Terminology The term
syntactic predicate was coined by Parr & Quong and differentiates this form of predicate from
semantic predicates (also discussed). Syntactic predicates have been called
multi-step matching,
parse constraints, and simply
predicates in various literature. (See References section below.) This article uses the term
syntactic predicate throughout for consistency and to distinguish them from
semantic predicates.
Formal closure properties Bar-Hillel et al. show that the intersection of two
regular languages is also a regular language, which is to say that the regular languages are
closed under
intersection. The intersection of a
regular language and a
context-free language is also closed, and it has been known at least since Hartmanis that the intersection of two context-free languages is not necessarily a context-free language (and is thus not closed). This can be demonstrated easily using the canonical
Type 1 language, L = \{ a^n b^n c^n : n \ge 1 \} : Let L_1 = \{ a^m b^n c^n : m,n \ge 1 \} (Type 2) Let L_2 = \{ a^n b^n c^m : m,n \ge 1 \} (Type 2) Let L_3 = L_1 \cap L_2 Given the
strings '
, ', and '
, it is clear that the only string that belongs to both L1 and
L2 (that is, the only one that produces a non-empty intersection) is '.
Other considerations In most formalisms that use syntactic predicates, the syntax of the predicate is
noncommutative, which is to say that the operation of predication is ordered. For instance, using the above example, consider the following pseudo-grammar, where
X ::= Y PRED Z is understood to mean: "
Y produces
X if and only if Y also satisfies predicate
Z": S ::= a X X ::= Y PRED Z Y ::= a+ BNCN Z ::= ANBN c+ BNCN ::= b [BNCN] c ANBN ::= a [ANBN] b Given the string ''
, in the case where Y
must be satisfied first
(and assuming a greedy implementation), S will generate aX
and X'' in turn will generate '
, thereby generating '. In the case where
Z must be satisfied first, ANBN will fail to generate '
, and thus ' is not generated by the grammar. Moreover, if either
Y or
Z (or both) specify any action to be taken upon reduction (as would be the case in many parsers), the order that these productions match determines the order in which those side-effects occur. Formalisms that vary over time (such as
adaptive grammars) may rely on these
side effects.
Examples of use ANTLR Parr & Quong constraints of
C++: • If it looks like a declaration, it is; otherwise • if it looks like an expression, it is; otherwise • it is a syntax error. In the first production of rule stat, the syntactic predicate
(declaration)? indicates that declaration is the syntactic context that must be present for the rest of that production to succeed. We can interpret the use of (declaration)? as "I am not sure if declaration will match; let me try it out and, if it does not match, I shall try the next alternative." Thus, when encountering a valid declaration, the rule declaration will be recognized twice—once as syntactic predicate and once during the actual parse to execute semantic actions. Of note in the above example is the fact that any code triggered by the acceptance of the
declaration production will only occur if the predicate is satisfied.
Canonical examples The language L = \{a^n b^n c^n | n \ge 1\} can be represented in various grammars and formalisms as follows:
Parsing expression grammars S ← &(A !b) a+ B !c A ← a A? b B ← b B? c
§-Calculus Using a
bound predicate: S → {A}B A → X 'c+' X → 'a' [X] 'b' B → 'a+' Y Y → 'b' [Y] 'c' Using two
free predicates: A →
a b Ψ(
a b)X
c Ψ(
b c)Y X → 'a' [X] 'b' Y → 'b' [Y] 'c'
Conjunctive grammars (Note: the following example actually generates L = \{a^n b^n c^n | n \ge 0\}, but is included here because it is the example given by the inventor of conjunctive grammars.): S → AB&DC A → aA | ε B → bBc | ε C → cC | ε D → aDb | ε
Raku rules rule S { > a+ } rule A { a ? b } rule B { b ? c } ==Parsers/formalisms using some form of syntactic predicate==