Avoiding ambiguity while keeping the syntax This problem often comes up in
compiler construction, especially
scannerless parsing. The convention when dealing with the dangling else is to attach the else to the nearby if statement, allowing for unambiguous context-free grammars, in particular. Programming languages like Pascal, C, and Java follow this convention, so there is no ambiguity in the semantics of the
language, though the use of a parser generator may lead to ambiguous
grammars. In these cases alternative grouping is accomplished by explicit blocks, such as begin...end in Pascal and {...} in C. Depending on the compiler construction approach, one may take different corrective actions to avoid ambiguity: • If the parser is produced by an SLR, LR(1), or LALR
LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict. Possible solutions are: • Having an "end if" symbol delimiting the end of the if construct. Examples of such languages are
ALGOL 68,
Ada,
Eiffel,
PL/SQL,
Visual Basic,
Modula-2, and
AppleScript. • Disallowing the statement following a "then" to be an "if" itself (it may however be a pair of statement brackets containing only an if-then-clause). Some implementations of
ALGOL 60 follow this approach. • Requiring braces (parentheses) when an "else" follows an "if". • Requiring every "if" to be paired with an "else". To avoid a similar problem concerning semantics rather than syntax,
Racket deviates from
Scheme by considering an if without a fallback clause to be an error, effectively distinguishing conditional
expressions (i.e if) from conditional
statements (i.e when and unless, which do not have fallback clauses). • Using different keywords for the one-alternative and two-alternative "if" statements.
S-algol uses if e do s for the one-alternative case and if e1 then e2 else e3 for the general case. • Requiring braces unconditionally, like
Swift and
Rust. This is effectively true in
Python as its indentation rules delimit every block, not just those in "if" statements.
Avoiding the conflict in LR parsers The above example could be rewritten in the following way to remove the ambiguity : statement: open_statement | closed_statement ; open_statement: IF '(' expression ')' statement | IF '(' expression ')' closed_statement ELSE open_statement ; closed_statement: non_if_statement | IF '(' expression ')' closed_statement ELSE closed_statement ; non_if_statement: ... ; Any other statement-related grammar rules may also have to be duplicated in this way if they may directly or indirectly end with a statement or selection-statement non-terminal. However, we give grammar that includes both of if and while statements. statement: open_statement | closed_statement ; open_statement: IF '(' expression ')' statement | IF '(' expression ')' closed_statement ELSE open_statement | WHILE '(' expression ')' open_statement ; closed_statement: simple_statement | IF '(' expression ')' closed_statement ELSE closed_statement | WHILE '(' expression ')' closed_statement ; simple_statement: ... ; Finally, we give the grammar that forbids ambiguous IF statements. statement: open_statement | closed_statement ; open_statement: IF '(' expression ')' statement | IF '(' expression ')' closed_statement ELSE open_statement | WHILE '(' expression ')' open_statement ; closed_statement: simple_statement | IF '(' expression ')' closed_statement ELSE closed_statement | WHILE '(' expression ')' closed_statement ; simple_statement: ... ; With this grammar the statement if (a) if (b) c else d can only be parsed one way, because the other interpretation (if (a) {if (b) c} else d) is produced as statement open_statement IF '(' expression ')' closed_statement ELSE open_statement 'if' '(' 'a' ')' closed_statement 'else' 'd' and then the parsing fails trying to match closed_statement to "if (b) c". An attempt with closed_statement fails in the same way. The other parse, if (a) {if (b) c else d}) succeeds: statement open_statement IF '(' expression ')' statement IF '(' expression ')' closed_statement IF '(' a ')' (IF '(' expression ')' closed_statement ELSE closed_statement) IF '(' a ')' (IF '(' b ')' c ELSE 'd') ==See also==