Set up To explain an LL(1) parser's workings we will consider the following small LL(1) grammar: • S → F • S →
( S
+ F
) • F →
a and parse the following input: :
( a + a ) An LL(1) parsing table for a grammar has a row for each of the non-terminals and a column for each terminal (including the special terminal, represented here as
$, that is used to indicate the end of the input stream). Each cell of the table may point to at most one rule of the grammar (identified by its number). For example, in the parsing table for the above grammar, the cell for the non-terminal 'S' and terminal '
(' points to the rule number 2: : The algorithm to construct a parsing table is described in a later section, but first let's see how the parser uses the parsing table to process its input.
Parsing procedure In each step, the parser reads the next-available symbol from the input stream, and the top-most symbol from the stack. If the input symbol and the stack-top symbol match, the parser discards them both by moving to the next input symbol and popping the top-most symbol off the stack. This is repeated until the input symbol and top-most symbol on the stack do not match. Thus, in its first step, the parser reads the input symbol '
(' and the stack-top symbol 'S'. The parsing table instruction comes from the column headed by the input symbol '
(' and the row headed by the stack-top symbol 'S'; this cell contains '2', which instructs the parser to apply rule (2). The parser has to rewrite 'S' to '
( S
+ F
)' on the stack by removing 'S' from stack and pushing ')', 'F', '
+', 'S', '
(' onto the stack, and this writes the rule number 2 to the output. The stack then becomes: [
(, S,
+, F,
),
$ ] In the second step, the parser removes the '
(' from its input stream and from its stack, since they now match. The stack now becomes: [ S,
+, F,
),
$ ] Now the parser has an '
a' on its input stream and an 'S' as its stack top. The parsing table instructs it to apply rule (1) from the grammar and write the rule number 1 to the output stream. The stack becomes: [ F,
+, F,
),
$ ] The parser now has an '
a' on its input stream and an 'F' as its stack top. The parsing table instructs it to apply rule (3) from the grammar and write the rule number 3 to the output stream. The stack becomes: [
a,
+, F,
),
$ ] The parser now has an '
a' on the input stream and an '
a' at its stack top. Because they are the same, it removes it from the input stream and pops it from the top of the stack. The parser then has an '
+' on the input stream and '
+' is at the top of the stack meaning, like with 'a', it is popped from the stack and removed from the input stream. This results in: [ F,
),
$ ] In the next three steps the parser will replace 'F' on the stack by '
a', write the rule number 3 to the output stream and remove the '
a' and '
)' from both the stack and the input stream. The parser thus ends with '
$' on both its stack and its input stream. In this case the parser will report that it has accepted the input string and write the following list of rule numbers to the output stream: : [ 2, 1, 3, 3 ] This is indeed a list of rules for a
leftmost derivation of the input string, which is: : S →
( S
+ F
) →
( F
+ F
) →
( a + F
) →
( a + a ) Parser implementation in C++ Below follows a
C++ implementation of a table-based LL parser for the example language: • include • include • include enum Symbols { // the symbols: // Terminal symbols: TS_L_PARENS, // ( TS_R_PARENS, // ) TS_A, // a TS_PLUS, // + TS_EOS, // $, in this case corresponds to '\0' TS_INVALID, // invalid token // Non-terminal symbols: NTS_S, // S NTS_F // F }; /* Converts a valid token to the corresponding terminal symbol • / Symbols lexer(char c) { switch (c) { case '(': return TS_L_PARENS; case ')': return TS_R_PARENS; case 'a': return TS_A; case '+': return TS_PLUS; case '\0': return TS_EOS; // end of stack: the $ terminal symbol default: return TS_INVALID; } } int main(int argc, char **argv) { using namespace std; if (argc pair to action map > table; stack ss; // symbol stack char *p; // input buffer // initialize the symbols stack ss.push(TS_EOS); // terminal, $ ss.push(NTS_S); // non-terminal, S // initialize the symbol stream cursor p = &argv[1][0]; // set up the parsing table table[NTS_S][TS_L_PARENS] = 2; table[NTS_S][TS_A] = 1; table[NTS_F][TS_A] = 3; while (ss.size() > 0) { if (lexer(*p) == ss.top()) { cout
Parser implementation in Python from enum import Enum from collections.abc import Generator class Term(Enum): pass class Rule(Enum): pass • All constants are indexed from 0 class Terminal(Term): LPAR = 0 RPAR = 1 A = 2 PLUS = 3 END = 4 INVALID = 5 def __str__(self): return f"T_{self.name}" class NonTerminal(Rule): S = 0 F = 1 def __str__(self): return f"N_{self.name}" • Parse table table = 1, -1, 0, -1, -1, -1], [-1, -1, 2, -1, -1, -1 RULES = [ [NonTerminal.F], [ Terminal.LPAR, NonTerminal.S, Terminal.PLUS, NonTerminal.F, Terminal.RPAR, ], [Terminal.A], ] stack = [Terminal.END, NonTerminal.S] def lexical_analysis(input_string: str) -> Generator[Terminal]: print("Lexical analysis") for c in input_string: match c: case "a": yield Terminal.A case "+": yield Terminal.PLUS case "(": yield Terminal.LPAR case ")": yield Terminal.RPAR case _: yield Terminal.INVALID yield Terminal.END def syntactic_analysis(tokens: list[Terminal]) -> None: print("tokens:", end=" ") print(*tokens, sep=", ") print("Syntactic analysis") position = 0 while stack: svalue = stack.pop() token = tokens[position] if isinstance(svalue, Term): if svalue == token: position += 1 print("pop", svalue) if token == Terminal.END: print("input accepted") else: raise ValueError("bad term on input:", str(token)) elif isinstance(svalue, Rule): print(f"{svalue = !s}, {token = !s}") rule = table[svalue.value][token.value] print(f"{rule = }") for r in reversed(RULES[rule]): stack.append(r) print("stacks:", end=" ") print(*stack, sep=", ") if __name__ == "__main__": inputstring = "(a+a)" syntactic_analysis(list(lexical_analysis(inputstring))) Outputs: == Remarks ==