Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most
top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many
bottom-up parsers). Therefore, a grammar is often preprocessed to eliminate the left recursion.
Removing direct left recursion The general algorithm to remove direct left recursion follows. Several improvements to this method have been made. For a left-recursive nonterminal A, discard any rules of the form A\rightarrow A and consider those that remain: :A \rightarrow A\alpha_1 \mid \ldots \mid A\alpha_n \mid \beta_1 \mid \ldots \mid \beta_m where: • each \alpha is a nonempty sequence of nonterminals and terminals, and • each \beta is a sequence of nonterminals and terminals that does not start with A. Replace these with two sets of productions, one set for A: :A \rightarrow \beta_1A^\prime \mid \ldots \mid \beta_mA^\prime and another set for the fresh nonterminal A' (often called the "tail" or the "rest"): :A^\prime \rightarrow \alpha_1A^\prime \mid \ldots \mid \alpha_nA^\prime \mid \epsilon Repeat this process until no direct left recursion remains. As an example, consider the rule set :\mathit{Expression} \rightarrow \mathit{Expression}+\mathit{Expression} \mid \mathit{Integer} \mid \mathit{String} This could be rewritten to avoid left recursion as :\mathit{Expression} \rightarrow \mathit{Integer}\,\mathit{Expression}' \mid \mathit{String}\,\mathit{Expression}' :\mathit{Expression}' \rightarrow {}+\mathit{Expression} \text{ } \mathit{Expression}'\mid \epsilon
Removing all left recursion The above process can be extended to eliminate all left recursion, by first converting indirect left recursion to direct left recursion on the highest numbered nonterminal in a cycle. :
Inputs A grammar: a set of nonterminals A_1,\ldots,A_n and their productions :
Output A modified grammar generating the same language but without left recursion :#
For each nonterminal A_i: :##
Repeat until an iteration leaves the grammar unchanged: :###
For each rule A_i\rightarrow\alpha_i, the \alpha_i being a sequence of terminals and nonterminals: :####
If \alpha_i begins with a nonterminal A_j and j: :#####
Let \beta_i be \alpha_i without its leading A_j. :#####
Remove the rule A_i\rightarrow\alpha_i. :#####
For each rule A_j\rightarrow\alpha_j: :######
Add the rule A_i\rightarrow\alpha_j\beta_i. :##
Remove direct left recursion for A_i as described above. Step 1.1.1 amounts to expanding the initial nonterminal A_j in the right hand side of some rule A_i \to A_j \beta, but only if j. If A_i \to A_j \beta was one step in a cycle of productions giving rise to a left recursion, then this has shortened that cycle by one step, but often at the price of increasing the number of rules. The algorithm may be viewed as establishing a
topological ordering on nonterminals: afterwards there can only be a rule A_i \to A_j \beta if j>i. Note that this algorithm is highly sensitive to the nonterminal ordering; optimizations often focus on choosing this ordering well. == Pitfalls ==