A
parser is an important component of a compiler. It parses the source code of a computer programming language to create some form of internal representation. Programming languages tend to be specified in terms of a
context-free grammar because fast and efficient parsers can be written for them. Parsers can be written by hand or generated by a
parser generator. A context-free grammar provides a simple and precise mechanism for describing how programming language constructs are built from smaller
blocks. The formalism of context-free grammars was developed in the mid-1950s by
Noam Chomsky.
Block structure was introduced into computer programming languages by the ALGOL project (1957–1960), which, as a consequence, also featured a context-free grammar to describe the resulting ALGOL syntax. Context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar. If a programming language designer is willing to work within some limited subsets of context-free grammars, more efficient parsers are possible.
LR parsing The
LR parser (left to right) was invented by
Donald Knuth in 1965 in a paper, "On the Translation of Languages from Left to Right". An
LR parser is a parser that reads input from
Left to right (as it would appear if visually displayed) and produces a
Rightmost derivation. The term '
LR(k
) parser' is also used, where
k refers to the number of unconsumed
lookahead input symbols that are used in making parsing decisions. Knuth proved that LR(
k) grammars can be parsed with an
execution time essentially proportional to the length of the program, and that every LR(
k) grammar for
k > 1 can be mechanically transformed into an LR(1) grammar for the same language. In other words, it is only necessary to have one symbol lookahead to parse any
deterministic context-free grammar (DCFG). Korenjak (1969) was the first to show parsers for programming languages could be produced using these techniques. Frank DeRemer devised the more practical
Simple LR (SLR) and
Look-ahead LR (LALR) techniques, published in his PhD dissertation at MIT in 1969. This was an important breakthrough, because LR(k) translators, as defined by Donald Knuth, were much too large for implementation on computer systems in the 1960s and 1970s. In practice, LALR offers a good solution; the added power of LALR(1) parsers over SLR(1) parsers (that is, LALR(1) can parse more complex grammars than SLR(1)) is useful, and, though LALR(1) is not comparable with LL(1)(See below) (LALR(1) cannot parse all LL(1) grammars), most LL(1) grammars encountered in practice can be parsed by LALR(1). LR(1) grammars are more powerful again than LALR(1); however, an LR(1) grammar requires a
canonical LR parser which would be extremely large in size and is not considered practical. The syntax of many
programming languages are defined by grammars that can be parsed with an LALR(1) parser, and for this reason LALR parsers are often used by compilers to perform syntax analysis of source code. A
recursive ascent parser implements an LALR parser using mutually-recursive functions rather than tables. Thus, the parser is
directly encoded in the host language similar to
recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent for the same reason that compilation is faster than interpretation. It is also (in principle) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human. Recursive ascent was first described by Thomas Pennello in his article "Very fast LR parsing" in 1986. in 1988 as well as in an article by Leermakers, Augusteijn, Kruseman Aretz in 1992 in the journal
Theoretical Computer Science.
LL parsing An
LL parser parses the input from
Left to right, and constructs a
Leftmost derivation of the sentence (hence LL, as opposed to LR). The class of grammars which are parsable in this way is known as the
LL grammars. LL grammars are an even more restricted class of context-free grammars than LR grammars. Nevertheless, they are of great interest to compiler writers, because such a parser is simple and efficient to implement. LL(k) grammars can be parsed by a
recursive descent parser which is usually coded by hand, although a notation such as
META II might alternatively be used. The design of ALGOL sparked investigation of recursive descent, since the ALGOL language itself is recursive. The concept of recursive descent parsing was discussed in the January 1961 issue of
Communications of the ACM in separate papers by A.A. Grau and
Edgar T. "Ned" Irons. Richard Waychoff and colleagues also implemented recursive descent in the
Burroughs ALGOL compiler in March 1961, the two groups used different approaches but were in at least informal contact. The idea of LL(1) grammars was introduced by Lewis and Stearns (1968). Recursive descent was popularised by
Niklaus Wirth with
PL/0, an
educational programming language used to teach compiler construction in the 1970s. LR parsing can handle a larger range of languages than
LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible.
Earley parser In 1970,
Jay Earley invented what came to be known as the
Earley parser. Earley parsers are appealing because they can parse all
context-free languages reasonably efficiently. == Grammar description languages ==