Modern definition by Hopcroft and Ullman In contemporary publications following Hopcroft and Ullman (1979), an indexed grammar is formally defined a 5-tuple
G = ⟨
N,
T,
F,
P,
S⟩ where •
N is a set of variables or
nonterminal symbols, •
T is a set ("
alphabet") of terminal symbols, •
F is a set of so-called
index symbols, or
indices, •
S ∈
N is the
start symbol, and •
P is a finite set of
productions. In productions as well as in derivations of indexed grammars, a string ("stack")
σ ∈
F* of index symbols is attached to every nonterminal symbol
A ∈
N, denoted by
A[
σ]. Terminal symbols may not be followed by index stacks. For an index stack
σ ∈
F* and a string
α ∈ (
N ∪
T)* of nonterminal and terminal symbols,
α[
σ] denotes the result of attaching [
σ] to every nonterminal in
α; for example if
α equals with
a,
d ∈
T terminal, and nonterminal symbols, then
α[
σ] denotes Using this notation, each production in
P has to be of the form •
A[σ] → α[σ], •
A[σ] →
B[
fσ], or •
A[
fσ] → α[σ], where
A,
B ∈
N are nonterminal symbols,
f ∈
F is an index,
σ ∈
F* is a string of index symbols, and
α ∈ (
N ∪
T)* is a string of nonterminal and terminal symbols. Some authors write ".." instead of "
σ" for the index stack in production rules; the rule of type 1, 2, and 3 then reads , and , respectively. Derivations are similar to those in a
context-free grammar except for the index stack attached to each nonterminal symbol. When a production like e.g.
A[
σ] →
B[
σ]
C[
σ] is applied, the index stack of
A is copied to both
B and
C. Moreover, a rule can push an index symbol onto the stack, or pop its "topmost" (i.e., leftmost) index symbol. Formally, the relation ⇒ ("direct derivation") is defined on the set (
N[
F*]∪
T)* of "sentential forms" as follows: • If
A[
σ] →
α[
σ] is a production of type 1, then β
A[
φ]
γ ⇒
β α[
φ]
γ, using the above definition. That is, the rule's left hand side's index stack
φ is copied to each nonterminal of the right hand side. • If
A[
σ] →
B[
fσ] is a production of type 2, then
β A[
φ]
γ ⇒
β B[
fφ]
γ. That is, the right hand side's index stack is obtained from the left hand side's stack
φ by pushing
f onto it. • If
A[
fσ] →
α[
σ] is a production of type 3, then
β A[
fφ]
γ ⇒
β α[
φ]
γ, using again the definition of
α[
σ]. That is, the first index
f is popped from the left hand side's stack, which is then distributed to each nonterminal of the right hand side. As usual, the derivation relation is defined as the
reflexive transitive closure of direct derivation ⇒. The language
L(
G) = {
w ∈
T*:
S w } is the set of all strings of terminal symbols derivable from the start symbol.
Original definition by Aho Historically, the concept of indexed grammars was first introduced by
Alfred Aho (1968) using a different formalism. Aho defined an indexed grammar to be a 5-tuple (
N,
T,
F,
P,
S) where •
N is a finite
alphabet of variables or
nonterminal symbols •
T is a finite alphabet of terminal symbols •
F ⊆
2N × (
N ∪
T)
* is the finite set of so-called
flags (each flag is itself a set of so-called
index productions) •
P ⊆
N × (
NF* ∪
T)* is the finite set of
productions •
S ∈
N is the
start symbol Direct derivations were as follows: • A production
p = (
A →
X1
η1...''X'
k'η''
k) from
P matches a nonterminal
A ∈
N followed by its (possibly empty) string of flags
ζ ∈
F*. In context,
γ Aζ δ, via
p, derives to
γ X1
θ1...''X'
k'θ''
k δ, where
θi = ''η'
i'ζ
if X''
i was a nonterminal and the empty word otherwise. The old flags of
A are therefore
copied to each new nonterminal produced by
p. Each such production can be simulated by appropriate productions of type 1 and 2 in the Hopcroft/Ullman formalism. • An index production
p = (
A →
X1...
Xk) ∈
f matches
Afζ (the flag
f it comes from must match the first symbol following the nonterminal
A) and copies the remaining index string
ζ to each new nonterminal:
γ Afζ δ derives to
γ X1
θ1...''X'
k'θ''
k δ, where
θi is the empty word when
Xi is a terminal and
ζ when it is a nonterminal. Each such production corresponds to a production of type 3 in the Hopcroft/Ullman formalism. This formalism is e.g. used by Hayashi (1973, p. 65-66). ==Examples==