Goal: write a multilingual grammar for expressing statements about John and Mary loving each other.
Abstract and concrete modules In GF, grammars are divided to two module types: • an
abstract module, containing judgement forms and . • or
category declarations list categories i.e. all the possible types of trees there can be. • or
function declarations state functions and their
types, these must be implemented by concrete modules (see below). • one or more
concrete modules, containing judgement forms and . • or
linearization type definitions, says what type of objects linearization produces for each category listed in . • or
linearization rules implement functions declared in . They say how trees are linearized. Consider the following:
Abstract syntax abstract Zero = { cat S ; NP ; VP ; V2 ; fun Pred : NP -> VP -> S ; Compl : V2 -> NP -> VP ; John, Mary : NP ; Love : V2 ; }
Concrete syntax: English concrete ZeroEng of Zero = { lincat S, NP, VP, V2 = Str ; lin Pred np vp = np ++ vp ; Compl v2 np = v2 ++ np ; John = "John" ; Mary = "Mary" ; Love = "loves" ; } Notice: (token list or "string") as the only linearization type.
Making a grammar multilingual A single
abstract syntax may be applied to many concrete syntaxes, in our case one for each new natural language we wish to add. The same system of trees can be given: • different words • different word orders • different linearization types
Concrete syntax: French concrete ZeroFre of Zero = { lincat S, NP, VP, V2 = Str ; lin Pred np vp = np ++ vp ; Compl v2 np = v2 ++ np ; John = "Jean" ; Mary = "Marie" ; Love = "aime" ; }
Translation and multilingual generation We can now use our grammar to translate phrases between French and English. The following commands can be executed in the GF interactive shell.
Import many grammars with the same abstract syntax > import ZeroEng.gf ZeroFre.gf Languages: ZeroEng ZeroFre
Translation: pipe linearization to parsing > parse -lang=Eng "John loves Mary" | linearize -lang=Fre Jean aime Marie
Multilingual generation: linearize into all languages > generate_random | linearize -treebank Zero: Pred Mary (Compl Love Mary) ZeroEng: Mary loves Mary ZeroFre: Marie aime Marie
Parameters, tables Latin has
cases: nominative for subject, accusative for object. •
Ioannes Mariam amat "John-Nom loves Mary-Acc" •
Maria Ioannem amat "Mary-Nom loves John-Acc" We use a
parameter type for case (just 2 of Latin's 6 cases). The linearization type of NP is a
table type: from to . The linearization of is an
inflection table. When using an NP, we
select () the appropriate case from the table.
Concrete syntax: Latin concrete ZeroLat of Zero = { lincat S, VP, V2 = Str ; NP = Case => Str ; lin Pred np vp = np ! Nom ++ vp ; Compl v2 np = np ! Acc ++ v2 ; John = table {Nom => "Ioannes" ; Acc => "Ioannem"} ; Mary = table {Nom => "Maria" ; Acc => "Mariam"} ; Love = "amat" ; param Case = Nom | Acc ; }
Discontinuous constituents, records In Dutch, the verb
heeft lief is a discontinuous constituent. The linearization type of is a
record type with two
fields. The linearization of is a
record. The values of fields are picked by
projection ()
Concrete syntax: Dutch concrete ZeroDut of Zero = { lincat S, NP, VP = Str ; V2 = {v : Str ; p : Str} ; lin Pred np vp = np ++ vp ; Compl v2 np = v2.v ++ np ++ v2.p ; John = "Jan" ; Mary = "Marie" ; Love = {v = "heeft" ; p = "lief"} ; }
Variable and inherent features, agreement, Unicode support For Hebrew, NP has gender as its
inherent feature a field in the record. VP has gender as its
variable feature an argument of a table. In predication, the VP receives the gender of the NP.
Concrete syntax: Hebrew concrete ZeroHeb of Zero = { flags coding=utf8 ; lincat S = Str ; NP = {s : Str ; g : Gender} ; VP, V2 = Gender => Str ; lin Pred np vp = np.s ++ vp ! np.g ; Compl v2 np = table {g => v2 ! g ++ "את" ++ np.s} ; John = {s = "ג׳ון" ; g = Masc} ; Mary = {s = "מרי" ; g = Fem} ; Love = table {Masc => "אוהב" ; Fem => "אוהבת"} ; param Gender = Masc | Fem ; }
Visualizing parse trees GF has inbuilt functions which can be used for visualizing parse trees and word alignments. The following commands will generate parse trees for the given phrases and open the produced PNG image using the system's command. > parse -lang=Eng "John loves Mary" | visualize_parse -view="eog" > parse -lang=Dut "Jan heeft Marie lief" | visualize_parse -view="eog"
Generating word alignment • In languages L1 and L2: link every word with its smallest spanning subtree. • Delete the intervening tree, combining links directly from L1 to L2. In general, this gives phrase alignment. Links can be crossing, phrases can be discontinuous. The command follows a similar syntax: > parse -lang=Fre "Marie aime Jean" | align_words -lang=Fre,Dut,Lat -view="eog" ==Resource Grammar Library==