LG511 Parsing: Basics - Semantic Scholar

LG511 Parsing: Basics Doug Arnold [email protected]

1

Basics • A parser for a grammar G is a program that takes an input string w and produces a parse tree for w as output (if w is a sentence of the G; otherwise, it fails or produces some kind of error message) In case a G is ambiguous, we may want all parse trees, or just one. • A recognizer differs in that it merely reports success (or failure), i.e. does not produce a parse tree (the difference is not very important — and the obvious way to decide if w is a sentence is to see if it has a parse tree.

In principle parsing may procede: • left-right (i.e. from the start of the string — earlier in the string to later); • right-left (i.e. from the end of the string towards the start); • bidirectionally (e.g. from “islands of certainty”, or from syntactic heads) Here we focus on left-right parsing. The major classification of parsing algorithms is in terms of whether they construct the parse tree: • from the root/top (top-down) • from the bottom (bottom up)

1.1

Example Grammar

(1) S NP VP VP

--> --> --> -->

NP VP DET N V NP V

NP --> Sam NP --> Kim V --> saw V --> cried N --> baby N --> child DET --> a DET --> the (2) NP Z Z N DET a

S PP PP VP H H H V NP ZZ

baby

saw

DET

N

a

child

1

(3)

S HH H VP NP HH H Sam V NP Z Z saw DET N a

2

baby

Top Down Parsing

2.1

Basic Idea

The basic idea is to build the parse tree from the top (the root, labelled with the start symbol of G) downwards. Beginning with the start symbol, the nodes of the parse tree are expanded in preorder. That is, when a node M is expanded by a rule M → D1 , . . . , Dn , we first expand D1 , then D2 , etc. In the case of a rule that introduces a terminal, e.g. M → d, we check if the next item in the input string matches d, and if it does, we advance the input pointer over d. 2.1.1

Example S

a baby cried.

(a)

S ,l , l NP VP S Q Q NP VP , l , l DET N S Q Q VP NP ,l , l DET N

a baby cried.

(b)

a baby cried.

(c)

a baby cried.

(d)

a

2

S HH H NP VP Z Z DET N baby S HH H NP VP Z Z DET N V

a baby cried .

(e)

a baby cried .

(f)

a baby cried .

(g)

a

a

baby S HH H NP VP Z Z DET N V a

baby

cried

Cf. a leftmost derivation: S NP DET the the the the

N N baby baby baby

VP VP VP VP V cried

Top-down parsing computes a left-most derivation. The structure of the parse tree mirrors the structure of the parsing process directly. Top-down parsing is of necessity non-deterministic in general, because in general there will be several rules expanding any given non-terminal. Non-determinism arises whenever a category can be rewritten in more than one way.

2.2

Advantages and Limitations

Top-down parsing is standard with ATNs (which have been used for some of the largest practical NLP systems in existence), and DCGs as implemented in Prolog. The chief advantage of top-down parsing is that it appears to be expectation lead (‘predictive’), and has the property that the parse is always connected. This lends some psycholinguistic plausibility. It can also deal with -productions. But there are several disadvantages: • difficulties in parsing incomplete or fragmentary data, or inputs that fall outside the grammar; • Need for backtracking when the wrong rule is chosen (inefficient, pyschologically implausible); • Inefficient — rules are tried regardless of the input data, so many rules are tried that do not play a role in the actual derivation; • Problems with left recursive rules, i.e. rules of the form A → Aγ, which lead to non-termination (looping).

3

2.3

Inefficiency

Examples like (4) suggest that sentences can begin with a PP as in the following rule: (4)

a. In the park, Sam saw a baby. b. S --> PP S

This may lead the parser to consider parse trees like (5), even for an example like (6), when a look at the first word would indicate this is wrong. (5)

S Q Q PP S @ T @ P NP . . . T in . . .

(6)

A child saw a baby.

2.4

Left Recursion

There are many constructions of English that seem to need left-recursive rules: 1. Coordination: (7) a. a child and a baby b. NP --> NP and NP 2. Sentential Subjects (8) That the world is round is not obvious. (9) a. S --> S’ VP b. S’ --> that S (10) a. S --> NP VP b. NP --> S’ c. S’ --> that S 3. Genitive NPs (11) John’s mother’s sister’s son NP PP P P N NP PP P P son N NP HH H N sister’s NP N

mother’s

John’s (12) a. NP --> NP ’s N 4. Adjuncts/Modifiers (13) the child in the park with the baby . . .

4

NP XX XXX NP PP !aa !aa ! ! ! a a ! ! a NP PP with the baby HH H ZZ DET N in the park the (14)

3

child a. NP --> NP PP b. NP --> DET N

Bottom Up Parsing

3.1

Basic Idea

In bottom up parsing, the parser attempts to construct the parse tree from the leaves upwards to the root. Informally, it involves scanning the input for substrings which match the RHSs of rules, and replacing them with the LHS symbols: (15)

3.2

a. b. c. d.

the baby saw the child. DET baby saw the child. DET N saw the child. NP saw the child. etc.

Shift-Reduce Parsing

Shift-Reduce parsing is a very simple form of bottom up parsing. It involves two data structures: • Input string • Push Down store (PDS) At each stage of parsing, the parser may take one of four actions: 1. shift the first element from the input string onto the PDS; 2. reduce a sequence of elements at the top of the PDS, to a single element (in particular, given a rule M → D1 , . . . , Dn , if the sequence D1 , . . . , Dn appears on top of the stack, it can be reduced to M , that is D1 , . . . , Dn are popped from the stack, and replaced by M ; 3. succeed (i.e. signal that the string has been accepted, when the input string is empty and the start symbol of the grammar is the only element in the PDS); 4. fail (i.e. indicate some error condition).

5

3.2.1

Example Stack [] [the] [DET] [DET,baby] [DET,N] [NP] [NP,cried] [NP,V] [NP,VP] [S]

Input Left the baby cried baby cried baby cried cried cried cried [] [] [] []

Action shift reduce DET --> the shift reduce N --> baby reduce NP --> DET N shift reduce V --> cry reduce VP --> V reduce VP --> V reduce S --> NP VP

Cf. the corresponding rightmost derivation: the DET DET

baby baby N NP NP NP

cried cried cried cried V VP

S

SR parsing computes a rightmost derivation in reverse. Constructing an SR parser simply requires that we associate the items on the stack with appropriate pieces of structure, which are combined them together when a reduce action occurs. For a SR parser, non-determinism arises in two cases: • shift-reduce conflicts • reduce-reduce conflicts 3.2.2

Shift-Reduce Conflicts

Shift-Reduce conflicts arise when there is a choice between shifting the next input element and reducing the top of the stack. For example, consider the rules in (16), which would be appropriate for the structures below: (16)

a. VP --> V NP PP b. VP --> VP PP c. VP --> V NP

VP XXX % XX % NP PP V !aa a ##cc !! put the car in the garage VP PP P

VP H H H V NP # # cc mended the car

P PP !aa ! ! a ! a in the morning

6

Given the following configuration: (17)

Stack [V, NP]

Input Left in the garage . . .

Action ????

the parser must chose between: shifting in onto the Stack [V, NP] [V, NP, in] [V, NP, P]

stack: Input Left Action in the garage . . . shift the garage . . . shift the garage . . . ... [V, NP, PP] . . . reduce [VP] ... reduce In which case the V, NP, and PP may become sisters: (18) VP PP , , PP PP V NP !aa T ! ! a . . . in the garage reducing V and NP to VP: Stack Input Left Action [V, NP] in the garage . . . reduce [VP] in the garage . . . shift In which case only the V and NP become sisters. (19) VP !aa !! a VP PP !aa ! @ ! a @ V NP in the garage T ... Presumably only one of these structures is right in any particular case. 3.2.3

Reduce-Reduce Conflicts

Reduce-Reduce conflicts arise when there is more than one rule that can be used to reduce. The rules in (21) might be proposed to analyze the structures in (20) (20) (21)

[V P persuade [N P Sam ] [V P to go ]]] [V P believe [S Sam left ]]

(22)

a. VP --> V NP VP b. VP --> V S c. S --> NP VP

This gives two possible reductions for the configuration: (23)

Stack [V, NP, VP]

Input ...

Action ????+

(24)

Stack [V, NP, VP] [VP]

Input ... ...

Action reduce (a)+

7

(25)

3.3

Stack [V, NP, VP] [V, S]

Input ... ...

Action reduce (c)

Advantages and Limitations

The main advantages of the approach are that it appears to be data-driven, can parse fragmentary input, and has no problems with left-recursion. However, there are drawbacks: • problems with -productions; • appears “non-predictive”; • inefficient — since reductions may be made that can never contribute to a correct analysis. 3.3.1

-productions

Given a rule A → , the empty string () can found at any point in the input string, so A can be pushed onto the stack at any point (and an indefinite number of times — looping/inefficiency). It is not perhaps not immediately clear that -productions are useful for natural languages, but constructions such as the following suggest they may be (“Gapping”): (26)

a. Sam went to Spain, and Kim ∆ to France. b. Sam hated the white wall paper, but admired the red ∆.

Moreover, one view of examples like (27), where the preposed Wh-item behaves in various ways as though it were in the ∆ position, is that they have a structure as below: (27) Where did you put the car ∆ yesterday? S XXX XX S COMP ````` `` Aux NP VP NPi PP PP Where did you VP ADVP "b PPP P " b V NP NPi yesterday ##cc put the car 3.3.2

Inefficiency

Just as a top down parser can expand a non-terminal using a rule that plays no part in the derivation of the input sentence, so an SR parser can perform reductions that play no role. (28)

That fish can swim is hardly surprising

8

S XX XXX X S VP !aa HH !! a H COMP V S AP PP "b PP " b that NP VP is hardly surprising "b b " fish can swim Given that that can be either a complementizer or a determiner, an SR parser may treat that fish as a constituent; but this, and any analysis that involves it is wasted, given the correct structure — though it would be correct for That fish can (really) swim. Stack [] [DET,N] [NP]

Input that fish can swim is hardly surprising ... can swim is hardly surprising can swim is hardly surprising ... *blocks*

Action

Similarly, the lexical categorial ambiguity of saw (N or V) can produce a situation like the following, where it has been analyzed as an N. (29) The baby saw the child. Stack []

(30)

4

[NP,N]

Input the baby saw the child. ... the child. ... *blocks*

Action

Comparison

The following trees show the order in which nodes are begun and completed by particular top-down and bottom-up parsing algorithms. The bottom up algorithm would be one which always tries to reduce before it shifts. (You may find it more intuitive to draw arrows connecting the numbers). (31)

1 S18 XXX X X NP 2 9 10 VP17 "b HH H " b " b 3 DET5 6 N8 11 V13 14 NP16 4 the4

7 baby7

12 saw12

15 Sam15

9

(32)

17 S18 PP PP P 7 NP8 15 VP16 "b "b " b " b " b " b 2 DET3 5 N6 10 V11 13 NP14 1 the1

5

4 baby4

9 saw9

12 Sam12

Non-Determinism and Search

These algorithms are non-deterministic, in the sense that the action they perform is not fully determined by their state and the input. That is, they must explore (search) a range of alternatives, even for unambiguous grammars. A space of alternatives is often represented as a (search) tree. The following is part of the search tree that arises for a top-down parser in relation to a grammar like that in (1).

5.1

A Search Tree S , ,l l NP VP (((hhhhhh (((( h B A S Q Q NP VP , l , l DET N XXXX X A2 A1

S Q Q VP NP ,l , l DET N a

S , , l l NP VP Sam ...

@ @ ...

S Q Q NP VP ,l , l DET N the

...

@ @ ...

...

@ @ ...

Search can be (inter alia): • Depth first (i.e. exploring one alternative, and its consequences, one at a time); • Breadth first (i.e. exploring all alternatives “in parallel”). There are advantages and disadvantages, e.g. breadth first is “fair”, and useful if exhaustive search is necessary anyway (e.g. if all parses must be found). The alternative to searching is to provide an “oracle” which ensures that the parser always makes

10

the right choice, i.e. so that the parser becomes deterministic. Obviously, for an ambiguous grammar (like an natural language grammar) no oracle can be “exact”, however, it is possible to augment SR parsers and top-down parsers with oracles that greatly reduces their non-determinism. For example, • for a top down parser one could compute a table of which lexical items can begin (‘be first in’) which kinds of phrase: a can be the first word of a DET, and (hence NP and S), but not PP. • for a bottom up parser on can compute a table which allows top-down information to be encoded, resolving some conflicts. The word saw can be a noun or a verb, but in a context like . . . the saw . . . it can only be a noun. If a shift reduce parser has DET on top of its stack, and saw as the first word of input, such a table can be used to tell the parser to reduce this to N, disregarding the alternative of reduction to V (cf. LR parsing)

6

Reading

Useful Discussions of top-down and bottom-up parsing can be found in Covington (1994, Ch6), Gazdar and Mellish (1989, Ch5), and Allen (1987, Ch3). Technical discussion unrelated to NLP can be found in most books on compiler design, e.g. Aho et al. (1986, Ch4). There is an interesting and very intuitive discussion which looks at psychological plausibility in Johnson-Laird (1983). Since Tomita’s work on Generalized LR parsing (a form of shift reduce parsing), e.g. Tomita (1986), Tomita (1987), there has been a good deal of practically oriented work on this approach. See, for example, Tomita (1991), Bunt and Tomita (1996). There has been some work on the psychological plausibility of shift–reduce parsing: e.g. Pereira (1985), Shieber (1983a), Shieber (1983b). There is discussion of Augmented Transition Networks in Gazdar and Mellish (1989), original papers include Woods (1970, 1986), and Kaplan (1972).

References Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Publishing Co., Reading, Mass., 1986. J. Allen. Natural Language Understanding. Benjamin Cummings Pub. Co., Inc., Menlo Park, Ca., 1987. Harry Bunt and Masaru Tomita, editors. Recent Advances in Parsing Technology, volume 1 of Text, Speech and Language Technology Series. Kluwer Academic Publishers, Dordrecht, 1996. Michael A Covington. Natural Language Processing for Prolog Programmers. Prentice Hall, Englewood Cliffs, NJ, 1994. G. Gazdar and C. Mellish. Natural Language Processing in Prolog. Addison Wesley, Wokingham, 1989. P N Johnson-Laird. Mental Models. Cambridge University Press, Cambridge, 1983. R. M. Kaplan. Augmented transition networks as psychological models of sentence comprehension. Artificial Intelligence, 3:77–100, 1972.

11

Fernando C.N. Pereira. A new characterization of attachment preferences. In Lauri Karttunen David R. Dowty and Arnold M. Zwicky, editors, Natural Language Parsing, pages 307–319. Cambridge University Press, Cambridge, 1985. Stuart M. Shieber. Sentence disambiguation by a shift-reduce parsing technique. In ACL Proceedings, 21st Annual Meeting, pages 113–118, 1983a. Stuart M. Shieber. Sentence disambiguation by a shift-reduce parsing technique. In ACL Proceedings, 21st Annual Meeting, pages 113–118, 1983b. Masaru Tomita. Efficient Parsing for Natural Language: a Fast Algorithm for Practical Systems. Kluwer, Boston, 1986. Masaru Tomita. An efficient augmented context-free parsing algorithm. Computational Linguistics, 13(1-2):31–46, 1987. Masaru Tomita, editor. Generalized LR Parsing. Kluwer Academic Publishers, 1991. William A. Woods. Transition network grammars for natural language analysis. In Barbara J. Grosz, Karen Sparck-Jones, and Bonnie Lynn Webber, editors, Readings in Natural Language Processing, pages 71–87. Morgan Kaufmann, Los Altos, 1970, 1986.

12

7

Prolog Code

7.1

Top Down

% topdown parsing % Recursive-descent top-down parser ?- op(1200,xfy,--->). % parse(?C,?S1,?S) % Parse a constituent of category C % starting with input string S1 and % ending up with input string S. % terminals: parse(C,[W|Ws],Ws) :word(C,W). % non-terminals parse(C,S1,S) :(C ---> Cs), parse_list(Cs,S1,S). % parse_list(+Cs,?S1,?S) % Like parse/3, but Cs is a list of % categories to be parsed in succession. parse_list([C|Cs],S1,S) :parse(C,S1,S2), parse_list(Cs,S2,S). parse_list([],S,S).

7.2

Bottom Up

%% bottom-up parsing %% shift-reduce parser ?- op(1200,xfy,--->). %% %% %% %%

parse(?C,?S1,?S2) Parse a constituent of category C starting with input string S1 and ending up with input string S2.

parse(C,S1,S2) :shift_reduce(C,[],S1,S2). %% shift_reduce(Cat,Stack,String1,String2) %% use either shifting or reduction to parse %% from String1 to String2. Stack is a PDS.

13

%% Succeed when the goal category Cat is the only thing on the Stack, %% and the input is finished. shift_reduce(C,[C],[],[]). shift_reduce(C,Stack,S1,S2) :reduce(Stack,NewStack), shift_reduce(C,NewStack,S1,S2). shift_reduce(C,Stack,S1,S3) :shift(Stack,NewStack,S1,S2), shift_reduce(C,NewStack,S2,S3). %% reduce(Stack,NewStack) %% pop the rhs categories of a rule off Stack, and %% push the lhs categories onto it to give NewStack reduce(Stack,[C|ReducedStack]) :(C ---> Cats), reverse(Cats,StackCats), append(StackCats,ReducedStack,Stack). %% shift(Stack,NewStack, String, NewString) %% shift the first item from String1 onto the Stack %% NewStack and NewString are what result. shift(Stack,[Cat|Stack],[W|String],String) :word(Cat,W). % reverse(L,RL) % RL is the reverse of L reverse(L,RL) :reverse(L,[],RL). reverse([],R,R). reverse([H|T],Aux,RL) :reverse(T,[H|Aux],RL). % classic append/3 append([],L,L). append([H|T],L,[H|NT]) :append(T,L,NT).

7.3

Grammar

% maximally simple grammar for top-down and bottom-up parsing. % Phrase Structure Rules: s ---> [np,vp]. np ---> [d,n]. vp ---> [v,np]. vp ---> [v,np,pp]. pp ---> [p,np]. % Lexicon: word(d,the). word(p,near).

14

word(n,dog). word(n,cat). word(n,elephant). word(v,chase). word(v,see). word(v,amuse).

word(n,dogs). word(n,cats). word(n,elephants). word(v,chases). word(v,sees). word(v,amuses).

15