HPSG and TAG - CiteSeerX

7 downloads 0 Views 170KB Size Report
lates Head Driven Phrase Structure Grammars into lexicalized ... nition of grammatical relations, such as head- ..... PS94] Carl Pollard and Ivan Sag. Head ...
In: 3e Colloque International sur les Grammaires d'Arbres Adjoints. (TAG+ 3). Rapport Technique TALANA-RT-94-01, pp. 77-82, Paris, 1994

HPSG and TAG Klaus Netter

* Robert Kaspery / Bernd Kiefer* / K. Vijay-Shankerz  DFKI GmbH, Saarbr ucken, Germany

fnetter|[email protected] y Ohio State University, Columbus, USA [email protected]

z University of Delaware, Newark, USA [email protected]

Abstract1

We will present a compilation algorithm that translates Head Driven Phrase Structure Grammars into lexicalized feature based Tree Adjoining Grammars. Through this exercise we attempt to gain further insights into the nature of the two theories and to identify correlating concepts. While HPSG has a more elaborated lexicalized and principle-based theory of functor argument structures, TAG provides the means to represent lexical-based structural information more explicitly and to factor out recursion. Our objectives are met by giving clear and simple de nitions for projecting structure from the lexicon, for determining \maximal" projections, and for identifying potential auxiliary trees and foot nodes.

1 Introduction

Head Driven Phrase Structure Grammar (HPSG) and Tree Adjoining Grammar (TAG) are two frameworks which so far have been largely pursued in parallel, taking little or no account of each other. In this paper we will describe an algorithm which will allow to compile HPSG grammars, obeying certain constraints, into TAGs. However, we are not only interested in mapping one formalism into another, but also in exploring the relationship between concepts employed in the two frameworks. HPSG has been set out to provide a featurebased grammatical framework which is characterized by a modular speci cation of linguistic generalizations through extensive use of principles and lexicalization of grammatical information. Traditional grammar rules are generalized to schemata providing an abstract de nition of grammatical relations, such as headof, complement-of, subject-of, adjunct-of etc. 1 This paper is based on a forthcoming extended version by Kasper/Kiefer/Netter/Vijay-Shanker. We would like to thank A. Abeille, A. Joshi, T. Kroch, O. Rambow and H. Uszkoreit for valuable comments and discussions. The research underlying the paper was supported by research grants from the German Bundesministerium fur Forschung und Technologie to the DFKI projects Disco, FKZ ITW 9002 0, Paradice, FKZ ITW 9403 and the VerbMobil project, FKZ 01 IV 101 K/1.

Principles, such as the head-feature-, valency-, non-local- or semantics-principle, determine the projection of information from the lexicon and recursively de ne the ow of information in a global structure. Through this modular design, grammatical descriptions are broken down into minimal units referring to local trees of depth one, jointly constraining the set of well-formed utterances. TAG on the other hand, if viewed as a linguistic framework, has put emphasis on a different aspect of grammatical relations, namely the distinction between recursive and nonrecursive structural representations and dependencies. Non-recursive dependencies are those which are nite with respect to a lexical functor, i.e., the majority of head-complement relations. Recursive dependencies are those which can occur in a non- nite number, i.e., adjunctive constructions or certain types of complementation, such as raising or equi constructions. As a consequence of this distinction, which is mirrored in the classi cation as initial vs. auxiliary (modi er or predicative) trees, the structural units of the grammar are trees of depth greater than one in which functor-argument relations are localized. In HPSG, the notion of local vs. non-local relations is a distinction on the linguistic level which is based on the concept of a \headdomain". Non-recursive dependencies which are realized within the domain de ned by the syntactic head of a structure are considered as local. Those which go beyond this domain are either non-local (in the case of ller-gap relations) or they can be local relative to an extended head-domain, as it is the case in raising constructions in the widest sense (including socalled verb-raising phenomena). From a formal point of view this distinction cuts across the notion of recursive and nonrecursive dependencies in TAG's. Basically,

there is no di erence in HPSG between, say, a \local" head-subject relation and a \non-local" ller-gap relation, since the paths connecting the respective items in question may have to be speci ed in a recursive way in both cases. Likewise, the notion of a syntactic \head" and the notion of a \lexical anchor" are disjoint; while the former primarily determines properties of (maximal) projections, the latter is more closely bound to functor-argument relations. If, at rst sight, the two frameworks are so disparate, why try to compile the one into the other? The reason is simply that by combining the two approaches both frameworks could profit. Roughly speaking, HPSG is a strongly lexicalized grammar framework which builds on a kind of projection principle which has to be computed during processing. Since its speci cations are in terms of minimal structural units, a lot of computation has to be repeatedly performed at run-time although it could be pre-computed at compile time, as long as it is constrained to non-recursive information. TAGs on the other hand provide a framework which attempts to optimize on the distinction between recursive and non-recursive dependencies. However, it does not naturally o er the modularity of formal means to constrain the internal structure of its basic trees according to principled generalizations and more ne-grained distinctions of grammatical relations. Thus, as we will show, a compilation from HPSG will also give a principled de nition of potential auxiliary trees (including the distinction between predicative and modi er trees) as well as the notion of potential foot node. In the following we will rst brie y describe the basic constraints we assume for the HPSG input grammar and the resulting form of TAG. We then give a description of the essential algorithm determining the projection of trees from the lexicon, the termination criterion and the formal de nition of potential auxiliary tree and foot node. We show how the computation of \sub-maximal" projections can be triggered and carried out in a two-phase compilation.

2 Background

As the target of our translation we assume a Lexicalized Tree-Adjoining Grammar (LTAG), in which every elementary tree is anchored by a lexical item [SAJ88]. We do not assume, as it is the case in traditional TAGs, that root and foot nodes of an auxiliary tree are labelled identically, and that the

recursions induced by adjunction are not constrained by such an identity relation. This raises the question of the status and the identi cation of foot nodes, which would be pertinent, however, even under the identity assumption, since more than one node in the frontier could be labelled identically with the root. Dropping the constraint on the other hand offers the potential to dispense with empty nodes (such as PRO) which are introduced to motivate certain projections (S vs. VP) and it allows to treat long-distance dependencies in a homogeneous way, so that extraction out of nonclausal complements can be treated analogously to clausal extraction. Our translation process will yield a lexicalized feature-based TAG [VSJ88] in which feature structures are associated with nodes in the frontier of trees and two feature structures (top and bottom) with nodes in the interior. Following [VS92], the relationship between such top and bottom feature structures represent underspeci ed domination links. Two nodes standing in this domination relation can be the same, and would be di erent nodes if adjoining takes place. Adjoining enables the separation by having the path from the root to the foot node in an auxiliary tree as further specifying the underspeci ed domination link at the site of adjunction. As input to our translation we basically assume an extended HPSG following the speci cations in [PS94][404 ]. The basic rule schemata we consider here comprise two rules for complementation, covering head-subject and headcomplement relations, one schema for headadjunct relations and one for ller-head relations. In our implementation all schemata have been simpli ed to binary branching (a ecting in particular the head-comps schema) to allow for a parametrization for con gurational and noncon gurational languages. We assume a slightly modi ed and constrained treatment of non-local dependencies, in which empty nodes are eliminated and replaced by a lexical (or unary) rule. The non-local feature SLASH we assume to be list-valued, its introduction and percolation being determined by the following modi ed Non-Local-Principle (referring to SLASH only). Slash-Principle In a ller-head-schema the SLASH value of the mother is equal to the SLASH feature of the HEAD-DTR minus the SYNSEM value of the FILLER-DTR; else, it is equal to the HEAD-DTR's SLASH value.

78

Slash termination is accounted for by an optional unary rule, which eliminates one of the elements from the valency features if it can be co-indexed with the SLASH value of the mother. The percolation of SLASH across head domains is lexically determined. Most lexical items will be speci ed lexically as having an empty SLASH list. Bridge verbs or other heads allowing extraction out of a complement co-index their own SLASH value with the SLASH of the respective complement.2 Equi verbs we assume to take a non- nite verbal complement with a non-empty SUBJ list. For raising verbs (comprising also auxiliaries and modals), which in languages like German can combine with verbs which do not require a subject, and which can form verb clusters with their governed verbs, we postulate the following generalized structure. 33 2 2 SUBJ 1  SUBJ COMPS S|L|C COMPS

4S|L|C4

1 2





2

55

Finally, we assume that rule schemata and principles have been compiled out (manually or automatically) to yield subtypes or instances of rule schemata. This does not involve a loss of generalization but simply means a further re nement of the type lattice. LP constraints could be compiled out beforehand or during the compilation of TAGs, since the algorithm is lexicon driven, i.e., initiates the process with a lexical type, and the length of the daughters list will be known at the time when a rule schema is applied.

3 Algorithm 3.1 Basic Idea

We will now discuss the relevant steps of the algorithm. The goal is to derive non-local trees (of depth  1 ) which explicitly encode the functor argument structure implicitly de ned in an HPSG. While in TAG all functor argument relations are represented in one single structure, the `functional application' in HPSG is distributed over individual rule schemata. Therefore we have to identify which constituents in a rule schema count as functors and arguments. 2 As far as we can see, the only limitation arising from the percolation of SLASH along head-projections is on extraction out of adjuncts, which may be desirablefor some languages. On the other hand, these constructionswould have to be treated by multi-component TAGs, which are not covered by the intended interpretation of the compilation algorithm anyway.

In TAG di erent functor argument relations, such as head-complement, head-modi er etc., are represented in the same format as branches of a trunk projected from a lexical anchor. As mentioned, this anchor is not equivalent to the HPSG notion of a head; in a tree projected from a modi er, for example, a non-head (ADJ-DTR) counts as a functor. We therefore have to generalize over di erent types of daughters in HPSG and de ne a general notion of a functor. We compute the functor-argument structure on the basis of a general selection relation. Following [Kas92], we adopt the notion of a selector daughter (SD), which contains a selector feature (SF) whose value constrains the argument (or nonselector) daughter (non-SD). We make the assumption that each rule schema can be viewed as de ning a local tree (of depth one), such that with each rule schema we can identify a single daughter, the selector daughter, that serves as the functor daughter and selects the remaining daughter(s) as argument(s). Each SD in a rule schema has an SF whose value constrains (and de nes) the argument daughters. The value of an SF (say f) of an SD having selected the argument daughters undergoes a \reduction" wrt. the mother node, i.e., the information contained in the value of f of the SD that selects the argument daughters is not shared with the mother node, while the remaining selector information is shared with the mother node. For these reasons, we say that the rule schema in question has reduced the SF, f. The head-comps schema reduces COMPS and the head-subject schema the SUBJ attribute on the basis of the valency principle. The head ller schema reduces SLASH according to the slash-principle. In all these cases, the headdaughter is the SD. In a head-adjunct schema, the adjunct daughter is the SD constraining through its MOD feature the selected (head-) daughter. This selection information no longer appears at the mother node and hence we say that MOD has been reduced by this schema. For the basic schemata assumed we can thus identify the set of SFs to comprise the valency features, SUBJ and COMPS, SLASH and the MOD feature. Note that complementation in TAG can be expressed by predicative or initial trees, modi cation (MOD) corresponds to modi er trees, and SLASH as an SF can be used in order to localize potential non-local dependencies. We further assume that in the expanded lexical types, which are the starting point of the

79

algorithm, no SF is fully unspeci ed. Furthermore no rule will ignore an SF, i.e., it will reduce at least one SF and pass on the remaining SFs from either the SD or one of the non-SDs. These assumptions guarantee that an unspeci ed SF at the root node of a produced tree will be at least partly co-indexed with an SF of a node in the frontier. At this point we can give a general description of how we obtain the lexicalized elementary trees from the lexicon and the rule schemata reducing SF's. Given a lexical item (type) and the selection information associated with it, we produce the trees by using the rule schemata to reduce all and only the selection information associated with the lexical item. Each instance of the rule schemata used will be manifested as a local tree and every single piece of the selection information is reduced exactly once. Basic Algorithm Take a lexical type L and initialize by creating a node with this type. Add a node dominating this node. For any schema S in which speci ed SFs of are reduced, try to instantiate S with corresponding to the SD of S. Add a node dominating the new root node. (The domination links are introduced to allow for the possibility of adjoining.) Repeat this step until no further reduction is possible. Thus, the trees produced have a trunk from the lexical anchor (node for the given lexical type) to the root. The nodes that are siblings of nodes on the trunk, the selected daughters, are not elaborated further and will serve as foot nodes or substitution nodes. In a TAG, a derivation will involve the substitution operation, which would insert trees below substitution nodes, or the adjoining operation, which identi es the root and foot node of an auxiliary tree with the top and bottom of a domination link, respectively. There are several details that we need to ll in: such as what information should be raised across the domination links that are introduced to allow for adjoining, how do we determine whether a tree is auxiliary or initial and in the former case, how do we detect which node can serve as foot nodes, and nally what is the termination criterion, i.e., how do we detect that there is nothing more to be reduced. n

n

n

n

licence the instantiation of an SD. If no SF were raised, we would loose any information about the saturation status of a functor, and the algorithm would terminate after the rst iteration. Of course, if we only raise the SFs there is a potential for over-generation, in the sense that some trees may never participate in derivations, leading to unproductive trees. If, on the other hand, we raised more than the SFs, there is a danger that we under-generate and thus lose completeness. For example, the head-subject-schema in German would typically constrain a verbal head to be nite. Raising HEAD features would block its application to non- nite verbs and we would not produce the trees required for raising-verb adjunction. The ultimate reason for this is again that heads in HPSG are not equivalent to lexical anchors in TAG, and that other local properties of the top and bottom of a domination link can be changed through adjunction. Therefore HEAD features, but also other LOCAL features cannot, in general, be raised across domination links. For now, we will therefore assume that we raise only the selection information, given by the SFs. A question arises whether some or all selection information must be preserved from the root to the foot. Raising all SFs produces only fully saturated functor argument structures and would require that any auxiliary tree that can be adjoined must preserve all selection information. Since this need not be the case, we insist that only some selection information needs to be preserved. Below, we will show how it can be determined in a multi-step process when and what to raise across domination links.

3.3 Detecting Auxiliary Trees and Foot Nodes

Recall that the traditional de nition of an auxiliary tree is that it gives a minimal recursive structure, enabling the factoring of recursion in TAG. They are said to be recursive structures because there is a node in the frontier, the foot node, that is labelled identically with the root (using labels such as and ). Factoring of recursion into the auxiliary trees thus also means that each auxiliary tree de nes a path (from the root to the foot) where we have the same selection information at the ends of the 3.2 Raising Features Across Dom- path. Although, we do not assume that the selecination Links tion information is de ned on the basis of such Quite obviously, we must raise the SFs, since labels but rather by the SFs and their values, they determine the applicability of a schema and we too adopt the notion that the auxiliary trees 80

S; N P;

V P

factor recursion in that they de ne a path from the root to the foot where (some part of) the selection information is preserved. Thus, we can detect that a tree produced is an auxiliary tree if the root and some frontier node have some selection information in common. In this case, the frontier node in question would be a potential foot node. Initial trees, of course, then are just those trees which have no such frontier nodes. For example, in the adverbial modi er trees, the head-daughter functions as the foot node as all its SFs are co-indexed with the root. This co-indexation also occurs with trees lexically anchored in raising verbs, the di erence being that here the foot node is a COMP-DTR. Similarly, equi verbs or other bridge lexemes can project into an auxiliary tree if the SLASH value of a COMP-DTR is co-indexed with the root node. However, it is worth noting that in the case of the equi-verb tree, only some of the SF values are co-indexed, since the selection information given by the value for SUBJ is not shared. "

T1

SUBJ COMPS SLASH

< >

#

< >

QQ H C QQQ  1

2

"

D SUBJ COMPS SLASH




will be reduced completely (to an empty list or atom) by applications of appropriate rule schemata. But it may be the case that an SF (or its reduction) bears an unspeci ed value. A typical case arises from the head-modi er-schema, whose SFs are all unspeci ed at the root. Similarly, the trees projected from raising verbs in German do not have a speci ed SUBJ, bridge verbs do not have a speci ed SLASH value. To the root nodes of these trees basically all rule schemata are applicable in the sense that their SD would unify with the speci cation. Since applying some rule schemata could result in an unspeci ed SF at their root, we may run into an in nite recursion. This kind of recursion is also linguistically unmotivated, since it implies that a functor could be applied to an argument which it never explicitly selected. Simply blocking further reduction of an SF as soon as its value is unspeci ed will not work for con gurations where the value of an SF is introduced into the tree by a node that is not on the trunk, i.e., non-SDs. This is the case if a lexical item constrains a complement to have a nonempty SF without (semantically) licencing this argument itself, such as English VP adverbs but also simple raising verbs, which can only adjoin at a VP level. "

#

T2

1

H

"

D SUBJ COMPS SLASH

<


#

>

COMPS SLASH




#

2 < >

QQ H A QQQ   " SUBJ

< >

QQ C   " SUBJQQQ

SUBJ COMPS SLASH

#

D

2 SUBJ 4 COMPS

< >

1



< > < >

SLASH MOD

1

1

< >

The distinction between modi er and predicative trees can be read o the DTRS label of the foot node, i.e., HEAD-DTR in the former and nonhead daughter in the latter case.

3.4 Termination

COMPS SLASH

5

2




#

< >



3

1

adverb

equi verb

3

3

"

T3

SUBJ COMPS SLASH

1

#

< >

QQ C H QQ  " Q

D

3

2

SUBJ COMPS SLASH

1

#

Recall that our aim is to start with a lexical item 3 " SUBJ 1 [ ] # (or type) and use exactly those rule schemata  2 COMPS that will reduce all and only the selection inforSLASH 3 mation associated with the given lexical type. raising verb Thus our process has to terminate when all SFs have \irreducible" values. However, the above structures have in common Normally an SF has a speci ed value that that all SFs which we want to exempt from fur


< >

ther reduction carry an index which also occurs on a non-trunk (frontier) node. Thus, our termination criterion will simply be as follows: Termination Criterion The value of an SF at the root node of a tree is not reduced further, if it is co-indexed with the value of at some non-trunk node in the frontier. Intuitively, this means that an argument is not realized in a basic tree, if it is guaranteed that it must be realized elsewhere in another tree. For initial trees, it is never the case that an argument selected by the anchor can be realized elsewhere, because by de nition the selection of arguments is not passed on to a node in the frontier. Selectional information coming in from a foot and being shared by the root of an auxiliary tree therefore eventually has to be realized by some initial tree. What we obtain from this criterion is also a notion of local completeness . A tree is locally complete as soon as all arguments which it licences and which are not licenced elsewhere are realized. Global completeness is guaranteed because the notion of \elsewhere" is only and always de ned for auxiliary trees, which have to adjoin into an initial tree.

reduced for SUBJ. From the information on the foot node we can infer that a SUBJ SF should not be raised, if we have a bottom of a domination link that has a list of length one as a value for SUBJ but only if in addition the COMPS' list is empty. Having then an empty SUBJ list associated with the top of the domination link would re ect the relationship between the root and foot of the auxiliary tree. Thus, the relationship between foot and root nodes gives us exactly the kind of information we need. This leads to the following multi-step compilation algorithm. In the rst phase, we raise all SF. We decide which trees are auxiliary trees and then note the relationship between the selection information associated with the root and foot in these auxiliary trees. In the additional phase, we start with lexical types and consider the application of sequences of rule schemata as before. Any time we reach a bottom of a domination link, we examine the selection information with respect to foot nodes of trees with di ering selection information at foot and root. When we nd a node immediately after applying a rule schema that is compatible with such a foot node, we raise the selection information as determined by the relationship between the root and foot of the auxiliary tree in question. This process may need iteration based on the trees produced. However, one can note that all auxiliary trees will have top of root with fully reduced selection information or shared with foot node3 . Hence the process of iterating the phases will terminate with the number of iterations necessary based on the (bounded amount of) selection information contained in lexical entries.

f

f

3.5 Additional Phases

Above, we noted that the preservation of some selection information along a path (realized as a path from the root to the foot of an auxiliary tree) need not imply that all selection information need be preserved along that path. We could have paths in the derivation trees where only some but not all selection information is raised across domination links in the compiled TAG structures. All selection information could be raised in general i only the rule schemata reduced selection information. However, lexical items like equi-verbs trigger the reduction of an SF by taking a complement that is unsaturated for SUBJ but never share this value with one of its own SF. Of course, one could simply systematically raise any part of selection information across domination links, which would lead to a large number of unproductive trees. Instead, we need better guidance in determining when we need to raise only some part (and exactly which parts of) the selection information. Take the example of a tree projected from an equi-verb (T1 above). The foot node (i.e., the VP complement) di ers from the root by requiring a nonempty SUBJ list whereas the root is

References

[Kas92] Robert Kasper. On Compiling Head Driven Phrase Structure Grammar into Lexicalized Tree Adjoining Grammar. In Proceedings of the 2nd Workshop on TAGs, Philadelphia, 1992. [PS94] Carl Pollard and Ivan Sag. Head Driven Phrase Structure Grammar. CSLI, Stanford & University of Chicago Press, 1994. [SAJ88] Y. Schabes, A. Abeille, and A. K. Joshi. New Parsing Strategies for Tree Adjoining Grammars. In: COLING-88. [VS92] K. Vijay-Shanker. Using Descriptions of Trees in a TAG. Computational Linguistics, 18(4):481{517, 1992. [VSJ88] K. Vijay-Shanker and A. K. Joshi. Feature Structure Based Tree Adjoining Grammars. In: COLING-88. 3 This can easily be established by induction. At the end of the rst phase (the base case for the proof) this holds by the design of algorithm and the termination condition. By using induction, we can show this holds after any phase.

82