Dependency Parsing for Information Retrieval.

1984 TOC

Search DEPENDENCY PARSING FOR INFORMATION RETRIEVAL

D. P. Metzler T. Noreault L. Richey B. Heidorn Information Science Department University of Pittsburgh Pittsburgh, PA 15260 USA This paper describes the development of a parser based on the Moulton and Robinson (1981) dependency theory of syntax, and several strategies by which we are attempting to apply the outputs of this parser to the processes of Information Retrieval. We first discuss the limits of present Information Retrieval theory and the potential benefits of linguistic analysis for Information Retrieval. Next we briefly present the Moulton and Robinson theory, contrast it to rewrite rule based theories, and outline its general advantages as an approach to natural language processing. Next we describe the parser we have implemented based on the Moulton and Robinson theory, and some of the implementation issues we have addressed. Finally, we discuss several strategies by which this parser could be applied to Information Retrieval, and the problems involved in this application. i. INFORMATION RETRIEVAL:

THE LIMITS OF KEY WORD APPROACHES

Present Information Retrieval techniques are based on the matching of key words in a query with key words in a document or document representation.

Techniques such as probabilistic indexing or the use of

thesaurus relations are used to extend the range of useful relations between terms, but these techniques are all based on overall statistical relationships among the terms of the collection or the language as a whole.

These techniques do not offer any way of capturing the shifts in

meaning of a term as it is used in different contexts or the meanings of combinations of terms as they are used together. This inability to represent differences in how terms are used in a document or query places an upper bound on the performance of Information Retrieval systems.

In fact, an analysis of experimental results

shows that new techniques, while achieving statistically significant improvement in performance, ance

offer only slight gains in absolute perform-

(e.g., McGill et al., 1979).

We feel that the existing techniques

employed in Information Retrieval research have succeeded in effectively

£_

Metzler et al. : Dependency parsing

utilizing

314

the information available within the keyword approach

document representation. necessary

To gain substantial

to perform a deeper analysis

improvement,

to

it will be

of both the document and the query

in order to obtain a more precise match. Two obvious categories

of such potential

are syntactic and semantic analyses

"deeper analyses"

of the document and of the natural

language expression of the query, however there are difficulties of these approaches.

Semantic processing

such open ended applications

with each

is simply not yet feasible for

as Information Retrieval.

Syntactic

parsing has also not proven to be very useful for Information Retrieval. In part this has been due simply to the lack of a parser that is efficient enough to handle large amounts incomplete

or ungrammatical

of separate sentences.

of full text, or flexible enough to handle

strings,

Moreover,

or the relationships

it is quite difficult

among the words

to use the level

of detail and complexity provided by a standard parser without some means of extracting

generalizations

over the syntactic

and Robinson based parser promises

structures.

The Moulton

to address all of these issues.

2. THE MOULTON AND ROBINSON MODELS There are two essential aspects of the Moulton and Robinson theory.

There is a structural model which refers

underlying

conceptual relations

that are presumed

parsing and a processing model which refers

to the nature of the to result from syntactic

to the nature of the parsing

process itself. 2.1 Underlxing Representations:

The Structural Model

The Moulton and Robinson theory, which is a particularly version of syntactic dependency grammar, that underlying

conceptual

scope and dependency. specification other.

representations

(e.g., Hudson,

pure

1976) suggests

encode only two relations,

Scope, which is usually binary,

refers to the

of which words or concepts are immediately

related to each

In a phrase such as "fire engine dog," "fire" and "engine" are in

each other's scope, and together specify a meaning which only as a unit relates to "dog."

Scope,

structure describing

therefore,

specifies

the relationships

a nonordered hierarchical

among the words of the sentence.

Within a pair of words or constituents other's scope,

it is almost always

dominant role in determining that concept.

that are in each

the case that one element plays a

the pair meaning, while the other modifies

Thus, "fire engine" refers to an engine of a particular

Metzler et al." Dependency parsing sort.

Taken together,

engine dog." whole refers engine.

this compound modifies

315 "dog" in the phrase "fire

Since "dog" is dominant in this phrase,

the phrase as a

to a kind of a dog, rather than a kind of fire or a kind of

Each binary pair of concepts

marked with respect to dependency. are indicated with an asterisk.

that share each other's

scope are

In figure i., the dominant branches

Moulton and Robinson contend that with

an adequate reliance on semantics is enough to specify the meaning

and pragmatics of sentences

this very simple syntax

and larger units of

language. 2.2 The Processing Model The second aspect of the Moulton and Robinson theory concerns the nature of the syntactic processes which mediate between the underlying hierarchical

structures

and surface strings.

The essential nature

of the Moulton and Robinson model of parsing is that it minimizes role of external abstract rules applied to the linguistic necessary

(e.g., phrase grammar rules)

string,

to apply those rules.

extremely simple.

and the control structures The control structure

It substitutes

are more usually encoded as rules.

problems

that are that are

of the system is

for rule and control complexity

degree of data complexity which in effect encodes

high degree of modularity

a high

the relationships

that

This difference provides an extremely

and flexibility and generally avoids the

of rule ordering which have plagued other computational

formal approaches

the

and

to language parsing.

The data structures which carry the bulk of the computational mechanisms

in this model are individual modules.

Each module can be

thought of as a four sided entity in that it has four edges by which it can be linked to other modules.

Each edge contains codes which enable two

Figure i

i* [ fire

--

I*

l

I dog

engine

I

llHlllllflll II1

I


316

modules to be linked if and only if they have matching codes. modules have lexical codes representations)

(potentially

Lexical

links to their semantic

on their bottom edges.

The top edges of lexical modules,

like all edges of the structure modules,

contain abstract codes whose only

meaning is determined by the connections which they permit.

The sides of

lexical modules are blank. The majority accounted

of the complexity

for by abstract

abstract codes.

of syntactic structure

is

structure modules whose edges contain only

The work of the parser is reduced to finding a set of

structure modules

that can be connected

to the lexeme modules of an input

string such that no outside edges are left with unmatched obligatory connection codes.

(Edges may have optional codes or no codes at all.)

A few simple topological representations

rules allow for stretching unordered underlying

to map them onto the linear input string.

The underlying directly available abstracting determines

conceptual structure

of an input string is

from a completed structural model of the input by

the hierarchical

structure.

The branching

of the hierarchy

scope relations while dependency is determined by which

element of a scoped pair is linked to the higher constituents structure.

(See figure 2.)

Variation in the allowable

of the

structures

permitted by the existence of optional codes on the edges. It is important structures

to note that although

the underlying

of the Moulton and Robinson model are much sparser than

Figure 2 N2- 7, i0 N5 N5

N5 fire

N5 N2-7,10 N5 N5 N6 engine

N6 dog

is


317

conventional syntactic trees, the parser modules implicitly contain all the information required for conventional syntactic parsing,

and, in fact,

it is not difficult to derive conventional descriptions from the parser. One thus has available two levels of analysis: scope and dependency descriptions,

the Moulton and Robinson

and conventional constituent labels.

It is possible not only to move from one to the other as the needs of a language processing strategy dictate, but also to augment or alter the descriptions at one level according to the information supplied by the other level.

For instance, one might traverse scope and dependency trees

with checks based on the nature of particular constituents. 3. THE IMPLEMENTATION We have begun work on an implementation of this theory, based not only on parser

the work of Moulton and Robinson, but also on the SCRYP

(Gruenewald,

1981).

Like Gurenewald, we have adapted the Moulton

and Robinson processing model by writing all modules in terms of mandatory binary structures, optional parts.

rather than tertiary structures with

We are interested also in implementing explicit

heuristics of the sort that people use (Clark and Clark, 1977) not only because of our interest in cognitive simulation but also because we are concerned with developing a computationally feasible parser for practical applications. The parse is accomplished by searching for a way in which the lexical modules

that correspond to the string and some structural

modules can be connected in a hierarchical structure that preserves relationships of the original input.

the

(The underlying structure itself is

unordered, but its hierarchical structure is determined by the surface string order.) string.

First,

lexical modules are assigned to the words of the

Next, each of the pairs of lexical modules are connected in any

possible way that preserves

the word order of the original string.

Later

passes successively add these simple structures together in more and more complex ways, always constrained by the need to maintain the correct surface order of the original string.

Thus,

the model is basically a

bottom up parser, but combines certain advantages of both bottom up and top down parsers with some advantages that are not found in either of these two as they have been traditionally implemented.

For instance,

although the parser is basically bottom up, the fact that all possible pairwise ordered combinations of terms are explored in the first pass

Metzler e t al. : Dependency parsing

318

means that a high level relationship such as that between a subject and a main verb can be built immediately.

These high level relations would

then be available for confirmation by general semantic or pragmatic processes.

In a typical bottom up parser, such high level relationships

cannot be entertained until all the lower level structures have already been built.

In a typical top down parser, high level relationships are

hypothesized at an early stage, but these relationships are not tied to any data in the string until the lower level structures have been built. This type of parser is however subject to the combinatorial problem since there is the possibility of an exponentially increasing number of structures built on each pass unless there are strong constraints on the ways that structures are allowed to combine.

Two ways

to control this problem are (i) by locally restricting the ways that modules are permitted to combine, and (2) by using heuristic rules to: determine the order in which structures are built, restrict the number of structures built, and/or restrict the number of structures that are maintained for consideration.

It has been our experience so far that

as we increase the number of structural modules,

the increasing

specificity of the modules offsets to some degree the combinatorial increase.

Since we do not believe that this local restriction alone will

adequately constrain the combinatorial explosion we have begun work on the explicit use of heuristics to improve efficiency. We have so far partially implemented two of the "syntactic strategies" discussed by Clark and Clark

(1977) that relate to the

separation of noun phrase and prepositional phrase processing from that of the entire input string.

The explicit use of these strategies, with

a look ahead for the end of a constituent,

and a recursive call to the

parser to deal only with the local constituent, has resulted in large reductions in the number of structures generated and processing time, and, in fact, may roughly

reduce the problem to linear complexity.

In addition, we are starting to develop several other strategies which are basically computational in their nature, although they too can be related to psychological considerations.

One strategy,

for instance, would maintain a window of recently developed structures in which one would look for the terms of a new input string.

When a term

is found in this structure, modules would be tried first that could replicate the structures in which the terms was recently used.


319

4. STRATEGIES FOR INFORMATION RETRIEVAL This parser promises to be useful for Information Retrieval for a variety of reasons.

First, its speed would allow for either

preprocessing of large amounts of full text or real time processing of full text surrounding retrieved key words for the purposes of estimating their relevance.

Second,

the formal simplicity of its output suggests a

variety of heuristic strategies

for estimating the relatedness between

the uses of a term in a text and in a query.

Third, the parser can handle

incomplete or ungrammatical strings. These factors, and the nature of the parser suggest a variety of information retrieval strategies.

Some of these make use only

of the dependency and scope relations of the Moulton and Robinson theory while others utilize some additional conventional syntactic information mediated by or made available through the dependency parser. Although it is expected that these strategies will prove useful in estimating the relevance of the use of terms in a query to the use of those terms in a text, thus improving precision,

it is

anticipated that some of these strategies will reject relevant documents. The empirical test of these costs and the comparison to the costs of other means of improving precision,

such as including additional query

terms, awaits the implementation of a relatively complete version of this paper. Some of the general approaches we have been exploring and illustrations of their implementation follow.

In general, we are planning

an Information Retrieval environment in which the user has available all the standard facilities,

(e.g., Boolean combinations of terms, stemming,

etc.), but in addition has the facility to utilize a limited set of basic natural language structures such as noun phrases

(including prepositional

phrases), simple sentences, and simple embedded clauses, relations between terms

to express the

(including adjectives and verbs).

Such complex

terms could themselves be treated as units by the conventional Information Retrieval processes. 4.1 Pattern Matching Our original hope was that the formal simplicity of the Moulton and Robinson model would

lend itself fairly directly to

relatively simple pattern matching. this approach.

|

Figure 3 illustrates

the power of

A query for "approximate string matching" would match

If

fill

I

IIIll II

Metzler et al." Dependency parsing

Figure

320

3

I approximate I string

approximate matching

matching

-"l

[ of

I approximate

strings

I*

I

matching

L I

I

matching

approximate

approximations

[ number

i of

I

matching

strings


321

"approximate matching of strings" perfectly in terms of the scope and dependency relations among the three terms.

In addition, the string

"approximate matching" matches the query in terms of the scope and dependency relations among the two terms which are present.

Clearly,

this permits a more delicate specification of a query than does the simple use of Boolean operators.

Permitting the query to match text

with a subset of the query terms widens the recall of the query, as would adding an OR term, while insisting that the terms have the appropriate scope and dependency relations tends to restrict the matching to appropriate uses of the terms in the text.

"Matching approximations"

and "approximate number of matching strings" are two examples of text that do not match the query in terms of scope and dependency.

The latter

is particularly interesting in that the terms of the query and text appear in identical order. The implementation of this approach is not without difficulties, however.

These are due principly to the fact that the

relationship between two terms is not only dependent on their structural relationship, but also on the semantics of the other terms involved in the structure.

We are exploring the possibility that the structural

relations of terms in a query can be matched against the relations of those terms in a text utilizing a combination of general pattern matching procedures and special word and word class specific rules. The preposition "by" illustrates this point.

The parser

treats all prepositions as dominant over the noun phrase portion of prepositional phrases.

This can be useful, for instance, in

distinguishing between direct objects, which are the most dominant term under a transitive verb, and indirect objects, which are parsed as modifying a preposition.

However, in a passive construction,

the most

dominant term in the sentence structure becomes the prepositional phrase whose head is the word "by."

To identify the head noun of such a

construction as the head concept of the sentence, it is necessary either to attach special procedures matching procedures

to the word "by" or to modify the pattern

to look for reduced patterns consisting only of words

of a particular class, especially nouns.

We are currently investigating

both approaches. 4.2 Indices of Relatedness Rather than trying to match patterns directly, it is possible


322

to derive summary estimates of the similarity of relatedness between pairs of query terms and pairs of terms in candidate documents.

A first

pass at such an index would assign a high value when a pair appears in each with the same dependency relation, a low value when the dependency is reversed

(e.g., "fire engines" vs. "engine fires"), and an intermediate

score when the dependency relation in one or both is not determined. One refinement of this approach would make use of the hierarchical distance between the terms in the two structures. 4.3 Weighting the Importance of Terms A variety of strategies involve using syntactic information to weight the importance of individual query terms found in the text.

One

particularly simple but promising strategy is to discount any noun term found to be dependent on another noun, unless the dominant noun is also a query term.

This strategy,

carried out within noun phrases, would

discount terms used only to modify other nouns, whether by noun noun modification or prepositional phrase.

A text which contained "fire

engine" but not "fire" would be unlikely to be directly relevant to a "fire" query.

Similarly, a text that contained "skyscrapers in Seattle,"

but not "Seattle" as a head of a noun phrase, would be unlikely to be related to a query which contained "Seattle" without "skyscrapers." This strategy can also be carried out on the sentential level.

At this level it has the effect of demanding that any query noun

be found as the dominant noun

(e.g., underlying subject) in the text,

unless the dominant noun of the sentence is also a query term.

The

assumption here, of course, is that a relevant document is likely to mention query terms as the topics of sentences. 4.4 Isolation of Key Portions of Text Variants of the previous approach isolate for consideration only the more important portions of text.

Such strategies might,

instance, ignore key words found in embedded clauses included embedded clauses) in a sentence.

for

(unless the query

or ignore any but the two most dominant nouns

(These nouns are typically the underlying subject and

direct object.) 4.5 Simple Sentences The previous strategies have focused essentially on the relations among nouns in a text.

Some variants of these approaches also

allow for the graceful use of adjectives since they can be considered in

II

!


323

relation to the particular noun on which they are dependent.

The

dependency parser also allows one to utilize the relationship between two

(or more) nouns specified by a verb.

One might,

interested in retrieving possible examples Germany, without retrieving

the numerous

be

of attacks by Poland on

documents

The parser produces very similar underlying

for example,

concerning

dependency

the reverse.

structures

not

only for active and passive sentences, but also for active and passive nominalizations

of these concepts such as "Poland's attack on Germany"

or "the attack by Poland on Germany."

In each case, "Poland" is the

dominant content word of the construction, "Poland,"

and "Germany"

"attack" is dependent

is dependent on "attack."

with a Boolean combination

on

Rather than querying

of "Poland," "Germany" and "attack," which

would not specify the nature of the relationships

among these concepts,

one would query with the simple sentence "Poland attacks Germany." course verbs introduce new problems as this example may illustrate,

of synonymy and paraphrase;

Of

however,

these problems may be less severe than

those one may face in the absence of verb specification. 4.6 Cross Sentence Relations The Moulton and Robinson theory offers no structural analysis of connected discourse constituent

structure

pronoun reference.

in the sense of a hierarchical of large units of text.

It is, however,

description

Nor does it address

possible to overlay the descriptions

of individual sentences when they contain overlapping nouns. lapping structural descriptions

can provide dependency

These over-

information

concerning the relations between terms in separate sentences. intersentence relations,

of the

Although

relations may be harder to specify than intrasentence

they may nonetheless

prove useful,

for instance in conjunction

with strategy 4.2. REFERENCES Clark, H. & Clark, E. (1977). Psychology and Language. Harcourt Brace Jovanovich. Gruenewald, Hudson,

New York:

P. J. (1981). SCRYP, the syntax crystal parser: implementation. In Moulton and Robinson.

R. A. (1976). Arguments Chicago: University

for a Non-Transformational of Chicago Press.

a computer Grammar.

McGill, M., Koll, M. & Noreault, T. (1979). An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems. Final Report to the National Science Foundation, NSF-IST-78-10454.

Metzler e t

al. •

Dependency parsing

Moulton, J. & Robinson, G. (1981). The Organization of Language. York: Cambridge University Press.

324 New