Semantic bootstrapping of type-logical grammar - CiteSeerX

Semantic bootstrapping of type-logical grammar Sean A. Fulop ([email protected]) Departments of Linguistics and Computer Science, The University of Chicago Abstract. A procedure is described which induces type-logical grammar lexicons from sentences annotated with skeletal terms of the simply typed lambda calculus. A generalized formulae-as-types correspondence is exploited to obtain all the typelogical proofs of the sample sentences from their lambda terms, and the resulting lexicons are then optimally unified, which effectively unifies the syntactic categories of words which have the same syntactic behavior evident in the induced structures. This effort extends the earlier induction of such lexicons for classical categorial grammar (Buszkowski and Penn, 1990) to at first the non-associative Lambek calculus, and then to a large class of type logics enriched by modal operators and structural rules. The motivation for this approach is linguistic—we have implemented a theoretically operational procedure for semantic bootstrapping of natural language syntax, which is the first one in any setting with sufficient scope to meet the demands of descriptively adequate natural language grammars. One of the main points of the enterprise is that the syntactic and semantic categories operating in the language are learned, in direct opposition to more familiar grammar induction procedures which begin with a fixed set of categories and frequently part-of-speech tagged data as well. This general approach could be used by linguists to learn something about lexical categories and to develop linguistically insightful complete grammars for large fragments.

1. Introduction and preliminaries When it comes to automatically learning a syntactic grammar for a natural language, many different approaches have been tried. This paper puts forth another contribution to this area of research, one which offers the promise of results that could inform linguistic theory about the nature of language. Given the highly lexicalized nature of categorial grammars, the focus here is on lexical inference—ultimately learning the system of lexical categories operating in the language from information about the syntactic and semantic distribution of the lexemes. This effort should be viewed as a computational implementation of a linguistically motivated procedure; the practical limitations of the approach are manifold, and it will be some time before this program of research bears fruit for the applied computational linguist. Starting from the notions of type-theoretic semantics (i.e. the lambda calculus) and categorial grammar, we introduce type-logical grammar (e.g. Moortgat, 1997) as a framework for syntactic analysis of natural language. Noticing the similarities between type-logical syntax and type-theoretic semantics leads to a natural connection between the two, c 2002 Kluwer Academic Publishers. Printed in the Netherlands. °

semboot1.tex; 17/09/2002; 13:50; p.1

2

Fulop

a generalization of the Curry-Howard morphism which provides a foundation for the syntax-semantics connection (van Benthem, 1991; Moortgat, 1997). This connection is then used to drive a new procedure for learning observationally adequate and linguistically perspicuous syntactic grammars from term-labeled strings—sentences annotated by skeletal lambda calculus terms. The origin of this idea is in theoretical psycholinguistics; a leading suggestion in cognitive science has been that humans learn syntax using some sort of “semantic bootstrapping” (Pinker, 1984). The procedure is so far of largely theoretical rather than practical interest because of its inefficiency, but since it is the first algorithm in the literature that could conceivably learn a linguistically interesting grammar for an entire natural language and is herein proven to terminate, there is considerable reason to work toward improving the performance specifications of the algorithms. Elsewhere (Fulop, 2002), the basic problem has been proven learnable—identifiable in the limit in Gold’s sense (Gold, 1967). Although space constraints prevent discussion of the learnability proofs here, the learning algorithms are presented in detailed form. An important element of the algorithms is the application of Buszkowski and Penn’s (1990) notion of optimal unification of syntactic categories; it is this idea that permits the procedure to learn a system of syntactic categories in accord with distributional evidence in the data. This is opposite in spirit to the more common means of grammar induction using treebank input data—in a treebank, the parts of speech are already assigned and used to show the syntactic structure, but in this enterprise we are trying to learn what the parts of speech are from minimal initial assumptions. The view is extreme— saying essentially that linguists have been arrogant in assuming that they know what the key syntactic category differences are in natural language, and that we should now forget those assumptions and find out about the lexicon using the procedures to be described. We will assume a basic familiarity with the simply typed lambda calculus. Our set T yp of semantic types is recursively defined as follows: P r ⊂ T yp, where P r is the set of primitive types; if σ, τ ∈ T yp then (τ ← σ) ∈ T yp. Traditional type notation would write the above function type as σ → τ , but we are going to employ an unusual “result-first” style (c.f. Andrews, 1986). We skip a definition of lambda terms, but point out that both application terms and abstraction terms should be available to us in the formation of semantic recipes for the learner. The lambda calculus terms themselves provide us with a compositional means of representing meanings of a natural language. The meanings of words are represented schematically using the words typeset in bold face, and are

semboot1.tex; 17/09/2002; 13:50; p.2

Semantic bootstrapping

3

constrained to be atomic terms. One can’t really investigate meanings using this kind of schematic representation; eventually something must be said about what words in fact mean. We will remain uncommitted here, since that is the province of semantic theory. Our only concern is the way in which meanings are composed, not the actual meanings themselves or the question of whether they compose to provide a truth-functionally and linguistically correct meaning for the composite. We assume a basic familiarity with categorial grammar; it would help if the reader as well has experience with logical deductive systems including Gentzen’s sequent calculus. In a logical approach to categorial grammar, every syntactic category is viewed as corresponding to some type in a type theory. Recall that types are inhabited by expressions; so just as a word or phrase will inhabit a semantic type, so will it inhabit a syntactic type.

2. An approach to learning categorial grammars Every grammar learning procedure provides some information in the form of positive examples (and perhaps also negative ones) and learns what is not provided, with the goal of learning a grammar that generates the language containing the positive examples. There is a wide range of variation in the nature of these discovery procedures, with respect to what there is to be learned and what is to be learned from. The particular endeavor of learning a categorial grammar is accordingly not a single idea or procedure, and there is again a wide range of possible discovery procedures which can be contemplated for categorial grammars. This paper presents in detail a discovery procedure for categorial grammars in their type-logical form which occupies a unique position in this learning landscape insofar as what is learned and what is learned from. The learning scheme to be described here clearly descends from the methods of Buszkowski and Penn (1990), and can be viewed as an updating of their proposals which is generalizable across most of the modern type-logical grammars that extend the categorial grammars of Bar-Hillel and Lambek.1 Buszkowski and Penn, building on earlier work by Buszkowski (1987), described a two-stage procedure that first discovers a categorial lexicon in most general form from syntactic structures annotated by function-argument information, and which then applies a novel unification procedure which they called optimal 1 It should be noted that our method cannot be applied directly to Steedman’s Combinatory Categorial Grammar (e.g. Steedman, 1997), and nothing more will be said about that framework.

semboot1.tex; 17/09/2002; 13:50; p.3

4

Fulop

unification to derive perhaps more than one categorial lexicon capable of generating a language beyond the learning sample. The categorial paradigm of their work was purely classical, in the tradition descending from Ajdukiewicz (1935). A streamlining of the method, along with many important results about it, was provided by Kanazawa (1998). The present paper provides a similar two-stage design which we call Optimal Unification for Type Logics, and in fact only the first stage is completely changed, with the optimal unification step working just as it did in Buszkowski and Penn (1990). The discovery of the initial most general lexicon(s) does not use syntactic structures as part of its input, and we shift to using semantic terms for the sample sentences which take the form of lambda calculus expressions which are not required to show the types of their subterms. The initial inference of these general form lexicons (Buszkowski and Penn, 1990) then exploits a generalized version of the Curry-Howard morphism connecting lambda calculi with categorial type logics, as van Benthem (1991) suggested could be done. The lambda terms are really just outlines of the semantic composition (though abstraction may be involved), and the individual lexical meanings are constrained to be atomic (another suggestion from van Benthem, 1991). The approach to be presented was foreshadowed by Bergström (1995) in an unpublished report, in which much the same ideas were implemented for the basic Lambek calculus, although there the Curry-Howard correspondence between Lambek proofs and lambda terms was not very clearly used. Let us consider where the present contribution is situated in the landscape of learning categorial grammars. − We use only positive examples. For an alternative, Adriaans (1992; 2000) has examined active learning of classical CGs from both positive and negative examples. − We infer the lexical categories—they are what is to be learned. A variety of procedures have been proposed for learning categorial grammars that assume a fixed set of lexical categories and then infer something else, such as the assignment of categories in the lexicon (Watkinson and Manandhar, 2000) or probabilities associated with the categories for use in a stochastic categorial grammar (Osborne and Briscoe, 1998). − The examples do not provide sentence structure trees or complete type-logical proof trees; only partial information about the typelogical proof is available from the lambda term labeling a sentence example, particularly in the modally enriched case. This is similar to related work by Tellier and colleagues (Tellier, 1999; Dudau-

semboot1.tex; 17/09/2002; 13:50; p.4


5

Sofronie et al., 2001), while the procedures of Kanazawa (1998) learn from either sentence structures or pure strings, and the procedure of Bonato and Retoré (2001) uses complete type-logical proof trees. − Our discovery procedure is not restricted to classical CG, and is amenable to any one of a large class of modally enriched typelogical grammars taking us beyond the Lambek calculus as well. The work by Tellier and her colleagues is limited to learning in the classical framework, while the contribution of Bonato and Retoré learns full Lambek calculus lexicons but cannot go beyond this. Though space constraints prevent a thorough review of the literature, we would like to emphasize that the present method is apparently the only one which learns the syntactic categories for a multimodal type-logical grammar together with their lexical assignment and the corresponding semantic categories of the simply typed lambda calculus. To become practical, however, the general algorithmic framework will need streamlining in order to pare down the search space and render it usable, if not tractable. In summary, the work to be described in the sequel provides a general method for learning a type-logical lexicon as well as the system of categories in the language, and which can be formulated in much the same fashion for any of a large class of modally enriched type-logical calculi that have been suggested on descriptive linguistic grounds. The induction acts on a sample of sentences annotated by lambda terms showing the semantic composition using atomic lexical meaning constants, but which may involve lambda abstraction. What is more, any class of these optimally unified lexicons restricted to assign at most k types to any word has recently been proven identifiable in the limit from the sort of data we provide (Fulop, 2002). This kind of learnability result provides a theoretical pillar supporting the potential utility of the learning paradigm.

3. Type-logical grammar One can express categorial grammar type reduction facts in the logical setting of a sequent calculus, (v. Gentzen, 1934) wherein one writes rules of inference stipulating what kinds of sequents (i.e. type reduction statements) can be derived from which other kinds of sequents. A typereduction calculus of such a form is called a type logic. It must be added that, unlike the general logical case, sequents with empty antecedent are not allowed in a type logic. A type-logical grammar G consists of a type

semboot1.tex; 17/09/2002; 13:50; p.5

6

Fulop

logic RG , a vocabulary of words VG , and a lexical function IG assigning some set of types (well-formed formulae of RG ) to each element of VG . We employ a framework of deductive systems for type logics which augments the basic type-logical sequent calculi with the addition of words and structures of words serving to label the type formulae of the antecedent. These structure-labeled type logics are necessary to be able to determine the correspondence between lambda terms and proofs. 3.1. Nonassociative Lambek calculus In what follows, A, B, C, . . . stand for type formulae, and ∆, Γ, . . . for formal trees of type formulae that are called G-terms. The somewhat ambiguous notational convention Γ[A] either means simply a G-term Γ containing an occurrence of type A somewhere within, or it means a G-term Γ in which A has replaced an occurrence of something else, the identity of which would be clear from a previous use of the notation Γ[·]. The symbol ¦ stands for a binary non-associative combination operation that is used to recursively build up the trees of formulae. Just as G-terms are assembled as trees using the nonassociative binary operation ‘¦’, the words which label individual types in a Gterm are assembled in parallel fashion into syntactic structure trees. The trees of words are written out with parentheses or square brackets as convenient, in which the simple comma serves as the counterpart to ‘¦’. For example, sentences are assigned structures like (John, sings) and (Mary, (loves, John)). A sentence tree S is said to be generated by a grammar G just in case there are types assigned to the words in the tree by the lexical function IG that form a G-term Γ having the same structure as the tree S such that the sequent Γ ⇒ s is derivable in the grammar’s logic RG . The type s is used for the principal type, inhabited by sentences. In our structure-labeled representations, a G-term can have its individual types labeled by the assigned words from the lexicon, but alternatively a larger structure of words can label a larger G-term. Since either labeling option may be convenient, we must handle equivalence classes of labeled G-terms because whichever manner of labeling is chosen makes no grammatical or logical difference. The equivalence relation of two G-terms being redistributed labelings of each other is recursively defined by the following clauses. A labeled G-term Γ : S, with S a tree of words, is a redistributed labeling of itself; a labeled G-term Γ ¦ ∆ (with Γ, ∆ each labeled G-terms) is a redistributed labeling of the term (Γ0 ¦ ∆0 ) : (S1 , S2 ) (with Γ0 , ∆0 each unlabeled G-terms) and vice versa so long as Γ and Γ0 : S1 are redistributed labelings of each other and ∆ and ∆0 : S2 are redistributed labelings of each other. The use of the

semboot1.tex; 17/09/2002; 13:50; p.6


7

square bracket notation for substitutions within G-terms is extended in the obvious fashion to labeled G-terms, which may implicitly need to have their labels redistributed within the term in order for the intended substitution to make sense. DEFINITION 3.1. The inference rules for a structure-labeled version of Lambek’s non-associative calculus without product (herein called ‘NL’) are given below. The ‘Cut’ rule is omitted, since the system provably enjoys Cut-elimination. (Axiom) A : word ⇒ A (/ L) ∆ : S2 ⇒ B Γ[A : (S1 , S2 ) ] ⇒ C Γ[A/B : S1 ¦ ∆ : S2 ] ⇒ C (\ L) ∆ : S1 ⇒ B Γ[A : (S1 , S2 ) ] ⇒ C Γ[∆ : S1 ¦ B\A : S2 ] ⇒ C (/ R) (Γ ¦ B) : (S, hypi ) ⇒ A Γ : S ⇒ A/B (\ R) (B ¦ Γ) : (hypi , S) ⇒ A Γ : S ⇒ B\A where hypi is a unique member of a denumerable set of “syntactic hypothesis variables” that is available. Figure 1 shows a linguistic example of the classical Ajdukiewicz-BarHillel (AB) fragment of NL in action. The assignment IG of types to the vocabulary in (1) constitutes a lexicon that will, together with the logical rules of the NL sequent calculus, generate the sentence structure in the figure by proving it. (1)

IG (Kazimierz) = np IG (the) = np/n IG (mathematician) = n IG (talks) = (np\s)/pp IG (to) = pp/np

semboot1.tex; 17/09/2002; 13:50; p.7

8

n : mathematician ⇒ n

np : [the, mathematician] ⇒ np

/L

s : [kazimierz, [talks, [to, [the, mathematician]]]] ⇒ s

((pp/np) ¦ ((np/n) ¦ n)) : [to, [the, mathematician]] ⇒ pp (np ¦ (np\s)) : [kazimierz, [talks, [to, [the, mathematician]]]] ⇒ s /L (np ¦ (((np\s)/pp) ¦ ((pp/np) ¦ ((np/n) ¦ n))))) : [kazimierz, [talks, [to, [the, mathematician]]]] ⇒ s

Figure 1. A simple type-logical proof

Fulop

semboot1.tex; 17/09/2002; 13:50; p.8

(np/n ¦ n) : [the, mathematician] ⇒ np np : kazimierz ⇒ np pp : [to, [the, mathematician]] ⇒ pp \L /L


9

3.2. Modally enriched type logics Although NL is a powerful framework, recent type-logical literature (e.g. Morrill, 1994; Moortgat, 1997) makes clear that more logical machinery is necessary to model natural syntax adequately. The two slashes together with the G-term operator ‘¦’ can be considered as making up a residuated family of operators (Dunn, 1993). Such a family can be used in linguistic applications as a way, or mode, of combining types into larger structures of types (i.e. G-terms) (Moortgat, 1997). It is often observed that in order to adequately describe the various ways in which words are put together and to control their behavior, more than one such residuated family is desirable (Moortgat, 1999). We distinguish among such modes using numerical subscripts on the operators. Each of the respective slashes are related in the same way to their corresponding structural ‘¦i ’, so there is a left rule and a right rule for each slash in the same manner as before. When, in a basic linguistic sense, surface structure is different from a deep structure derived from the semantics, it is useful to be able to reconfigure the structure during the proof. To facilitate this, one adds various families of unary operators ¤↓ i , 3i together with their corresponding G-term operators which mark structures as h·ii (Moortgat, 1997). Structures of types marked in such a way can then be singled out as domains within which structural rules added to the type logic can manipulate word order and structure marking. The additional rules governing the unary modal operators are given below. We must emphasize that the grammar-learning technique presented here is sufficiently general to be easily extensible from the base Lambek system to a wide class of such enriched type logics, and we present the algorithm in just such a general format. (3R) (3L) Γ[hAi] ⇒ B Γ[3A] ⇒ B (¤↓ L) Γ[A] ⇒ B Γ[h¤↓ Ai] ⇒ B

Γ⇒A hΓi ⇒ 3A (¤↓ R) hΓi ⇒ A Γ ⇒ ¤↓ A

As an illustration of such an enriched type logic, consider the somewhat challenging problem of so-called right node raising, exemplified in (2). (2)

John buys and Mary sells records.

semboot1.tex; 17/09/2002; 13:50; p.9

10

Fulop

As the evident coordination can be interpreted as a test for syntactic constituency, such examples should be thought of as comprising two conjoined subject-verb constituents, which do not normally form constituents to the exclusion of the object in this fashion. It is possible to find an enriched type-logic which will derive sentences like (2) in a theoretically pleasing fashion. We require a family of unary modal operators, together with two structural rules governed by them. The following structural rule permits left associativity only in the presence of the appropriate structural modal environment. (LAssoc)

∆[Γ1 ¦ (Γ2 ¦ hΓ3 ia )] ⇒ C ∆[(Γ1 ¦ Γ2 ) ¦ hΓ3 ia ] ⇒ C

The only other rule that is needed is one which expands the structural domain of a modal environment, when read from top to bottom. The need for this kind of rule turns out to be ubiquitous in the linguistic application of modally enriched type logics (cf. Moortgat, 1999; Kraak, 1998). (Expand)

∆[Γ1 ¦ hΓ2 ia ] ⇒ C ∆[hΓ1 ¦ Γ2 ia ] ⇒ C

The following lexicon will permit the derivation of (2), given in Figure 2. The syntactic structure labels have been left off of the sequents in the proof to permit it to fit on a single page. Notice that the work is essentially performed by the special category assigned to and, which can be justified by noticing that the special sort of s/np constituent formed by John buys is created only in the presence of a few specific words, and among them. (3) IG (records) = np IG (John) = np IG (Mary) = np IG (sells) = (np\s)/np IG (buys) = (np\s)/np IG (and) = (¤↓ s/¤↓ np)\((s/np)/(¤↓ s/¤↓ np)) Some type logics have nice properties for the purposes of sequent deduction. NL, for instance, enjoys Cut-elimination and has the subformula property, and thus is decidable as well. Let us limit our further considerations to those type logics possessing a finite number of families of slashes all governed by the slash rules of NL, a finite number of families of unary modalities all governed by the modal rules above, and a finite number of structural rules which preserve the nice properties

semboot1.tex; 17/09/2002; 13:50; p.10


Figure 2. English right node raising

11

semboot1.tex; 17/09/2002; 13:50; p.11

s ⇒ s np ⇒ np L\ np ⇒ np np ¦ np\s ⇒ s s ⇒ s np ⇒ np L/ L\ np ¦ ((np\s)/np ¦ np) ⇒ s np ⇒ np np ¦ np\s ⇒ s L/ L¤↓ a ↓ a np ¦ ((np\s)/np ¦ np) ⇒ s np ¦ ((np\s)/np ¦ h¤ a npi ) ⇒ s LAssoc L¤↓ a np ¦ ((np\s)/np ¦ h¤↓ a npia ) ⇒ s (np ¦ (np\s)/np) ¦ h¤↓ a npia ⇒ s Expand LAssoc (np ¦ (np\s)/np) ¦ h¤↓ a npia ⇒ s h(np ¦ (np\s)/np) ¦ ¤↓ a npia ⇒ s Expand R¤↓ a ↓ ↓ h(np ¦ (np\s)/np) ¦ ¤↓ a npia ⇒ s (np ¦ (np\s)/np) ¦ ¤ a np ⇒ ¤ a s s ⇒ s np ⇒ np R/ L/ R¤↓ a (np ¦ (np\s)/np) ¦ ¤↓ a np ⇒ ¤↓ a s s/np ¦ np ⇒ s np ¦ (np\s)/np ⇒ ¤↓ a s/¤↓ a np R/ L/ np ¦ (np\s)/np ⇒ ¤↓ a s/¤↓ a np ((s/np)/(¤↓ a s/¤↓ a np) ¦ (np ¦ (np\s)/np)) ¦ np ⇒ s L\ (((np ¦ (np\s)/np) ¦ (¤↓ a s/¤↓ a np)\((s/np)/(¤↓ a s/¤↓ a np))) ¦ (np ¦ (np\s)/np)) ¦ np ⇒ s

12

Fulop

of NL just mentioned. These restrictions are natural since we desire to deal just with finitary logics, because the learnability of our grammars would be spoiled by the possibility of an unbounded number of different families of operators. We insist that all structural rules are modally licensed, meaning they only apply to G-terms with some structural modal environment delimited. Such structural rules have been called “interaction postulates” in the literature. We also insist that the set of structural rules be non-cyclic, meaning there can be no series of structural rules in a proof that begins and ends with the same sequent. This is more for the convenience of theorem proving than anything else, but the limitation does not seem to have any significant effects on the classes of languages that can possibly be generated by our grammars. Note that the standard associative Lambek calculus is excluded from consideration, because its associativity rule is not modally licensed and will apply circularly, but note further that the effects of global associativity can be imitated in a grammar using well-chosen interaction postulates and lexical types. Finally, we insist on an ad hoc restriction of the number of modal operators permitted to adorn a type in a well-formed formula. Ordinarily, this would be any finite number, but we will consider any class of type logics in which this number is limited to some specific k, and otherwise formulae are considered not well-formed and thus not part of provable sequents even if they would otherwise be. Let us write, for each such k, the class of modally enriched type logics possessing the other properties mentioned above as Lk . We conjecture without proof that the class of grammars using logics S in k Lk weakly generates all context-free languages and some contextsensitive ones, but nothing beyond this. This is rendered plausible by some new results by Jäger (2002b), who proved that the class of grammars using NL with modals generates exactly the context-free languages in the absence of interaction postulates. The modally licensed structural rules can allow the generation of non-context-free languages (Jäger, 2002a), but the idea that no grammar based on a logic in any class Lk can generate any non-context-sensitive language is suggested by Jäger’s examination of Carpenter’s (1999) Turing-completeness proof (and is also hinted at in Carpenter’s own remarks). Since we cannot focus on these issues here, a catalogue of results about the strong and S weak generative capacity of the subclasses of k Lk will have to wait for future research.

semboot1.tex; 17/09/2002; 13:50; p.12


13

4. Syntax-semantics interface 4.1. Meaning recipes for syntactic type logics It has been suggested by van Benthem (1991) and Moortgat (1997) that the established application of the typed lambda calculus in formal semantics can be put to good use alongside a type-logical grammar. Consider a type-logical grammar, whose type logic will be called RG . A sentence, together with its sentence structure, derivable in the grammar will arise from a derivable sequent in the type logic, which states that the structure of types assigned to the vocabulary items in the sentence entails the principal type s assigned to sentences. This sequent will possess a construction term in an appropriate variant of the typed lambda calculus by a Curry-Howard formulae-as-types interpretation (Howard, 1980; Wansing, 1992). But such a construction taken literally simply echoes the information provided by the type-logical derivation of the sequent. Let us follow van Benthem and consider a better division of labor between the type-logical derivation and the associated lambda term. The properties of the logic RG , on the one hand, directly determine the kind of syntactic combination countenanced by the grammar. If the correspondence between lambda terms and sequent derivations were merely homomorphic, we could consider pure simply typed lambda terms as semantic representations in accord with the idea that composition and function-argument structure are important to semantics, while word order and other aspects of syntactic structure are not. The lambda calculus terms can provide meaning recipes that are partially insensitive to the syntactic organization of the vocabulary elements. A type logic RG ∈ Lk can be shown to induce a fragment ΛN L of the simply typed lambda calculus Λ such that for every sequent Γ ⇒ s in RG provable by some proof Π there will be a term N s ∈ ΛN L such that N is a homomorphic construction of the sequent proof Π. This notion is defined as an extension of the Curry-Howard morphism between proofs and lambda terms. We develop our correspondence between semantic terms and syntactic proofs by the following methods. First, with each lambda term N there is associated a tree var(N ) of its free variable or constant occurrences, which is inductively defined: (4) var(xA ) = xA ; (5)

var(λxA .t) = the tree that results from dropping all occurrences of xA in var(t);

(6) var(t(u)) = (var(t) ¦ var(u)).

semboot1.tex; 17/09/2002; 13:50; p.13

14

Fulop

A tree var(N ) of free atom occurrences in turn induces in the obvious way a tree type(var(N )) of the types of the free atoms. Then we say that a term N corresponds in structure to a G-term Γ just when type(var(N )) = Γ. A rigid Curry-Howard isomorphism would require that a semantic term of type B corresponds in structure to the antecedent of a proven sequent Γ ⇒ B. For a better division of labor between syntax and semantics, we relax the morphism so that the term structure and the antecedent G-term structure are required only to be equivalent modulo direction of application, defined below. DEFINITION 4.1. A syntactic type T is said to be equivalent modulo direction of application (for which we write emda(T, τ (T ))) to a semantic type τ (T ) under the mapping τ below. The subscript i signifies the possibility of having multiple families of slash operators. τ (c) = c0 for corresponding primitive types c, c0 ; τ (A/i B) = τ (B\i A) = τ (A) ← τ (B) τ (3(A)) = τ (A) τ (¤↓ (A)) = τ (A) Notice that, in general, a single semantic type is said to be equivalent modulo direction to an infinite number of possible syntactic types, but not if we set a limit k on the number of modal operators permitted to adorn any subformula of a well-formed formula. To handle the equivalence between lambda terms and trees of syntactic types, some preliminaries are needed. DEFINITION 4.2. A tree (i.e. G-term) Γ of syntactic types is said to reduce by application to the tree Υ according to the following recursive cases, in which the function modstrip(·) produces the result of removing all unary modal operators from within its argument. 1. If Γ is some type and modstrip(Γ) = A , then Υ = A. 2. If Γ is some tree such that modstrip(Γ) = (A/i B ¦i B), then Υ = A. 3. If Γ is some tree such that modstrip(Γ) = (B ¦i B\i A), then Υ = A. 4. If Γ is some tree such that modstrip(Γ) = (Tree1 ¦i Tree2), then Υ is the reduction of (R1 ¦i R2), where Tree1 reduces to R1 and Tree2 reduces to R2. DEFINITION 4.3. A G-term X ¦ Y of syntactic types is now said to be equivalent modulo direction of application to a tree (x, y) of semantic types just when either emda(X, x), emda(Y, y), and X reduces by

semboot1.tex; 17/09/2002; 13:50; p.14


15

application to a type of the form C/D, or emda(Y, x), emda(X, y), and Y reduces by application to a type of the form C\D. Let NL denote the non-associative Lambek calculus without product, as in Def. 3.1, and let PROOFNL denote its set of proofs. We can in this case easily specify a fragment ΛNL of the simply typed lambda language such that every Π ∈ PROOFNL can be encoded modulo direction of application by some term of this fragment. DEFINITION 4.4. Let ΛNL denote the largest sublanguage of the simply typed lambda language each of whose terms N conform to the following constraints: 1. Term N may not contain “vacuous abstraction,” 2. No subterms of N are without free variables or constants, 3. No subterm of N contains more than one free occurrence of the same variable. 4. Every prefix λx in N binds exactly one free variable occurrence, and that occurrence is either leftmost or rightmost in the subterm abstracted over by the prefix. These are a combination of the restrictions on lambda terms that were worked out by van Benthem (1988) and by Wansing (1992). Wansing in essence proved a theorem like the following one, but his treatment omits condition 2 above because it deals with logic allowing sequents to have empty antecedents. The proposition below is preliminary to the more important (for us) correspondence between lambda terms and modally enriched type-logical proofs, and is stated for the moment without proof. PROPOSITION 4.5. Given a proof in PROOFNL of a sequent Γ ⇒ B, one can find a homomorphic construction M B ∈ ΛNL , and conversely given a lambda term one can find (at least one) proof. This proposition will be seen to emerge as a corollary of the more general Theorem 4.8 proven later. Moreover, with the logic RG set to NL a term M τ (B) ∈ Λ is then found to be a homomorphic construction of every proof of a sequent Γ ⇒ B just in case emda(Γ, type(var(M τ (B) ))). By simple extensions of the methods of Wansing (1992), given a proof in the set of all proofs PROOFRG in any type logic RG ∈ Lk of a sequent Γ ⇒ B, one can find its unique homomorphic construction M B ∈ ΛN L , and conversely given a lambda term in ΛN L one can

semboot1.tex; 17/09/2002; 13:50; p.15

16

Fulop

find (at least one) proof in PROOFRG . One can specify following the methods of Wansing a function mapping proofs to their corresponding lambda terms, but there is no inverse available because one lambda term may homomorphically construct two or more logically distinct sequent proofs (i.e. they are not different solely as a result of spurious ambiguity). One can instead define a relation which specifies the correspondence between lambda terms and the proof figure(s) that each constructs. This object could as well be regarded as a “multivalued function” from lambda terms to proofs; an implementation of its proper specification drives the semantic bootstrapping algorithm in the sequel. 4.2. Structure-labeled sequent systems In order to induce a lambda calculus whose terms serve as homomorphic constructions of the sequents in structure-labeled NL, we must ensure that a lambda term is identified as a construction just when its free atoms can stand for the syntactic structure labels on the corresponding types in the proven sequent. This notion is valuable for using lambda calculus semantic recipes for type-logically derived syntactic structures, since we must also specify in such an application when a particular lambda term atom is really the atomic semantic representation (or meaning) of a word or syntactic atom. What is for now the purely formal notion of free atoms standing for some labeled syntactic types is intended to be interpreted as a free atom “being the meaning” of a word with its syntactic type. A constant c can stand for a syntactic structure just when we say it can; i.e. when we say that c is the constant meaning of the structure. A variable v can always stand for a structure. A term M such that var(M ) = (t1 ¦ t2 ) can stand for a structure [c1 , c2 ] if and only if either t1 can stand for c1 and t2 can stand for c2 respectively, or t2 can stand for c1 and t1 can stand for c2 respectively. It should be noted that it is necessary to use a structure-labeled type logic in general (i.e. if one expects to be able to move beyond NL to the enriched systems) in order to be able to specify precisely which semantic terms serve as homomorphic constructions of which proofs. Since we are most interested in exploiting the multiplicity of typelogical proofs that can correspond with any particular lambda term in our learning algorithms in the sequel, a formal definition of the homomorphic construction correspondence is given below, showing the casewise description of the various proofs that are constructed by a given sort of lambda term. The construction algorithms of section 4 are really nothing more than declarative implementations of a restricted form of the definition below. The function τ appearing in the types of lambda terms refers to the mapping from syntactic to semantic types

semboot1.tex; 17/09/2002; 13:50; p.16


17

in Def. 4.1. We are thus now defining a correspondence modulo more than just direction of application; the lambda terms carry no reflex of modal or structural rules operating in the type logic, so a lambda term can construct a particular sequent modulo modalities and structural shifting in addition to direction of application. We must, however, point out, that no provision is made for “semantically vacuous” syntactic items like expletives and complementizers; every syntactic word has to have a corresponding meaning atom. In fact, we do not subscribe to a view that allows the notion of a word’s being semantically vacuous, and the methodological spirit represented here is highly compositional, albeit in a generalized sense. The correspondence defined below can accomodate any type logic in the class Lk , allowing any number of modes of composition as well as any number of families of unary modal operators together with structural rules. In particular the correspondence is the desired one for the type logic described above which handles non-constituent coordination (let us call that NLNCC). This generality is in keeping with the position that all of these syntactic mechanisms that are so essential to getting a type-logical syntactic theory to be observationally adequate are not elements of a basic type-theoretic representation of the composition of meanings. The following notational devices are used, after Wansing (1992). M A is a lambda term of type A. M [xB := N B ] is used to mean the result of substituting a lambda term N for the free occurrences of x in M . M [xB ] denotes a term in which x occurs free, and M [N B ] is the result of replacing a single occurrence of x in M [x] with the term N . Finally, a (binary branching) tree of type formulae Γ0 is said to be a restructuring of another (binary branching) tree Γ just in case the two trees contain exactly the same formula occurrences. DEFINITION 4.6. This definition depends on the following definition of construction by modals, and vice versa. Let us say that for term M of the simply typed lambda calculus and proof Π of any type logic in L k with structure labels, M is a homomorphic construction of Π when one of the following correspondences holds. (a)

M = xτ (A) Π = A : word ⇒ A

and x can stand for word; M = N τ (C) [z τ (A)←τ (B) Gτ (B) ] Π2 Π1 (b) ∆ : S2 ⇒ B Γ : Struc[A : (S1 , S2 ) ] ⇒ C /L Π= Γ : Struc[A/j B : S1 ¦j ∆ : S2 ] ⇒ C

semboot1.tex; 17/09/2002; 13:50; p.17

18

Fulop

where Γ is a restructuring of some Γ0 such that emda(Γ0 , type(var(N [vi ]))) (and vi is an atom of type τ (A) not occurring in N ), ∆ is a restructuring of some ∆0 such that emda(∆0 , type(var(G))), G is either a homomorphic construction or a construction by modals of ∆:SΠ21⇒B , τ (A)

and N [vi ] is either a homomorphic construction or a construction 2 by modals of Γ : Struc[A :Π(S . Additionally, z can stand for S1 and 1 ,S2 ) ]⇒C G can stand for S2 ; M = N τ (C) [z τ (A)←τ (B) Gτ (B) ] Π2 Π1 (c) ∆ : S1 ⇒ B Γ : Struc[A : (S1 , S2 ) ] ⇒ C \L Π= Γ : Struc[∆ : S1 ¦j B\j A : S2 ] ⇒ C where Γ is a restructuring of some Γ0 such that emda(Γ0 , type(var(N [vi ]))) (and vi is an atom of type τ (A) not occurring in N ), ∆ is a restructuring of some ∆0 such that emda(∆0 , type(var(G))), G is either a homomorphic construction or a construction by modals of ∆:SΠ11⇒B , τ (A)

and N [vi ] is either a homomorphic construction or a construction 2 by modals of Γ : Struc[A :Π(S . Additionally, z can stand for S2 and 1 ,S2 ) ]⇒C G can stand for S1 ; M = λxτ (B) .N τ (A) Π1 (d) (Γ ¦j B) : (S, hypk ) ⇒ A Π= /R Γ : S ⇒ A/j B where Γ¦j B is a restructuring of some Γ0 such that emda(Γ0 , type(var(N ))) and N is either a homomorphic construction or a construction by Π1 modals of (Γ¦j B) : (S,hyp )⇒A ; k

M = λxτ (B) .N τ (A) Π1 (e) (B ¦j Γ) : (hypk , S) ⇒ A \R Π= Γ : S ⇒ B\j A where B¦j Γ is a restructuring of some Γ0 such that emda(Γ0 , type(var(N ))) and N is either a homomorphic construction or a construction by Π1 modals of (B¦j Γ) : (hyp . k ,S)⇒A Finally, let us stipulate for now that any lambda term M is a homomorphic construction of a proof Π so long as M is a construction by modals (see below) of Π.

semboot1.tex; 17/09/2002; 13:50; p.18


19

DEFINITION 4.7. Say that a lambda term M is a construction by Π Γ : Upper ⇒ B (Rule) just in case the last step of modals of a proof 0 Γ : Struc ⇒ A the proof is performed by any permissible modal or structural Rule, and so long as the term M is either a homomorphic construction or a Π . construction by modals of the proof Γ0 : Upper ⇒ B Now, it would be helpful to prove that the above defined relation between lambda terms and multimodal type-logical proofs has the desired property that for any simply typed lambda term M in the restricted fragment ΛNL , every proof constructed following the definition proves a sequent Γ ⇒ C such that M ∈ τ (C) and emda(Γ, type(var(M τ (C) ))), except for the modal operators and any rearrangements by means of structural rules. THEOREM 4.8. In any type logic in any of the classes Lk , the antecedent Γ of a proven sequent with homomorphic construction term M ∈ ΛNL is a structural rearrangement of a G-term Γ0 which is emda with type(var(M )). Proof. This can be proven by a simple structural induction on the length of the constructed sequent proof. There will need to be a number of cases, corresponding to those in the definition of homomorphic construction. The syntactic structure labels will be dropped in the proof, as they will be taken care of with the proviso that the “stand for” conditions in Def. 4.6 must also be met in each correspondent case. 1. Suppose the constructed sequent Γ ⇒ C has the form A ⇒ A, and this is an axiom with zero premisses. Then Def. 4.6 provides M = xτ (A) , and emda(A, τ (A)) by the definition of emda. 2. Suppose the constructed sequent has the form Γ[A/j B ¦j ∆] ⇒ C and follows from two premesses by pattern /L. Then Def. 4.6 tells us that M ∈ τ (C), and that M has a subterm of the form z τ (A)←τ (B) Gτ (B) . From the definition and an inductive hypothesis, emda(∆, type(var(G))) and emda(Γ[A], type(var(N [vi ]))). It follows now from Defs. 4.6 and 4.3 that emda((A/j B ¦j ∆), type(var(z τ (A)←τ (B) Gτ (B) ))) because emda(A/j B, τ (A) ← τ (B)) emda(∆, type(var(G))),

(obviously),

and A/j B is already a type of the form C/D.

semboot1.tex; 17/09/2002; 13:50; p.19

20

Fulop

3. Suppose the constructed sequent has the form Γ[∆ ¦j B\j A] ⇒ C and follows from two premisses by pattern \L. This case is exactly parallel to that above, so details are omitted. 4. Suppose the constructed sequent has the form Γ ⇒ A/j B and follows from one premiss by pattern /R. By definition, M ∈ τ (A) ← τ (B). Our objective now is to prove that emda(Γ, type(var(λx.N ))) First of all, type(var(λx.N )) equals the structure that results from dropping all occurrences of τ (B) in type(var(N )), by definition. Secondly, emda(Γ¦j B, type(var(N ))) by Def. 4.6. Now, type(var(N )) is composed by binary combination from two immediate substructures as (N1 ¦N2 ), and Def. 4.3 requires that either (i) emda(Γ, N1 ), emda(B, N2 ), and Γ reduces by application to the form X/B; or (ii) emda(B, N1 ), emda(Γ, N2 ), and B reduces by application to the form D\X. These conditions cannot be satisfied unless the type τ (B) occurs only once in type(var(N )), and unless in fact it appears “on either end” of this tree of types. We are thus ensured that only lambda terms meeting the conditions of Def. 4.4 will be able to construct proofs of a type logic. 5. Suppose the constructed sequent has the form Γ ⇒ B\A and follows from one premiss by pattern \R. This case is exactly parallel to the immediately preceding one. 6. Suppose the constructed sequent has the form Γ ⇒ A and follows by one of the four modal introduction rules. Then by Def. 4.3, emda(A, τ (A)). Moreover, emda(Γ, type(var(N ))) so long as the premiss of the rule has antecedent Γ0 for which emda(Γ0 , type(var(N ))), a condition assumed by the inductive hypothesis. 7. Suppose the constructed sequent has the form Γ ⇒ A and follows from one premiss Γ0 ⇒ A by a structural rule. By assumption emda(A, τ (A)) and emda(Γ1 , type(var(N ))) for Γ1 some structural rearrangement of Γ0 . The desideratum follows because Γ is itself merely a structural rearrangement of Γ0 . The above theorem can now be seen to have the earlier Proposition 4.5 as a corollary. The definitions 4.6 and 4.7 are completely general, and as such they cannot be implemented very easily as algorithms. Since our objective is to construct proofs of type-reductions to the atomic type s (i.e. sentences), and since the structural rules licensed by modals cannot possibly be available at the bottom of the proof of a sentence, there is no need to have such a general procedure which could find proofs of all kinds of different sequents.

semboot1.tex; 17/09/2002; 13:50; p.20


21

DEFINITION 4.9. Let us say that a G-term (tree of formulae) Γ is a sentence tree in logic R just in case 1. it consists entirely of type formulae that have either no external modal operators or only boxes externally; 2. no subterm of Γ is adorned with a structural modal environment; 3. Γ ⇒ s is a provable sequent in R. The following result is important for the correctness of the algorithms in the sequel. LEMMA 4.10. For any type logic R in any of the classes Lk , every proof of a sequent Γ ⇒ s in which Γ is a sentence tree must have an instance of (/L) or (\L) as its final rule. Proof. 1. The final rule cannot be a structural rule, since we are limited to modally licensed structural rules in which some substructure of Γ must be contained within a structural modal environment h·ii . This means Γ is not an unadorned tree of type formulae. 2. The final rule cannot be a modal introduction, since none of (3L), (3R), (¤↓ L) have a lower sequent with a sentence tree as antecedent, and (¤↓ R) yields a lower sequent with a nonatomic consequent.2 3. The final rule cannot be (/R) or (\R), because the lower sequent of these rules cannot have an atomic type formula as consequent.

5. A discovery procedure for type-logical lexicons 5.1. Outline of the method We view a type-logical grammar as a triple G = hVG , IG , RG i consisting of a vocabulary VG , a lexical assignment function IG , and a type logic RG . Consider now the task of discovering a lexicon IG when RG is fixed to a particular logic in Lk and ΛN L is our previously defined version 2

Disallowing external diamond operators in sentence trees (and thus in the lexicon) makes this result possible, and leads to some formal loss of generality. If they are allowed, then we would also have to search for potential proofs whose last step is an instance of (3L), and this would be a nuisance. We conjecture that no loss of grammatical power is suffered by the framework when this restriction is imposed.

semboot1.tex; 17/09/2002; 13:50; p.21

22

Fulop

of the lambda calculus whose terms correspond homomorphically to the derivations in RG . The data are sentences annotated by lambda calculus meaning recipes, with variable semantic types (except for the principal type s of a sentence). Let us call such annotated sentences term-labeled strings. The lambda calculus is used in a standard fashion to model the compositional meaning structures of natural language; an example of a term-labeled string is: (7) ((loves(s←α)←α (Maryα ))(Johnα ))s : hJohn, loves, Maryi; or alternatively, without subterm types (8) ((loves(Mary))(John))s : hJohn, loves, Maryi. The meaning recipes show the basic compositional construction of the sentence meaning in terms of application and abstraction, and can be used as directionally non-specific recipes for the construction of typelogical proofs of the labeled sentence which correspond via a generalized Curry-Howard homomorphism. The bold faced items are constants of the lambda language, and the bolding of e.g. Mary is usually taken to be the meaning of ‘Mary.’ It is possible to develop everything that follows in a more general framework in which annotating terms are not required to show which meanings are for which words, in which case a lambda variable should be used instead of a constant for each atom. In order to first discover a general form lexicon as an intermediate step, which assigns syntactic types whose primitive types are distinct variables except for the principal type constant s which is assigned to sentences, our algorithm is intended to learn from a sample of termlabeled strings such as the above example, whose labels either contain no explicit semantic types on the subterms (they are unsubtyped ), or are subtyped but contain only such semantic types in which the primitive types are all variables except for the principal type s of a sentence. In the example above, the first term-labeled string (tls) is subtyped, while the second is unsubtyped but otherwise the same. The complete algorithm is charged with learning the system of categories (other than s) together with the lexical assignment function, for a fixed vocabulary and type logic. The broad outline of the procedure, called Optimal Unification for Type-Logical grammars (OUTL), is as follows: 1. Given a sample D of unsubtyped tls’s, compute a counterpart sample D0 of subtyped tls’s whose terms are typed in a most general way. This can be accomplished using some kind of principle type algorithm, such as the one which is discussed at length in (Hindley, 1997).

semboot1.tex; 17/09/2002; 13:50; p.22


23

2. Compute the set of general form type-logical lexicons GFTL(D 0 ), in each of which distinct variable primitive types will each occur atomically only once, and such lexicons will generate only the sample D 0 . This is accomplished by taking the following steps: a) For each tls in the sample, determine all proofs in the type logic at hand which can be constructed by using the subtyped lambda term as a construction term, and which are also compatible with the word order that is evident in the sentence. b) Non-deterministically select one proof for each tls in the entire sample; a general form lexicon can then be read off from the types labeling the words. Repeat this step until all different ways of selecting one proof for each tls have been exhausted. This will provide all general form lexicons that could generate the learning sample. 3. Find all of the optimal unifications (Buszkowski and Penn, 1990) of each of the lexicons in GFTL(D 0 ). The notion of a general form lexicon and its role as an intermediate step in grammar discovery was elucidated by Buszkowski and Penn (1990) in their work on the discovery of classical categorial grammars from syntactic skeletons. The basis of an algorithm for discovering such a lexicon is implicit in Def. 4.6, which specifies the kinds of proofs in a modally enriched type logic that a particular kind of lambda term serves as a construction of. An implementation of this correspondence could be used to recover the entire family of proofs in a suitable type logic RG which correspond homomorphically to a given construction term. The members of such a proof family will each determine a general form lexicon over the vocabulary elements in the tls. Now, the set of tls’s which constitutes the learning sample corresponds to a set of such proof families (one proof family to every tls). Every possible combination of proofs obtained by choosing one proof from each family will then induce an algorithmically obtainable general form lexicon over the vocabulary found in the learning sample. 5.2. Discovering the general form lexicons Algorithm 1, which we call MGA (for Most General Assignment), provides a most general subtyping for an unsubtyped lambda term. The algorithm is based upon Hindley’s (1997) principal typing algorithm, but it is much less complicated because the task of providing a most general subtyping is rendered easy by giving the type of the entire term.

semboot1.tex; 17/09/2002; 13:50; p.23

24

Fulop

Algorithm 1 mga(M τ , Stterm) Require: M τ to be an unsubtyped lambda term Ensure: Stterm is a subtyped version of M τ whose subtypes are assigned in a most general way if M τ is a variable xτ then Stterm ← xτ else if M τ is a constant cτ then Stterm ← cτ else if M has the form (λx.P )α←β then assign x to type β and do mga(P α , P 0 ) with free x in P assigned to type β Stterm ← (λx.P 0 )α←β else if M has the form (P Q)τ then do mga(P τ ←β , P 0 ) for new type variable β do mga(Qβ , Q0 ) Stterm ← (P 0 Q0 )τ else fail end if We are assuming that all the term labels in the data are of the principal semantic type written s (as is the principal syntactic type). 5.2.1. Discovery in pure NL Algorithm 2 will find all of the general form lexicons for a sample of tls’s, and the procedure shown in turn uses a number of subsidiary algorithms. In order to provide a general form lexicon, one condition of such being that it assigns syntactic types whose primitive types consist entirely of variables except for the principal type constant, the algorithm is intended to run on a sample of tls’s whose labels contain only such semantic types (i.e. the primitive types therein are all variables except for the principal type). The algorithm requires the use of a structure-labeled type logic, so that the burden of determining whether a lambda term can stand for a sentence is shouldered by the construction algorithm 4 that finds all the type-logical proofs of which a given term serves as a homomorphic construction. The details of construction may need to be modified for the type logic RG that is being used, although in principle this is merely an implementation of Def. 4.6. We first give a version that is correct for basic NL. The lexicons discovered by GFTL will only be true general forms in Buszkowski and Penn’s sense of the phrase when there is a distinct variable primitive type assigned to each atomic subterm that must

semboot1.tex; 17/09/2002; 13:50; p.24


25

inhabit a primitive type for proper typing to work out; this is a natural outcome of using MGA on a sample of unsubtyped tls’s. Such subterms are analogous to the argument substructures of Buszkowski and Penn (1990). On the other hand, GFTL doesn’t care whether its input sample will produce a purely general form lexicon, and so it can also be used to discover a “not-so-general” form lexicon from any sample of subtyped tls’s drawn from the restricted set of tls’s with just finitely many semantic type variables. A brief word about the notations employed in these algorithms is in order. We use the standard Prolog list notation for lists, wherein [Head | Tail] means a list with first member Head and remainder Tail. The concatenation of two lists L1 and L2 is symbolized L1_L2, while the addition of an element E to (the end of) a list L is symbolized as L + E. The list L minus some element E is written L − E. The arrow ← is herein used in its standard fashion in algorithms, which is to indicate the assignment of a value expression at the arrow’s tail to a variable at the arrow’s head. Algorithm 2 gftl(Sample, GF) Require: Sample to be a list, either empty or of the form [String : Semantics | TLSs] Ensure: GF is a list of all general form lexicons determined by Sample GF ← ∅ repeat gf type1(Sample, GF1) GF ← GF + GF1 until there are no more lexicons provided by gf type1(Sample, GF1) The algorithm gf type1 provides a single complete general form type assignment for the entire learning sample by picking a single proof tree out of each set of proofs constructed by a tls. The algorithm construction is at the core of GFTL, and provides a proof tree as its second argument, of a sequent of which the first argument serves as a homomorphic construction according to our earlier defining conditions. Only the second argument can be queried, and the third argument, which is the syntactic structure that is desired as a structure label on the corresponding sequent, must be provided. The other components of the gf type1 procedure are simpler, and are not given in detail. Briefly, gf typec extracts the structured sequence of labeled types found in one sequent and provides a structure, in the form of a G-term, of type assignments to individual words suitable for inclusion into a lexicon of such assignments. The algorithm

semboot1.tex; 17/09/2002; 13:50; p.25

26

Fulop

Algorithm 3 gf type1(Sample, GF1) Require: Sample to be a list, either empty or of the form [String : Semantics | TLSs]. Ensure: GF1 is a general form lexicon determined by Sample if Sample = [] then GF1 ← [] exit else if Sample has the form [String : Semantics | TLSs] then Select some (binary branching) syntactic structure ¸ Struc for String · Π Rule , Struc) construction(Semantics, Γ : Struc ⇒ s gf typec(Γ : Struc, Partg) gf type1(TLSs, Rest) list of types(Partg, GFpart) GF1 ← GFpart_Rest end if list of types flattens a structure, in the form of a G-term, down to a simple list with the same left-to-right order. There is perhaps a fear that GFTL is not guaranteed to terminate. Let us demonstrate that the algorithm applied using NL always terminates. The crux of the matter is the construction algorithm 4; if this terminates, GFTL must terminate. THEOREM 5.1. Algorithm 4 construction for pure NL always terminates. Proof. Algorithm 4 implements a Prolog-style search for a single Π , given a simply typed lambda term M s proof of the form Γ : Struc⇒s and a bare syntactic tree over the corresponding sentence of words. The algorithm shown is recursive, calling itself. We can demonstrate it will terminate by noticing that the recursion successfully peels away the layers of the lambda term. 1. If M contains an application subterm of the form zG, construction calls itself upon both (i) a new term M 0 which is exactly like M but with the subterm zG reduced by application, and (ii) the argument subterm G. Both of these recursive calls have lambda term arguments which are smaller than the original term. 2. If M is an abstraction term of the form λx.N , then construction is called upon the subterm N . 3. If M is an atomic term (either free variable or constant), the routine attempts to provide a simple axiom as the constructed proof step.

semboot1.tex; 17/09/2002; 13:50; p.26


27

Algorithm 4 construction(Semantics, Proof, Struc) for NL Require: Semantics to be a lambda term, Struc a syntactic structure for which Semantics is a licit label Ensure: Proof is a proof tree in NL of the sequent Γ : Struc ⇒ C of which Semantics is a homomorphic construction 1: if Semantics has the form N τ (C) [z τ (A)←τ (B) Gτ (B) ] then 2: Either (i)assign a proof tree  P1 P2   ∆ : S2 ⇒ B Γ : Struc[B : (S1 , S2 )] ⇒ C (/L) Proof ←   according  Γ : Struc[A/B : S1 ¦ ∆ : S2 ] ⇒ C

to Def. 4.6 in which emda(Γ, type(var(N [vi ]))), emda(∆, type(var(G))), τ (A) 2 and ensuring that construction(N [vi ], Γ : Struc[B P : (S1 ,S2 )]⇒C , Struc)

3:

and construction(G, ∆ : SP21⇒B , S2 ) or (ii) assign  a proof tree  P2 P1  ∆ : S1 ⇒ B Γ : Struc[B : (S1 , S2 )] ⇒ C  Proof ←  (\L)   according Γ : Struc[∆ : S1 ¦ B\A : S2 ] ⇒ C

to Def. 4.6 in which emda(Γ, type(var(N [vi ]))), emda(∆, type(var(G))), τ (A) 2 and ensuring that construction(N [vi ], Γ : Struc[B P : (S1 ,S2 )]⇒C , Struc)

and construction(G, ∆ : SP11⇒B , S1 ) 4: else if Semantics has the form λxτ (B) .N τ (A) then 5: assign either M ← A/B or M ← B\A 6: if M = A/B then   P1   (Γ ¦ B) : (S, hyp ) ⇒ A i  accord/ R 7: assign a proof tree Proof ←    Γ : Struc ⇒ M 8: 9:

10: 11: 12: 13: 14: 15:

ing to Def. 4.6 in which emda((Γ ¦ B), type(var(N ))), and ensuring P1 that construction(N, (Γ¦B) : (Struc,hyp , (Struc, hypi )) i )⇒A else if M = B\A then   P1   (B ¦ Γ) : (hyp , Struc) ⇒ A i \ R assign a proof tree Proof ←    Γ : Struc ⇒ M

according to Def. 4.6 in which emda((B ¦ Γ), type(var(N ))), and P1 ensuring that construction(N, (B¦Γ) : (hyp , (hypi , Struc)) i ,Struc)⇒A end if else if Semantics has the form xτ (A) then Proof ← [A : word ⇒ A] such that x can stand for word else fail end if

semboot1.tex; 17/09/2002; 13:50; p.27

28

Fulop

If this is not possible, as when the meaning constant cannot stand for the corresponding word in the sentence, the routine will fail. Since the lambda terms provided to the recursive calls are always smaller, eventually all calls will be on atomic terms after a finite number of steps, at which point the routine will either succeed and return a constructed proof, or fail. The construction of the corresponding proof tree proceeds at each step in one way or another according to the recipe, and cannot cause a problem. We now give an instructive little example. Although ultimately we intend to provide the learner with tls’s not showing any subterm types, the examples will be presented with subterm types showing since that allows more clarity. EXAMPLE 5.2 (Computing the general form lexicon). Consider providing the algorithm GFTL with just the following two input tls’s, which could be provided as the output of the MGA algorithm: (9)

(singss←α1 Maryα1 )s : hMary, singsi (singss←α2 Susanα2 )s : hSusan, singsi

and let us take NL as our type logic. This data tells us: 1. (singss←α1 Maryα1 )s is a homomorphic construction of a sequent Γ1 : [Mary, sings] ⇒ s 2. (singss←α2 Susanα2 )s is a homomorphic construction of a sequent Γ2 : [Susan, sings] ⇒ s Now we can determine lexicons that generate the learning sample using Def. 4.6 restricted to NL. This restriction may be accomplished by dropping the cases dealing with modals and structural rules, and by changing instances of the phrase “is a restructuring of ” to “is identical to.” Now, according to the third case of that definition, the term and proof (singss←α1 Maryα1 )s (10) α1 : Mary ⇒ α1 s : [Mary,sings] ⇒ s (α1 : Mary ¦ α1 \s : sings) ⇒ s are a correspondent pair subject to the side conditions given there.

semboot1.tex; 17/09/2002; 13:50; p.28


29

By analogous reasoning from the definition one can obtain the analogous fact that the following term and proof are also a correspondent pair: (singss←α2 Susanα2 )s (11) α2 : Susan ⇒ α2 s : [Susan,sings] ⇒ s (α2 : Susan ¦ α2 \s : sings) ⇒ s again subject to the side conditions. The side conditions, in particular the conditions stipulating that the lambda terms must stand for (i.e. must be the meaning for) the corresponding structure labeled types, prevent any other proofs from standing in the relation with the two given tls’s. The algorithm GFTL thus finds just one general form lexicon that is consistent with the learning sample: (12) IG (Mary) = α1 IG (sings) = α1 \s IG (Susan) = α2 IG (sings) = α2 \s 5.2.2. Discovery in modally enriched systems In order to modify GFTL to work with modally enriched versions of NL, it is sufficient to enrich the construction algorithm so that it implements the version of Def. 4.6 that is suited to the type logic RG that one has at hand. All other parts of GFTL are independent of the nature of the type logic RG . Such an enrichment for NLNCC is presented below. The final stipulation of Def. 4.6 cannot be directly implemented, as this would make the construction and consbymodals routines circularly call each other. Omitting the stipulation has the effect that the last line of the constructed proof must be the result of a slash introduction rule; in this way the first step of any call to construction cannot invoke consbymodals, and this breaks the cycle. Fortunately, any proof of a sequent whose antecedent is a sentence tree (Def. 4.9) is guaranteed to end with a slash introduction rule thanks to Lemma 4.10. We are not interested in getting the algorithm to find proofs of sequents that do not involve sentence trees, since we are trying to induce grammars here. Algorithm 5, using its codependent Algorithm 6, provides a proof tree as its second argument of a sequent of which the first argument serves as a homomorphic construction. As before, only the second argument can be queried, and the third argument is the desired syntactic

semboot1.tex; 17/09/2002; 13:50; p.29

30

Fulop

Algorithm 5 construction(Semantics, Proof, Struc) for NLNCC Require: Semantics to be a lambda term, Struc a syntactic structure Ensure: Proof is a proof tree of the sequent Γ : Struc ⇒ C of which Semantics is a homomorphic construction 1: if Semantics has the form N τ (C) [z τ (A)←τ (B) Gτ (B) ] then 2: Either (i) assign a proof tree   P1 P2  ∆ : S2 ⇒ B Γ : Struc[B : (S1 , S2 )] ⇒ C  acProof ← (/L)  Γ : Struc[A/B : S1 ¦ ∆ : S2 ] ⇒ C cording to Def. 4.6 in which Γ is any restructuring of some Γ0 such that emda(Γ0 , type(var(N [vi ]))), ∆ is a restructuring of some ∆0 such that emda(∆0 , type(var(G))), and ensuring that τ (A) 2 either construction(N [vi ], Γ : Struc[B P , Struc) or : (S1 ,S2 )]⇒C τ (A)

P2 , Struc), and ensuring that Γ : Struc[B : (S1 ,S2 )]⇒C P1 construction(G, ∆ : S2 ⇒B , S2 ) or consbymodals(G, ∆ : SP21⇒B , S2 )

consbymodals(N [vi 3:

],

either or (ii) assign a prooftree Proof

P2 P1   ∆ : S1 ⇒ B Γ : Struc[B : (S1 , S2 )] ⇒ C (\L)  Γ : Struc[∆ : S1 ¦ B\A : S2 ] ⇒ C

←



ac-

cording to Def. 4.6 in which Γ is a restructuring of some Γ0 such that emda(Γ0 , type(var(N [vi ]))), ∆ is a restructuring of some ∆0 such that emda(∆0 , type(var(G))), and ensuring that τ (A) 2 , Struc) or either construction(N [vi ], Γ : Struc[B P : (S1 ,S2 )]⇒C τ (A)

P2 , Struc), and ensuring that Γ : Struc[B : (S1 ,S2 )]⇒C P1 construction(G, ∆ : S1 ⇒B , S1 ) or consbymodals(G, ∆ : SP11⇒B , S1 ) τ (B) τ (A)

consbymodals(N [vi either

],

4: else if Semantics has the form λx .N 5: assign either M ← A/B or M ← B\A 6: if M = A/B then 7:

assign

a

proof

tree

Proof

then

←

P1  (Γ ¦ B) : (S, hypi ) ⇒ A  / R  Γ : Struc ⇒ M





according to Def. 4.6 in which Γ ¦j B is a restructuring of some Γ0 such that emda(Γ0 , type(var(N ))), and ensuring that P1 , (Struc, hypi )) or either construction(N, (Γ¦B) : (Struc,hyp )⇒A i

8: 9:

P1 consbymodals(N, (Γ¦B) : (Struc,hyp , (Struc, hypi )) i )⇒A else if M = B\A then   P1  (B ¦ Γ) : (hypi , Struc) ⇒ A  assign a proof tree Proof ←  \ R acΓ : Struc ⇒ M

cording to according to Def. 4.6 in which B ¦j Γ is a restructuring of some Γ0 such that emda(Γ0 , type(var(N ))), and ensurP1 , (hypi , Struc)) or ing that either construction(N, (B¦Γ) : (hyp ,Struc)⇒A i

10: 11: 12: 13: 14: 15:

P1 construction(N, (B¦Γ) : (hyp , (hypi , Struc)) i ,Struc)⇒A end if else if Semantics has the form xτ (A) then Proof ← [A : word ⇒ A] such that x can stand for word else fail end if

semboot1.tex; 17/09/2002; 13:50; p.30


31

P Γ : Upper ⇒ B Algorithm 6 consbymodals(N τ (A) , 0 (Rule), Struc) Γ : Struc ⇒ A Require: N τ (A) to be a lambda term, Struc to be a bare syntactic structure, Γ : Struc ⇒ A to be a sequent Ensure: Γ0 : Upper ⇒ B is an upper sequent in a proof of the lower by a modal or structural Rule 1: if Γ : Struc ⇒ A is provable (using standard sequent reduction) by P0 Γ00 : Upper0 ⇒ B 0 (Rule0 ) whose last step by Rule0 involves a proof Γ : Struc ⇒ A any modal or structural rule P0 and either construction(N τ (A) , Γ0 : Upper 0 ⇒B 0 , Upper) or consbymodals(N τ (A) , 2: 3: 4: 5:

Assign Γ0 ← else fail end if

Γ00 ,

0

Γ00 :

P0 , Upper0 ⇒B 0

Upper) then

Upper ← Upper, B ← B 0 , and Rule ← Rule0 0

structure label on the proven sequent. As with the NL case, the information that allows the corresponding proofs to be induced is provided by the applications and abstractions found in the lambda term. The algorithm is designed as before to work directly from these, but at each step in building a corresponding proof tree the routine must now guess at all possible sequences of modal and structural rules that could have led to the slash rule that in fact corresponds with the considered application or abstraction in the lambda term. “All possible sequences of modal and structural rules” would in general mean an infinity of possibilities, since each semantic type corresponds with an infinite number of syntactic types. This is the practical reason we limit the correspondence between syntactic and semantic types so that at most some number k modal operators are permitted to adorn each subtype of a type A that is equivalent modulo direction to some semantic type α. Unfortunately, all the nondeterministic guessing and checking in this more enriched case slows the procedure down to the point where it cannot really be used, and as usual, “the problem is search.” This has so far precluded the testing of any example data requiring a multimodal lexicon to be learned, and parallelizing of the above algorithm or the development of a suitable approximate algorithm is now the most important goal of this continuing research program. There is

semboot1.tex; 17/09/2002; 13:50; p.31

32

Fulop

no immediately obvious way of improving the algorithm using favored techniques such as dynamic programming or search heuristics. In light of the purely theoretical interest of Algorithms 5 and 6, it seems even more crucial to prove that they together form a routine which always terminates. THEOREM 5.3. Algorithm 5, with its coroutine 6, always terminates when given the proper form of input. Proof. Key to the success of Algorithm 5 in the general case is that the routine must begin on an input semantic term from which a proof can be constructed whose last step is not a modal or structural rule, but a slash introduction rule. This is guaranteed by Lemma 4.10 for every meaning term constructing a proof of a sentence tree sequent. Moreover, Algorithm 5 has the same casewise structure as the earlier Algorithm 4, which was proven to terminate in Theorem 5.1. This proves that no step of Algorithm 5 could lead to a non-terminating condition, except possibly for the calls to consbymodals. We now demonstrate that these calls lead to termination also. Calls to consbymodals do not change the lambda term, and so herein lies a danger. Each such call extends the constructed proof upwards by either a modal or structural rule; in the most general sort of type-logical proof there could be any finite number of such rules in sequence, but our class of type-logics is restricted so that only a fixed finite number k of modal operators are permitted to adorn the types in a well-formed formula. Additionally, we have required that the set of structural rules cannot form a closed loop in a proof, wherein some upper sequent is identical with a lower sequent any number of steps away. Under these conditions, there will be just a finite number of possible modal and/or structural steps between slash introduction steps, meaning just a finite number of steps can be added to the constructed proof by the consbymodals routine at each phase without decreasing the size of the lambda term. This proves that the lambda terms supplied to the recursive calls will eventually decrease to atoms, and thus termination is guaranteed in the same fashion as with Algorithm 4.

5.3. Optimal unification It seems clear from Example 5.2 above that the general form lexicon for a learning sample is not very good, in the loose sense that it fails to capture any generalizations about the behavior of the elements of the learning sample. It describes the sample perfectly, but isn’t very informative about the “true” syntactic categories. One way of altering

semboot1.tex; 17/09/2002; 13:50; p.32


33

the lexicon to provide a more generally stated description of the observables is to apply a substitution over the variable types in the family TF L , so that any two distinct variables that in some sense behave in exactly the same way throughout the sample are collapsed, or unified, into a single type that is then identified as a learned type constant. We do this using Buszkowski and Penn’s (1990) optimal unification idea. Let T = {T1 , . . . , Tn } be a non-empty finite family of non-empty sets of types. A substitution σ is called a unifier of T if, for all 1 ≤ i ≤ n and all a, b ∈ Ti we have σ(a) = σ(b) (Buszkowski and Penn, 1990). Call T unifiable if there is a unifier for it. A most general unifier (mgu) of T is a unifier σ of T such that, for any unifier σ 0 of T , there exists a substitution α for which σ 0 (a) = α(σ(a)), for all types a (Buszkowski and Penn, 1990). We follow Buszkowski and Penn’s development of the notion of an optimal unifier. For any substitution σ, denote its kernel by ker(σ); following standard algebraic practice, this is the equivalence relation in the set T p of types defined by (13) ha, bi ∈ ker(σ)

if and only if σ(a) = σ(b)

for a, b ∈ T p. Let [a]σ denote the equivalence class of type a with respect to ker(σ). For T ⊆ T p, def

(14) T /ker(σ) = {T ∩ [a]σ | a ∈ T }. For T = {T1 , . . . , Tn }, (15) T /ker(σ) = T1 /ker(σ) ∪ · · · ∪ Tn /ker(σ). An optimal unifier (ou) of a family T of sets of types is a substitution σ such that: 1. for all 1 ≤ i ≤ n, and all a, b ∈ Ti , if σ(a) 6= σ(b) then the set {σ(a), σ(b)} is not unifiable; 2. σ is an mgu of T /ker(σ). More informally, an ou of T unifies the members as far as possible, and is a most general substitution with this property. THEOREM 5.4 (Buszkowski and Penn, 1990). For any family T , there exists at least one ou of T , and the total number of ou’s of T is finite up to alphabetic variants.

semboot1.tex; 17/09/2002; 13:50; p.33

34

Fulop

5.4. Optimally unified lexicons A method for generating all of the optimal unifications of each of the lexicons output by GFTL has been implemented as the last step of the OUTL procedure. This uses an adaptation of Buszkowki and Penn’s (1990) optimal unification algorithm. Algorithm 7 OUTL(Sample, OG) Require: Sample to be a list, either empty or of the form [String : Semantics | TLSs] Ensure: OG is a list of all optimally unified lexicons determined by Sample gftl(Sample, GF) unify all(GF, GLists) liststomembers(GLists, G) unify grammars(G, OG) We won’t describe the optimal unification procedure in detail, because it is simply a declarative implementation of Buszkowski and Penn’s idea. The algorithm unify all applies the optimal unification algorithm to each element of a list of lexicons as many times as there are optimal unifiers, producing a corresponding list of all the optimally unified lexicons from each general lexicon. Algorithm liststomembers(Lists, L) ensures that L is a list of the members of the lists in Lists, in the obvious left-to-right order. The algorithm unify grammars takes a list of lexicons and produces a single optimal unification of them. This has the effect of removing redundant lexicons from the original list, since any two lexicons which are simply alphabetic variants will always unify to a single one. We illustrate the results of OUTL with a few small examples showing lexicons induced using NL as the underlying type logic. Induction of lexicons for the more enriched logics is still purely theoretical, owing to the above mentioned search space problems involved with finding all of the general form lexicons using an enriched type logical framework. EXAMPLE 5.5. Example 5.2 computed the single general form lexicon IG (Mary) = α1 IG (sings) = α1 \s IG (Susan) = α2 IG (sings) = α2 \s

semboot1.tex; 17/09/2002; 13:50; p.34


35

from a learning sample of two tls’s. This lexicon can then be optimally unified, which results in: (16) Mary sings Susan

α α\s α

Let us now consider providing the OUTL algorithm with tls’s whose labels involve lambda abstraction. EXAMPLE 5.6 (Term labels with lambda abstraction). The following term labels follow the semantic proposals of Keenan and Faltz (1985), in which intransitive verbs are posited to be functions taking generalized quantifiers as their only argument. The abstraction in the meaning terms below allows this to work for proper nouns, whose lexical type will then be discovered to be atomic. 1 (17) (singss←(s←(s←α1 )) λx2 .(xs←α Maryα1 ))s : hMary, singsi 2 2 (singss←(s←(s←α2 )) λx5 .(xs←α Susanα2 ))s : hSusan, singsi 5

The abstraction cases in Def. 4.6 are now invoked, which yields the following two proofs from the first tls: (18)

(19)

⇒ α1 s : [mary,[hypothesis]] ⇒ s L\ α1 ¦ α1 \s : [mary,[hypothesis]] ⇒ s R/ α1 : mary ⇒ s/(α1 \s) s : [mary,sings] ⇒ s L\ α1 ¦ (s/(α1 \s))\s : [mary,sings] ⇒ s

α1 :

mary

α1 :

mary

⇒ α1 s : [[hypothesis],mary] ⇒ s L/ s/α1 ¦ α1 : [[hypothesis],mary] ⇒ s R\ α1 : mary ⇒ (s/α1 )\s s : [mary, sings] ⇒ s L\ α1 ¦ ((s/α1 )\s)\s : [mary,sings] ⇒ s

There are two parallel proofs corresponding to the second tls. As a result, GFTL outputs the following four general form lexicons: (20) IG (Mary) = α1 IG (sings) = (s/(α1 \s))\s IG (Susan) = α2 IG (sings) = (s/(α2 \s))\s

IG (Mary) = α1 IG (sings) = (s/(α1 \s))\s IG (Susan) = α2 IG (sings) = ((s/α2 )\s)\s

semboot1.tex; 17/09/2002; 13:50; p.35

36

Fulop

IG (Mary) = α1 IG (sings) = ((s/α1 )\s)\s IG (Susan) = α2 IG (sings) = (s/(α2 \s))\s

IG (Mary) = α1 IG (sings) = ((s/α1 )\s)\s IG (Susan) = α2 IG (sings) = ((s/α2 )\s)\s

Optimally unifying each of these four yields distinct optimally unified lexicons: (21) IG (Mary) = α IG (sings) = (s/(α\s))\s IG (Susan) = α

IG (Mary) = α1 IG (sings) = (s/(α1 \s))\s IG (Susan) = α2 IG (sings) = ((s/α2 )\s)\s

IG (Mary) = α1 IG (sings) = ((s/α1 )\s)\s IG (Susan) = α2 IG (sings) = (s/(α2 \s))\s

IG (Mary) = α IG (sings) = ((s/α)\s)\s IG (Susan) = α

The two smallest of these could then be selected post hoc, by invoking some further principle of optimality such as least cardinality of the lexicon, or even Minimum Description Length (Fulop, 2001). We wish ultimately to learn from unsubtyped tls’s, so that the algorithm will truly induce the systems of both syntactic and semantic lexical types. When providing a learning sample without any subterm types, we simply have to initiate an application of OUTL using the subtyping algorithm MGA. Let us refer to the resulting discovery procedure for optimally unified lexicons from unsubtyped tls’s as OUTLust . To demonstrate, the next example shows the result of applying the procedure to two entirely different samples drawn from the same language, which have the same vocabulary. The procedure settles on a grammar for the same language in each case (the same grammar too, in fact). EXAMPLE 5.7. The following pair of samples gives rise to the same optimally unified lexicon by OUTLust . (22) (sings(John))s : ((loves(Mary))(John))s : ((loves(a(man)))(Mary))s : ((sees(John))(a(man)))s :

hJohn, singsi hJohn, loves, Maryi hMary, loves, a, mani ha, man, sees, Johni

semboot1.tex; 17/09/2002; 13:50; p.36


37

(23) (sings(Mary))s : ((loves(John))(Mary))s : ((loves(Mary))(a(man)))s : ((sees(a(man)))(John))s :

hMary, singsi hMary, loves, Johni ha, man, loves, Maryi hJohn, sees, a, mani

Using the output of MGA on the above, the general form lexicons which GFTL discovers are respectively: (24) IG (John) = α1 IG (John) = α2 IG (John) = α9 IG (sings) = α1 \s IG (loves) = (α2 \s)/α3 IG (loves) = (α4 \s)/α5 IG (Mary) = α3 IG (Mary) = α4 IG (a) = α5 /α6 IG (a) = α7 /α8 IG (man) = α6 IG (man) = α8 IG (sees) = (α7 \s)/α9

IG (Mary) = α1 IG (Mary) = α2 IG (Mary) = α9 IG (sings) = α1 \s IG (loves) = (α2 \s)/α3 IG (loves) = (α7 \s)/α9 IG (John) = α3 IG (John) = α4 IG (a) = α5 /α6 IG (a) = α7 /α8 IG (man) = α6 IG (man) = α8 IG (sees) = (α4 \s)/α5

Each of the above optimally unifies to the same result: (25) IG (Mary) = α IG (sings) = α\s IG (loves) = (α\s)/α IG (John) = α IG (a) = α/β IG (man) = β IG (sees) = (α\s)/α Moving from NL to the modally enriched systems, we do not have any similar results to report. Owing to search space size and software development obstacles, the enriched type-logical learning algorithm is purely theoretical for the moment, but at least it has been proven to terminate. It has elsewhere been proven that the basic solution it seeks is a learnable one in Gold’s sense (Fulop, 2002). We also know exactly

semboot1.tex; 17/09/2002; 13:50; p.37

38

Fulop

what its output will be: a set of optimally unified, modally enriched lexicons that are guaranteed to generate the term-labeled strings in the learning sample, and beyond. We hope that further development of these algorithms will pave the way to automatic learning of the most scientifically justified and linguistically desirable grammars for natural languages seen to date.

6. Concluding remarks A damaging volley that has been directed at past ideas of semantic bootstrapping simply points out that a naive implementation demands too much specific information to be acceptable a priori. In other words, the demands of a priori plausibility seem to negate the availability of the necessary bootstraps. Pinker’s (1984) Canonical Structure Realization, for instance, demands that a syntactic learning system has ready access to previously learned knowledge of the specific semantic categories into which each word falls. This places a considerable burden on the universal component of linguistic theory (so-called Universal Grammar), since the learner must then know what the categories are in advance of learning the grammar. It is our position that the learner should instead be charged with learning those categories and which words fall into them, as a byproduct of learning the grammar. Our paradigm doesn’t require the learner to have previously learned meanings in the conceptual sense, nor must the learner know anything about how lexical items are categorized semantically or syntactically. The system operates at a higher level of abstraction, if you will, and learns a set of syntactic categories and their lexical assignment from sentences that are provided with schemata showing the shape, though not necessarily the exact nature, of their semantic composition. Can semantic composition recipes themselves be learned? The work of Siskind (1991) on semantic learning has explored implementations which successfully learn semantic representations from diagrammatic presentations of events, to cite one relevant example. Learning from term-labeled strings, in which no commitment is made about precise semantic categories or even about whether certain sets of words fall into the same categories, permits the induction of an adequate type-logical lexicon with smaller semantic bootstraps. There is a kind of canonical structure realization here, but it provides only the possible forms that a syntactic type can take when the form of a semantic type is provided. This is embodied in the generalized Curry-

semboot1.tex; 17/09/2002; 13:50; p.38


39

Howard correspondence, and in particular the syntactic-semantic type correspondence that we have specified. It seems clear that the null hypothesis about Universal Grammar should simply be that it includes as little specific information as possible. At first not everyone agrees with this assertion because it seems to take a position on the nature/nurture controversy, but this arises from a misapplication of the notion of “null hypothesis.” The null hypothesis is not required to be a priori the most reasonable (a condition plagued by subjective disagreements), it is simply the one that assumes the least. We therefore do not assume that UG somehow provides a fixed system of syntactic or semantic categories, and it is better to avoid such assumptions because no specific content for UG has ever been proven necessary. Such systems of categories have been suggested over the years as a result of careful analysis by linguists, and they are traditionally based on facts of intersubstitutability anyway. We have thus shifted from UG to the learner the burden of providing an appropriate taxonomy of the lexicon based on intersubstitutability, and are pursuing in further research a fuller account of the automatically generated “linguistic theory” that will result.

References Adriaans, P.: 1992, ‘Language Learning from a Categorial Perspective’. Academische proefschrift, Universiteit van Amsterdam. Adriaans, P. and E. de Haas: 2000, ‘Grammar Induction as Substructural Logic Programming’. In (Cussens and Dˇzeroski, 2000), pp. 127–142. Ajdukiewicz, K.: 1935, ‘Die syntaktische Konnexit¨ at’. Studia Philosophica 1, 1–27. Andrews, P.: 1986, An Introduction to Mathematical Logic and Type Theory: To Truth through Proof. Academic Press. Bergstr¨ om, D.: 1995, ‘Generalising Categorial Grammar Discovery’. Manuscript, Barcelona. Bonato, R. and C. Retoré: 2001, ‘Learning Rigid Lambek Grammars and Minimalist Grammars from Structured Sentences’. In: L. Popelinsky and M. Nepil (eds.): Proceedings of the Third Learning Language in Logic (LLL) Workshop. FI MU Brno, Czech Republic. Technical report FIMU-RS-2001-08. Buszkowski, W.: 1987, ‘Discovery Procedures for Categorial Grammars’. In: E. Klein and J. van Benthem (eds.): Categories, Polymorphism, and Unification. Universiteit van Amsterdam and University of Edinburgh, pp. 36–64. Buszkowski, W. and G. Penn: 1990, ‘Categorial grammars determined from linguistic data by unification’. Studia Logica 49, 431–454. Carpenter, B.: 1999, ‘The Turing-completeness of multimodal categorial grammars’. In: J. Gerbrandy, M. Marx, M. de Rijke, and Y. Venema (eds.): JFAK: Essays dedicated to Johan van Benthem on the occasion of his 50th birthday. Institute for Logic, Language, and Computation, University of Amsterdam. Available on CD-ROM at http://turing.wins.uva.nl.

semboot1.tex; 17/09/2002; 13:50; p.39

40

Fulop

Cussens, J. and S. Dˇzeroski (eds.): 2000, Learning Language in Logic, No. 1925 in Lecture Notes in Artificial Intelligence. Berlin: Springer. Dudau-Sofronie, D., I. Tellier, and M. Tommasi: 2001, ‘From Logic to Grammars via Types’. In: Proceedings of the 3rd Learning Language in Logic Workshop. Strasbourg, France, pp. 35–46. Dunn, J. M.: 1993, ‘Partial Gaggles Applied to Logics with Restricted Structural Rules’. In: K. Doˇsen and P. Schroeder-Heister (eds.): Substructural Logics. Oxford University Press, pp. 63–108. Fulop, S. A.: 2001, ‘On the Logic and Learning of Language’. Manuscript, University of Chicago. Fulop, S. A.: 2002, ‘Learnability of Type-logical Grammars’. In: Proceedings of Formal Grammars / Mathematics of Language, Vol. 53 of Electronic Notes in Theoretical Computer Science. Elsevier. Gentzen, G.: 1934, ‘Untersuchungen u ¨ber das logische Schliessen’. Math. Zeitschrift 39, 176–210, 405–431. English translation in (Szabo, 1969). Gold, E. M.: 1967, ‘Language identification in the limit’. Information and Control 10, 447–474. Hindley, J. R.: 1997, Basic Simple Type Theory. Cambridge University Press. Howard, W. A.: 1980, ‘The formulas-as-types notion of construction’. In: J. P. Seldin and J. R. Hindley (eds.): To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism. New York: Academic Press, pp. 479–490. J¨ ager, G.: 2002a, ‘On the Generative Capacity of Multi-modal Categorial Grammars’. Journal of Language and Computation. To appear. J¨ ager, G.: 2002b, ‘Residuation, Structural Rules and Context Freeness’. Manuscript, University of Potsdam. Kanazawa, M.: 1998, Learnable Classes of Categorial Grammars, Studies in Logic, Language and Information. CSLI Publications and the European Association for Logic, Language and Information. Keenan, E. L. and L. M. Faltz: 1985, Boolean Semantics for Natural Language. Kluwer. Kraak, E.: 1998, ‘A deductive account of French object clitics’. In: E. Hinrichs, A. Kathol, and T. Nakazawa (eds.): Complex Predicates. Academic Press. Moortgat, M.: 1997, ‘Categorial Type Logics’. In: J. van Benthem and A. ter Meulen (eds.): Handbook of Logic and Language. Elsevier. Moortgat, M.: 1999, ‘Meaningful patterns’. In: J. Gerbrandy, M. Marx, M. de Rijke, and Y. Venema (eds.): JFAK: Essays dedicated to Johan van Benthem on the occasion of his 50th birthday. Institute for Logic, Language, and Computation, University of Amsterdam. Available on CD-ROM at http://turing.wins.uva.nl. Morrill, G. V.: 1994, Type Logical Grammar: Categorial Logic of Signs. Dordrecht: Kluwer. Osborne, M. and T. Briscoe: 1998, ‘Learning Stochastic Categorial Grammars’. In: T. M. Ellison (ed.): Proceedings of CoNLL. Somerset, NJ: ACL. Pinker, S.: 1984, Language Learnability and Language Development. Cambridge, MA: Harvard University Press. Siskind, J.: 1991, ‘Dispelling Myths about Language Bootstrapping’. Manuscript, MIT AI Laboratory. Steedman, M.: 1997, Surface Structure and Interpretation. Cambridge, Mass.: The MIT Press. Szabo, M.: 1969, The Collected Papers of Gerhard Gentzen. Amsterdam: NorthHolland.

semboot1.tex; 17/09/2002; 13:50; p.40


41

Tellier, I.: 1999, ‘Towards a Semantic-based Theory of Language Learning’. In: Proceedings of the 12th Amsterdam Colloquium. pp. 217–222. van Benthem, J.: 1988, ‘The semantics of variety in categorial grammar’. In: W. Buszkowski, W. Marciszewski, and J. van Benthem (eds.): Categorial Grammar. Amsterdam: John Benjamins, pp. 37–55. van Benthem, J.: 1991, Language in Action. Amsterdam: North-Holland. Wansing, H.: 1992, ‘Formulas-as-types for a hierarchy of sublogics of intuitionist propositional logic’. In: D. Pearce and H. Wansing (eds.): Non-classical Logics and Information Processing, Vol. 619 of Lecture Notes in Artificial Intelligence. Berlin: Springer-Verlag, pp. 125–145. Watkinson, S. and S. Manandhar: 2000, ‘Unsupervised Lexical Learning with Categorial Grammars Using the LLL Corpus’. In (Cussens and Dˇzeroski, 2000), pp. 218–233. Address for Offprints: Dept. of Linguistics The University of Chicago 1010 E. 59th Street Chicago, IL 60637

semboot1.tex; 17/09/2002; 13:50; p.41

semboot1.tex; 17/09/2002; 13:50; p.42