Electronic NotesinTheoretical Computer Science 53 (2001) URL: http://www.elsevier.nl/locate/entcs/volume53.h tml
TreeUnification Grammar ProblemsandProposalsfor Topology,TAG,andGerman
Kim Gerdes
1
Lattice,UFRL Université Paris7
Abstract Thisworkpresentsalexicalizedgrammarformalismwh multi-componenttreeadjoininggrammar(TAG).Thisfo scribingthesyntaxofGermanbecauseirtelatesasynt hierarchyoftopologicaldomains.Thetopologicalphr mentofverbalandnominalelementsinthe(ordered) caltopologicalanalysisotfheGermansentence.Thi stepbetweenthesemanticandtheprosodicrepresentati derivingcasesoscrambling f thatareproblematicfor structure based formalisms.
ichcanbeseenasavariantof rmalismiswell-suitedfordeacticdependencygraphwitha asestructureencodestheplacefieldstructure,justasintheclassim s oduleconstitutesanintermediate onsofthesentenceandallows classicalTAGsandforsomephrase
DiedeutscheWortfolgeistnicht„frei“,sonderndenkbedingt.
1 Introduction Thispaperproposesyetanotherlexicalizedtreegramma TAGfamilywiththepurposeofcapturingGermanwordor andyetanotherlinearizationsystemfordependencygramma topologicalmodelo ft heGermansentence.Whatsetst his previousworkisthatt heproposedformalism,TreeUnif accomplishesboththesetasksat hesametime.Moreover, poseofalexicalizedtreegrammart oserveasasynta moduleinsidet heMeaning-Text-Theory(MTT,Mel’
1 Email :
[email protected] Thispaperh asbenefitedgreatlyfromdiscussionswith YonYoo,andPatriceLopez.Iassumethecustomaryrespon shortcomings. 2 TheGermanwordorderins ot‘free’,butthought-co
rformalisminthe derphenomena, rsbasedonthe workapartfrom icationGrammars, weseethepurcticcorrespondence čuk1987).
SylvainKahane,IgorMel’ čuk,Hisibilityforcontentand nditioned.Drach1937,page26
©2001PublishedbE y lsevierScienceBV. .
2
GERDES
InMTT,languageisdescribedas (ar eversible)processofgeneratpossibly conceptual graph ingtext(theactualspokenorwritten language) frommeaning, which is semantic graph thoughtofasaconceptualstructureo f (encodest hetarolesandcommunicative hierarchyotheme/rheme) f whattosay.Onitswaytobecome text,specificcorrespondencemodules possibly otherintermediate levels transformthemeaningintodifferent like deepsyntax intermediate representations (See surface syntaxtree Figure1).Int hepresentwork,weuse (encodes subcategorization andpasses aslightlyrevisedversionofMTTfolthroughwordgroups) lowingGerdesandKahane2001a: proposedmodule Betweenthe phonologicalandthe topologicalphrase structure tree syntacticrepresentationwestipulate (encodes h a ierarchy o word f g roups) theexistenceofahierarchyofword surface string domains, called topological phrase structure,t obeintroducedbelow,rephonologicalrepresentation placingthe‘morphologicallevel’in usualMTT. Thelineofargumentotfhispaper Figure1Str : ucturesinMTT goesasfollows:ThenexttwosectionsintendtojustifywhyanadequatelexicalizedtreegrammarforGermanhastogofa ro fft hetrackof usualTreeAdjoiningGrammars:I nsection2,Irecallbr ieflywhyaphrase structureviewonGermanwillnotgivesatisfyinginsights inthefunctioning oft helanguage.Insection3,sI ummarizet heeffortst hathavebeendonet o adoptTAGtoGermanandviceversa.Section4givesashor introduction t of analternativeviewonphrasestructurebasedontheclassic alt opological modelotfheGermansentencestructures.Thisphrase structuredoesnotitselfcarryanysyntacticorsemanticinformation,butit caneasilybelinked toaphonologicalanalysisononesideandtoan(unordered )syntacticdependencystructureontheotherside.I nsection5,then I p resentsomephenomenaoG f ermanconstituentformation,whicharedifficult tohandleat he syntacticlevel.Theseconstituentswillbeshowntobeofd ifferentnature thansyntacticconstituents,andshouldbecontrolledonase manticlevel,i.e. bythemoduleintheMTTframeworkthatlinkst hesemantic withthesyntacticrepresentation.Weendupwitha‘lightened’task ofthesyntactic module.Thist askwillfinallybeshowntobeaccomplished byTreeUnificationGrammars,tobde efinedandillustratedinsections 6and7. TheappearanceofatopologicalstructureinanMTTgra mmar,where phonologyandsyntaxhavetobejoined,seemssurprisinglynat ural:Erich Drach,t hefatherofGermansentencetopologywhocoinedto day’susual fieldnames,alsowrotet hefollowinglinesin 1937: UmAufschlußzugeben,
2
GERDES
„wieeinegesprocheneoderniedergeschriebeneÄußer ungalszutreffender AusdruckdesgemeintenBedeutungserlebnisseszustan dekommt,“muß„von derSprechdenk-Funktion,demSchöpfungsaktdesSatz esinderSeeledes Sprechenden, ausgegangenwerden:vonderBeobachtungdesSpreche nsals 3 PersönlichkeitsleistungundalssozialenHandelns. “.Aclearforerunnero f theapplicationotafopologicalstructureinanMTT-fr amework.
2 Phrases tructurea nditss hortcomings Classicalphrasestructuretriestocollapsesyntacti candorderinginformation.Thisconceptionoft hesyntaxoflanguageiserroneous becauseitassumest hatwordorderisalwaysanimmediatereflecti onoft hesyntactichierarchyandthatanydeviationfromthisconstitutesaprobl em,denotedby 4 termslike scrambling . Modernlinguisticframeworksproposeadoublestructur econsistingat leastotfhebasicsyntacticstructure(valency,functorargumentstructure,fstructure,deepstructure –wewilluset het ermsyntactic dependency)anda linearstructure(surfacestructure,c-structure,phr asestructure,precedence rules,etc.);someframeworkslikeLFGputt hisdual ityatt hebasisotfheir system,o thers,likeHPSGunifythedifferentstructure s(e.g.SYNSEMand DTRS)ino nesign. Whilethesyntacticdependencygivesrisetolittlecontro versy,t hedifferentphrasestructuresproposedforlinearizationcons titutet heboneosfyntacticcontention.Thedifferentapproachesappeart obec aughtinthet ransformationalt hinkingwhichassumestwocloselyrelatedst ructures:Analready-ordereddeepstructureholdsfunctionalinformati onandtransforms, viamovements,intoasurfacestructurewhichthenstillca rriessomefunctionalinformationhiddenint henodest hathavenotgottenhold ofanyothe f itinerantelements.Accordingly,agoodsurfacephrases tructureiso nethat carriesasmuchfunctionalinformationaspossible,fori thadbeeneasyto obtainthesurfacestructurebyt ransformationfromthede epsyntaxtree.For English,t hisattemptcangoquitefar;forlanguageswi thcasehowever,t he wordorderservesmainlytorepresento thercommunicative goalst hanfunctionalrecovery. TheattemptsforatransformationalgrammarofGermanr esultedina greatvarietyofsurfacephrasestructures:Evers1975 ’sanalysisputsall NP’so nt hehighestpossiblenodeaftert het ransformationf romadeepstruc3 “For a nexplanationofhowaspokenorwrittenutterance them eantsignificantexperience,weh avet ohypothesi tion,onthea ctofcreationoft hesentencei nthesoul thea ctofspeakingaapersonal s achievementandaas 4 First usedbyRoss(1967).
becomesacorrectexpressionof zeonthespeechandthoughtfuncoft hespeaker:ontheobservationof socialaction.”(Drach1937,p.7)
3
GERDES
ture.Müller1999,onthecontrary,advocatesbinaryandr phrasestructuretreesforhisHPSGgrammar.Theyall phrasestructureswhosesubtreesdonotcorrespondtoli prosodic,semantic…)objectsoft heiro wn,t heirjustifi transformationalproximityt ot hedeepstructures.
ight-combed endupwithsurface nguistic(functional, cationreliesonthe
3 Germansentencet opology TheclassicalanalysisofGermansentencestructure(Dra ch1937,Bech 1955)dividest hesentenceintoafixedsequenceoffields i,nwhichthesyntacticelementsareplaced.Wedenoteby domainasequenceoffields.The maindomain ofadeclarativesentenceconsistsoV f orfeld(VF),le ftbracket (‘[’),Mittelfeld(MF),rightbracket(‘]’)andNachfel d(NF).Wecallt he fieldsVF,MF,andNF majorf ields . Theideaisthatwordsdonotpositionthemselvesinrelat iontoeach other,butt hatt heyappearint hefields,whicharepresent ateveryutterance. Thefieldstructurecontrolsthepossibleordersbyconst rainingthenumber ofelementst hattheymusthold. Kathol1995proposesaformalizationofthetopologica lstructurein HPSG,refiningworkofReape1994.Heshowsthatt hisst ructureisindependento fphrasestructureandessentialforlinearizing German.However, basedontheHPSGframework,hestillneedstokeepphras estructurefor combiningsigns.I nasense,hekeepsthreelevelsofdescri ption:Thedomainstructure(DOM)givingthelinearization,thephrases tructuretree (DTRS),representinghowthestructurehasbeenbuild,andt hedependency graph(encodedunderSYNSEM),correspondingt ot hesubcate gorization. Recentworksindependencygrammarhavetriedtolinkdire ctlythede5 pendencystructuretotheplacemento ft hewordsindiffe rentfields, skippingtheconstituentstructureunderlyingformalismslike HPSG.SeeBröker 1998foralexicalizeddescriptionofverysimplephenom enabasedonmodal logic,DuchierandDebusman2001foradescriptionincons traintprogramming,andGerdesandKahane2001aforadescriptionofa topologicalhierarchyseenasasyntacticmoduleoM f eaningTextTheory.T hedevelopment ofTreeUnificationGrammarsisanattemptt olexicaliz eandfine-tunethis latterapproach. Letusnowturntotheideasunderlyingthet opologicalp hrasestructure: ThealgorithmofGerdesandKahane2001at akesasinputa nunorderedsur5
Thewordorderproblemofm anydependencyanalysesiss problemsopf hrasestructuregrammarsstatedabove:Iis t deringoft hewordsi ntot hedependencygraph,since thedependentsofalexeme)a reevidentlynotsufficie forashortsummaryothe f differentattemptst odefine
4
omehoworthogonalt othe difficulttoencodet hel inearorpurelyl ocalrules(ontheorderingof nt.SeeLombardoandLesmo2000 the‘right’degreeoprojectivity. f
GERDES
hat
D
has
pp
subject
[
vf
versprochen
niemand
mf
]
nf
promised
nobody
indirobj dem Lehrer
zulesen
to the teacher
to read
obj denRoman
V
D
zu-inf
N
N
hatdenRomanniemand ]
mf V
N
hasthenovelnobody
nf V
demLehrerzulesenversprochen totheteachertoreadpromised
the novel
Figure2Correspondence : syntax
topology -
6 facesyntacticdependencytree with amarkupofgroupingsotfhewordso f thetree:Thelexicalelementsareallparto fasecond hierarchy,indicating whichelementswillhavetoformatopologicalconstituent T . hemeaningof theseconstituentsandtheunderlyingrestrictionso ntheirf ormationwillbe discussedinsection5.Outotfhismarked-uptree,we constructanordered hierarchyoft opologicaldomains,t hetopologicalphrase structure,t huslinearizingt hewordsothe f dependencyt ree. Asanillustration,considerFigure2: Thelinearizationwillbe doneby placingtheelementso ft hesyntacticdependencytreeinto themaindomain oft hedeclarativesentence.Westartfromtheroototf het reeandplacet he finiteverbintheleftbracket.Itssubjectcouldgoinone oft hemajorfields, andigoes, t forinstance,intot heMittelfeld. Essentialt othisanalysisist hatverbalcomplementcanbe placedintwo waysintothetopologicalstructure:averbalcomplement alwaysgointothe rightbracketoafdomain,butt hisdomaincaneitherbet he existing domain ofitsverbalgovernoro r,ift heverbheadsagroupingonit os wn,itcanbea newembeddeddomainitcreatesinamajorfieldofits governororina higherdomaincontainingitsgovernor.Anembeddeddomainofno n-finite verbsconsistso nlyofa Mittelfeld,arightbracket,andaNachfeld.Ifict reatessuchanewdomain,t hedomainasawholebehaveslikea non-verbal complementoftheverb.I ntheexample, versprochen’promised’headsa groupingandopenst hereforeanewembeddeddomain.Thisnew domain,as itisheadedbyapastparticiple,canonlygointotheV orfeld –azuinfinitivecouldjoinanymajorfield.
6
Thisa pproachusesavery‘surfacy’versionofdependen cy.Sincesubjectplacementi n Germanisidenticalforauxiliaries,raisingandcont rolverbs,weonlyencodeactual syntacticsub-categorization:T hecontrolledverb zulesen doesn otcontrolitsdeepsubject niemand/nobody,a ndthesubjectbelongst otheauxiliaryort hepastp articiple.SeeFigure2.
5
GERDES
Thenon-verbaldependentofthepastparticiple, demLehrer ‘tothe teacher’,isapartotfhegroupingofitsgovernor,and consequentlyiht ast o stayinitsgovernor’sdomain.Betweenthet womajorfiel dsint heembedded 7 domain,ict hoosest heMittelfeld. Thenextverbaldependent,t heinfinitive, couldagaincreateanewdomaininoneotfhemajorfield soiftsgovernor’s domainoro fadomaincontainingitsgovernor.Theotherchoi ceist ojoin therightbracketoiftsgovernor’sdomain.I ntheexample, thecreationofa newdomainisnotpossiblebecauseiitspartoiftsgove rnor’sgrouping.The infinitivehast ostayinitsgovernor’sdomain,andint hisc ase,itmustgodirectlyt ot heleftofitsgovernorint herightbracket. Nowitremainsonlytoplacethelastcomplement,“theno vel”.I tcan againgoinoneotfhemajorfieldsoiftsgovernor’sdom ain,oor afdomain containingitsgovernor.Sincethegroupingcutsiot uto f itsgovernor’sdomain,itfindsitselfnaturallyinthehigherdomain,nextt o niemandinthe Mittelfeld.Allelementso ft hedependencytreehavebeenp ositionedinthe topologicalstructureandt hederivationicsompleted. Sowehavelinearizedthe(unordered)nodesotfhedependenc ytreeinto 8 sentence(1), va ariationoRambow f 1994’smainexample . (1)Dem Lehrerzulesen versprochen hatden Roman niema The teacherto readpromisedhasthe novelnobody. Nobodyhaspromisedtotheteachertoreadthenovel
nd. .
Theotherpossiblesurfaceorderso G f ermancanbeobtained withother 9 groupings . ConsideralsotheotherexamplesinFigure3:I ncaseA, thesentenceis notfurthersubdividedintogroups,andtheconstructedtop ologyconsequentlyhasnoembeddeddomains.I ncaseB,t heinfinitiveand itscomplementformagroup,andthecorrespondingdomaincouldocc upyanymajor field.Inthisexamplet hedomaingoesintotheVorfeld,bu st tartingfromthe samestructure,wecouldalsogeneratethesurfacest ringasincaseC.This stringisidenticaltothestringincaseAi.e. , asente ncecanbet opologically ambiguous.WehaveshowninGerdesandKahane2001b,t hatt his distinctionisnotaspuriousambiguity,butcorrespondst odif ferentprosodicpatternsotfhesamestring,andthust odifferentlinguistic structures.Thepure 10 dependencyt reewithoutthegroupmark-updoesnotcaptu ret hisdifference
7
NP’si ntheNachfeldhaveaheavinessconstraint,n ot discussedhere.Seeforexample Müller1999section13.1.1.3 8 Readingthenovel isabetterexamplet han repairingthef ridge because , wea voidconfusionwitht hebenefactivedative( ihmdasFahrradreparieren vs. *ihmdenRomanl esen) 9 The constraintsonwhatcanformagroupa rediscussed insection 5. 10 TopologicalstructureArequiresaveryspecificdiscou rsecontexta ndist hereforem ore difficulttoobtaint hant het hreeotherexamples.See Gerdesa ndK ahane2001bfordetails.
6
.
GERDES hat has
pp
subject niemand
versprochen
nobody
promised
indirobj dem Lehrer
D
A
mf
vf
[
N
V
]
nf
zu-inf zulesen
to the teacher
to read
N
N
V
V
demLehrer niemand hat denRomanzulese
nversprochen
totheteacher thenovel nobody to has readpromi
obj
sed
denRoman the novel
hat has
D
pp
subject
B
niemand
versprochen
nobody
promised
indirobj dem Lehrer
[
vf
zu-inf
V
D
]
N
N
nf
V
hatdemLehrerniemandversprochen
zulesen
to the teacher
mf
to read
mf
]
N
V
hastotheteachernobodypromised
nf
obj denRoman
denRomanzulesen
the novel
thenoveltoread
D
C vf
[
N
V
mf
]
N
D
niemand dem hat Lehrer
nf
V
versprochen
nobody has totheteacher
mf
promised ] nf
N
V
denRomanzulesen thenoveltoread
D
hat has
pp
subject niemand
versprochen
nobody
promised
indirobj dem Lehrer to the teacher
vf
D
D
[
mf
]
V
N
V
niemand hat zu-inf
zulesen to read
mf
]
N
V
nobodyhas nf D
demLehrerversprochen obj
totheteacherpromised
denRoman
mf
]
N
V
nf
denRomanzulesen
the novel
thenoveltoread
Figure3 Syntacticdependencytreewithgrouphierarchyandtheirc
orrespondingtopologicalphrasestructuretrees 7
nf
GERDES
Therulespresentedaboveconstitutet hebackbone pendencylinearizationinatopologicalmodel.Fordetai ruleswerefert oGerdesandKahane2001a.Iwouldjus reader’sattentiont othefactt hatdo I nott reatt hest in thiswork.I nthesyntacticdependencytrees,NPsarerep innerstructure,evokingTesnière’snuclei(Tesnière1959) clustersom f eaning.Ocf ourse,inthelightoaftopolo man,iw t ouldbereasonabletoexplorethepossibilitieso asaspecifickindofdomain,fromwhichextractionispo conditionsjustasfromverbaldomains.
ofadescriptionodf elsandfiner-grained tliketodrawthe ructureonf ounphrases resentedwithout i,.e.unstructured gicalanalysisoG f eraf nalysingtheNP ssibleundercertain
4 TAGsa ndtheirs hortcomings AlexicalizedTAGisasimplemathematicallanguagemo delwithnicecomputationalproperties:Alexicalentryconsistsoef lem entarytreest hatcombinewithothert reesbyverysimplerulest oformthefina pl hrasestructure oft heanalyzedsentence.Notingdownthestepstakenyield saderivation tree,interpretableasasemanticdependencystructurec onsistingoft helexicalunits.Acompleteanalysisconsistsoft hestring,t heattachedderived tree,andthederivationt ree.Beckerealii t 91called obtainingt hecorrectobjects weak, strong,and derivational generativepower 11, respectively. Differentapproacheshavetriedtoconstructasemantics tructureduring theTAGderivation:SynchronousTAGsconstructasemantict reeinparallel totheusualderivationtree(whichraisesconsiderablythe computational complexityoftheformalism)(Shieber&Schabes1990).Jo shiandKallmeyer1999givet oTAGarestrictedmulti-componentma keup,designedfor scopeinterpretabilityothe f derivationt ree. Thislatterapproachpresumesthegoalo fTAGtobead irectlinkbetweenthesurfacestringandasemanticstructure.Equal ly,inthet woimportantexistingTAGgrammars,XTAG(XTAGgroup,1995)and FTAG (Abeillé1991),t hederivationtreesaresupposedtoenc odedependencieso f amoreprofoundlevelt hansimplesyntactic sub-categorization;e.g.raising verbs,notcarryingtheirsubjectintheirelementarytree, areadjoinedinto theinfinitives,resultinginaderivedstructurewheret he finiteraisingverbis notlinkedtothesubjectiat greeswith,i.e.t heraising verbisgiventherole ofapuremodifiero ft he“main”verb.However,t his“sem antic-ambition”12 oft hederivedtreefallsfarshortopf erfectionwhen,f orexample,adjectives 11
Thisn otationdisregardst hefactt hatt hederivation tachest oasentence,anditshouldsimplybeconsidere power. 12 Iti satl easta“deep-syntax-ambition”,dependingon questions.
ispartotfhe dasapartoft he
analysist hatTAGatstronggenerative
whereweplaceverbalvalency
8
GERDES
adjointooneanotherand semantic tree nott othenoun(Schabesand (obtainedwithsynchronous Shieber1994),o rwhenthe TAGs) derivedtreeofcontrolverb derivationtree constructionsdoesnoten(semantic anddeepsyntax codethe“controlled”link interpretation) betweensubjectandinfinideepsyntaxtree tive (Candito and Kahane (implicitly definedwiththe 1998a).Ist eemsmuchmore derivedtree) reasonabletolimito urselves derivedtree rightfromthestartt oasur(semantic,syntactic,or facesyntaxdependencyenprosodic interpret ation) coding exclusively surface surface string syntactic relations. HowFigure4Structures : inthestandardTAGanalysis ever, the writers of the LTAG-grammars did not haveachoice:Forexample,t heonlywaytocoverlong-dist ancerelationshipsinthe(single-component)TAGformalismist headjunc tionoft hematrixverbs,resultinginderivedstructureswithmixed(sem anticandsyntactic)informationcontent. Inthiscorrespondencefromthesurfacestringtothe(sem antic)derivationtree,t heroleotfhederivedtreeremainst heoretic allyandcomputationallyunclear:I tattemptstoresembleGB’ssurfacesy ntactictreeassome emptynodesaremarkedwithepsilons,butt hedeepsyntac tictreewhose elementhavebeenmovedisnevercalculated.Thesetreesinher itt hehandicapofGB’ssurfacesyntactictrees:Somenodesares implycomputational necessities,o therstendtorepresentsemantic,syntactic ,o rprosodicunits. Forexample,GB’ssurfacesyntactict reesneedintermed iatelandingsitesfor themovingobjects.TAG’sderivedtreehasevenmorenodesw hosesubtrees donotcorrespondto syntacticentities:Eachadjunctionaddsanadditional nodelevelintothederivedstructureandevenelementarytr eescannotbe flat,becausesisteradjunctionisnotavailable.Thenodes allowcontrolling 13 adjunctionbetweenelements,andarevitalinTAGs mainlyforexpressing thelinearizationrules;t heydonotstemfromlinguistic observation 14.The resultingmostlyright-branchingVPorNPstructuresare oftenjustifiedwith scopepropertiesofadverbialsoradjectives(seefor exampleSchabesand Shieber1994).I nasense,t he raisond’être oft hederivedtreeist hatiat l-
13
Thesameh oldsforDTGs(Rambowetalii1995)andGAGs Ofcoursesomel inguistmightreallywanttoh avet hes structure.Allwecanreallysayisthatalinguist nodesins otpossible.
(Candito,Kahane1998b). ei ntermediaten odesi nthephrase icdescriptionthatdoesnoth avethese
14
9
GERDES
lowedustoobtainasemanticallyinterpretablederivatio ntree;itsstatusis nottherepresentationolafinguisticentity. However,eventhiscompulsoryopen-mindednessoft heLTAGw riters concerningthederivedtreedoesnotsuffice:Nomatterw hichderivedtree 15 wetake,aslongasTAG’sstrongcooccurrenceconstraint issupposedto hold,wecannoto btainthepredicate-argumentstructureof adoublematrix constructionwithfrontedinnerargumentinEnglish(Rambow,V ijayShanker,Weir,1995),andforGerman,Beckereat lii,1 991,1992showthat TAGcannotdescribet he‘scrambling’phenomenainsatisfying a manner. Inspiteotfhesedrawbacksweshouldnotgiveuprightaw ayt heideao f alexicalizedtreegrammar,morepreciselyalexical grammarwhoselexical entriescanbecombinedintwomannersinparallel:toform anordered phrasestructureandanunordereddependencytree.Myobje ctiveist ousea lexicalizedtreegrammar asamoduleinanMTTapproach,i.e.ast hecorrespondencemodulebetweenthet opologicalandthe(surface) syntacticstructureoof urlinguisticrepresentation.Theexistenceotf hedifferentlevelscan bejustifiedcomputationallybythesimplicityofthetw ocorrespondence modules,oneforo btainingastructuret heo therfort rans latingiinto t thefollowingstructure.However,t helevelscanalsobevalidate dintuitivelyand psycho-linguisticallybytheexpressivenessoft hestructur esandtheir(possible)well-formednessrules.SoIsupposeaspecific phrasestructure(a topologicalhierarchy,whichItriedtojustifyinsecti on3),andaspecific dependency(onlysimplesub-categorizationstructuresfo ragreement),and I’mlookingforanalgorithmthatlinksthetwostructure scompositionally (withcorrespondingsubstructures).Wecouldcallt hisc apacityofalexicalizedgrammarthe descriptivestronggenerativepower . SinceTAGcannotanalyzeGermanwithanyphrasestructurea ndwitha derivedtreethatencodessyntacticorsemanticdependenci es,itisclear a fortiorit hatTAGs(anditscloserelatives,whichdonotallows isteradjunction)lackthedescriptivestronggenerativepowerfort he topologicalmodel, i.e.t hepowert oengendert hedesiredtopologicalstru ctureandthesurface syntaxderivationtreeinparallel.Mygoalist hust ode finealexicalizedtree grammarwithenoughdescriptivestronggenerativepowerfo rt herelation betweensurface a syntaxdependencyandt opology. Wewillseethatt hisgrammarshouldalsoremedyanother flawofTAG: Sinceelementarytreesosf tandardTAGhaveo rderedbranche s,weo btaina combinatorialexplosionoft reesundistinguishablefromsynt acticambiguity andthusahighinformationredundancy,inparticularforfr eerwordorder 16 languageslikeGerman. 15
Apredicatecontainsi ni tselementaryt reealeas t one t n odeforeachoits f a rguments. Oneproposedsolution,them etagrammar(Candito1999), solvest hepracticalaspectsof grammargeneration,butm oveslinguisticdescriptionout oft hetreesetsi ntothemeta-
16
10
GERDES
5 Communicativeg roupsa ndtopologicalwellformedness –dividingthet asksa mongthem odules. Itiswellknownthatthe“freedom”ofGerman(oranyothe rcaselanguage’s)wordorderiso nlyrelativet oagivensub-cate gorization,whendisregardingthecontextinwhichthesentenceisuttered.Soit ismostlyagreed ont hatthespeakerchooseso neo rderoelements f o veranot hero rder,forexampleintheGermanMittelfeld,t odistinguisholdinform ationfromnew one,t odistinguishwhatshet alksaboutfromwhatshesays aboutit,t odistinguishwhatshefindsimportantfromlessimportantinfor mation.Wecall thesedistinctions communicativestructure ;amorecommonlyusedtermis ‘informationstructure’ 17.
5.1. DataonVPFronting Itislessclearwhatkindofrulesgoverntheformation ofsocalled(partial)VPs. Theproblemstandso utclearlywhent heVPtakest heVorfe ld,because“ dasinsVorfeldverlegteSatzglied –gleichviel,wieesgrammatisch verwendetsei –kannbeliebiguntergliedertwerden.Immerjedoch bleibtes 18 einGanzes ” (Drach1937,page21). Theexamplesin(2),(3),and(4)have anNPjoiningthepast(orpassive)participletoformo neconstituentinthe Vorfeld.Thefirstquestionis:Whatkindofentityits heVorfeldinsentences like(2)and(3)? (2)a.
Den Roman gelesen hatPeterbishernicht. The novel(acc)readhasPeterso-farnot. Sofar,Peterhasnotreadthenovel.
b.
Ein berühmterGeigergeworden
wäreegerne. r
19
Afamousviolinist(nom)become,wouldhw e ith-pleasure. Hewouldhavelikedtobecomeafamousviolinist
grammar,reducingt heelementaryt reest osomea lgorit hmicsideproduct.SeeGerdes2002 fordetails. 17 pI refert het erm‘communicativestructure’,because dI on otknowwhatinformationis. Moreover,t hecomplexNP‘informationstructure’isambigu ousbetweentheintended reading‘structureotfhei nformation’=‘structuredi nformation’a ndt her eading‘structure thatcontains/givesinformation’=‘informativestructure ’.Interestingdiscussionsonthe termsi nquestioncanbefoundi nChoi1999(section3.2 .2),Vallduví1992a ndLambrecht 1994. 18 “Thephraset hatism ovedt ot heVorfeld –h oweverm aybei tsgrammaticaluse –canbe subdivideda rbitrarily.Italwaysr emainsa nentity.” 19 Thecontrastbetween(2b)a nd(4a),bothfrontedconstitu entswithnominativearguments,goest oshowthatthedistinctionbetweenthet er m subjectand nominativeargument isworthwhile.
11
GERDES
(3)a.
20
Ein Linguistangekommen ist(*sind)bishernicht. Alinguist(nom)arrivedhasso-farnot. Sofar,nolinguisthasarrived
b.
Solcheschönen Geschenkegemacht
wurden (*wurde)mirnoch nie.
Suchnice presents(nom)offeredwere (*was)to-me so-f have I nevergottenanygiftsthatbeautifulbefore.
c.
VonGrammatikernangeführt tiverVerben. 21
arnever.
werdenauchFällemitdemPartizipintransi-
By grammarianscitedare also caseswiththe participle Caseswiththeparticipleointransitive f verbsarealsocitedbgyr
(4)a?* .
of intransitive verbs. ammarians.
Ein Linguistgeschlafen hatbishernicht.
Alinguist(nom)slepthasso-farnot.
b.?*
DieserFrauunterlaufen
istein Fehlernoch nie.
To-thiswoman(dat)slipped-inim as istake so-farneve
r.
5.2. ProsodicandCommunicativeInterpretationoth f
D e ata
Onefirstanswert othequestiono nt hequalityothese f g roupsitshatthey certainlyareprosodicconstituents:Thewordsint heVor feldformagroupo f wordst hatdoesnotsupportapauseinitsmidstandt ha obtains t aawhole s a typicalmelodiccurve,dependingonthecontextinwhichthes entenceisuttered:Asananswert o(5a),(2a)iso nlypossiblewith afallingcontouro n theVorfeld.Thiscontextmakest heVorfeldtherhemeoft hesentenceand thefallingcontourisidentifiedastypicalrhematicac cent. 22Equivalently, whenweputt hesentenceinacontextwhere denRomangelesen isotfhematiccharacter(5b),t heVorfeldcaneitherhaveaflatp rosodiccurve,usuallyassociatedwithnon-prominentt heme,o rict anhavear aisingpitchaccento nthelastlexicallystressedsyllable,usedinmany languagesforcontrastandperseveranceofathematicelement.Wefindide nticaldatafor questions(6)with(2b)asananswer. (5)aWas . hatPeternoch nichtgetan? Whathasn’tPeterdone yet?
b.HatPeterden Roman gelesen? HasPeterreadthe novelyet?
(6)aWas . wäreegerne? r Whatwouldhleike tboe?
b.Wollteeein r berühmterGeigerwerden? Didhw e antto become famous a violinist?
Thesedataindicateclearlythatt hegroupingoft heelem feldnoto nlyhasaspecificprosodicappearance,butal
20
TheexampleifsromHaider1985. Theexamplesa refromMüller1999. 22 SeeGibbon1998. 21
12
entsintheVorsothatt hisgrouping
GERDES
asawholeplaysaspecificcommunicativerole.Thisis ourquestion. Tosumupweuseananalysisbasedontwobasicbinaryfea to Choi 1999’s point of view: Theme/rheme prominence.Thecommunicativeroleoft hefrontedVPcanbe rhematic;iftheconstituentisthematic,itcanbepromine prominent 24i,ifitsrhematic,iht ast obeprominentinordert obe theVorfeld. 25
5.3. Pushingthecommunicativeresponsibilityonthe
anotheranswert o
23
tures,similar and prominence/nonthematicor ntornonplacedin
semanticlevel
Thenextquestionis:Whatreallyiascommunicativestruc ture?Iitcs lear thatt heprosodicgroup,thecorrespondingstring,thecor respondingdomain, andthecorrespondingparto ft hesyntactictree,canall besaidtopossess thiscommunicativefeature,butwheredoesirt eallycom efrom,wheredoes itmaterialize?Aw t hichlevel,inanMTTviewoflanguage the , communicativestructureiscreated?will I notbecapableogf ive nacompleteanswert o thequestion,andrIefert heinterestedreadert otheboo koncommunicative structureMel’cuk2001.Ijustwantt ogivesomeindic ation,importantfor thejustificationofwhatfollows:Manytreatiseso npro sodyanalyzet heprosodicpatternso naword-stringbase.Theywouldsay:“ DenRomangelesen carriestheprosodicthememarking”,andthisisin asensecorrect,asthe stringios nerealizationothe f underlyingspeechact. Thepresentanalysiso ft heGermansentence,however,relies heavilyon theexistenceoft hesegroupsinthesyntacticdependencytr ee.I nanMTT analysisolfanguage,wehavet owonderaw t hichlevelthes tructuresareinstantiated.Fort hiswehavet odistinguishthefrontedcons tituentsin(2)and (3)fromtheungrammaticalstructureso(f4),andweha vet oask:Aw t hich levelshouldwebestinstantiatet hecommunicativegrouping inordert ocaptureeasilyt heexistingrestrictionso nt hesegroupings? Generally,allconstituentscanenterthefrontedconstitue ntexceptfor subjectsaisn(4a),whichleadst otheideat hatGermanno n-finiteverbsform VPs.Thereareneverthelessexceptionstothisrule:Some NPswithother casemarkingthannominativeareequallydifficultt ogro upwiththeverb,as 23
Choiusest het erms topica nd focus,endingupwith non-prominentfocus which , sounds tomelike defocalizedfocus .Iprefer themeand rheme,whileusingherbinaryfeature prominence. 24 Thisisimilart oVallduví1992whodistinguishes topica nd tail. 25 Aroughdraftoft hepossibleprosodicandcommunicativeva luesoft heVorfeldconstituentcanalreadybefoundinDrach1937:Hen otes thattheVorfeldcaneitherbeoccupiedbyt he expressiveposition ( Ausdruckstelle)for“semanticallyn on-emptywordswitha valueofemotionorwill”,orby“minori nformationoraconnec torwithgiveninformation”.Hedoesn otyetexplicitlystatet hepossibility offillingtheexpressivepositionwith giveni nformation,i.e.the prominenttheme case.
13
GERDES
demonstratedin(4b),wheret hedativeNPanditsverbalhead cannotforma communicativeentity.Thisarisesincaseswheretheargum entplaysavery agentiverole.Ergativeverbs(3a)andverbsintheirpassive voice(3b)seem totoleratebeinghookedtothesubject(anddeepobject). Thedifficultyfor allphrasestructurebasedapproaches,likeforexample HPSG,isthatlinearization,agreement,andtheconstructionort hepredica teargumentstructureisbasedonthephrase.For(3a,b)iht ast obeexplai nedhowandwhere thesubjectverbagreementisdone:DoesthewholefrontedV Pcarrythe 26 agreementvalueordo‘spirits’ carrytheinformationintothefrontedVP? Inversely,theo ptionalPPin(2c)hast obeassignedtheag ent’s θ-roleotfhe verb(seeMüller2000). Inthewhole,t hesemanticrelationbetweenpredicateandnou nappears toplayamoreimportantroleintherestrictionsonVP frontingthanthe nouns’actualcasemarking,asWebelhuth1985alreadyo bse rved.Unsurprisingly,itseemst hatwhenthespeakerdecideso nthecommu nicativegroupingofherspeech-act(theme/rheme,prominent/non-prominent), restrictions applyt hatrelyo nsemanticinformation.(2c)showst hati does t notsufficet o simplyblockallcommunicativegroupingsofagentandpre dicate,butwe onlyneedonespecificruleinthesyntax-semanticinterface forcapturingthe phenomenon: (A)Theabsenceofanagentiveargumentaswellasthecomm groupingoftheagentiveargumentwithitspredicatebotht passiveconstruction. 27 Allt hist oconcludet hato urlanguagemodelshouldplac oft hecommunicativegroupingatt hesemanticlevelo frep evenhigher),becauseatt hislevelt herestrictionsareeas semanticmoduleprovidesthecorrespondencebetweenthisse tureandasurfacesyntacticdependencyt ree.Wearenot thedetaileddescriptionoft hismodule,however,whendec strictionsonVPfrontingareasemanticproblem,weare thattheburdenwpe utont hemoduleins ottooheavy.
unicative riggerthe et heemergence resentation(or ytocapture.The manticstrucconcernedherewith laringthatt hereobligedtoshow
5.4. What’sleftforthesyntacticmodule? Inthe(stillunordered)surfacedependencyt ree,agreem outindependentlyotfhesubsequentactualsurfaceo rder. module,t helinkingofsurfacesyntacticdependencyandto chy,doesnothavetoworryaboutt herestrictionso nthefo beddedVPs.I nthedirectionofsynthesis,t hemodulegener 26
Meurers1999: RaisingSpirits(andassigningt hemcase) Therea re,ofcourseothert riggersotfhepassiveconst nuity,t hata reonf oconcerninthispaper.aI munsu mantic,ordiscursivefeaturedistinguishest het wosu
27
14
entcanbecarried Thus,oursyntactic pologicalhierarrmationofematest hepossible
. ruction,likee.g.discoursecontiret houghwhichcommunicative,serfacer ealizationoFigure f 1.
GERDES
wordorders,whichtransformthegivengroupingsintoembedd eddomain structures,whiletheanalysisreportst heencounteredgr oupingintothesurfacesyntacticstructure.Itremainst hedutyotfhese manticmodulet orefuse theungrammaticalstructureso ft hesentencesin(4).This correspondswell toourintuitionthatt heungrammaticalityoft hesesentenc esisoafdifferent natureasforexampletheagreementclashintheungrammati calvariantso f (3a)and(3b);ist eemslesscleart hatt hesentencesin( 4)arereallyungrammatical,itratherseemsdifficulttoguesswhatthes peakerwantst osay. semantic graph time past
LESEN read
undergoer
agent PETER nbsg
def Peter det comm T, Tp, Rp
ROMAN nbsg
def NOVEL det
surface syntaxtree
surface syntaxtree
wurdenbsg
hat nbsg
3 was pers
3 has pers
pp
subject Peter
der Roman the novel
gelesen
forme pp
read
Peter nb sg
pp
subject nb sg pers 3
gelesen read
forme pp
obj
by
nb sg den Roman pers 3 the novel
comm T,
iobj Peter
nb sg
comm T,
Peter
Tp, Rp
Tp, Rp
proposedmodule
proposedmodule
D
D
vf
[
D
V
mf
]
nf
vf
[
mf
V
P
Tp, Rp
Tp, Rp
D
N
Peter hat
V N denRomangelesen thenread ovel
nf
wurdevon Peter byPwas eter
Peter has ]
]
comm T,
comm T,
mf
mod von
mf
nf
topologicaltree
]
V N derRomangelesen thenread ovel
Figure5Schematic : representationscorrespondingtgoramm Left:activediathesis.Right:passivediathesis.
15
nf
topologicaltree
aticalsentences.
GERDES
Astheanalysisofthesentencesreachesthesyntacticmodu le,where agreementwassuccessfullychecked,wewillcallsente nceslikethosein(4) syntacticallywell-formed a, nd semanticallydefective E . qually,wewereable toassignatopologicalstructuretotheungrammatical versionso f(3a)and (3b),t heclashariseswhenagreementischeckedonthesynta cticlevel.We willcallt hesesentences topologicallywell-formed and syntacticallydefective.
5.5. Examplesocorrespondences f Accordingly,IadvocateanalyzingagrammaticalGermans compositionalcorrespondencebetweenatleastthreerepre specificwell-formedconditions.Figure6showst hesim twogrammaticalsentencesgivenaswrittentext.Thesesent inouranalysistothesamesemanticrepresentation.Ther semanticrepresentationismuchsimplifiedandnotfinegr capturet hechoiceotfhediathesis.Equally,oursyntac notgetholdofallwordordervariationinsideo nefield derscancorrespondt ot hesamesyntacticrepresentation. Aswrittentextcontainsnoindicationontheprosodicstruc Vorfeldconstituent,itscommunicativefeaturesremainu possiblevaluesaVorfeldconstituentcanget(non-prominent prominentt heme(Tp),prominentrheme(Rp)).Aninputfrom module(aspeechanalyzer)wouldspecifythemelodicp 28 ofthecommunicativefeaturecouldbienstantiated.
28
bI elievet hatthefieldsa nddomainsi nthet opological arer eflectionsopf rosodicgroupingsi nvolvedinthel inea analysisiisnasenseacompromiset ocapturewordorde MittelfeldandtheNachfeldarecertainlynotprosodic topologicalm odela ndstipulateforexamplethereplacem grainedfieldsofprecisecommunicativeandprosodicval happenasm elodicschemei ntheMittelfeld:T hefiel forRp,t hefieldfort hetheme,a ndfort herheme.How buildananalyzert hatanalyzesstringsa ndwordswit tionoft heMittelfeldwouldmainlyl eadtoagreata mou callevel.Soweh avet osticktowhatisobservablei nwrit ah eapomany f differentstructuresi nbetween.
16
entenceasa sentationswith plifiedstructuresfor encescorrespond easonist hato ur ainedenoughto ticrepresentationdoes and , twosurfaceo rtureoft he nderspecifiedinthe theme(T), theprosodic attern,andthevalue
hierarchyothe f Germansentence rizationprocess.Thet opological w r ithaprosodict ool.Infact,the units.Wecouldrevolutionizethe entoft heMittelfeldbyfiner ue,correspondingtowhatcan dfort heprominentt heme,t hefield ever,fort hemomentwewantt o houtprosodici nformation.Thepartintoaf mbiguitiesonthet opologitentext:theverbalbrackets,and
GERDES
semantic graph
LESEN
time past
read
agent
undergoer ROMAN nbsg
PETER nbsg
def NOVEL det
Peter
comm T, Tp, Rp
surface syntaxtree
surface syntaxtree
wurdenbsg
hat nbsg
3 was pers
3 has pers
pp
subject
derRoman
gelesen
forme pp
Peter
nb sg pers 3
read
mod
nb sg pers 3
obj
comm T, Tp, Rp
gelesen
forme pp
the novel
read
Peter nb sg
pp
subject
von by
iobj Peter
denRoman the novel
nb sg
comm T,
Peter
Tp, Rp
proposedmodule
proposedmodule
D
D
vf
[
mf
D
V
N
]
[
vf
nf
comm T,
Tp, Rp
Tp, Rp
hat denRoman ]
N
V
]
nf
V N wurdederR oman thenwas ovel
D
the has novel mf
mf
comm T,
nf
topologicaltree
mf
]
P
V
nf
topologicaltree
von Petergelesen
*Peter gelesen
byPread eter
read Peter
Figure6 Left:Schematicrepresentationoasfyntacticallycorrect andsemanticallydefectivesentence Right:Schematicrepresentation ofagrammaticalsentencewithpassivediathesis.
Thelefthand analysisofFigure6showsasemanticallydefectivesentence.Wecanconstructatopologicalphrasestructure andwecantransform thisstructureintoasyntacticstructure.Thesemantic module,however,fails, asiht ast otransferagroupingofverband subjectintoagroupingofpredicateandagent,whatisforbiddenwithrule(A).Thisgrou pingofpredicate andagentispossiblewiththepassiveconstruction,whatis showninthe derivationo nt herighthandside.
17
GERDES
no semantic graph isbuild
no semantic graph isbuild
surface syntaxtree
partsof saurface syntaxtree
nb pl 3 havepers
haben
3 havepers
pp
subject Peter
habennbpl
to the man
gelesen
forme pp
gelesen
forme pp
nb sg pers 3
read
Peter nb sg
pp
des Mannes
read
obj
obj
nb sg den Roman pers 3 the novel
comm T,
nb sg pers 3
comm T,
Tp, Rp
den Roman the novel
Tp, Rp
proposedmodule
proposedmodule D [
vf
D mf
]
nf
vf
comm T, Tp, Rp
V
D
Tp, Rp
N
D
habenPeter
mf
V
N
]
nf
des hat Mannes
Peter have mf
[
comm T,
]
V N *denRomangelesen thenread ovel
ofthe has man
nf
mf
topologicaltree
]
nf
V N *denRomangelesen read thenovel
Figure7 Schematicrepresentationscorresponding to asyntacticallydefectivesentence: Agreementproblem
topologicaltree
Figure8 Schematicrepresentationscorresponding to asyntacticallydefectivesentence: Valencyproblem
Figures7and8showsyntacticallydefectivesentences.N aretopologicallywellformed,fort heagreementprobl notpreventt hesyntacticmoduletoproduceatopologica andeventoconstructasyntacticdependencytreeouto fi formednessconditionoft hesyntacticlevelt hatwillcheck agreement. ThecaseoF f igure8isdifferent:Sincet hegenitiveNP argumento ft heauxiliary,wecannotestablishasyntactic them.Eitherwesaythatinthiscaseasyntacticstructu
18
onetheless,t hey emofFigure7does pl hrasestructure t.I t’sthewellthesubjectverb isnotasyntactic relationbetween recannotbecreated,
GERDES
no syntactic tree is
orweconcludethattheunconnectedpartsareasyntacticstructurethatdoesnotfulfillthewellformednessconditionofconnectedness. 29 Figure9showsatopologically defective sentence. Den Romanand Petercannotcreatea newdomainthatcouldoffera landingsiteforbothoft hemin the Vorfeld, and a connected topologicaltreecannotbeconstructed. Att hispoint,wehaveseena sufficientnumberofillustrations oft hesyntacticmoduleatwork, thenextstepbeingtheformalizationofthecorrespondingalgorithm.
build
proposedmodule
D
vf
[
mf
]
V
N
V
nf
hatgelesen vf
vf
N
N
have read
partsof taopologicaltree
denRomanPeter thePeter novel
Figure9 Schematicrepresentationscorrespondingtao topologicallydefectivesentence
6 GivingGermanaTUG Inwhatfollows,iIntroduceanewlexicalizedtreegram ilybasedonsuperposingandunifyingtreestructures.We TreeUnificationGrammar(TUG). IntheprecedingsectionsIdefinedwhatt healgorithmoft moduleissupposedtoperform:Takingastringofwor logicalphrasestructureonit,andbuildingcompositiona facesyntacticdependencytree.I naddition,t hegrouping intoonetopologicaldomain(withitseventualcommunicati shouldbepassedon,andmarkedonthedependencytree.We performthetaskofbuildingthetopologicalphrasestru combinationprocedureoflexicalizedtreechunks,takingT 29
Fort hesentenceofFigure8,t hesituationwouldbedi Thesyntacticstructurewouldbeconnectedandwell-for canclash.T hesemanticm oduleh oweverwillr emarkthe theLESEN‘read’semanteme.Thisviewallowsaw s ell tionswith‘subjectless’verbsl ikein(i):Fort hethir subjectr emainsoptional.T hesemanticmodulepassesa agentpositiononlyiitfagreesi n umbera ndperson. (i) Mirh atgegraut. Tom(edat.)hasdreaded dreaded. I
19
marint heTAGfamcallt heformalism hesyntactic ds,buildingatopollyinparallelasuroflexicalheads vefeaturevalue) wouldliket o cturewithasimple reeAdjoining
fferentwithoutt hegenitiveNP. medsincenoagreementfeature lackof anagentivea rgumentof anelegantdescriptionofconstrucdpersonsingularform hat‘has’t he nominativeargumenti ntothe
GERDES
Grammarsasamodel.Moreover,theconstructionhast obe compositionalin thesensethatt hesuccessfulcombinationoft wolexicalent rieso n thet opologicallevelshouldfinditsimmediatereflectioninthe surfacesyntacticdependencyt ree. TUGisprincipallyalexicalizationoft healgorithmfor thetopological phrasestructureanalysisoG f ermanpresentedinGerdes andKahane2001a. ItborrowednotionsandideasfromTAGsanditsrelatives D : TGs(Rambow etalii1995)addressthesameproblemoft heso-calle dlongdistancedependenciest hatarenotperipheral(likewh-extraction,whic hTAGscanhandle)butinthemiddleofthephrasestructure(scrambling ).TheAESof AlexisNasr1996placefort hefirstt imet helexicalt reegrammarinaMeaning-Textframework,andbubblegrammarsandGAGs(Kahane,19 97,CanditoandKahane1998b),justasTUGs,addresst heproblem oft helinkbetweenadependencygrammarandano rderedphrasestructur e.Anattemptt o describeacompleteMeaning-Textgrammarlexicallycanbe foundinKahane2001.Thet askofTUGsim s erelyt oserveaascor respondencemodule betweensyntacticdependencyandt opologicalphrasestruct ure:
6.1. Thedefinition Let Vbeanalphabet,letD ∈Vbeadistinguishedletter,let Wbetheseto f words. Wecallt reenodes atomsitfheyaredistinguishedbyalabelLoutoV f andbyabinarycolorfeature.Thisfeaturecantaket heva luefull(i.e.