A Statistics-Based Semantic Textual Entailment

3 downloads 0 Views 376KB Size Report
sbandyopadhyay@cse.jdvu.ac.in www.gelbukh.com. Abstract. .... UNL consists of a set of Universal Words (UW), relations and attributes. Universal .... Precision. (%). Recall. (%). F-Score. (%). RTE-1. Dev. Set (1). YES. 143. 63. 102. 0.44. 0.61.
A Statistics-Based Semantic Textual Entailment System Partha Pakray1, Utsab Barman1, Sivaji Bandyopadhyay1, and Alexander Gelbukh 2 1

Computer Science and Engineering Department, Jadavpur University, Kolkata, India 2 Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico {parthapakray,utsab.barman.ju}@gmail.com, [email protected] www.gelbukh.com

Abstract. We present a Textual Entailment (TE) recognition system that uses semantic features based on the Universal Networking Language (UNL). The proposed TE system compares the UNL relations in both the text and the hypothesis to arrive at the two-way entailment decision. The system has been separately trained on each development corpus released as part of the Recognizing Textual Entailment (RTE) competitions RTE-1, RTE-2, RTE-3 and RTE-5 and tested on the respective RTE test sets. Keywords: textual entailment; Universal Networking Language; Recognizing Textual Entailment data sets.

1

Introduction

Recognizing Textual Entailment (RTE) is one of recent challenges in Natural Language Processing (NLP) [1]. Textual Entailment is defined as a directional relationship between two text expressions, denoted by T, which is the entailing “Text” and H, the entailed “Hypothesis”. T entails H if the meaning of H can be inferred from the meaning of T, as would typically be interpreted by people. Textual Entailment has many applications in NLP tasks. For example, in Summarization (SUM), a summary should be entailed by the text; Paraphrases (PP) can be seen as mutual entailment between two texts; in Information Extraction (IE), the extracted information should also be entailed by the text; in Question Answering (QA), the answer obtained for a question after the Information Retrieval (IR) process must be entailed by the supporting snippet of text. To date there have been six Recognizing Textual Entailment (RTE) competitions: RTE-1 [2] in 2005, RTE-2 [3] in 2006 and RTE-3 [4] in 2007, RTE-4 [5] in 2008, RTE-5 [6] in 2009 and RTE-6 [7] in 2010. In every new competition several new features of RTE were introduced. In 2010, Parser Training and Evaluation using Textual Entailment [8] was organized at SemEval-2. Our work has been tested on the data sets released in those competitions (except RTE-4, where no separate development data was released, and the most recent RTE-6). I. Batyrshin and G. Sidorov (Eds.): MICAI 2011, Part I, LNAI 7094, pp. 267–276, 2011. © Springer-Verlag Berlin Heidelberg 2011

268

P. Pakray et al.

Our system is based on the ideas of [9] but has been optimized for the entailment YES/NO decision using the development set. Our results are better than in [9]. The paper is organized as follows. Related works are described in Section 2. Section 3 explains the UNL Expressions. Section 4 presents our semantic based RTE system architecture. The experiments carried out on the development and test data sets, as well as their results, are described in Section 5. Section 6 includes discussion and error analysis. Conclusions are drawn in Section 7.

2

Related Work

In the various RTE Challenges, several methods have been applied to the textual entailment task. Most of these systems use some sort of lexical matching. A number of systems represent the texts as parse trees (e.g., syntactic or dependency trees) before the actual task. Some of the systems use semantic relations (e.g., logical inference, Semantic Role Labeling) for solving the text and hypothesis entailment problem. The VENSES system [10] (Venice Semantic Evaluation System) is organized as a pipeline of two subsystems: the first is a reduced version of GETARUN, their system for Text Understanding. The output of the system is a flat list of head-dependent structures with Grammatical Relations and Semantic Roles labels. The version [11] presented at RTE-3 uses a linguistically based approach for semantic inference. A syntax-driven semantic analysis system is presented in [12] and uses the notion of atomic proposition as its main element for entailment recognition. The idea is to find the entailment relation in the sentence pairs by comparing the atomic propositions contained in the text and hypothesis sentences. The GROUNDHOG system [13] for recognizing textual entailment utilizes a classification-based approach to combine lexico-semantic information derived from text processing applications along with a large collection of paraphrases automatically acquired from the WWW. A baseline system for modeling textual entailment is presented in [14] that combines deep syntactic analysis with structured lexical meaning descriptions in the FrameNet paradigm. Textual entailment is approximated by degrees of structural and semantic overlap of text and hypothesis, which they measure in a match graph. A Machine Learning approach with Support Vector Machines and AdaBoost to deal with the RTE challenge is presented in [15]. Its authors perform a lexical, syntactic and semantic analysis of the entailment pairs. From this information they compute a set of semantic based distances between sentences. The system presented in [16] generates paraphrases of semantically labeled input sentences using the semantics and syntax encoded in FrameNet, a freely available lexico-semantic database. The algorithm generates a large number of paraphrases with a wide range of syntactic and semantic distances from the input. The system presented in [17] maps premise and hypothesis pairs into an abstract knowledge representation (AKR) and then performs entailment and contradiction detection on the resulting AKRs.

A Statistics-Based Semantic Textual Entailment System

269

The system reported in [18] presents a new data structure, termed compact forest, which allows efficient generation and representation of entailed consequent represented as parse trees. The system presented in [19] performs semantic interpretation of the sentence pairs. It tries to determine if the logic for the H sentence subsumes some inferenceelaborated version of the T sentence, using WordNet and the DIRT paraphrase database as its sources of knowledge. The Monte Carlo Pseudo Inference Engine for Text system [20] addresses the RTE problem in a new theoretic framework for robust inference and logical pattern processing based on integrated deep and shallow semantics. The Boeing Language Understanding Engine [21] can be viewed as comprising of three main elements: parsing, WordNet and DIRT, built on top of a simple baseline of bag-of-words comparison. A joint syntactic-semantic representation to better capture the key information shared by the T-H pair is proposed in [22]. The system applies a co-reference resolver to group cross-sentential mentions of the same entities together. In [23], the entailment recognition is attempted by computing shallow lexical deductions and richer inferences based on semantics. The system works on WordNet, detection of negation terms, named entity recognition, verbs implications and frame semantic analysis.

3

Universal Networking Language

Universal Networking Language (UNL) [24] is an artificial language that expresses information or knowledge in the form of semantic network with hyper-nodes. The applications of UNL have been found in the domains of Machine Translation, Information Retrieval and Multilingual Document Generation [26]. UNL consists of a set of Universal Words (UW), relations and attributes. Universal Words are concepts. The binary relationships among the universal words in the sentence are specified as relations. Attributes are properties of the Universal Words. UNL semantic network includes a set of binary relations and each binary relation relates the two Universal Words that hold the relation. A binary relation of UNL has the following format: (, ). The process of representing natural language sentences in UNL graphs is called enconverting, and the process of generating natural language sentences out of UNL graphs is called deconverting. An EnConverter is a language independent parser, which provides a framework for morphological, syntactic, and semantic analysis synchronously. A DeConverter is a language independent generator, which provides a framework for syntactic and morphological generation synchronously.

4

System Description

Our semantic-based textual entailment system accepts pairs of text snippets (T and H) in the input and outputs a binary value: “YES” if the text T entails H and “NO” otherwise. It is a point-based scoring system, which takes decision based on the scores

270

P. Pakray et al.

of the UNL relations of a T-H pair. The score of the T-H pair is compared with a threshold value (which we calculated empirically by observation of the results) to obtain the entailment decision.

Fig. 1. Semantic Textual Entailment System

4.1

UNL En-Conversion Module

The T-H pairs are converted into UNL expression using the UNL En-Converter (www.unl.ru). An example of UNL expression for a hypothesis from RTE-5 Development data is shown in Figure 2. [S:00] {org:en} Pfizer is accused of murdering 11 children {/org} {unl} obj(accuse(icl>do,equ>charge,cob>abstract_thing,agt>person,obj>person).@entry .@present,pfizer.@topic) qua:01(child(icl>juvenile>thing).@pl,11) obj:01(murder(icl>kill>do,agt>thing,obj>living_thing).@entry,child(icl>juvenile >thing).@pl) cob(accuse(icl>do,equ>charge,cob>abstract_thing,agt>person,obj>person).@entr y.@present,:01) {/unl} [/S] Fig. 2. Example of a hypothesis of RTE-5 Development Data in UNL

A Statistics-Based Semantic Textual Entailment System

4.2

271

Pre-processing Module

Separation. From the different UNL graphs of T and H, individual UWs are extracted using regular expressions. The regular expressions that are used to extract the individual UWs are as follows: For UW1: [#] + [-a-z0-9R:._-‫ \\"\~`'*=&י‬+ [\\s]] + [\\ (.,] For UW2: [,] + [-a-z0-9R:._- &=*'`~\"\\ + [\\s] ] + [\\(.#]

The relation name, scope ID, constraint list and attribute list are separated from a single UNL relation graph. All relations are put up into a logical set in some specific format as per our system requirement: [Relation Name] [Relation Scope ID] {[UW1][UW1 Scope id], [UW2][UW2 Scope id]}

Extraction of different components from a single UNL relation graph is done by using regular expressions. Scope Resolution. The specific task at this step is to resolve the scope ID of UNL relations. For example, consider a hypothesis of the RTE-5 development set in UNL relation format shown in Figure 2. In the fourth relation Cob we find a scope ID ‘:01’ in the place of UW2 that specifies the relation between the present UNL graph and the other UNL graph specifying UW2. In the sentence the main subject / noun in focus is ‘Pfizer’ and the other noun ‘children’ in the predicate part has less focus. However, the second noun is directly affected by the action of first one that has occurred in parallel. The UNL specification defines the relation Cob as “defines a thing that is directly affected by an implicit event done in parallel or an implicit state in parallel.” The result is shown in Figure 3.

Fig. 3. UNL graph of Cob relation

Relation Grouping. Relation grouping step groups UNL relations that are semantically identical [25]. UNL relations are hyper semantic network and UW1, UW2 are two nodes of the graph having a relation specified by the relation name in the UNL graph. The strategy is based on the thematic roles of different relations. Table 1 shows different relation groups and the set of relations in each group. 4.3

Scoring Module

The scoring module calculates the score between a pair of T-H UNL relations. The module assigns points to each relation pair using certain set of rules. The rules are Relation grouping rule, UW Rule, and Name Entity Rule.

272

P. Pakray et al. Table 1. UNL Relation grouping Group Name Agent Object Place Instrument State Time Manner Logical Concept Cause Sequence

Relations Agt, cag, aoj, cao, ptn Obj, cob, opl, ben Plc, plf, plt Ins, met Src, gol, via Tim, tmf, tmt, dur Man, bas And, or Equ, icl,iof Con, pur, rsn Coo, seq, cnt, mod, nam, per, pof, pos, qua

Relation grouping rule checks whether the two UNL relations or UNL graphs, one from the text and another from the hypothesis, are in same relation group. If so, it is considered as a match and one point is assigned to the relation pair. UW rule Checks whether UWs in the two matched UNL relations or graphs are same or they belong to the same synset, i.e., refer to same meaning. We have used the riWordNet (www.rednoise.org/rita/wordnet/documentation/index.htm) for synset matching. One point is considered as the score for each UW match. In case of the presence of scope id in the place of any UW the comparison will be done from the UW list created by the scope resolve module. Name Entity Rule: If there are n Named Entities in H and m Named Entities in T and k be the number of Named Entities that are present in both H and T, then the point for Named Entity will be calculated as the fraction of the Named Entities in Hypothesis that match, i.e., k/n. The composite score of a T-H pair is calculated as follows: Total Score (TS) = Relation Match Point (RMP) + UW1 point (UW1) + UW2 point (UW2) + (k/n)

4.4

(1)

Decision Module

Individual Relation Pair decision: The total score of individual relation pair is calculated using the equation (2). The maximum value of the total score (TSmax) for each individual relation pair is calculated as (4). This gives for our example: TS = RMP + UW1 + UW2+ (k/n) TS max = RMP max + Uw1 max + Uw2 max + (k/n) max

(2) (3)

A Statistics-Based Semantic Textual Entailment System

273

RMP max = Uw1 max = Uw2 max = (k/n) max = 1 Hence TS max = 4

(4) (5)

The minimum value of the total score (TSmin) for each individual relation pair has been observed as 3.5 from the training sets of the various RTEs. If the total score of a relation pair falls between 3.5 and 4, the relation pair is considered as a match. Final Decision by Total Relations Score Calculation: Let Hn be the numbers of UNL relations in hypothesis and Tn, the numbers of UNL relations in text. Then the number of matched relation pairs (Mn) is identified. The final score (FS) for the T-H pair is calculated as (Mn / Hn). It has been observed from the training sets of the various RTEs that the minimum value of FS is 0.96 for the T-H pair to be entailed. Hence, if the FS score for a T-H pair is 0.96 or above then the T-H pair is considered as entailed.

5

Experimental Results

We used the following data sets: RTE-1 development set and test set, RTE-2 development set and test set, RTE-3 development set and test set, RTE-4 test set and RTE-5 main development set and test set to deal with the two-way classification task. Table 2. Evaluation Results Entail- Entailments

Correct

Total

Precision Recall F-Score (%) (%) (%)

RTE Data

ment decision

RTE-1 Dev. Set (1)

YES

143

63

102

0.44

0.61

0.51

NO

144

104

185

0.72

0.56

0.63

RTE-1 Test set

YES

400

172

346

0.43

0.49

0.46

NO

400

262

454

0.65

0.57

0.60

RTE-2 Dev. Set

YES

400

170

272

0.42

0.62

0.50

NO

400

297

528

0.74

0.56

0.64

RTE-2 Test Set

YES

400

262

460

0.65

0.56

0.60

NO

400

212

340

0.53

0.62

0.57

RTE-3 Dev. Set

YES

412

189

278

0.45

0.67

0.54

NO

388

299

522

0.77

0.57

0.65

RTE-3 Test Set

YES

410

285

409

0.69

0.69

0.69

NO

390

262

391

0.67

0.67

0.67

RTE-4 Test Set

YES

500

352

522

0.67

0.70

0.68

NO

500

320

478

0.66

0.64

0.65

RTE-5 Dev. Set

YES

300

176

300

0.58

0.58

0.58

NO

300

175

300

0.58

0.58

0.58

RTE-5 Test Set

YES

300

169

318

0.56

0.53

0.54

NO

300

150

282

0.50

0.53

0.51

in Gold Standard

entailments entailments by our system by our system

274

P. Pakray et al.

The RTE-1 has two development sets, one consisting of 287 text-hypothesis pairs and another consisting of 287 text-hypothesis pairs. The RTE-1 test set, the RTE-2, RTE-3, and RTE-5 development sets and test sets consist of 800 text-hypothesis pairs each. At RTE-4, no development set was provided, as the pairs proposed were very similar to the ones contained in the RTE-3 development and test sets. The RTE-4 test set consisted of 1000 text-hypothesis pairs. Four applications, i.e., IE, IR, QA and SUM, were set as the contexts for the pair generation. The length of the H’s was the same as in the past data sets (RTE-3); however, the T’s were generally longer.

6

Error Analysis

The system has some limitations in the matching of universal words (UW). Two UWs may belong to different layers of geographical hierarchy, one may be a named entity and another may be a description of the named entity itself or one may be an anaphor and the other may be the antecedent. In all such cases, the current level of UW match will not succeed. Some related Text-Hypothesis pairs with the above situations are shown below. Geographical Hypothesis: T: He lives in West Bengal. H: He lives in India. Here all the UNL relations are same, West Bengal and India both are in plc relation but the system doesn’t have enough capabilities to identify the West Bengal is in India. Knowledge of Name Entities Hypothesis: T: Madona has 3 children. H: The rock star has 3 children. The system also does not have enough resource so that it can identify Madona is a rock star. Pronoun Replacement Hypothesis: T: Albert Einstein discovered theory of relativity. H: He discovered the theory of relativity. If the pronoun in text is replaced with proper noun/name entity then the system will increase its performance.

7

Conclusions

Our results show that a semantic-based approach appropriately tackles the textual entailment problem. Experiments have been initiated for a semantic and syntactic based RTE task. Acknowledgements. The work was done under partial support of the DST India— CONACYT Mexico project “Answer Validation through Textual Entailment”, European Project WIQ-EI 269180, IPN (COFAA, SIP 20100773 & 20113295),

A Statistics-Based Semantic Textual Entailment System

275

CONACYT (SNI and Sabbatical program), and Mexico City Government project ICYT PICCO10-120.

References 1. Ledeneva, Y., Sidorov, G.: Recent Advances in Computational Linguistics. Informatica. International Journal of Computing and Informatics 34(2010), 3–18 (2010) 2. Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Proceedings of the First PASCAL Recognizing Textual Entailment Workshop (2005) 3. Bar-Haim, R., Dagan, I., Dolan, B., Ferro, L., Giampiccolo, D., Magnini, B., Szpektor, I.: The Second PASCAL Recognising Textual Entailment Challenge. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, Venice, Italy (2006) 4. Giampiccolo, D., Magnini, B., Dagan, I., Dolan, B.: The Third PASCAL Recognizing Textual Entailment Challenge. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic (2007) 5. Giampiccolo, D., Dang, H.T., Magnini, B., Dagan, I., Cabrio, E.: The Fourth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2008 Proceedings (2008), http://www.nist.gov/tac/publications/2008/papers.html 6. Bentivogli, L., Dagan, I., Dang. H.T., Giampiccolo, D., Magnini, B.: The Fifth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2009 Workshop, National Institute of Standards and Technology Gaithersburg, Maryland, USA (2009) 7. Bentivogli, L., Clark, P., Dagan, I., Dang, H.T., Giampiccolo, D.: The Sixth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2010 Notebook Proceedings (2010) 8. Yuret, D., Han, A., Turgut, Z.: SemEval-2010 Task 12: Parser Evaluation using Textual Entailments. In: Proceedings of the SemEval-2010 Evaluation Exercises on Semantic Evaluation (2010) 9. Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic Textual Entailment Recognition using UNL. Polibits (43), 23–27 (2011) 10. Delmonte, R., Tonelli, S., Boniforti, M.A.P., Bristot, A., Pianta, E.: VENSES – a Linguistically-Based System for Semantic Evaluation. In: Proceedings of the First PASCAL Recognizing Textual Entailment Workshop (2005) 11. Delmonte, R., Bristot, A., Boniforti, M.A.P., Tonelli, S.: Entailment and Anaphora Resolution in RTE3. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic (2007) 12. Akhmatova, E.: Textual Entailment Resolution via Atomic Propositions. In: Proceedings of the First PASCAL Recognizing Textual Entailment Workshop (2005) 13. Hickl, A., Bensley, J., Williams, J., Roberts, K., Rink, B., Shi, Y.: “Recognizing Textual Entailment with LCC’s GROUNDHOG System”. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, Venice, Italy (2006) 14. Burchardt, A., Frank, A.: Approaching Textual Entailment with LFG and FrameNet Frames. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, Venice, Italy (2006) 15. Ferres, D., Rodriguez, H.: Machine Learning with Semantic-Based Distances Between Sentences for Textual Entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic (2007)

276

P. Pakray et al.

16. Ellsworth, M., Janin, A.: Mutaphrase: Paraphrasing with FrameNet. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic (2007) 17. Bobrow, D.G., Condoravdi, C., Crouch, R., de Paiva, V., Karttunen, L., King, T.H., Nairn, R., Price, L., Zaenen, A.: Precision-focused Textual Inference. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic (2007) 18. Bar-Haim, R., Berant, J., Dagan, I., Greental, I., Mirkin, S., Shnarch , E., Szpektor, I.: Efficient Semantic Deduction and Approximate Matching over Compact Parse Forests. The Fourth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2008 Proceedings (2008), http://www.nist.gov/tac/publications/2008/papers.html 19. Clark, P., Harrison, P.: Recognizing Textual Entailment with Logical Inference. The Fourth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2008 Proceedings (2008), http://www.nist.gov/tac/publications/2008/papers.html 20. Bergmair, R.: Monte Carlo Semantics: MCPIET at RTE4. In: TAC 2008 Proceedings (2008), http://www.nist.gov/tac/publications/2008/papers.html 21. Clark, P., Harrison, P.: An Inference-Based Approach to Recognizing Entailment. The Fifth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2009 Workshop Notebook, Gaithersburg, Maryland, USA (2009) 22. Wang, R., Zhang, Y., Neumann, G.: A Joint Syntactic-Semantic Representation for Recognizing Textual Relatedness. The Fifth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2009 Workshop Notebook, Gaithersburg, Maryland, USA (2009) 23. Ferrandez, O., Munoz, R., Palomar, M.: Alicante University at TAC 2009: Experiments in RTE. The Fifth PASCAL Recognizing Textual Entailment Challenge. In: TAC 2009 Workshop Notebook, Gaithersburg, Maryland, USA (2009) 24. UNDL Foundation. Universal networking language (unl) specifications, edition 2006 (August 2006), http://www.undl.org/unlsys/unl/unl2005-e2006/ 25. Ishizuka, M.: A Solid Foundation of Semantic Computing toward Web Intelligence. School of Information Science and Technology, http://www.jst.go.jp/sicp/ws2010_austria/presentation/presen tation_14.pdf 26. Cardeñosa, J., Gelbukh, A., Tovar, E. (eds.): Universal Networking Language: Advances in Theory and Applications. Research on Computing Science, vol. 12, p. 443 (2005), http://www.cicling.org/2005/UNL-book/..