Word Sense Disambiguation Based on Word

0 downloads 0 Views 397KB Size Report
Data-driven WSD. Henry Anaya-Sánchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori. Word Sense Disambiguation Based on Word Sense Clustering ...
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Word Sense Disambiguation Based on Word Sense Clustering Henry Anaya-S´anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Outline

1

Introduction

2

A Knowledge-driven Framework for WSD

3

A new WSD Method

4

Experimental results

5

Conclusions

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the general task of deciding the appropriate sense for a particular use of a polysemous word given its textual context.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Approaches to WSD

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Approaches to WSD

Word Sense Induction

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Approaches to WSD

Word Sense Induction Data-driven WSD

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Approaches to WSD

Word Sense Induction Data-driven WSD Knowledge-driven WSD

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Behaviour of most knowledge-driven WSD methods

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Behaviour of most knowledge-driven WSD methods

Match a textual context against the knowledge source,

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Behaviour of most knowledge-driven WSD methods

Match a textual context against the knowledge source, select the best match, and

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Behaviour of most knowledge-driven WSD methods

Match a textual context against the knowledge source, select the best match, and retrieve from it the suitable senses for the context constituents.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Our proposal

Introduction of a knowledge-driven framework and a first prototype algorithm for the disambiguation of nouns.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Our proposal

Introduction of a knowledge-driven framework and a first prototype algorithm for the disambiguation of nouns. Idea ... clustering of sense representations as a natural way to stand for the reflected cohesion among the words of a textual unit ...

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Clustering in the WSD area

Two main usages:

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Clustering in the WSD area

Two main usages: Clustering textual contexts to represent different senses in Data-driven WSD and Word Sense Induction . Clustering fine-grained senses into coarse-grained ones for reducing the average polysemy of words.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Knowledge-driven Framework Algorithm

Input: The finite set of nouns N and the textual context T . Output: The disambiguated noun senses. Let S be the set of all senses of nouns in N; repeat G = group(S) G 0 = filter (G , T , matching -function) S = ∪g ∈G 0 {s|s ∈ g } until stopping -criterion return S

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components

Sense representation: Topic Signatures as representation for WordNet nominal senses.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components

Sense representation: Topic Signatures as representation for WordNet nominal senses. Clustering Algorithm: Extended Star Clustering Algorithm, with cosine measure as similarity function.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components

Sense representation: Topic Signatures as representation for WordNet nominal senses. Clustering Algorithm: Extended Star Clustering Algorithm, with cosine measure as similarity function. Matching Function:  P min(¯ gi , Ti ) X   i P P matching -function(g , T ) = |nouns(g )|, ,− number (s) min( g¯i , Ti ) s∈g 

i

(1)

i

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components (Cont.)

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components (Cont.)

Filtering Function: Orderly selects clusters to build a cover for N.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components (Cont.)

Filtering Function: Orderly selects clusters to build a cover for N. Stopping Criterion: β0 (i) =

 percentile(90, sim(S)) 

min

{β = percentile(90 + q, sim(S))|β > β0 (i − 1)}

if i = 0, otherwise. (2)

q∈{0,5,10}

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Definition of components (Cont.)

Filtering Function: Orderly selects clusters to build a cover for N. Stopping Criterion: β0 (i) =

 percentile(90, sim(S)) 

min

{β = percentile(90 + q, sim(S))|β > β0 (i − 1)}

if i = 0, otherwise. (2)

q∈{0,5,10}

percentile(p, sim(S)) represents the p-th percentile value of sim(S) = {cos(si , sj )|si , sj ∈ S, i 6= j} ∪ {1}

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Example

Disambiguation of all nouns in the sentence “The competition gave evidence of the athlete’s skills”.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Example

competition: #1 refers to a business relation in which two parties compete to gain customers. #2 refers to an occasion on which a winner is selected from among two or more contestants (hypernym of athletic contest, race, trial, etc.). #3 refers to the act of competing as for profit or a prize. #4 refers to the contestant you hope to defeat.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Example

evidence: #1 refers to your basis for belief or disbelief; knowledge on which to base belief. #2 refers to an indication that makes something evident. #3 refers to (law) all the means by which any alleged matter of fact whose truth is investigated at judicial trial is established or disproved.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Example

athlete: #1 referst to a person trained to compete in sports. skill #1 refers to an ability that has been acquired by training. #2 refers to an ability to produce solutions in some problem domain.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Example

Figure: Disambiguation of nouns in “The competition gave evidence of the athlete’s skills”.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Table: WSD performance in SemCor categories. Categories

A. Press: reportage C. Press: reportage L. Mystery & detective fiction F. Popular lore P. Romance & love story H. Miscellaneous M. Science fiction B. Press: editorial K. General fiction E. Skill & Hobbies G. Belles letters, biography, essays R. Humor N. Adventure & western fiction J. Learned D. Religion Brown 1 Brown 2 Whole SemCor

Polysemous nouns 0.606 0.504 0.498 0.482 0.480 0.479 0.479 0.476 0.476 0.473 0.462 0.461 0.452 0.444 0.388 0.475 0.467 0.472

All nouns 0.683 0.602 0.589 0.604 0.581 0.590 0.587 0.599 0.580 0.586 0.563 0.576 0.552 0.571 0.494 0.588 0.576 0.582

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Table: Results using different topic signatures. Signature Based only on WordNet Web-based

Recall 0.501 0.433

Polysemous nouns Precision F Coverage 0.501 0.501 100 % 0.461 0.447 93.8 %

Recall 0.603 0.536

All nouns Precision F 0.603 0.603 0.565 0.550

Signature Based only on WordNet Web-based

Coverage 100 % 94.9 %

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Table: Overall performance. WSD method Conceptual density Lesk UNED method Specification marks

Recall 0.220 0.274 0.313 0.391

Full coverage not not not yes

Our method

0.472

yes

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Conclusions and Future Work

1. A general framework for the disambiguation of nouns and a first prototype have been introduced. 2. Its novelty consists of the use of clustering. 3. Different disambiguation algorithms can be obtained from the framework. 4. Results are encouraging.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Conclusions and Future Work

1. A general framework for the disambiguation of nouns and a first prototype have been introduced. 2. Its novelty consists of the use of clustering. 3. Different disambiguation algorithms can be obtained from the framework. 4. Results are encouraging. Further work: To instance & extend.

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering

Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions

Thanks!

Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering