Proceedings of - ACM Digital Library

How to cope with questions typed by dyslexic users Laurianne Sitbon

Patrice Bellot

Laboratoire d’Informatique d’Avignon University of Avignon 339, chemin des Meinajaries Avignon, France

Laboratoire d’Informatique d’Avignon University of Avignon 339, chemin des Meinajaries Avignon, France

[email protected]

[email protected]

ABSTRACT

1.

In this paper we propose a way to cope with questions typed by dyslexic users as they are usually a deformation of the intended query that cannot be corrected with classical spellcheckers. We first propose a new model for statistic question answering systems based on a probabilistic information retrieval model and a combination of results. This model allows a multiple weighted terms query as an input. We also introduce a phonology based approach at the sentence level to derive possible intended terms from typed questions. This approach uses the finite state machine framework to go from phonetic hypothesis and spellchecker proposals to hypothesized sentences thanks to a language model. The final weighted queries are obtained thanks to posterior probabilities computation. They are evaluated according to new density and appearance rating measures which adapt recall and precision to non binary data.

When dyslexic users are searching information through text media they are likely to type their queries in an unexpected form for an information retrieval (IR) system. These are usually designed according to evaluation campaigns specifications such as proposed by TREC [30]. However, there is clearly a direct connection between the aimed query and the words typed. In most cases with dyslexic spellers only humans can guess the intended words of the query, usually through a reading aloud process. We focus our work on question-answering because dyslexia mostly concerns children who are themselves most likely to type questions instead of keywords based queries.

INTRODUCTION

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AND ’08, July 24, 2008 Singapore Copyright l’ 2008 ACM 978-1-60558196-5... $5.00

We show previously in [26] that question answering systems tend to fail answering question typed by real users. This work also demonstrated that the system has really hard time to deal with queries typed by dyslexic or non native users. These two types of users tend to write phonetically and sometimes group words together. Here we will focus only on dyslexic users. If a few researches have focused on robust information retrieval (described later in section 2), the possibility of using these results in a question answering context has so far not been shown. Statistical questionanwering systems generally answer a question in fours steps. Firstly, the question is analyzed in order to determine its expected answer type (which is a kind of named entity) and to extract its key terms. The named entities are usually organized hierarchically. This involves a inclusive correspondence between two types (for example president name is a corresponding type to person name. The next step is the retrieval of documents according to the key terms usually achieved by a separate search engine. In a third step the most relevant passages of a few sentences are selected according to their density of key terms and the presence of named entity of a type corresponding to the expected answer type. Lastly each named entity corresponding to the expected answer type in the passages, refereed as candidate answer, is scored according to its distance to the keywords.

AND 2008 Singapore, Singapore

Our first contribution in this paper is to propose a probabilistic model for question-answering directly issued from the IR probabilistic model introduced by [23]. This model allows a query in the form of a vector of weighted terms according to a confidence score related to the intention of the user to mean these terms through her noisy query. The next contribution we make here is to propose an approach at the sentence level to extract these vectors from the noisy query. The

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval ]: retrieval models; I.2.7 [Natural Language Processing]: text analysis; H.5.2 [User Interfaces ]: user-centered design

General Terms Human Factors

Keywords robust probabilistic model, dyslexic users, rewriting, questionanswering

1

solution we propose is based on a phonetic representation of the sentence into the FSM framework in order to get best path hypothesis. The vectors are extracted thanks to posterior probability computation. Hence come our third contribution for an evaluation framework of weighted vectors. This evaluation is inspired from the classical recall/precision measures. It has been conduced on 37 questions typed by 8 dyslexic children.

linear combination approaches detailed in [29] and investigated by [10] allows to integrate an orthogonal measure although this has not been studied in this context already.

2.1.3

2.

ROBUST QUESTION ANSWERING APPROACHES 2.1 Robustness by existing systems Some IR applications are merely sensitive to robustness issues because of the task they involve. They usually deal with either input or output issues, but rarely with both issues at the same time.

2.1.1

Uncertainty over IR queries

Many IR systems with typed entries pre-process a spell check before main retrieval steps. They use it to modify (rewrite) the query, usually by replacing unknown words by the first one proposed by the spell checker. This method is likely to introduce new errors if unknown proper nouns have been typed, and sometimes the decision between two alternatives for an erroneous word can be made on a small difference and can be incorrect. Keeping many hypothesis including the initially typed word could avoid this. Some IR systems use multiple hypothesis over the query, such as the spoken query information retrieval system designed by [31]. They based their retrieval system on a vector space representation where the queries are vector of all existing words in the corpus and their associate values are the a posteriori probabilities of each word in the transcription lattice of the query. Trans-lingual IR systems could also benefit from several translation of a single query typed in a different language than documents are. [21] have established a parallel between translated queries and expanded queries because in both cases the new query has a partial correspondence with the initial query. They propose a revised computation of the probability for a document D to be relevant to a query Q according to common expanded terms t0 as shown in Equation 1. X P (R|D, Q) = P (R|D, t0 )P (t0 |Q) (1)

2.2

A QA probabilistic framework

Previous experiments [26] showed that QA systems must adapt to not well formed queries. A potential answer to a question is estimated to be relevant according to two criteria : the topic relevance of each candidate answer and the named entity type this candidate answer. The uncertainty must be kept on both dimensions. A parallel can be drawn on the input special need with spoken query systems and trans-lingual document retrieval. The output special need is similarly taken into account in multimedia documents retrieval (where the orthogonal dimension is a preferred media). The probabilistic model for information retrieval introduced by [23] involves a document unit D, a query Q and the terms they both contains mi and estimates the probability of D being relevant to Q through the similarity between D and Q. From a question-answering point of view, the document units is a candidate named entities (CNE) C in a document D of a type t corresponding to the set of expected answer type T of the query being a question Q. The terms are weighted according to their distance to the CNE C and to their importance in the document D according to a tf.idf score. The probability for C to be a correct answer can be expressed through Equation 2 following the probabilistic model.

t0

2.1.2

Confidence score for document retrieval

This emergent domain has been concretised through a robust track in TREC campaign since 2003. The main issue of this challenge is to find out the useful set of features allowing to predict a priori (only with the query) and a posteriori (thanks to the IR system intermediary scores and thanks to features on the proposed results) whether the proposed results are correct or not. The proposed features based on the query are frequency and combined frequency of words [17], or on an ambiguity rate [5], or on the syntactic complexity of the sentence and polysemia of words[20]. The proposed features based on the retrieved documents are the repartition of query terms in top ranked documents [2]. They are sometimes weighted according to their inverse document frequency [14]. [12] introduced a machine learning based method based on linguistic and statistical features from both query and retrieved documents.

Systems combining relevance sources

An IR system can start to adapt to user needs if considering relevance not only on the topic point of view. In the question-answering perspective, the expected answer type can be seen as a user need. At best, systems propose a redefinition of relevance combining various topic aspects (context, novelty). This limitation is encouraged by evaluation campaigns that do not propose tasks based on some orthogonal user need (based on document type or text linguistic level for example). However a lot of work is done on combining similarity results from various IR approaches in meta-search engines. The most refereed approach CombMNZ [9] is based on normalized scores and on the number of systems retrieving each document. Derived probabilistic approaches such as probFuse [15] model relevant documents. These approaches suppose a common objective for all sub-sytems (which is topic relevance of top ranked documents). Opposite to this,

P (R|QT , Ct∈T ∈ D) ∼ Sim(QT , Ct∈T ∈ D) X = wmi ,D .wmi ,Q

(2)

mi ∈Q∩D

Sim(QT , Ct∈T ∈ D) =

X

wm0i ,D × P (m0 |Q).wm0i ,Q (3)

m0i ∈Q0 ∩D

If we now consider that the actual question has to be processed in order to extract the actual key tems, and if we consider that this process can lead to a new question Q0 containing terms m0 each with a different probability degree

2

P (m0 |Q), the similarity can be expressed through Equation 3.

3.2

The set of expected answer types of the initial question Q should also be expressed according to the probability P (t ∈ T ) for each possible type t to correspond to an actual expected answer type of the question T . The actual models only consider C with a given answer type assumed to fully corresponds to the expected answer type. If this correspondence is expressed through a probability P (t ∈ T ) it can be seen as an orthogonal score to be taken into account when estimating the accuracy of the CNE. This can be integrated with a linear combination following [29] proposal. The probability for a CNE C to be a correct answer is then computed thanks to equation 4.

P (R|QT , Ct∈T ∈ D) ∼

α.P (t ∈ T |Q, D) +β.Sim(QT , Ct∈T ∈ D)

Why do classical spell checkers fail ?

Current spell checkers work on words separately, and the right correction is rarely in the top position in the suggestion list for impaired spellers such as dysorthographics. The low accuracy of available spell-checkers on the writing of dyslexic learners is published in a study by [13]. It demonstrates that dyslexic spellers have specific needs, and highlights issues from identifying “real words” errors to propose a correct assumption. Commercial spell checkers compute suggestion lists for each out of vocabulary word by computing a distance between the written word and each word in the lexicon. The distance is mainly defined by the Levensthein edition distance, and sometimes includes phonetic features. But these systems work on words separately, and the right correction is rarely in the top position in the suggestion list for impaired spellers such as dysorthographics. [13] highlights issues from identifying “real words” errors to propose a correct assumption.

(4)

3.

HOW TO REWRITE WHOLE SENTENCES FOR EXTRACTING TERMS ? 3.1 How dyslexic users type ? There are some similar observations between the way dyslexic users write and the way they type. There are also a few differences, most of them due to the higher concentration and motivation they have in front of a computer. This motivation is also the reason why more blogs written by dyslexics appear every day despite their diffulties. Dysorthographics refers to dyslexic spellers, whose condition interferes in grapho-phonemic correlation. More specifically, a lack of phonological awareness makes them consider a sentence as a continuum of phonemes instead of a sequence of semantic units [11]. This leads to frequent word segmentation errors in written sentences that implies sentence level processing.

Systems dedicated to dyslexics usually avoid the typing constraint and focus on spoken query transcription. Even if requiring less effort from the user, this involves more hardware material while discarding typing habits by encouraging users to use voice instead of text. The first work to computationally model the errors of dyslexic spellers started with [16], who claims that dyslexia involves only worse errors than those of normal spellers. [7] implement for each word an automaton based on confusions learned from a modelling of error causes. This technique supposes isolated and regular errors. [24] also considers mistyping of dyslexic spellers to be worse than the mistyping of regular spellers. He introduces a user specific model which provides results as accurate as commercial spell checkers. This study shows that these systems collapse on such typing. [22] concentrates on the correction and detection of real words errors, using syntactical and semantic context. Error modelling techniques have been proposed. They all consider a correct word segmentation in the sentence. [7] implement for each word an automaton based on confusions learned from a modelling of error causes. This technique supposes isolated and regular errors. [24] also considers mistyping of dyslexic spellers to be worse than the mistyping of regular spellers. He introduces a user specific model which provides results as accurate as commercial spell checkers. This study shows that these systems collapse on such typing. [22] concentrates on the correction and detection of real words errors, using syntactical and semantic context.

The data we use have been collected in the context of an experiment of question answering systems robustness [26]. In this experiment, 9 children have been required to type question corresponding to a oraly given constraint. The experiment has been lead by an orthophonist during remediation sessions. The main observation about their behaviour is the enthousiasm they show in front of a computer. Following this enthousiasm comes a better concentration compared to the one observed by teachers for manual writing tasks. The computer also provide a very convenient way for the children to correct the misspell they see when reading back their sentences. This leads to a quasi absence of inversion of letters or syllables that are known to typically appear in dyslexics manuscripts. However, the wrong choices of graphemes for a same phoneme and the wrong segmentation of words remains. The wrong choice of phoneme can sometimes lead to a real word error when the replacement conduces to another existing word with a completely different meaning. In ´ our data the word maire (mayor) has been replaced by mRre (mother) which is pronounced in French the exact same way. Some words are typed with more than one wrong choice of graphemes such as koman instead of comment. The wrong segmentation of words consists generally in agglutinating a couple of words that shouldn’t be or separate one word into two non existing words. It can also be more complicated as for example in la Bepierre instead of l’abbe Pierre.

A sentence level rewriting system mainly based on phonetics can provide for the detection of both word segmentation and real word errors. Automatic speech recognition systems answer this disambiguation issue with language models.

3.3 3.3.1

Whole sentence based approach Phonetic interpretation

The phonological route can be simulated by phonetisation and transcription tools sequencially applied on the initial text. Automatic speech recognition tools are designed to process a continuous stream of phonemes without word segmentation since audio signal provides any. This is a solution to the erroneous word segmentation issue.

3

The phonetisation is made with LIA phon [3] tool. This phonetiser combines a phonetic lexicon (for words in vocabulary) and rules (for words out of vocabulary) that are robust for misspelled words. This step transforms a letter sequence in a phoneme sequence (linear graph).

INITIAL QUESTION Aspell Spell Checker Words Hypothesis Lattice

The phonetiser implements academic rules of how the words must be well pronounced, but in fact many people mispronounce some vowels in French, confusing open and close or short and long vowels. This pronunciation confusion also reflects on writing confusions (like living instead of leaving). That is why alternate phonetic hypothesis must be generated on the basis of a confusion matrix. This step finally provides a lattice of phonetic hypothesis.

LIA_Phon Phonems Lattice Phonetic Lexicon Transducer

According to finite state machine transcription work [18] and with the help of AT&T FSM toolkit [19], the phonetic lattice is encoded in a finite state automata (FSA) and composed with a language model automata learned on a journalistic corpus [1] and a phonetic lexicon transducer. The N-best path in this graph are the possible rewritings of the question.

HYPOTHESISED QUESTIONS

Figure 1: Grapho-phonemic processing of a question.

This method is accurate on parts of the sentence where no mistyping involved phonemes addition or omission. Transcriptions are weighted with the sum of transition cost in the corresponding best path in the language model, which provides a clue on linguistic coherence of the sentence.

Spell checkers hypothesis

kil/0.100

Some mistyping errors such as omissions or substitution may be done also by dyslexic users, and they cannot be processed by the phonetic interpretation which produces incoherent sentences when they appear. A way of avoiding this low misspellings compromising phonetic lattice is to add graphemic hypothesis on isolated words. These hypothesis can be obtained thanks to a classical spelling corrector which will increase the robustness of the method.

je/0.100 ‰ge/0.100

2

a/0

3

la/0

4

‰gŽ/0.100

ƒpierre/0.100 ƒpierrŽ/0.100

5/0

N'Žpierre/0.100

expl_abbe0.auto.fsa

tion as an hypothesis. Hypothesised words H provided by spell checker might be assign a graphic weight : W g(H) = f (d(H, I))

(5)

where f is a normalisation function of distance d(H, I) between the hypothesised and the initially written word. The distance can be a score provided by the spell checker. Phonetic alternatives H obtained from a confusion matrix might also be assigned transition costs on their alternative path : W p(H) = g(m(H, I))

(6)

where g is a normalisation function of confusion score m(H, I) between the alternate and the initial phoneme (obtained from the confusion matrix).

Combination

In order to take advantage on both sentence-level processing of the phonetic interpretation and the graphemic alternatives based on written words, the combination system uses graphemic hypothesis for the construction of an enhanced phonetic lattice. Figure 1 illustrates the whole process.

Sentence quel ˆ age a l’ abbé pierre quel ˆ age ` a l’ abbé pierres quel ˆ age alla et pierre

Cost 45,7884293 46,495369 48,4406662

Table 1: Top 3 rewritings of typed sentence : kel aje a la Bepierre

All alternative hypothesis, graphemic or phonemic, must be associated to a path cost in order to privilege initial path. Indeed, in order to avoid the production of new errors (on proper nouns essentially), the system must primarily suppose that the user writes well and keep the initial produc1

1

Figure 2: Lattice hypothesised word corrections for kel aje a la Bepierre

[28] introduces an approach combining phone and letter models on words. This approach is based on noisy channel approach [4], and shows improvements. The GNU project Aspell spell checker 1 implements both a phonetic and a Levenshtein edition distance. Its evaluation results, specially on badspellers mode, have been shown higher than classical spell checkers. This approach seems the most appropriate for dyslexics as phonetic errors are more frequent than inversion or substitutions. However, it supposes an accurate word segmentation, and can not recover ”real word” errors.

3.3.3

bel/0.100 gel/0.100

Bepierre/0

aje/0

kel/0

0

3.3.2

Language Model Lattice

FSM compose

Consider for example the typed sentence kel aje a labepier? (nearly equivalent to O auldiz abowpiair ? ) instead of Quel age a l’abbé Pierre ?, which means How old is abbot Pierre ˆ

http://aspell.sourceforge.net/

4

8

40

aa/0 ee/0 jj/1.400

6

10 ll/0

ei/1.400

7

ll/0

ei/0

ai/1.299 ei/1.299

37

3

jj/0

49

ei/1.5

36

ai/0

jj/1.400

6

ei/0.099

jj/1.299

ll/0

53 3

55

ai/0

kk/0

1

ei/0.099

jj/0

60

ii/1.400

kk/0

61

ai/1.400

aa/0

4

ll/0

ee/0 jj/1.299

53

9

aa/0

1

56

jj/1.400

2

ei/0 ai/0

ll/0

94

aa/0

ll/0

12

11

57

96

jj/0

ll/0

58

ll/0

aa/0 33

bb/0

56

59

97

aa/0

ei/0 ai/0

93

ll/0

aa/0

94

96

64

aa/0

jj/1 66

jj/0

ei/0

yy/0

15

nn/0.400 pp/0

23

ei/0.099

aa/0

ai/1.299 68 ee/0.099 ei/1.400

95

ll/0

aa/0

85

ei/0 ai/1.200 70

aa/0 90

aa/0

71

72

ei/1

97

bb/0

aa/0 87

ll/0 91

ei/0.099

59

ai/0.099

yy/0

rr/0

84

83

yy/0

24

ei/1.099

76

73

ai/0.200 ei/0.099

21/0

ai/0 25

rr/0

aa/0 27/0

33

ai/0.299

34

ei/0.5 81/0

79

82

15

ei/0.299 28/0

26

ai/0.099

ll/0

ai/1

ai/0

ei/0

ei/0.200

ei/0.200 yy/0

aa/0

pp/0

nn/0.599

ai/0.799

rr/0 ei/0

32 nn/0.200

74

16

20

30

77

ai/0.099

78

rr/0

80/0.099

75

98

ll/0

pp/0

aa/0

99

aa/0

bb/0

ei/1.099 ai/0.900

aa/0

52/0

ei/0

ai/0

ai/0.799

ll/0 86

89

pp/0 29

bb/0

ai/0.700

69

ei/0.900 aa/0

31

rr/0

jj/1.099 65

58

ai/0

ai/0

ai/1

bb/0

ei/0.099

ei/0

ei/0 aa/0

47

ai/0.200

ll/0

101 67

ai/0

ai/0.099

ai/0.799

bb/0 88

ll/0

57

aa/0 aa/0

ei/1.299

5

rr/0

19

ei/0

pp/0

ei/0.900

ee/0.099

12

bb/0

pp/0 98

99

jj/0

ll/0

yy/0

nn/0.200

22 34

100

2

16

ee/0.099 ei/0 ai/1.200

aa/0

jj/1.400

aa/0

101

95

ll/0

51

yy/0

ai/0.900

ei/0

aa/0

32

jj/1 5

ei/0

11

ei/1

aa/0

93

aa/0

13

bb/1.400

bb/1.400

9

ei/1.099

ii/1.400

50

18

ai/1.099

aa/0

ai/0.900

pp/0

ei/1.200

ei/0.900 aa/0 aa/0

17

ee/0

46

54

54 ei/0

0

0

ei/0 aa/0

13

aa/0 pp/0

14

bb/0 40

ll/0

ei/1.400

7

ei/1

yy/0

ei/0 45

ai/0 bb/0

aa/0

ee/0

62

10

ei/0.099

bb/0

bb/0

ai/1.400 39

ai/1.5

ee/0

ll/0

ll/0

8 aa/0

4

aa/0

aa/0

38

jj/0

aa/0

63

60

35

ee/0 43

ai/0

ei/0

aa/0

55

bb/0 ai/1.200

48

61

nn/0.200

ei/1.200

42

41

ai/1.099

pp/0 44

aa/0 aa/0 ll/0

ei/1.200

62

100

bb/0

72

ei/1.200 92

ll/0

ai/1

ei/0.900

expl_abbe0.auto.fsa.hyp.phauto.min.fsa

ee/0.099

Figure 3: Lattice of hypothesised phonemes for kel aje a la Bepierre

ai/0.700

ei/1.299

?. Figure 2 illustrates the graphemic hypothesis on words aa/0 64 66 produced by Aspell encoded in a finite state automata where transition are the words and their associated cost. Figure jj/1.099 3 is the phonetic lattice resulting from the phonetisation of all sentences acceptable by the preceding automata. Table 1 contains the top 3 results of understanding obtained when composing preceding lattice with a phonemes-to-words transducer and a language model. The two first hypothesis are partially correct while the third one is the correct sentence, on both sense and syntactic levels.

bb/0

jj/0

In the following experiments the cost normalisation functions for alternative hypothesis are empirically set to :  0 if H = I f (d(H, I)) = (7) 0.1 if H 6= I  0 if H = I g(m(H, I)) = (8) 0.1 if H 6= I

The immediately following issue is how to use use these hypothesis provided by the n-best paths in the composed lattice for information retrieval systems and more specifically question-answering systems. As an obvious solution we could just put the n first rewritings in the system and affect various confidence score to the outputs according to the cost of the used hypothesis and also to the quality of the proposed answer as proposed in [27] following approaches introduced in section 2.1.3. But on some occasions it appears that the correct rewriting is only visible when we merge several hypothesis as different partially correct rewritings appears in different hypothesis. In these cases no hypothesis is fully correct and every answers proposed by the system are likely to be wrong.

The confusion phonemes matrix is restricted to confusion between open and closed vowels, and the graphemic alternatives are the three first hypothesis provided by Aspell in badspellers mode.

3.4

by dyslexic children [25]. The word error rate of best and aa/0 88 ll/0 3-best hypothesis by measuring the min-ei/1 67have been69obtained aa/0 ei/0 70 imal edition distance between any hypothesis and 71 the corai/0.799 ai/1.299 Because it has been computed on lemmatized rect sentence. 65 68 aa/0 is called lems error rate. The resentences the measure bb/0 ee/0.099 85 thell/0 sults have been compared to error rate of initially typed ei/1.400 86 aa/0 sentence and to the error rate of hypothesis obtained with Aspell spell checker only. The combined system based on ei/1.099 aa/0 87 89 an FSM framework shows 90a good ll/0 accuracy by decreasing the lems error rate from 51% to 20%91 when consideringai/0.900 3 aa/0 best hypothesis (while Aspell remains at a 31% lem error ei/1.200 rate). 5.4% of the typed sentences was initially fully correct after lemmatisation. The first hypothesis provided92by Aspell increased this score at 13.5% while our system reaches 43.2% when we consider only the first hypothesis. A furtherai/1 evaluation shows that the system performs better for some expl_abbe0.auto.fsa.hyp.phauto.min.fsa individuals and is quite inefficient for other ones.

Extraction of correct sentences

The final lattice is obtained from the lattice of phonetic hypothesis and from the acceptor of the language model and from the transducer of the phonetic lexicon. Each of these automata provide transition costs that are summed to assign each possible path a cost. The final lattice is then a lattice of possible words with weighted transition links. The three most probable hypothesis according to these cost (usually called 3−best paths) from the previous example corresponding to figure 3 are shown in Table 1. For this particular example the first proposed hypothesis is the correct rewriting of the sentence.

The merged graph can be used to extract posterior probabilities of each word appearing in the graph. This can be achieved for each word by computing the ratio between the cost of all paths using this words and the cost of all paths. We consider the cost of several paths as the sum of the cost of these paths. Having a probability score for each possible

This approach have been evaluated on our 37 sentences typed

5

74

73

word we can generate a weighted vector of key terms of the question or the query. Most information retrieval systems are today able to be adapted in order to take weighted keywords as an input and the model we previously introduced in section 2 can handle such input for a question-answering task.

P p(m, Qpond ) m∈Q dens(Qpond , Qcorrect ) = P correct m∈Qpond p(m, Qpond ) P m∈Qcorrect p(m, Qpond ) pres(Qpond , Qcorrect ) = |m ∈ Qcorrect |

EVALUATION OF PROBABILITY VECTORS

Appearance of initial terms ininitiaux the vector indice derating présence deskey mots-clés (recall) (rappel)

4.

according to Equation 10.

In the particular context of information retrieval we usually focus on lemmas instead of terms. This leads to a tolerant evaluation regarding grammatical issues. This consideration is mostly useful in French rather than in English because lots of grammatical markers are never pronounced, which leads to several hypothesis only differentiated by grammatical flexions. Following these specific needs, we conduced an evaluation specific to the information retrieval task instead of a classical spell checking evaluation. The evaluation has been done for both considered outputs of the system. In this evaluation we compare for each sentence the weighted vector obtained with posterior probabilities with vectors extracted from correct sentence. The reference for correct sentence has been obtained manually. The focus of the typed question being known there were no remaining ambiguity in the choice of correct words. We still made a cross validation by two different persons to be sure there is an agreement on the correct form of intended queries.

a ˆge 0.98

a ` 0.98

et 0.06

la 0.06

abbé 0.94

pierre 0.64

(10)

1,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00 0,00

0,20

0,40

0,60

0,80

1,00

Densitéofdes mots-clés dansinlathe requête Density correct key terms vector(précision) (precision)

Figure 4: Evaluation of each vector of weighted terms according its density and its appearance rating The graph on Figure 4 shows the measures of appearance rating and density for the vectors of weighted terms obtained from each of the questions typed by the dyslexic children. Exception made of a couple of vectors with very low indices, most of them reaches at least 0.5 for both measures. More than one third of them have a density higher than 0.8. We hypothesize that when the appearance rating is high a density value over .5 is good enough as we hope that the cohesion between the words from the query inside documents will support the disambiguation of added terms. The results showing medium rates for both measures should be correlated with some IR results to find out how we can consider their quality.

The evaluation of vectors of weighted terms has to take into consideration both correct terms, wrong terms and missing terms. Table 2 provides an example of a vector of weighted terms for the previously studied example sentence. Even if these vectors have the same form as vectors of expanded queries, it is clear that wrong words inserted have an immediate negative impact on the query. This impact should be evaluated according to their weight in the vector. So far we are only interested in evaluating these vectors as themselves instead of evaluating them in the context of an information retrieval task where many other parameters such as the stability of the used system can interfere with results. quel 0.98

(9)

pierres 0.29

5.

CONCLUSION AND PERSPECTIVES

We proposed in this paper a full way of dealing with questions typed by dyslexic users, from the basis of the model to the technical solutions. The evaluation has been conduced with measures that integrate the uncertainty level of the extracted terms on a set of questions typed by dyslexic children. The approach we use for rewriting the question has previously show some good improvements upon the use of a single spell checker when considering its first proposed sentences. We show here that the outputs of the system can be used altogether in order to form vectors that contains most of the words intended by the dyslexic spellers. These results are really promising even though they have to be confirmed in the complete processing from the question to the answers proposed by a system implementing our model. This will be achieved thanks to the new open-source question-answering system OpenEphyra (http://www.ephyra.info/).

Table 2: Vector of weighted terms corresponding to the sentence kel aje a la B´ epierre We propose to evaluate the quality of the vectors of weighted term according to two criteria inspired by precision and recall. The density of correct non empty lemmas Qcorrect in the query vector Qpond is a measure related to precision. The weight of a term w in the vector of weighted terms p(m, Qpond ) is defined according to its posterior probability. This probability is null if the word does not belong to the vector. The density measures according to Equation 9 the ratio between the sum of weights of correct terms in the weighted vector and the sum of weights of all the terms it contains. The appearance rating is a second measure related to recall. This measure is meant to estimate how high are the correct lemmas relayed into the vector of weighted terms. The appearance rating pres(Qpond , Qcorrect ) of correct key words in the vector of weighted terms is computed

The assumption that the correct words tend to be grouped in same documents has to be verified. If it shows good results we will be able to modify the probabilities of the vector

6

with a second a posteriori phase taking into account the correlation between the words. This can be achieved thanks to semantic space models such as Latent Semantic Analysis [6]. The rewriting system works much better for some children thant for some others. This leads us to think that its better adapted to a particular form or dyslexia. It could be used as a diagnostic system to estimate the gravity of the pathology and may be to track or detect potential dyslexic users on forums on the Internet.

ACM Press. [13] A. James and E. Draffan. The accuracy of electronic spell checkers for dyslexic learners. PATOSS bulletin, August 2004. [14] K.L.Kwok. An attempt to identify weakest and strongest queries. In Actes de SIGIR’05, Salvador, 2005. ACM Press. [15] D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, 2006. ACM Press. [16] R. P. W. Loosemore. A neural net model of normal and dyslexic spelling. In International Joint Conference on Neural Networks, volume 2, pages 231–236, Seattle, USA, 1991. [17] C. D. Loupy and P. Bellot. Evaluation of document retrieval systems and query difficulty. In Actes du. LREC’2000 Satellite Workshop ”Using Evaluation within HLT Programs : Results and trends”, pages 31–38, Athènes, 2000. [18] M. Mohri, F. C. N. Pereira, and M. Riley. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16(1):69–88, 2002. [19] M. Mohri, F. C. N. Pereira, and M. D. Riley. At&t fsm librarytm – finite-state machine library, 1997. [20] J. Mothe and L. Tanguy. Linguistic features to predict query difficulty - a case study on previous trec campaigns. In Actes de SIGIR’05, pages 7–10, Salvador, 2005. ACM Press. [21] J.-Y. Nie. Clir as query expansion as logical inference. Technology letters, 4(1):69–76, 2000. [22] J. Pedler. The detection and correction of real-word spelling errors in dyslexic text. In Proceedings of the 4th Annual CLUK Colloquium, 2001. [23] S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter. Probabilistic models of indexing and searching. In 3rd annual ACM conference on Research and development in information retrieval, pages 35–36, Cambridge, England, 1980. [24] Roger. A spelling checker for dyslexic users : user modelling for error recovery. PhD thesis, Human Computer Interaction Group, Department of Computer Science, University of York, Heslington, York September 1998. ” [25] L. Sitbon, P. Bellot, and P. Blache. Phonetic based sentence level rewriting of questions typed by dyslexic spellers in an information retrieval context. In Proceedings of Interspeech 2007, Antwerp, Belgium, September 2007. [26] L. Sitbon, P. Bellot, and P. Blache. A corpus of real-life questions for evaluating robustness of qa systems. In Proceedings of the 6th edition of the Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco, May 2008. [27] L. Sitbon, L. Gillard, J. Grivolla, P. Bellot, and P. Blache. Vers une prédiction automatique de la difficulté d’une question en langue naturelle. In 13ième conférence Traitement Automatique des Langues Naturelles (TALN), pages 337–346, Louvain, Belgique, 10-13 Avril 2006.

The model we propose can also be extended to allow the integration of uncertainty on the document side. This could be used in order to get information from forums and blogs. A few adaptations could be needed in order to be able to extract sentences hypothesis from texts written in SMS language. The main specificity of these language are that the letters and numbers can be pronounced as themselves to obtain the right phonology of the sentence (as in How R U ? or Im l8t). There can also be some missing spaces between the typed words [8].

6.

REFERENCES

[1] C. Allauzen and M. Mohri. The design principles and algorithms of a weighted grammar library. International Journal of Foundations of Computer Science, 16(3):403–421, 2005. [2] G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Actes de ECIR’04, Lecture Notes in Computer Science, pages 127–137, Sunerland, 2004. Springer. [3] F. Bechet. Lia phon - un systeme complet de phonetisation de textes. Traitement Automatique des Langues (T.A.L.), 42(1), 2001. [4] E. Brill and R. C. Moore. An improved error model for noisy channel spelling correction. In Proceedings of the 38th Annual Meeting of the ACL, pages 286–293, 2000. [5] S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Actes de SIGIR’02, pages 299–306. ACM, August 2002. [6] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407, 1990. [7] S. Deorowicz and M. G. Ciura. Correcting spelling errors by modelling their causes. International journal of applied mathematics and computer science, 15(2):275–285, 2005. [8] C. Fairon and S. Paumier. A translated corpus of 30,000 french sms. In In Proceeding of LREC 2006, Genoa, Italy, May 2006. [9] E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2), pages 243–252, 1994. [10] J. Gao, H. Qi, X. Xia, and J.-Y. Nie. Linear discriminant model for information retrieval. In Proceedings of SIGIR’05, pages 290–297, 2005. [11] G. T. Gillon. Phonological Awareness- From Research to Practice. Guilford Press, 2004. [12] J. Grivolla, P. Jourlin, and R. D. Mori. Automatic classification of queries by expected retrieval performance. In Actes de SIGIR’05, Salvador, 2005.

7

[28] K. Toutanova and R. C. Moore. Pronunciation modeling for improved spelling correction. In Proceedings of the 40th annual meeting of ACL, pages 144–151, Philadelphia, July 2002. [29] C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151–173, 1999. [30] E. M. Voorhees and D. Harman. Overview of the eighth text retrieval conference (trec-8). In proceedings of the eighth Text REtrieval Conference, pages 1–24, Gaithersburg, Maryland, USA, November 1999. [31] P. Wolf and B. Raj. The merl spokenquery information retrieval system. In IEEE International Conference on Multimedia and Expo (ICME), volume 2, pages 317–320, Aoˆ ut 2002.

8