Statistical Methods for Machine Translation - CiteSeerX

2 downloads 0 Views 128KB Size Report
May 30, 2000 - am vierten Dezember ,. Wir sollten um zehn Uhr morgens losfahren , damit wir um zwцlf Uhr in Hannover sind . wir sollten um zehn Uhr ...
Statistical Methods for Machine Translation Stephan Vogel, Franz Josef Och, S. Nießen, H. Sawaf, C. Tillmann, Hermann Ney Lehrstuhl f¨ur Informatik VI, RWTH Aachen, Germany May 30, 2000

Abstract. In this article we describe the statistical approach to machine translation as implemented in the it stattrans-module of the VERBMOBIL system.

1 Introduction In this paper, we describe the present status of the machine translation approach developed at RWTH Aachen and report experimental results obtained for the Verbmobil task. The ultimate goal of this task is spontaneous speech translation as opposed to text translation. The experimental tests are reported for both text and speech input. There are a couple of characteristic features addressed in our system and in this paper: – In the Verbmobil task, the translation direction from German to English poses special problems due to the big difference in the word order of the German and English verb groups. In addition, there are the word compounds in the German language like Gesch¨aftsreise for business trip that require refined alignment models. – In addition, the bilingual corpus is a transcription of spontaneously spoken sentences. Thus, it exhibits the typical phenomena of spontaneous speech, such as high variability of the syntactic structures and hesitations. – We use a comparatively small amount of bilingual training data, namely about 500 000 running words for a vocabulary of 10000 words (in the source language).

2 The Statistical Approach to Translation 2.1 Principle The goal is the translation of a text given in some source language into a target language. We are given a source string f1J f1 :::fj :::fJ , which is to be translated into a target string eI1 e1 :::ei :::eI : In this paper, the term word always refers to a

=

=

Source Language Text

Transformation J

f1

J

maximize Pr( e 1I )

I

Pr(f 1 | e 1 )

Global Search: J

Lexicon Model

Alignment Model

I

Pr(f 1 | e 1 )

over e 1I

I

Pr( e 1 )

Language Model

Transformation

Target Language Text

Figure 1. Architecture of the translation approach based on Bayes decision rule.

full-form word. Among all possible target strings, we will choose the string with the highest probability which is given by Bayes’ decision rule Brown et al., 1993:

^ = argmax fP r(eI1 jf1J )g e1 = argmax fP r(eI1 )  P r(f1J jeI1 )g e

eI1

I

( )

(1)

I

:

(2)

1

( j )

P r eI1 is the language model (LM) of the target language, whereas P r f1J eI1 is

the string translation model. The argmax operation denotes the search problem, i.e. the generation of the output sentence in the target language. The overall architecture of the statistical translation approach is summarized in Figure 1. In general, as shown in this figure, there may be additional transformations to make the translation task simpler for the algorithm. The transformations may range from the categorization of single words and word groups to more complex preprocessing steps that require some parsing of the source string. We have to keep in mind that in the search procedure both the language and the translation model are applied after the text transformation steps. However, to keep the notation simple, we will not make this explicit distinction in the subsequent exposition. 2.2 Basic Alignment Models

(

)

A key issue in modeling the string translation probability P r f1J jeI1 is the question of how we define the correspondence between the words of the target sentence and 2

ja ich denke wenn wir das hinkriegen an beiden Tagen acht Uhr

days both on eight at it make can we if think I well

Figure 2. Manual alignment.

the words of the source sentence. In typical cases, we can assume a sort of pairwise dependence by considering all word pairs fj ; ei for a given sentence pair f1J eI1 . Here, we will further constrain this model by assigning each source word to exactly one target word. Later, this requirement will be relaxed. Models describing these types of dependencies are referred to as alignment models (Brown et al., 1993, Dagan et al., 1993, Kay and R¨oscheisen, 1993, Vogel et al., 1996). When aligning the words in parallel texts (for Indo-European language pairs like Spanish-English, French-English, Italian-German,...), we typically observe a strong localization effect. Figure 2 illustrates this effect for the language pair German-toEnglish. In many cases, although not always, there is an even stronger restriction: over large portions of the source string, the alignment is monotone. To arrive at a quantitative specification, we first define the

(

alignment mapping: j

)

[ ; ℄

! i = aj

;

=

which assigns a word fj in position j to a word ei in position i aj . The concept of these alignments is similar to the alignments introduced by Brown et al. (1993). By looking at such alignments, it is evident that the mathematical model should try to capture the strong dependence of aj on the preceding alignment. Therefore, for our ultimate model, the probability of alignment aj for position j should have a dependence on the previous alignment position aj 1 :

( j

p aj a j

1 ; I; J ) 3

:

We can rewrite the probability by introducing the ‘hidden’ alignments aJ1 for each sentence pair f1J eI1 :

[ ; ℄ X P r(f J ; aJ jeI ) P r(f1J jeI1 )= p(J jI )  1 1 1

:= a1 :::aj :::aJ

aJ1

= p(J jI ) 

X YJ P r(f ; a jf j

= p(J jI ) 

X Y p(f ; a ja

= p(J jI ) 

X Y[p(a ja

j j 1

aJ1 j=1 J

1 ; aj 1 ; eI ) 1 1

j j j 1 ; eI1 )

aJ1 j=1 J

j j 1 ; I; J )  p(fj jeaj )℄ ;

aJ1 j=1

( )

where we have included a sentence length probability p J jI . In the last two equations, the dependence has been confined to a first-order dependence. Putting everything together, we have the following ingredients:

( )

– the sentence length probability: p J jI , which is included here for completeness, but can be omitted without loss of performance; – the lexicon probability: p f je ; – the alignment probability: p aj jaj 1 ; I; J , which here has been chosen as firstorder model.

( ) (

)

(

)

Rather than a first-order dependence, we can also use a zero-order model p aj jj; I; J , where there is only a dependence on the absolute position index j of the source string. For this zero-order model, it can be shown (Brown et al., 1993) that we have the following identity:

( j ) = p(J jI ) 

P r f1J eI1

= p(J jI ) 

X YJ [p(a jj; I; J )  p(f je a1 j=1 J I J

j

j aj )℄

Y X[p(ijj; I; J )  p(f je )℄ :

j=1 i=1

j i

The sum in the last equation can be interpreted as a mixture-type distribution with mixture weights p ijj; I; J and with component distributions p fj jei , that model the pairwise dependencies between fj and ei . Except for the missing “empty word”, this model is identical to so-called Model IBM-2 (Brown et al., 1993). Assuming a uniform alignment probability

(

)

(

(j

p i j; I; J

4

) = I1

)

we arrive at the so-called IBM-1 model (Brown et al., 1993). The attractive property of the IBM-1 model is that, for maximum likelihood training (Brown et al., 1993), there is only one optimum and therefore the EM algorithm (Baum, 1972) always finds the global optimum.

3 Alignment Template Approach A general deficiency of the baseline alignment models is that they are only able to model correspondences between single words. Therefore, we will now consider whole phrases rather than single words as the basis for the alignment models. In other words, a whole group of adjacent words in the source sentence may be aligned with a whole group of adjacent words in the target language. As a result the context of words has a greater influence and the changes in word order from source to target language can be learned explicitly. 3.1 The word level alignment: alignment templates The key element of the extended translation model are the alignment templates. An alignment template z is a triple F ; E; A which describes the alignment A between a source class sequence F and a target class sequence E . The use of classes instead of words themselves has the advantage of a better generalization. If there exist classes in source and target language which contain all towns it is possible that an alignment template learned using a special town can be generalized to all towns. The classes used in F and E are automatically trained bilingual classes using the method described in (Och, 1999) and constitute a partition of the vocabulary of the vocabulary of source and target language. The class functions F and E map words to their classes. The alignment F is represented as a matrix with binary values. A matrix element with value 1 means that the words at the corresponding positions are aligned and the value 0 means that the words are not aligned. If a source word is not aligned to a target word then it is aligned to the empty word e0 which shall be at the imaginary position i . F ; E; A is applicable to a sequence of source An alignment template z words f if the alignment template classes and the classes of the source words are equal: F f F . The application of the alignment template z constrains the target E. words e to correspond to the target class sequence: E e The application of an alignment template does not determine the target words, but only constrains them. For the selection of words from classes we use a statistical model for p ejz; f based on the lexicon probabilities of a statistical lexicon p f je . We assume a mixture alignment between the source and target language words constrained by the alignment matrix A:

~

~

( ~ ~ ~)

~

~

~

~

~

~

( )

=0

( ~) = ~

= ( ~ ~ ~)

(~) = ~

(~ ~)

~

Y ( ~j( ~ ~ ~) ~) = Æ(E (~e); E~ )Æ(F (f~); F~ )  p(fj jA;~ e~) I

p f F ; E; A ; e

j=1

5

(3)

~ e~) = X p(ijj; A~)  p(fj jei ) p(fj jA; i=0 ~) = PA~(~i; j ) : p(iji; A i A(i; j ) I

(4) (5)

3.2 The phrase level alignment In order to describe the phrase level alignments in a formal way, we first decompose both the source sentence f1J and the target sentence eI1 into a sequence of phrases (k ; : : : ; K ):

=1

= f~1K ; eI1 = e~K 1 ;

~= e~k =

f1J

fk

+1 ; :::; fjk eik 1 +1 ; :::; eik fjk

1

:

In order to simplify the notation and the presentation, we ignore the fact that there can be a large number of possible segmentations and assume that there is only one segmentation. In the previous section, we have described the alignment within the K phrases. For the alignment aK 1 between the source phrases f1 and the target phrases K e1 , we obtain the following equation:

~

~

~

( j ) = PXr(f~1K je~K1 ) = P r(~aK1 ; f~1K je~K1 )

P r f1J eI1

a~ X = P r(~aK je~K )  P r(f~K ja~K ; e~K ) K

1

1 1

a~K1

=

1

1

X YK p(~a ja~k a~K1 k=1

1

k 1 1 ; K )  p(f~k je~a~k )

:

For the phrase level alignment we use a first-order alignment model p ak ja1k 1 ; K p ak jak 1 ; K which is in addition constrained to be a permutation of the K phrases. For the translation of one phrase, we introduce the alignment template as an unknown variable: p f je (6) p z je  p f jz; e

(~ ~

(~ ~

)

( ~ ~) =

( ~)

X z

)=

( ~) ( ~ ~)

The probability p z je to apply an alignment template gets estimated by relative frequencies (see next section). The probability p f jz; e is decomposed by Equation (3).

( ~ ~)

3.3 Training The training of the alignment templates requires the following steps: First, we train two word-based alignment models for the two translation directions f ! e and e ! 6

f by applying the EM-algorithm. In this step, any of the basic alignment models can be used. For each translation direction we calculate the Viterbi-alignment of the translation models determined in the previous step. Thus we get two alignment vectors aJ1 and bI1 for each sentence pair. We increase the quality of the alignments by combining the two alignment vecf aj ; j jj tors into one alignment matrix using the following method. A1 : : : J g and A2 f i; bi ji : : : I g denote the set of links in the two Viterbialignments. In a first step the intersection A A1 \ A2 is determined. The elements within A are justified by both Viterbi-alignments and are therefore very reliable. We now extend the alignment A iteratively by adding links i; j occurring only in A1 or in A2 if they have a neighboring link already in A or if neither the word fj nor the word ei are aligned in A. The alignment i; j has the neighboring links i ;j , i; j , i ; j , and i; j . This enhanced alignment is now used to obtain the parameters required for the alignment template approach. The bilingual word lexicon p f je is estimated by the relative frequencies of the alignment determined in the previous step:

1

= (

= ( ) =1

) =

=

( )

(

1) ( + 1 )

( + 1)

( )

( 1 )

( )

( j ) = nAn((f;e)e)

p f e

( )

(7)

()

Here nA f; e is the frequency that the word e is aligned to f and n e is the frequency of e in the training corpus. We determine correlated bilingual word classes for source and target language by using the method described in Och (1999). The basic idea of this method is to apply a maximum-likelihood approach to the joint probability of the parallel training corpus. The resulting optimization criterion for the bilingual word classes is similar to the one used in monolingual maximum-likelihood word clustering. Finally, we have to collect the alignment templates. To do so, we count all phrase-pairs of the training corpus which are consistent with the enhanced alignment matrix. A phrase-pair is consistent with the alignment if the words within the source phrase are only aligned to words within the target phrase. Thus we obtain a count n z of how often an alignment template occurred in the aligned training corpus. The probability of using an alignment template needed by Equation (6) is estimated by relative frequency:

()

~ E (~e)) ~ A~)je~) = n(z)  Æ(E; ( = (F~; E; n(E (~ e))

p z

(8)

Figure 3 shows some of the extracted alignment templates. The extraction algorithm does not perform a selection of good or bad alignment templates - it simply extracts all possible alignment templates. Actually, only the maximal alignment templates are shown. Other, smaller templates extracted from that alignment include: ‘wie sieht es’ - ‘how about’, ‘am neunzehnten’ – ‘the nineteenth’, and ‘nachmittags’ – ‘in the afternoon’. 7

okay , wie sieht es am neunzehnten aus , vielleicht um zwei Uhr nachmittags ?

? afternoon the in o’clock two , maybe at nineteenth the about how , okay

Figure 3. Example of a word alignment and some learned alignment templates.

3.4 Search For decoding we use the following search criterion:

argmax fp(eI1 )  p(eI1 jf1J )g e I

(9)

1

This decision rule is an approximation to Equation (1) which would use the translation probability p f1J jeI1 . Using the simplification it is easy to integrate translation and language model in the search process as both models predict target words. As experiments have shown this simplification does not affect the quality of translation results. To allow the influence of long contexts we use a class-based five-gram language model with backing-off. In Figure 4 the decisions taken during the search process are shown. First, the source sentence words are mapped into their word classes. Those alignment templates matching part of the word class sequence are selected. Reordings of the alignment templates are possible to allow for global word reordering. The alignment templates generate a sequence of target word classes. In the final step the actual word sequence is generated. During this step the target language model and the lexicon probabilities are taken into account to score the translation hypothesis. In Figure 4 $ denotes the sentence start/end marker. In search we produce partial hypotheses, each of which contains the following information:

(

)

8

f

f

1

f

2

f

4

f

5



1



2



3



z

1

z

2

z

3

z





1

$

f

3

e

1

e

2

4



4

3

e

e

3

4

7

4



2

f

6

e

5

e

6

$

Figure 4. Decisions during search process.

1. 2. 3. 4.

the last target word produced, the state of the language model (the classes of the last four target words), a bit-vector representing the already covered positions of the source sentence, a reference to the alignment template instantiation which produced the last target word, 5. the position of the last target word in the alignment template instantiation, 6. the accumulated costs (the negative logarithm of the probabilities) of all previous decisions, 7. a reference to the previous partial hypothesis. A partial hypothesis is extended by appending one target word. The set of all partial hypotheses can be structured as a graph with a source node representing the sentence start, leaf nodes representing full translations and intermediate nodes representing partial hypotheses. We recombine partial hypotheses which cannot be distinguished by neither language model nor translation model. When the elements 1 - 5 of two partial hypotheses do not allow to distinguish between two hypotheses it is possible to drop the hypothesis with higher costs for the subsequent search process. We also use beam-search in order to handle the huge search space. We compare in beam-search hypotheses which cover different parts of the input sentence. This makes the comparison of the costs somewhat problematic. Therefore we integrate an (optimistic) estimation of the remaining costs to arrive at a full translation. This can be done efficiently by determining in advance for each word in the source language sentence a lower bound for the costs of the translation of this word. Together with the bit-vector stored in a partial hypothesis it is possible to achieve an efficient estimation of the remaining costs. 9

4

System Integration

The statistical approach to machine translation is embodied in the stattrans-module which is integrated into the Verbmobil system. The implementation allows for translating from German to English and from English to German. In normal processing mode the stattrans-module gets in input from the repair-module. At that time the word lattices and best hypotheses from the speech recognition systems have been prosodicaly annotated. Translation is performed on the best hypothesis of the recognizer. The prosodic boundaries and mode information are utilized by stattrans using a very simple heuristics. If there is a major phrase boundary a full stop or question mark is inserted into the word sequence, depending on the sentence mode as indicated by the prosody-module. Additional commas are inserted for other types of segment boundaries. As the prosody-module gives probabilites for segment boundaries thresholds are used to decide if the sentence marks are to be inserted. These thresholds were selected to give on average good segmentation of the input. The segment boundaries restrict possible word reordering between source and target language. This not only improves translation quality but also restricts the search space, thereby speeding up translation. The output of the stattrans-module is the translation as plain string together with a confidence measure required by the selection-module. The score for the translation which results from the cumulation of the log-probabilities from the different knowledges source combined in the search process is normalized to the sentence . length an mapped into the interval : : : The stattrans-module uses approx. 200 MB of memory mainly to store the alignment templates, the lexicons and the language models for the two translation directions. Translation speed is very high and typically only a few tens of a second even for longer turns. In the overall Verbmobil system the processing time used by the . stattrans-module is about

[0 100℄

2%

5 Experimental Results 5.1 The Task and the Corpus The statistical translation approach was tested on the Verbmobil Corpus. The translit˜ erations of the recorded dialogs have been translated by Verbmobil partners (Hildesheim for Phase I and T¨ubingen for Phase II). As different translators where involved there is great variability in the translations. The turns are sometimes rather long and may consist of several sentences. To prepare the training corpus, these turns were split into shorter segments using sentence marks as potential split points. As the sentence marks do not always coincide, a dynamic programming approach was used to find the optimal segmentation points. n source segments can be aligned to m target segments. This alignment is scored using a word-based alignment model. That segmentation of a sentence pair is used 10

which gives the best overall score. Additional restrictions are applied to avoid segment pairs with very low score. The translation and language models where then trained on the segmented corpus. An official vocabulary has been agree upon for the speech recognizers. However, not all of these vocabulary items are covered by the training corpus. Therefore, an additional lexicon was constructed semi-automatically. Online lexicons were used to extract translations for words missing in the training corpus. Some additions had to be made manually. The resulting lexicon contained not only word - word items but also multi-word translations, especially for the large number of German composita. In Table 5.1 the characteristics of the training and test sets are summarized. Table 1. Training and Test Corpus German English Train Sentences 58 332 Words 519 523 549 921 Vocabulary 7 940 4 673 Lexikon Items 12 779 Words 15 101 18 213 ex. Vokabular 11 501 6 867 Test Sentences 147 Words 1 968 2 173 Trigr. PP (40.3) 28.8

5.2 Alignment Quality We measure the quality of the above mentioned alignment models with respect to alignment quality and translation quality. To get a reference alignment we manually aligned about 1.4 percent of our training corpus. It is well-known that manually performing a word-alignment is a complicated and ambiguous task (Melamed, 1998). Therefore, we allowed the humans which performed the alignment to specify two different kinds of alignments: an S (sure) alignment which is used for alignments which are unambiguously and a P (possible) alignment which is used for alignments which might or might not exist. The P relation is used especially to align words within idiomatic expressions, free translations, and missing function words. Figure 5 shows an example of a manually aligned sentence with S and P relations. The quality of an alignment A f j; aj g is then measured using the following error rate:

= ( ) 1 jA \ S j j+AjjA+ \jS(jP [ S )j 11

:

ja , dann w"urde ich sagen , verbleiben wir so .

. that at it leave us let , say would I then , yes

Figure 5. Example of a manual alignment with sure (filled dots) and possible connections.

The reference alignment does not prefer any translation direction (it is symmetric) and contains many-to-one and one-to-many relationships. Therefore the Viterbi alignments of the baseline alignment model will not have zero errors. The following table shows the alignment quality of different alignment models on the Verbmobil task: Alignment Errors [%] Dictionary no yes Empty Word no yes yes Model 1 17.8 16.9 16.0 Model 2(diag) 12.7 11.7 10.6 HMM 11.7 9.9 9.2 Model 4 9.2 7.9 6.6 We conclude that more refined alignment models are crucial for good alignment quality. Especially the use of a first-order alignment model, modeling an empty word and fertilities are important. The improvement by using a dictionary is small compared to the effect of proper statistical modelling. 12

5.3 Translation Results Performance Measures for Translation Quality. We measure the translation quality using two different criteria:

( )

– Word Error Rate (WER): The edit distance d t; r (number of insertions, deletions and substitutions) between the produced translation t and one predefined reference translation r is calculated. The edit distance has the great advantage to be automatically computable, and as a consequence, the results are inexpensive to get and reproducible, because the underlying data and the algorithm are always the same. The great disadvantage of the WER is the fact that it depends fundamentally on the choice of the sample translation and that it does not take into account how serious different errors are for the meaning of the translation. – Subjective Sentence Error Rate (SSER): The translations classified into a small number of quality classes, ranging from “perfect” to “absolutely wrong”. In comparison to the WER, this criterion is more liable and conveys more information, but to measure the SSER is expensive, as it is not computed automatically but is the result of labourous evaluation by human experts. The SSER is used e.g. in Nießen et al. (1998). To support the assignment of the subjective error scores and to guarantee high consistency an evaluation tool has been developed which displays already evaluated translations along with the new translation and also allows for an extrapolation of the SSER by finding nearest matches to former evaluations stored in a database (Nießen et al., 2000, Vogel et al., 2000). Effect of Preprocessing. There are a number of problems for the statistical approach to translation which can mainly attributed to the sparse data problem. Many word and many syntactical constructions are seen only once in the training data. For these it is often not possible to train reliable alignments. However, in some cases the problems can be lessened be appropriate preprocessing steps. Most important for the Verbmobil task is the handling of numbers in time expressions like ‘halb zehn’ to be translated as ‘half past nine’. Therefore, simple substitution rules are used to normalize such expressions in the training corpus. The same substitutions are applied online for the test corpus. The effect of this preprocessing is given in Table 5.3. Whereas the WER gives only a small improvement the translation quality as measured by the subjective sentence error rate shows a clear improvement. Table 2. Effect of Preprocessing on Translation Quality.

no preprocessing preprocessing

WER[%] SSER[%] 50.6 22.9 48.6 16.8

13

Effect of Alignment-Models. It has already been shown that stronger alignment models result in improved alignment quality. How this effects the translation quality can be seen in Table 5.3. The translations were produced for text input. Table 3. Effect of Alignment Model on Translation Quality. Model WER[%] SSER[%] IBM-1 49.8 22.2 HMM 47.4 19.3 inv. HMM 48.6 16.8

The improvement is due to better lexicons and better alignment templates extracted from the resulting alignments. The search process and also the preprocessing was the same for all three runs. 5.4 Translation Expamples Disambiguation. For the statistical approach to translation no explicit information about different meanings of words are stored. Rather, this has to be extracted from the corpus and stored in the alignment templates, the lexicon and the language model in an implicit way. The following examples show that in many cases the context stored in this way allows for correct disambiguation. The first two groups of sentences contain the verbs ‘gehen’ and ‘annehmen’ which have different translations. Some of these examples are rather collocational. Correct translation is only possible by taking the whole phrase into account. The last two sentences show the disambiguation of prepositions with the example of temporal and locational ‘vor’. Table 4. Disambiguation Examples Input Wir gehen ins Theater . Mir geht es gut . Es geht um Geld . Geht es bei Ihnen am Montag ? Das Treffen geht bis 5 Uhr . annehmen Wir sollten das Angebot annehmen . Ich nehme das Schlimmste an . vor Wir treffen uns vor dem Fruehstueck . Wir treffen uns vor dem Hotel . gehen

Translation we will go to the theater . I am fine . it is about money . is it possible for you on Monday ? the meeting is to five . we should accept that offer . I will take the worst . we meet before the breakfast . we will meet in front of the hotel .

The translation of ‘Ich nehme das Schlimmste an .’ as ‘I will take the worst .’ shows the problem of long distance dependencies. In this case the strong connec14

tion between the words ‘nehme’ and ‘an’ was not captured by the alignment templates. This can be improved with additional morphosyntactic preprocessing which transforms ‘nehme ... an’ into ‘annehme’. Training and testing on corpora with this preprocessing will produce the translation ‘I suppose the worst’ . Note also, that ‘Wir treffen uns’ in the last two sentences gets two different translation, both of which are correct. This demonstrates the fact that not segments of the sentences are translated and the result concatenated. Rather, the target sentence is the result of an overall search process combining different knowledge sources. Examples from test corpus In Table 5 we give some translation examples are taken from the test corpus used for our internal evaluation. Translations were produced on text and on speech input. Table 5. Example from test-147 corpus. Text

Speech

Text Speech Text Speech Text Speech Text Speech

wie w”are es denn mit dem achtzehnten , weil ich am siebzehnten noch verhindert bin . how about the eighteenth , because I am still booked on the seventeenth . wie w”are es denn mit dem achtzehnten , weil ich am siebzehnten noch verhindert , dann how about the eighteenth , because I still booked on the seventeenth then . sehr gut , ja . dann fahren wir da los . alles klar . danke sch”on . very good , yes . then we will go then leave . all right . thank you . sehr gut , ja ich dann fahren wir da uns , alles klar dann schon very good , well then we will go then I us , all right then already . Mittwoch , den sechsten , geht nicht . ”ah Montag , der elfte . Wednesday , the sixth , isn’t possible . ”ah Monday , the eleventh . wie Mittwoch den sechsten geht , nicht , Montag , der elfte ? how is , not Wednesday the sixth , Monday , the eleventh ? ah , ja , ja , die haben einen guten Service . uh , well , well , they have a good service . ah , ja , die ja guten Service . oh , yes , good yes the service . genau , das w”are dann eine ”Ubernachtung . exactly , then , that would be an overnight stay . genau , das w”are dann eine ”Ubernachtung . exactly , then , that would be an overnight stay .

Examples from End-to-End Evaluation Dialogs. During March and April 2000 an end-to-end evaluation of the Verbmobil system was performed by the Verbmobil partners at the university of Hamburg (Tessiore and Hahn, 2000). In this evaluation two native speaker conducted a dialog having no direct contact but only listening to the translation synthesized by the Verbmobil system. From the Log-files of these 15

dialogs the best hypothesis of the speech recognizer for each turn and the translation produced by the stattrans-module was extracted. In Table 5.4 some turns from dialog 000412-152011 are shown. First the transliteration of the spoken sentence, next the output of the speach recognizer, and finally the translation produced by the stattrans-module .

Table 6. Translation examples from the end-to-end evaluation. Guten Tag , fahren wir mit dem Auto nach Hannover ? guten Tag fahren wir mit dem Auto nach Hannover hello , let’s go by car , to Hanover , Okay , that’s fine. What time should we leave on the fourth of December ? I okay that is fine what time should we leave on the fourth of December ich , okay , das ist in Ordnung , um wieviel Uhr , sollten wir losfliegen , am vierten Dezember , Wir sollten um zehn Uhr morgens losfahren , damit wir um zw¨olf Uhr in Hannover sind . wir sollten um zehn Uhr morgens losfahren damit wir um zw¨olf Uhr in Hannover sind we should at ten o’clock in the morning , so that we leave at twelve o’clock , will be in Hanover . Okay , that’s fine . Let’s meet at ten o’clock in the morning . okay that is fine let us meet at ten o’clock in the morning okay , das ist in Ordnung , dann treffen wir uns , um zehn Uhr , in der Fr”uh , Soll ich ein Hotel buchen ? soll ich ein Hotel buchen shall I book a hotel , Okay , I like the Queens-hotel , could you reserve two single rooms with the shower ? I okay I I like the UNK:Hotel.kvi:nsho:tEl.Queens-Hotel could you reserve two single rooms what the UNK:Hotel.Sa:vOY.Savoy ich , okay , ich mag den , k”onnten Sie UNK:Hotel.kvi:nsho:tEl.Queens-Hotel zwei Einzelzimmer reservieren wie das UNK:Hotel.Sa:vOY.Savoy . Ich buche in dem Koenigshotel zwei Einzelzimmer mit Dusche . ich buche in dem UNK:Surname.k2:.KOEH zwei Einzelzimmer mit Dusche I will book in the UNK:Surname.k2:.KOEH two single room with shower . That’s fine . What is the name of the hotel ? that is fine what is the name of that hotel das ist in Ordnung , wie ist der Name dieses Hotel .




16

>

>

6 Summary References Alshawi, H., Bangalore, S., and Douglas, S. (1998). Automatic Acquisition of Hierarchical Transduction Models for Machine Translation. In Proc. 36th Annual Conference of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, 41–47. Baum, L. (1972). An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes. Inequalities 3:1–8. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263–311. Dagan, I., Church, K., and Gale., W. A. (1993). Robust bilingual word alignment for machine aided translation. In Proceedings of the Workshop on Very Large Corpora, 1–8. Fung, P., and Church, K. W. (1994). K-vec: A New Approch for Aligning Parallel Texts. In COLING ’94: The 15th Int. Conf. on Computational Linguistics. Jelinek, F. (1976). Continuous Speech Recognition by Statistical Methods. In Proc. of the IEEE, Vol. 64, No. 10, 532–556. Kay, M., and R¨oscheisen, M. (1993). Text-translation alignment. Computational Linguistics 19(1):121–142. Melamed, I. D. (1998). Manual annotation of translational equivalence: The blinker project. Technical Report 98-07, IRCS. Nießen, S., Vogel, S., Ney, H., and Tillmann, C. (1998). A DP based search algorithm for statistical machine translation. In Proc. 36th Annual Conference of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, 960–967. Nießen, S., Och, F. J., Leusch, G., and Ney, H.. 2000. An evaluation tool for machine translation: Fast evaluation for MT research. In Proceedings of LREC, Athens, Greece, May . Och, F. J. (1999). An efficient method to determine bilingual word classes. In EACL ’99: Ninth Conf. of the Europ. Chapter of the Association for Computational Linguistics. Tessiore, L., and Hahn, W. v. (2000). Functional End-to-End Evaluation of an MT System: Verbmobil. In this volume. Vogel, S., Ney, H., and Tillmann, C. (1996). HMM-based word alignment in statistical translation. In COLING ’96: The 16th Int. Conf. on Computational Linguistics, 836– 841. Vogel, S., Nießen, S., and Ney H. Automatic extrapolation of human assessment of translation quality. In Proceedings of the Workshop on the Evaluation of Machine Translation, Athens, Greece, May 2000.

17