Towards a model of statistical machine translation Arabic-French

4 downloads 0 Views 701KB Size Report
Dec 31, 1998 - teachers: Chip selected a novel by Mohamed. Choukri [19] (" time errors ")" This extract was published with the lexicon in the No. 49 TextArab.
Towards a model of statistical machine translation Arabic-French Khaireddine Bacha

Mounir Zrigui

Laboratoire LaTICE, Équipe de Monastir, Faculté des sciences de Monastir [email protected]

Laboratoire LaTICE, Équipe de Monastir, Faculté des sciences de Monastir [email protected]

Abstract— The automatic translation of texts of human origin is a very complex application called to apprehend the universe open text, without any constraint on their nature or diversity. To solve this problem, several attempts have been initiated, each having the objective to obtain a better translation quality of the parallel corpora. But before the various ambiguities of the natural language, the problem of translation is far from easy to solve. To do this, and in order to increase the translation quality, we propose a model for the generation of different semantic cases relating to different components of the sentence to determine the meaning first and then generate the translation into the language target. This allowed us to obtain satisfactory results compared to similar studies using other techniques. In our approach, we used a model guided by the semantics to learn the techniques of translation with a similar human performance. Our project is to translate source sentences from Arabic to French through a statistical approach that includes a dictionary and that is automatically evaluated by the BLEU metric to ensure encouraging results even if the tools that we use are very limited. Keywords— statistcal machine translation, alignment, parallel corpora, statistical approach

I. INTRODUCTION Understand a foreign language, translate a text, learn to speak are examples of tasks performed by a man instantly, while the most powerful computer is totally incampable of doing that. To study these types of tasks is called the cognitive approach. It is multidisciplinary. Indeed, we see scientists and psychologists computer linguists trying to work side by side. It is used to complex problems, bad or just specified that require treatment in a huge mass of information of different nature. In this work, we tried to introduce our statistcal machine translation model which is to translate the source sentences from Arabic to French through a statistical approach that includes a dictionary to ensure encouraging results even if the tools we have used

are very limited. Finally, we present our experimental results and we end with a conclusion stating our results and identify the perspectives opened by this work. II. RELATED WORK In the early 90s, a team of IBM researchers proposed an operational approach to statistical machine translation. There are among other statistical translation by statistical methods based on the mathematical theory of distribution and probabilistic estimation [1], and in particular the prototype Candide, a translation system built from speech available in French and English [2]. Translation by statistical methods based on a parallel corpora [3] [4]. For example, [5] are translating the source systems by eight and then select the final translation. Other studies attempt to automatically correct errors in a rule-based system by a statistical system [6] and [7]. The number of approaches to statistcal machine translation has increased in recent years, to our knowledge, only one other system of statistical translation Arabic / French was developed precisely in the FRAMES project [8]. We have seen different approaches and methods used in the field of statistical machine translation [9]. And given the lack of jobs on the side semantics, we took the opportunity to do some work strongly guided by the semantics and will be detailed in the next section. We will propose a novel method for automatic translation of the Arabic language. The proposed method is strongly guided by the semantics [10]. III. SYSTEM ARCHITECTURE A. General Description In this phase, we have proposed a new architecture to achieve a coherent and wellstructured final translation. So we modelled a

method based on a semantic approach system that includes a dictionary and integration of the statistical approach that gives an acceptable end result. The system works as follows: • Step 1: is to insert an Arabic parallel corpora as input to the pre-treatment. After performing the steps of segmentation, analysis and morphological analysis, an affixal sentence labeled as a result is obtained. • Step 2: the labeled sentence will be received by the statistical approach to analyze and identify the likelihood of a possible translation and choose the most consistent and acceptable translation to end using the inserted during the translation phase dictionary. Pre treatment

1

2

Treatment

Parallel Corpora

P r(e)P r(f|e) P r(e|f) =

(1) P r(f) Call ê word sequence that maximizes the equation e (A). Given independence of P r (f) with respect to e, find E can be expressed using the equation (B). ê = argmax P r (e) P r (f | e) e

(2)

Equation (B) highlights three challenges of statistical machine translation] [1], namely: 1. Modeling language: P r (e). 2. Model the translation: P r (f | e). 3.A decoder to search for the best translation: ê = argmax P r (e) * P r (f | e) (3) e The following three sections are a brief overview of the steps already taken by some researchers to address these challenges.

Statistical Approach

Segmentation Tagged sentence

Languag e model

Translati on model

Decoder

Lexical Analysis

1) Language model: A language model is a description of a language. There are two main schools of thought about the nature of this description, namely:

3 Translated sentence Morphological analysis

Fig. 1

Translated sentence

Operation phases of the model

The basic idea of our model is that each text or corpora to translate the following steps: 1. Segmenting the text on sentences. 2. Lexical analysis of sentences by detecting their roots and identifing the components of these sentences affixals. 3. Make a morphological analysis of these to know the correct alignment and avoid any inconsistency in translation 4. Finally, the phase of a semantic analysis is based on a probabilistic statistical method that includes the dictionary of the ocean as a reference and as a database for the translation of segments sentences. B. Description of the statistical approach Suppose we want to translate a sentence from Arabic to French. Call the Arabic sentence f and the French sentence e. In statistical machine translation, this is not possible using any formal knowledge of language such as grammar, syntax or semantics. In this case, it is clear that all sentences belonging to the Arab are potential translations of f [1]. We associate a number or P r (e | f), which is the conditional probability to translation e given f. Bayes' theorem gives us the equation (A)

1. Linguistic tradition that the language is modeled deterministically. 2. The latest trend that wicks formal language, relying instead on probabilistic models as the basic unit with the n-gram. Here we focus on the latter. We first explain what an n-gram, and briefly describe its use in a probabilistic language model. N-gram: An n-gram model consists of n sequences of words and assigns a probability to each of them. We introduce this concept using a bi-gram model, ie n = 2. Imagine a sequence s consists of the words w1 ... wl. The probability of this sequence, denoted p (s) is expressed using the following equation: p (s) = p (w1) p (w2 | w1) p (w3 | w1w2) ... p (wl | l w1 ... wl-1) = Π p (wi | w1 ... wi-1) i=1 The works of Chen and colleagues [11] are based on the work of [12] to perform the approximation wanting that for a bi-gram, the probability of a word depends only on the context provided by the word that precedes it in s. The following equation also represents this approximation. It is called Markov approximation and is of the order n - 1. For example, in this case, the order of the Markov approximation of bi-gram is 2-1 = 1. l l

p (s) = Πp (wi | w1 ... wi-1) ≈ Πp (wi | wi-1) i=1 i=1 When we condition the probability of wi using more than only the word that precedes it, so if n> 2, we can generalize the approximation described in the previous equation to obtain the following equation: l l p (s) = Πp (wi | w) = Πp (wi | h) i =1 i-1 i = 1 n +1 . Alignment of n-gram: When the time comes to make the statistical machine translation, using the model of the language is insufficient. The alignment concept of n-gram is to say that every word is associated to the target zero, one or more source words are then introduced. The following figure shows an example of n-gram alignment. It is found that the alignment is not strictly a correspondence between the words of a source sentence and a target sentence. We denote by A (c, s), all possible alignments between the sentence s (source) and the sentence c (target) s1 s2 s3 s4 s5

s1 s2 s3 s4 s5 c1 c2 c3 c4 Fig. 2

A Word alignment

2) Translation model: Now, we come to the second part. The translation model, the one that will give us Pr (f | e). In this section, we briefly detail the steps leading to the construction of such a model. Let us first define of all the statement at the beginning of this section, which states that any sequence in a target language is a possible translation of any sequence in the source language. A probability to each of these translations is combined, ie Pr (f | e). A wise choice in the distribution of these probabilities provides quality translations [1]. It happens, among other things, to assess the likelihood of translating a sequence of zero or more words in the source language into a sequence of zero or more words in the target language. These sequences are called segments. The quality of a translation model depends on the quality of the text on which it was trained. This is done on a bitext of an imposing size. A bitext is a parallel corpora consisting of two texts where each sentence in a text is related to a translation with the phrase in the corresponding position in the other text and vice versa.

3) Decoder: The language model, the alignment method and the translation model are not sufficient alone to make a statistical machine translation. You also need a decoder that uses these models to find in the search space, the best translation hypotheses. Scores associated with each model will allow the decoder to select the most likely target segments when a source sentence is translated. Each parameter can be defined by a set of functions fn having as parameter the source sentences P and target T, and their possible alignments. Starting from the translation model, the path of pairs of segments (Pi, Ti) will help generate hypotheses for translation. Then, based on a language model, the scores of these assumptions are weighed by the probability of observing these segments in the order, proposed by the alignment model. Finally, the decoder can select a translation following this equation: ê = argmax P r (e) * P r (f | e) e A body dedicated to this task is commonly used and is called parallel corpora development. The weights are optimized iteratively according to the scores obtained in the translation of the parallel corpora development, based on a reference translation and automatic metric. C. Automatic Evaluation Human evaluation was generally the method used to evaluate the quality of translations produced by an automated system. Using this method allows one hand to get a better estimate of the quality of translations although it is expensive [13] and on the other hand it limits evaluation as some authors have stated in their work [14]. In our project, or more precisely in our method it is possible to use one of the most popular methods for an automatic evaluation of translations which lies in the comparison of segments, from January to April, and words in common between all the assumptions made by the system and their translations reference. The approach proposed by Papineni [15] was the first automatic measurement accepted as reference for the evaluation of translations. The principle of this method is to calculate the degree of similarity between an automatic translation and one or more reference translations based on the particular n-gram precision. The BLEU score is defined according to the following formulae : (Σ n = 1 wn log pn) BLUE = BP × e

(4)

It is the geometric means of n-gram information, pn, obtained with n-grams of order 1 to N and positive weights wn. Pn is the number of n-grams of statistcal machine translation is also present in one or more reference translation divided by the total number of n-grams of machine translation. Confronting the automatic translation, several references give more freedom to the translation system for the selection of the translation of a word, especially when it has several synonyms. BP is calculated for short automatic translations disadvantage compared to the reference penalty. BLUE as a measure of precision, the advantage would be too short sentences without the penalty. It is defined by: 1 BP=

si si

c>r c≤r

(E)

where c is the size of the statistical machine translation and the nearest r c from the size of the reference translations size. The details n-grams are usually combined to 4grams with weight wn uniforms. Automatic translation is assigned a score of 1 BLUE if it is identical to a reference. Instead, it has a score of 0 if none of the n-grams is present in a reference. The BLUE method has proven its capabilities evaluation of statistical machine translations. It turned out that the classification of machine translation systems given by BLUE was the same as the classification provided by expert judges [16] D. Data and Tools 1) Dictionary of the Ocean: Among the tools we used to make our experience is a dictionary and more specifically the dictionary of the ocean. The interest of this dictionary [17] is selected to seven thousand words, the encyclopedic comments or summary tables that accompany most definitions and provide knowledge and information on the many areas that affect the ocean: marine flora and fauna, forms the bottom of seas, geological sciences, physical or biological, fisheries and mariculture, shipping, sailing or scuba diving, ocean engineering, hyperbaric medicine, law of the sea, etc.. But also despite the various researches on the automatic processing of Arabic, it was difficult to find readymade resources [18].

geographical and historical topics, short biographies of the great discoverers and navigators, inventors or oceanographers, such as articles on famous ships of the past. By its richness and diversity, this dictionary provides a comprehensive and practical information, first specialists of the ocean when their research or action, they have to go out of their discipline, the maritime professionals, politicians and administrative and public as a whole ever wider and more curious about things from the sea to their culture or leisure. For all of them, the dictionary of the ocean is a rich working tool, a reference book. 2) Corpora used: After inserting the dictionary we must choose the texts on which we will try to practice our translation method that it is called the "training parallel corpora ". That is why we decided to choose two very well labeled parallel corpora in Arabic that are "Portraits of two teachers: Chip selected a novel by Mohamed Choukri [19] (" time errors ")" This extract was published with the lexicon in the No. 49 TextArab and another from a book published in Cairo and Rabat, where the author, Talha Jibreel, is telling his life the famous Sudanese writer Salah Taieb. The text published in TextArab No. 47 with the lexicon and grammatical comments. The following table shows the characteristics of both the parallel corpora used to test and evaluate our approach. TABLE I CONSTRUCTION OF CORPORA WORK

Corpora

Size

Number of words

Corpora 1

1.24 Mo

459

Corpora 2

743 Ko

363

3) Toolbox probabilistic translation used: We recall that the approach to statistical machine translation is as follows. Given an Arabic sentence s, we look for the French translation t that maximizes p (t / s), the probability of a sentence is the translation of t s:

Tagged sentence

Languag e model

P(s|t) Decoder ê = argmax P r(e) * P r(f|e)

Translation model

Dictionary Ocean

P(t)

e Final translation

Fig. 3 Automatic statistical machine translation Arabic / French

The figure shows the main components of the probabilistic machine translation. The decoder takes as input the results provided by the translation models and language to output the Table: Translated text. Note that the language into which you want to translate is called the "target language." E.

Experiments and evaluation

To ensure the translation of the words in a correct manner we (acceptable) must use a system based on the principle of triliteral roots whose structure gives us the ability to choose the appropriate equivalent in the target language dictionary. Our dictionary has the following structure: 1. Root word: This field must contain the root word or the verb in the source language. 2. The schema of the word: this field must contain the schema of the word or the verb in the source language. 3. The word in the source language: This field must contain the word or the word in the source language. 4. The semantic role: This field has an important role in the validity of the operation of translation of the word of source language to the target language, but as we work toward the phrases that contain a single word we replace the semantic role for words whose semantic role is an action by their primitive. 5. Type: This field contains the type of the source language word eg Word, female name, male name, ... 6. Group of the verb indicates the group verb in the target language, this information is very important for a conjugated verb, for example: the first group / second group ... 7. The auxiliary verb indicates the auxiliary verb in the target language, this information is very important to properly conjugate the verb eg Be / Have; 8. Equivalent: This field contains the equivalent word of the word in the source language into the target language. 9. The plural equivalent: This field contains the plural word equivalent of the word in the source language into the target language. TABLE II

RESULTS FOUND BY OUR MODEL

Experience1

Experience2

Phrase en langue source

Traduction en français

‫يدرسنا مواد‬،‫في قسم الشهادة اإلبتدائية‬ ‫اللغة العربية معلم شاب متبجح بنفسه‬

Dans le département du certificat d'études primaires, l'étude des matériaux en langue arabe, professeur homme vantard lui-même Et augmenté mon amour pour la langue anglaise après avoir déménagé à l'enseignement secondaire Lire des journaux et des livres dans la classe.

‫وزاد حبي للغة اإلنقليزية بعد أن‬ ‫إنتقلت إلى الثانوية‬

Experience3

‫يقرأ الصحف و الكتب في القسم‬

Experience4

‫أظن إسمه ليلى بنت الصحراء‬

Je crois que son nom est Layla bint désert

Experience5

‫حين بدأت أتعلم اللغة‬ ‫إكتشفت حبي لهذه اللغة‬،‫اإلنقليزية‬

Experience6

‫حمار!!غبي!! أأنت ستدرس؟‬

Experience7

‫لكن أبناء منطقتنا في الفصل كانوا‬ ‫يتمتعون بذكاء شديد‬

Experience8

‫ورغم أنه درس الزراعة لكن أعتقد‬ ‫أنه لو اِتجه أيّ إتجاه لتفوق فيه‬

Experience9

‫يسبّ من يخطئ في‬،‫يغضب بسرعة‬ ‫أدنى شيء‬

Quand j'ai commencé à apprendre l'anglais, j'ai découvert mon amour pour cette langue(fig 4) Âne! Stupide! Étudiez-vous?

Mais les gens de notre région dans le chapitre qu'ils avaient très habilement Bien qu'il ait étudié l'agriculture, mais je pense que même si a choisi n’importe quelle direction il sera le vainqueur Fâché rapidement, insulte le fautif dans la moindre de chose

And that's just one example of the experiments conducted by our model: [‫إكتشفت حبي لهذه اللغة‬،‫]حين بدأت أتعلم اللغة اإلنقليزية‬

Fig.4

Example of results in experiments.

F. Experimental results and interpretations In a system of translation, experimental results represent a key element to test its validity. To perform the test, we proceeded as follows: 1. Building our own parallel corpora: This phase represents the slow phase, it requires a coordination between linguists and computer scientists to get to form a complete parallel corpora. 2. Adaptation of the training corpora used in the system of generating a representation of meaning in Arabic based on a semantic case. 3. The launch of Learning: For our case, this phase requires little time. 4. Test and generalization: During this phase, we used a parallel corpora generalization of various kinds (02 words, 03 words ...). TABLE III

RESULTS OF TESTS OF GENERALIZATION VARYING SIZES PHRASES Nature of the parallel corpora

Sentences to 02 words

Sentences to 03 words

Sentences to 04 words

Sentences size more than 05 words

Rate recognition

96.66

93.33

88.33

64.19

The results we obtained in our experiments show the average found in the sizes of the tested sentences 120 100

Phrase à 2 mots%

80

phrase à 3 mots

60 40

phrase à 4 mots

20 0 Phrase Phrase Phrase Plus à2 à3 à 4 que 5 mots mots mots mots

Plus que 5 mots

Distribution of recognition rate depending on the size of the sentence Fig. 5

G. Results and interpretation of experiments In order to test the results of the learning phase, we performed a series of tests with different body sizes (02 words, 03 words, 04 words of varying sizes), the number of sentences of each corpora equals 30 sentences except in the case of sentences of varying sizes corpora, we used the body of a document simply repeating the generation of a representation of the meaning of a sentence in Arabic based on the semantic cases. In general, the test results are satisfactory, so that the positions of the words in the translated sentence are calculated and in most cases we got fair surface representations. H. Qualitative Analysis The following table shows the results of the qualitative analysis at the morphosyntactic categories of words in our model French Arabic translation. TABLE IV RESULTS OF THE EVALUATION OF THE QUALITY OF TRANSLATION OF SOURCE WORDS OF OUR MODEL

IV. CONCLUSION AND OUTLOOK The objective of this project was to make some large building stones to become as the Automatic Translation Statistics. We focused our efforts on two main components characterizing this approach, namely the parallel corpora for learning translation models and the translation table that is one of the basic resources that the decoder needs to calculate the most automatic translations probable input sentences. The first problem which confronts anyone wishing to set up a system of statistical MT is collecting aligned bilingual parallel corpora from which the machine will take its knowledge. Collecting such a parallel corpora is a difficult and expensive task. It must indeed find texts that are available in several languages and then build the body so that each sentence is mapped to its translation. In our work, we proposed a novel method based on a bilingual dictionary "ocean" that will replace the parallel corpora. The use of this dictionary benefits such as data translations are very abundant and freely accessible. It is therefore possible to build a large parallel corpora for learning translation models. The initial aim of our experiments was to make a model of efficient and robust automatic translation. To do so, we conducted experiments where we evaluated several different formulas and tools to achieve the ideal formula to achievie our goal. After making deep and well-specified research studies we decided to make a machine translation model strongly guided by the semantics for the lack of an interesting work in this side. That's why our model is to translate a text in Arabic to French through a statistical probability method. In the first place, we went was based on the work of [20] in [21] in segmentation, lexical analysis, and morphological analysis. The first one consists in segmenting the sentences and the following words with well-defined criteria such as punctuation and spaces between words.

Turning to the lexical analysis phase, the words found are highly lexically analyzed by identifying their roots and their affixal detecting components and then facilitating their According to the statistics released in this table, we see that morphological labeling that is to say, to know their Categories of words in the source Precise n-grams in the target grammatical categories in the sentences to avoid the +Bleu( 1g 2g 3g 4g Adj Adv Nom Pro Verb Total *100) following trap of inconsistency. Test 1 Arriving now at the semantic analysis phase, we 10 8 20 7 20 65 7% 63,3 36,0 22,7 14,8 entered labeled sentences as well. We have introduced a Test 2 15 7 30 12 30 94 61% 63,8 36,6 23,2 15,2 probabilistic statistical approach including a dictionary of Test 3 the ocean. 25 5 50 5 50 135 44% 63,6 36,5 23,1 15,1 Our model includes a language model that is the Test 4 19 9 70 29 70 197 50 % 64,0 36,7 23,2 15,1 description of a language and in our case it was based on an n-gram model which is composed of sequences of n the Blue score varies in each test. We think we got only 7% in words and it assign a probability to each of them as opposed the first or in the second test. We managed to get a very to other works made, that were based on the alignment of acceptable and even in encouraging measures of 61% loss. For "IBM", including [22]. other tests, results are performing well. Also we find the translation model that assigns a probability of each translation dictionary generated by the ocean. As average 96.66% recognition, or when the size of sentences increases, there is a decline in measured rates of 64.19%.

Making the average recognition rate of our model, there is a figure of 85.13%. According to previous works, the recognition rate, which varies between 79% and 97% is a figure that usually results in an acceptable translation in most cases . To see the meaning of the differences, observed in the experiments reported in the previous paragraph blue scores, we analyzed examples of translations to the naked eye. We chose some translations from a randomly selected batch. We note that there are sometimes errors in translations from human sources. Although the BLEU score is among the best automatic evaluation metrics and provides generally accurate and meaningful results. The change score BLUE during testing we did shows a clear imbalance between the test results. It is found that in a test the BLEU score reached 61% and in other cases, he sees a fall to 7%. This must be explained by the huge variation of the words used in the parallel corpora and also seen at least in a limited size of our reference presented in the ocean dictionary. A comparative study that we conducted in order to understand the differences between our BLEU approach scores and the approach [23] led us in the second series of experiments conducted on the second body to change the number of data used. This change leads to change scores found with BLEU score over 30, in other words, understandable and readable automatic translations. So it is quite expected that we can meet such results. Overall, our system was able to achieve acceptable and encouraging statistics to improve our system by using the most powerful tools that can be learning [24] and a solution to the Arab-French machine translation system.

based post-editing. In Proc. of the Workshop on Statistical Machine Translation, pages 203–206, Prague, Czech Republic, June 2007. [7]

[8] [9] [10]

[11]

[12]

[13] [14]

[15]

[16] [17]

[18] [19]

REFERENCES [1]

[2]

[3]

[4]

[5]

F. Brown, John Cocke, Stephen Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2) :79–85, 1990. L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry Printz, and Luboš Ureš. The candide system for machine translation. In Proc. of the Workshop on Human Language Technology, pages 157–162, Plainsboro, NJ, 1994. M. Nagao. A framework of a mechanical translation between Japanese and English by analogy principle. In Proc. of the Intl. NATO symposium on Artificial and human intelligence, pages 173–180, Lyon, France, 1984. H. Somers. Review article : Example-based machine translation. Machine Translation, 14(2) :113–157, June 1999. M. Paul, Takao Doi, Youngsook Hwang, Kenji Imamura, Hideo Okuma, and Eiichiro Sumita. Nobody is perfect : ATR’s hybrid

approach to spoken language translation. In Proc. of Intl. Workshop on Spoken Language Translation, Pittsburgh, USA, October 2005. [6]

M. Simard, Nicola Ueffing, Pierre Isabelle, and Roland Kuhn. Rule-based translation with statistical phrase-

[20]

[21]

L. Dugast, Jean Senellart, and Philipp Koehn. Statistical postediting on SYSTRAN’s rule-based translation system. In Proc. of the Workshop on Statistical Machine Translation, pages 220–223, Prague, Czech Republic, June 2007 S. HASAN and NEY H. (2008). A multi-genre SMT system for Arabic to French. In LREC, p. 2167–2170. K. Bacha, Mounir Zrigui, Machine Translation System on the Pair of Arabic / English. KEOD 2012, P: 347-351 A.Zouaghi, Mounir Zrigui, Georges Antoniadis, Laroussi erhbene: Contribution to Semantic Analysis of Arabic Language. Adv. Artificial Intellegence 2012 (2012) S. Chen, & Goodman, J. (octobre 1999). An empirical study of smoothing techniques for language modeling. Computer Speech and Language , 359-394,. E. Leonard, Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics, 41(1) :164–171, 1970 L. Specia, 2011. Exploiting objective annotations for measuring translation post-editing effort. Dans les actes de EAMT, 73–80. B. Sagot, K. Fort, G. Adda, J. Mariani, B. Lang, et al., 2011. Un turc mécanique pour les ressources linguistiques : critique de la myriadisation du travail parcellise. Dans les actes de TALN. K. Papineni, S. Roukos, T.Ward, etW. Zhu, 2002. Bleu : a method for automatic evaluation of machine translation. Dans les actes de ACL, 311–318. P. Koehn, 2004. Statistical significance tests for machine translation evaluation. Dans les actes de EMNLP, Volume 4, 388–395. K.Bacha, Mounir Zrigui, Design of a Synthesizer and a Semantic Analyzer's Multi Arabic, for use in Computer Assisted Teaching, International Journal of Information Sciences and Application (IJISA), Volume 4, Number 1 (2012) , , Research India Publications , P: 11-33 K.Bacha,, Mounir Zrigui, Designing a Model of Arabic Derivation, for Use in Computer Assisted Teaching. KEOD 2012, P: 352-356 M. Chokri: The novel Le Pain Nu "or Bread Alone in english" is the most famous literary product of the late writer Mohamed Choukri. wrote in 1972 but, not published in Arabic until 1982, and translated into 38 foreign languages. Published December 31, 1998 by Editions du Seuil H. Belguith Lamia, L. Baccour, G. Mourad, “Segmentation de textes arabes basée sur l'analyse contextuelle des signes de ponctuations et de certaines particules”. 12ème conférence sur le Traitement Automatique des Langues Naturelles (TALN’2005), Dourdan, France, 6-10 juin 2005, pp 451–456 L. Hadrich Belguith, C. Aloulou et A. Ben Hamadou, MASPAR : De la segmentation à l'analyse syntaxique de textes arabes. Revue Information Interaction Intelligence I3, vol 7, N° 2, pp. 9 à 36, mai 2008. CÉPADUÈS-Editions. (ISSN : 1630- 49x).http://www.revue-

i3.org/ [22]

[23]

[24]

D. Déchelotte, H. Schwenk, G. Adda, and J. Gauvain . Improved machine translation of speech-to-text outputs. In Proc. of the NAACLHLT Workshop on Syntax and Structure in Statistical Translation, pages 2441–2444, Antwerp, Belgium, August 2007a. P. Koehn, F. Och et D. Marcu. Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, pages 48–54, Edmonton, May-June 2003. K. Bacha, M. Zrigui ,M. Amine Nahdi, M. Maraoui, A. Zouaghi, (2011). TELA: Towards Environmental Learning Arabic, The 2011 International Conference on Artificial Intelligence (ICAI'11), 2011, WORLDCOMP'11, Las Vegas, Nevada, USA, 6 pages.