Domain Adaptation in Statistical Machine Translation ...

1 downloads 0 Views 1MB Size Report
where there is not enough parallel data available: ➢in-domain comparable corpora can be used to increase translation quality. ➢if comparable corpora are large ...
Domain Adaptation in Statistical Machine Translation Using Comparable Corpora: Case Study for English-Latvian IT Localisation M Ā RCI S P I N NIS, I N G U NA S K A DI ŅA A N D A N DR EJS VA S I ĻJEVS {marcis.pinnis;inguna.skadina;andrejs}@tilde.lv

CI CL I N G 2 0 1 3 M A RCH 2 9 , 2 0 1 3

Challenges of Data Driven MT Lack of sufficiently large parallel corpora limits the building of reasonably good quality machine translation (MT) solutions for less-resourced languages For many languages only a few parallel corpora of reasonable sizes are available Statistical machine translation (SMT) systems trained on parallel corpora perform well on texts which are from the same domain, but are almost unusable for other domains Introduction

Goals of the Paper To show that, for language pairs and domains where there is not enough parallel data available: in-domain comparable corpora can be used to increase translation quality

if comparable corpora are large enough and can be classified as strongly comparable then the trained SMT systems applied in the localisation process can lead to increased human translator productivity Introduction

Comparable Corpora in Machine Translation Comparable corpus is a relatively recent concept in MT: methods on how to use parallel corpora in MT are well studied (e.g., Koehn) comparable corpora in MT have not been thoroughly investigated

The latest research has shown that: parallel data extracted from comparable corpora improves SMT performance by reducing the number of un-translated words (Hewavitharana and Vogel, 2008) language pairs and domains with little parallel data can benefit from usage of comparable corpora (Munteanu and Marcu, 2005; Lu et al., 2010; Abdul-Rauf and Schwenk, 2009 and 2011) Introduction

Comparable Corpora in Machine Translation (cont.) Most of experiments are performed with widely used language pairs, such as French-English, Arabic-English or German-English For under-resourced languages (e.g., Latvian) exploitation of comparable corpora for machine translation tasks is less studied (e.g., ACCURAT, TTC)

Introduction

Briefly about Latvian Latvian belongs to the Baltic language group of the Indo-European language family: morphologically rich language with a rather free word order less than 2.5 million speakers worldwide

Few bi/multilingual parallel corpora exist, among them the largest are JRC-Acquis*, DGT-TM**, Opus*** * Steinberger et al., 2006; available at: ** Steinberger et al., 2012; available at: *** Tiedemann, 2009; available at:

http://www.statmt.org/europarl http://langtech.jrc.it/DGT-TM.html http://opus.lingfil.uu.se

Introduction

Collecting and Processing Comparable Corpus

Collection of Comparable Corpora

Extraction of Semi-parallel Sentence Pairs Extraction of Bilingual Term Pairs

Baseline System Training

Domain Adaptation

SMT System Evaluation

English-Latvian IT Domain Comparable Corpus Artificially created to simulate a strongly comparable corpus composed by two strategies: 1st part: comparable corpus of different versions of software manuals large documents split in chunks of 100 paragraphs aligned in document level with DictMetric*

EN doc. LV doc. Aligned doc. pairs Aligned doc. pairs (f) 5,200 5,236 363,076 23,399 * Su & Babych, 2012; available through the ACCURAT Toolkit (www.accurat-project.eu)

Collecting and Processing Comparable Corpus

English-Latvian IT Domain Comparable Corpus (cont.) 2nd part: comparable corpus of Web crawled documents combined with parallel software manuals Documents Unique sentences Tokens in unique sentences

English 22,498 1,316,764 16,927,452

Collecting and Processing Comparable Corpus

Latvian 22,498 1,215,019 13,036,066

Extraction of Semi-parallel Sentence Pairs Corpus was pre-processed – broken into sentences and tokenised In-domain semi-parallel sentence pairs were extracted with LEXACC* Due to different distribution of comparable data within the two corpus parts, different thresholds were applied in the extraction process Corpus part First part Second part

Threshold 0.6 0.35

Unique sentence pairs 9,720 561,994

* Ştefănescu et al., 2012; available through the ACCURAT Toolkit (www.accurat-project.eu)

Collecting and Processing Comparable Corpus

Extraction of Bilingual Term Pairs Terms monolingually tagged in corpus using Tilde’s Wrapper System for CollTerm (TWSC)* In-domain bilingual term TerminologyAligner (TEA)*

pairs

extracted

with

Highest ranked translation equivalent for each Latvian term kept Unique monolingual terms Mapped term pairs Corpus part English Latvian Total Filtered First part 127,416 271,427 847 689 Second part 415,401 2’566,891 3,501 3,393 * Pinnis et al., 2012; available through the ACCURAT Toolkit (www.accurat-project.eu)

Collecting and Processing Comparable Corpus

Building SMT Systems

Collection of Comparable Corpora

Extraction of Semi-parallel Sentence Pairs Extraction of Bilingual Term Pairs

Baseline System Training

Domain Adaptation

SMT System Evaluation

Training SMT Systems Three English-Latvian systems were trained on LetsMT!* infrastructure: The baseline system The intermediate adapted system The adapted system

All systems were tuned and automatically evaluated on 1,837 and 926 unique sentence pairs in the IT domain respectively * Vasiļjevs et al., 2012; available at: www.letsmt.eu

Building SMT Systems

Baseline SMT System Trained on relatively large publicly available parallel corpora – the DGT-TM parallel corpora of two releases (2007 and 2011): 1’828,317 unique parallel sentence pairs 1’736,384 unique monolingual Latvian sentences

Parallel corpora were cleaned (duplicates and corrupt sentence pairs were removed) before training

Building SMT Systems

Intermediate Adapted System In-domain bilingual data (sentence and term pairs) extracted from comparable corpora were added to the parallel data In-domain monolingual corpus was used to build a second language model Parallel corpus Monolingual (unique pairs) corpus DGT-TM (2007 and 2011) sentences 1’828,317 1’576,623 Sentences from comparable corpus 558,168 1’317,298 Terms form comparable corpus 3,594 3,565 Building SMT Systems

Final Adaptation Phrase table of the intermediate adapted system was transformed to a term-aware phrase table A sixth feature is added identifying phrases containing bilingual in-domain terminology All tokens are stemmed prior to comparison (in order to capture inflected variants of terms) 3,594 bilingual term pairs extracted with TEA used in the adaptation process The SMT system was re-tuned with MERT

Building SMT Systems

Evaluation

Collection of Comparable Corpora

Extraction of Semi-parallel Sentence Pairs Extraction of Bilingual Term Pairs

Baseline System Training

Domain Adaptation

SMT System Evaluation

Evaluation (cont.) Automatic score calculation: BLEU, NIST, TER and METEOR scores Manual translation evaluation: Comparative evaluation Usability for localization – productivity and quality

Evaluation

Automatic Evaluation Results System

Case sensitive?

Baseline Intermediate adapted system Final adapted system

BLEU

NIST

TER

METEOR

No Yes No Yes

11.41 10.97 56.28 54.81

4.0005 3.8617 9.1805 8.9349

85.68 86.62 43.23 45.04

0.1711 0.1203 0.3998 0.3499

No Yes

56.66 9.1966 55.20 8.9674

43.08 44.74

0.4012 0.3514

Evaluation

Comparative Evaluation

Evaluation

System Comparison by Total Points

From 697 cases when the sentences were evaluated:  in 490 cases (70.30±3.39%) output of the improved SMT system (System 2) was chosen as a better translation  in 207 cases (29.70±3.39%) users preferred the translation of the baseline system (System 1) Evaluation

Evaluation in the Localisation Task

Localization Work The localization process is generally related to the cultural adaptation and translation of software, video games, websites, and less frequently to any written translation Translation memories (TM) are widely used in localization industry to increase translators’ productivity and consistency of translated material Support from the TM is minimal on out-of-domain texts and on texts with different terminology Evaluation in the Localisation Task

Key Requirements for Application of MT Increasing the efficiency of translation process without degradation of quality is the most important goal for a localization service provider Key requirements for application of MT in localization: Quality of translation Language coverage Domain coverage Terminology usage Cost of adaptation Evaluation in the Localisation Task

Application of MT in Localisation Localization industry experiences a pressure on efficiency and performance

growing

MT is integrated in several computer-assisted translation (CAT) products, e.g. SDL Trados, ESTeam TRANSLATOR and Kilgrey memoQ.

Evaluation in the Localisation Task

Related Work: MT in Localisation Microsoft (Schmidtke, 2008) used MT on MS tech. domain for 3 languages for Office Online 2007 localization task for Spanish, French and German. Application of MT to all new words increased productivity from 5% to 10% on average Adobe (Flournoy and Duran, 2009) used rule-based MT for translation into Russian (PROMT) and SMT for Spanish and French (Language Weaver). The productivity increased between 22% and 51% Autodesk used the Moses SMT system (Plitt and Masselot, 2010) for translation from English to French, Italian, German and Spanish. Varying increase in productivity from 20% to 131% was observed Evaluation in the Localisation Task

Evaluation Task The productivity was compared in two scenarios: Translation using translation memories (TM’s) only Translation using suggestions of TM’s and the SMT system that is enriched with data from comparable corpus

Evaluation in the Localisation Task

MT Integration into the Localization Workflow Evaluate original / assign Translator and Editor

Evaluate translation quality/ Edit

MT translate new sentences

Analyse against TMs

Translate using translation suggestions for TMs and MT

Fix errors

Ready translation

Evaluation in the Localisation Task

Integration of SMT Systems in SDL Trados

Evaluation in the Localisation Task

Integration Into SDL Trados Translations by SMT systems are provided for those translation segments that do not have exact match or close match in the translation memory Suggestions coming from the MT are clearly marked Localization specialists can post-edit them for a professional result

Evaluation in the Localisation Task

Evaluation Setup 30 documents were split into 2 equally sized parts to perform the two translation scenarios The length of each part of a document was 250 to 260 adjusted words on average, resulting in 2 packages of documents with about 7,700 words in each Three evaluators translated 10 documents without SMT support and 10 documents with SMT support

Evaluation in the Localisation Task

Evaluation of Productivity Both - experienced and novice translators were involved

Translators were well trained to use SDL Trados Studio 2009 in their translation work Translators performed the test without interruption and switching to other translation tasks on their 8 hour working day The time spent for translation was reported to the nearest minute Individual productivity of each translator was measured in words per hour 𝑁 𝑡𝑒𝑥𝑡 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑤𝑜𝑟𝑑𝑠 𝑡𝑒𝑥𝑡, 𝑠𝑐𝑒𝑛𝑎𝑟𝑖𝑜 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑠𝑐𝑒𝑛𝑎𝑟𝑖𝑜 = 𝑁 𝑡𝑒𝑥𝑡 𝐴𝑐𝑡𝑢𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑒𝑥𝑡, 𝑠𝑐𝑒𝑛𝑎𝑟𝑖𝑜 Evaluation in the Localisation Task

Results of Productivity Evaluation Translator Translator 1

Translator 2 Translator 3 All

Scenario TM TM+MT TM TM+MT TM TM+MT TM TM+MT

Actual Productivity productivity increase or decrease 493.2 35.39% 667.7 380.7 13.02% 430.3 756.9 -5.89% 712.3 503.2 13.63% 571.9

Evaluation in the Localisation Task

Evaluation of Quality Performed by 2 experienced editors as part of their regular QA process Editors did not know was or was not MT applied to assist a translator

Quality was measured by filling in a Quality Assessment form in accordance with the Tilde QA methodology based on the Localization Industry Standards Association (LISA) QA model

Evaluation in the Localisation Task

Evaluation of Quality (cont.) The evaluation process involves inspection of translations and classifying errors according to the following error categories: Accuracy Spelling and grammar Style Terminology

Evaluation in the Localisation Task

QA Evaluation Form Error Category

Weight

Amount of errors

Negative points

1. Accuracy 1.1. Understanding of the source text

3

0

1.2. Understanding the functionality of the product 1.3. Comprehensibility 1.4. Omissions/Unnecessary additions 1.5. Translated/Untranslated 1.6. Left-overs

3 3 2 1 1

0 0 0 0 0 0

2 1 1

0 0 0 0

1 1 2 1

0 0 0 0 0

2 2

0 0 0 0 0 0

Total 2. Language quality 2.1. Grammar 2.2. Punctuation 2.3. Spelling Total

3. Style 3.1. Word order, word-for-word translation 3.2. Vocabulary and style choice 3.3. Style Guide adherence 3.4. Country standards Total 4. Terminology 4.1. Glossary adherence 4.2. Consistency Total Additional plus points for style (if applicable) Grand Total Negative points per 1000 words Quality:

Resulting Evaluation

Evaluation in the Localisation Task

Error Score The error score was calculated by counting errors identified by the editor and applying a weighted multiplier based on the severity of the error type. The error score is calculated per 1000 words as: 1000 𝐸𝑟𝑟𝑜𝑟𝑆𝑐𝑜𝑟𝑒 = 𝑤𝑖 𝑒𝑖 𝑛 𝑖 where n is the number of words in a translated text, ei is the number of errors of type i,

wi is the coefficient (weight) indicating the severity of type i errors. Evaluation in the Localisation Task

Quality Evaluation Results AccuTranslator Scenario racy

TM Translator 1 TM+MT TM Translator 2 TM+MT TM Translator 3 TM+MT TM Average TM+MT

6.8 9.9 8.2 3.8 4.6 3.0 6.5 5.4

Lang. Style quality

8.0 6.8 14.4 7.8 10.1 11.7 11.7 7.6 9.5 7.3 8.3 6.0 9.3 8.6 11.4 7.1

Evaluation in the Localisation Task

Terminology

1.6 4.1 0.0 1.5 0.0 0.8 0.5 2.1

Total error score

23.3 36.3 30.0 24.6 21.4 18.1 24.9 26.0

Quality Grades A quality grade was assigned to a translation depending on the error score severity Error Score 0…9 10…29 30…49 50…69 >70

Resulting Quality Evaluation Superior Good Mediocre Poor Very poor

Evaluation in the Localisation Task

Quality Evaluation Results Translator Translator 1 Translator 2 Translator 3 Average

Scenario

TM TM+MT TM TM+MT TM TM+MT TM TM+MT

Total error score

23.3 36.3 30.0 24.6 21.4 18.1 24.9 26.0

Evaluation in the Localisation Task

Quality grade

Good Mediocre Mediocre Good Good Good Good Good

Conclusion It is feasible to adapt SMT systems for highly inflected underresourced languages to a particular domain with the help of comparable data The use of the English-Latvian domain adapted SMT suggestions (trained on comparable data) in addition to the TMs increased translation performance by 13.6% while maintaining an acceptable (“Good”) quality of the translation We observed a relatively high difference in translator performance changes (from -5.89% to +35.39%); therefore, for more justified results the experiment should be carried out with more than three participants Evaluation in the Localisation Task

Thank you for your attention! T HE R ESEARCH LEADING TO T HESE RES ULTS HAS REC EIVE D F UNDING F ROM THE RESEARCH P ROJEC T “2 .6. M ULT ILING UA L M ACHINE TRANSLAT I ON” OF EU STR U CT URAL F UN DS, CONT RACT NR. L -KC- 11-0003 SIGNED BETWEEN I CT COMPETEN CE CENTRE AND I N V ESTMENT A N D DE V E LOPM ENT AG E NCY O F L AT V IA .