Lexicalist Machine Translation of Colloquial Text

5 downloads 0 Views 119KB Size Report
authors would like to thank Dan Fass and John. Grayson for their comments on earlier versions ... Bob Carpenter. The Logic of Typed. Feature Structures.
Lexicalist Machine Translation of Colloquial Text Fred Popowich Davide Turcato Paul McFetridge Natural Language Lab School of Computing Science Simon Fraser University Burnaby, BC, V5A 1S6 Canada popowich,turk,mcfet @cs.sfu.ca 

Abstract Colloquial text as found in television programs or typical conversations is different from text found in technical manuals, newspapers and books. Phrases tend to be shorter and less sophisticated. In this paper, we look at some of the theoretical and implementational issues involved in translating Colloquial English (CE). We present a fully automatic large-scale multilingual natural language processing system for translation of source CE, as found in commercially transmitted closed caption television signals, into simple target sentences. Our approach is based on Whitelock’s Shake and Bake machine translation paradigm, which relies heavily on lexical resources. The system currently translates from English to Spanish with translation modules for Brazilian Portuguese under development.

1. Introduction. Colloquial text as found in typical conversations is different from text found in weather reports, technical manuals, newspapers and books. Phrases tend to be shorter, less sophisticated, and may often contain violations of linguistic rules and conventions found in written language. These characteristics and others distinguish the translation of colloquial text from the traditional areas of focus of machine translation by the special problems that they present. We have tackled an MT task involving the translation of Colloquial English (CE): the translation of the closed caption text transmitted with the vast majority of North American television programs and all major release videos. The translation of closed captions need not be as accurate as other text to be highly understandable if the translation is presented simultaneously with the original closed captioned television broadcast; visual and contextual information from the broadcast can be used to supply additional information, just as is done with the usual presentation of the original closed caption text. After examining issues specific to the translation of CE as found in closed captions, we shall describe a lexicalist approach to MT which

is well suited for the translation task. We will then introduce a fully automatic large-scale multilingual natural language processing system for translation of CE input text as found in many closed caption broadcasts, and examine issues relating to extension of the system for handling the colloquial text additional languages. The linguistic information in the system is explicitly represented in grammars and lexicons making up language specific modules, so that modules for additional target languages can be added in the future. Spanish target language modules have already been developed, with Brazilian Portuguese modules currently under development. A discussion of evaluation issues and some preliminary results for the current system will then be presented.

2. Translation of Closed Captions. The language of closed captions is peculiar and differs in many ways from any other linguistic domain currently addressed by machine translation systems. In this section, this peculiarity will be described, with reference to the characteristics of the language found in closed captions, and the conditions under which the translated closed captions are used. CE sentences tend to be very short. Analysis of 11 million words of English text captured

from typical North American prime-time television broadcasts has shown the average sentence length to be 5.4 words, with approximately 75% of the sentences in the corpus having a length of 7 words or less and 90% of the sentences consisting of 10 words or less. In many cases they are not actually complete sentences, but phrasal fragments of some kind. Their syntax is considerably simpler than that found in written text. Phrases tend to have a very simple structure, subordination is rarely used, and parataxis is often used instead. Elliptical expressions are frequently present. CE contains a good deal of idiomatic expressions and slang, and is frequently ungrammatical. Moreover, although closed captions are written text, they are meant to render spoken language and therefore they contain those phenomena which characterize spoken language: hesitations, interruptions, repetitions, non linguistic utterances, etc. A taste of the kind of text available in closed captions is provided by the fragment in Table 1, taken from the script of the television program Baywatch. As in almost all applications of Natural Language Processing, translation of closed captions must address ambiguity, although with a slightly different emphasis. On the one hand, the shortness of phrases leaves little room for syntactic ambiguity. On the other hand, semantic ambiguity is often a problem. Whereas in the more traditional areas of MT, well-written text contains enough contextual information to avoid possible misreadings, dialogue relies heavily on contextual information from other sources such as sound effects and the visual scene, sources that are not available to the translation system. Therefore, the meaning of a sentence is sometimes difficult to recover. Reading the script of a previously unseen movie is an instructive (and frustrating) experience since it can be hard to understand what’s happening and what the characters are talking about. In contrast, professional translators of movie scripts are provided with detailed instructions about the intention of the author to help them understand the script. Unlike translating sublanguages like technical manuals or weather reports, translating TV programs means dealing with an unrestricted semantic domain. Since we cannot currently provide the system with any of the current techniques in

NLP for dealing with a semantic domain (world knowledge databases, example based training, etc.), it is apparent that the system cannot currently guarantee a truly semantically accurate translation. Therefore, in the traditional MT dilemma between robustness and accuracy, we definitely favour the former over the latter. However, the disadvantage of dealing with an input stripped away from its visual context is countered by the symmetrical advantage that the user need not rely only on the translated text in order to understand its meaning: users can employ the visual context and story continuity to make sense of the translations. Therefore, translation acceptability to the end user can be improved by the additional content and context provided by the images and soundtrack. In light of the remarks above, the traditional requirement of meaning equivalence between source and target sentence can be weakened in the present context to a requirement of meaning subsumption. As we have shown, the goal to achieve fully meaning equivalent translations (i.e. target sentences that, ideally, can be retranslated to their source sentences) is hardly achievable in the present context, but fortunately is also less necessary than in other MT domains. More modestly, our goal is to obtain an output consistent with the input, in the sense that its content, although less specific than the intended input meaning, must be coherent with it and hence with its context. If such a goal is achieved, the user is likely to recover from the context the lesser information not found in the translation. Given this context, the traditional measure of accuracy of translation can be replaced by a measure of acceptability of translation. Sentences and phrases will be acceptable not only in the case where they are accurate, but also in the case where they are understandable by the viewer in the context of a simultaneous television broadcast. Acceptability is thus defined in terms of the end user [CH93]. Indeed, translations that contain one or two minor errors (involving say lexical choice or modifier placement) will often be acceptable to the end user in the final viewing context. Evaluation issues related to timeconstrained MT are discussed in [TTP 98]. 

3. Use of Lexicalist MT. After a short introduction to lexicalist MT,

COME ON, CYNTHIA! CYNTHIA, PUSH! COME ON, CYNTHIA! GOOD GIRL. COME ON, CYNTHIA. GOOD GIRL. HE’S GOT THE HEAD. IT’S A BOY. OH! OH! OH, YOU’VE GOT A BOY! HERE, HONEY. HERE YOU GO. OKAY. OKAY. YEAH. SHOULD NOTIFY THE COUNTY, HUH? THANKS. ALL DONE. THANKS FOR WAITING. WHAT AN INCREDIBLE DAY. WE STARTED OUT WITH A FABULOUS RIDE ON THE BEACH AND A LITTLE PICNIC AND ALL OF A SUDDEN THERE’S A CAR IN THE WATER YOU’RE RESCUING THIS WOMAN AND WE DELIVER HER BABY TOGETHER.

Table 1: Script fragment we will consider the decisions which have been taken in designing and implementing our system. This will be followed by a detailed discussion of the specific system components, namely the lexicons and grammars.

3.1. Overview of Lexicalist MT. Contemporary MT systems are usually classified as either interlingua based or transfer based [HS92]. The lexicalist approach, or Shake and Bake (S&B) approach [Whi94, Bea92], is similar to the transfer-based approach in having distinct analysis, transfer, and generation stages. It differs though in the type of structures that these three different stages use. Many applications in MT, and in natural language processing in general, make use of feature structures (also known as attribute value matrices) for representing linguistic information (in our case morphological, syntactic and semantic information) about different words and phrases [Car92]. A key aspect of feature structures is that information can be shared between features, and information can be initially unspecified (to be specified or instantiated dynamically through unification).

The goal of the analysis phase in the S&B approach is to produce a set of enriched lexical entries associated with the different words in the source language sentence. They are enriched in the sense that they contain additional syntactic and semantic information above what was present in the original lexical entries retrieved from the lexicon. During the analysis phase, lexical entries are combined according to the grammar rules and through unification values that were originally unspecified in the feature structures from the lexicon become instantiated in the feature structures in the parse tree. Among these values are the indices of the other lexical entries with which each entry has combined. After analysis, these enriched feature structures for the lexical entries are used as input to the transfer module which together with lexical transfer rules, contained in a bilingual (or multilingual) lexicon, result in the creation of a corresponding set of feature structures for the target language. The generator then uses this set of feature structures - in particular, the indexical information about relationships among the words - as input, together with the target language lexicon and grammar to produce an output sentence. Typically, a transfer based system requires

structural transfer rules as well as lexical transfer rules. With the lexicalist approach, only lexical transfer rules are needed. There is sometimes redundancy between lexical transfer rules and structural transfer rules, so by placing all the information in the lexical transfer rules, this redundancy can be eliminated. Although we have no structural transfer rules, we still consider our approach to be transfer based since the lexical transfer rules are specific to a given source and target language. The lexicalist approach is also attractive for translating idiomatic expressions, where the translation of complex phrasal structures can be simply specified in the transfer lexicon. All the information relevant for translation will be contained in the source language lexicon and grammar, the transfer lexicon, and the target language lexicon and grammar.

3.2. Why Lexicalist MT? A number of reasons suggested the use of a S&B lexicalist approach with respect to the specific domain addressed. As already pointed out, colloquial text contains phrases of different sorts, unlike most written texts (books, newspapers, etc.), which are usually made up of complete sentences. In the short sample provided in Table 1, noun phrases, adjectival phrases, prepositional phrases, etc. can be found along with complete sentences as units to be translated. One of the key features of the S&B approach (and other unification-based approaches to MT) is that grammars can be developed in a declarative fashion, regardless of the specific procedure for which they are going to be used. The purpose of a grammar of this kind is to specify all the well-formed phrases of a language by means of a set of constraints. A grammar of this kind works equally well for full sentences as for any other kind of phrase. We have also pointed out that colloquial expressions tend to be poorly structured. Unlike written text, the nature of colloquial communication is such that a speaker often starts an utterance before having in mind a complete, structured sentence. As a consequence, colloquial expressions often take the form of sequences of unstructured, short phrases, rather than long sentences with complex subordination. An MT system relying on structural transfer (thus mapping ordered sequences onto ordered sequences)

would have to enforce a structure even on such loosely structured source expressions, in order for structural transfer to apply to them properly. This would result in unnatural, artificially complex transfer mappings. In the S&B approach, although a structure is assigned in parsing, structural information is stripped away from source expressions. Transfer only maps lexical items. The only kind of ‘structural’ information transferred is a system of semantic dependencies expressed by means of indices. However, such dependencies can be specified in a partial way. They only need to be sufficiently specified to avoid an incorrect swapping of elements of the same category in generation. There is no need to assign and transfer dependencies when they are irrelevant to translation. This feature makes the translation process much easier when the input contains elements whose position is largely accidental. For instance, in the case of a parenthetical expression, which can appear in virtually any position in the input, the transfer procedure need not take into account all the possible positions where the expression can appear. The expression is simply assigned no dependency, leaving it free to appear in any position in the target sentence, provided that the target grammar constraints are satisfied. Further support for a lexicalist approach comes from the observation that Colloquial English (and, more generally, colloquial text in any language) is richer in idiomatic expressions than written English. Idiomatic expressions resist any compositional analysis and can only be lexically translated as compound expressions. A lexicalist approach, which is explicitly centred around a bilingual lexicon as a means to perform lexical transfer, is therefore particularly suitable for the expression of this sort of bilingual information.

3.3. English Lexicon. Our first English lexicon was partially derived from the 70,000 entries of the machine readable Oxford Advanced Learners Dictionary (OALD) which is rich in morphological and syntactic information. Since English words do not exhibit a great degree of inflectional variation, there is no need for distinct morphological rules to be stored in the system. Instead, the inflected form of the words can be explicitly stored in the lexicon, together with its associated morphological infor-

mation. For example, our English lexicon contains distinct entries for ‘dog’ and ‘dogs’, one entry having information that it is a singular noun, and the other that it is a plural noun. At the cost of the related increase in storage requirements, we are able to avoid any English morphological analysis and thus decrease our analysis time. Our English lexicon also contains verb pattern information. These patterns give detailed information about the subcategorization requirements of different verbs (descriptions of the kinds of phrases and constituents that verbs can combine with). Rich subcategorization information is precisely what is required by a lexical grammar formalism. The relationship between the codes used in the OALD and the feature structures used in our grammar is provided by a separate module. Thus, if one wanted to simplify the system to ignore information supplied in the OALD, or use some other machine readable dictionary other than the OALD, only this mapping module would need to be modified.

3.4. English Grammar. The initial use of the OALD has allowed us to implement an HPSG [PS94] style English grammar with syntactic information coded into lexical entries, and constituency and order information contained in a small number of grammar rules, which gives us very broad coverage. The nature of colloquial text is such that the input to the machine translation system may cover a very wide semantic domain, rather than the more restrictive domains often found in technical text. This necessitates a loosely constrained analysis of the input text, and a translation process that is predominately based on morphological and syntactic information with very little semantic information. In addition, there is the need for a Colloquial English grammar as opposed to a more formal English grammar. A primitive notion of semantic roles is introduced into the grammar and lexicon to allow semantic dependencies between constituents to be represented, as is traditionally done in the S&B approach. Each word in a phrase or sentence is assigned an index, and this index can appear as one of the semantic arguments of another word. For example, in the sentence ‘John loves Mary’, the index of ‘John’ would be the first semantic argument of ‘loves’ while the index of ‘Mary’

would be its second semantic argument. The ordering of the arguments is determined according to their obliqueness, as is done in HPSG: the subject is the least oblique argument, followed by the direct object, followed by the indirect object, etc. It is also possible for two words or phrases to share the same index. In this case, they are said to be coindexed. Part of the role of the analysis phase is to use the semantic indices to represent semantic dependencies, which play a key role in the generation and transfer phases. The English grammar is used for loosely constrained parsing. It is meant to be as unrestrictive as possible, in order to accommodate colloquial expressions and constructions which are not strictly grammatical. It is just restrictive enough though to augment the input lexical entries with sufficient information for the subsequent transfer phase. Any additional information which can be recovered in some other way during transfer is superfluous and thus avoided. For instance, the English grammar does not enforce agreement between subject and verb because the Spanish grammar still has the means (by reference to argument indices) to recognize the subject-verb relation and enforce proper agreement on the Spanish output. Although the grammar is purely declarative and thus bidirectional in principle, its unrestrictive nature makes it specifically suitable as a source language grammar.

3.5. Bilingual Lexicon. The bilingual lexicon is responsible for the transfer aspect of an S&B system. Although S&B permits bidirectional translation, we can take advantage of constraints on transfer rules by building a unidirectional dictionary and obtain better performance. Each entry in the bilingual lexicon establishes a relationship between a set of source language lexemes and a set of target language lexemes.

3.5.1. Bilingual entries. In our English-Spanish bilingual lexicon many of these entries are one-to-one: a single feature structure for an English lexeme is mapped to a single Spanish feature structure. Information from the source language feature structure may be related to information in the target language feature structure. For example, this information may include the value of agreement features such

as person or number, tense, and semantic indices. Perhaps more interesting are the bilingual entries which involve multiple lexemes on either side, e.g., ‘loud and clear’ ‘perfectamente’ (many-to-one), ‘trip’ ‘hacer tropezar’ (oneto-many), ‘all over the place’ ‘por todas partes’ (many to many). Such entries allow the mapping of English phrases to Spanish phrases in the bilingual lexicon. This is important for colloquial speech since idioms are commonly used. Idioms rarely translate compositionally, e.g., ‘kick the ‘estirar la pata’, ‘to be bending over bucket’ backwards’ ‘hacer lo imposible’. Expressing collocations in the entries can also help reduce ambiguity (especially in the absence of deep semantic analysis), e.g., the translation of ‘get’ (‘get lost’, ‘get on with’, ‘get away from’, ‘get away with’, ‘get drunk’ . . . ) or support verbs like ‘make’ (‘make a phone call’, ‘make a decision’, ‘make a move’, ‘make trouble’ . . . ). It is important to note that the English phrase contained in the bilingual entry need not be a single constituent in the English sentence, e.g ‘put up with’ ‘aguantar’ where ‘with’ is part of another constituent, a PP, the rest of which is transferred by other entries. So the presence of additional words within the phrase contained in the English sentence will not interfere with the application of the rule. Thus the same rule can be used in a wide variety of contexts. Now, at a more detailed level, a bilingual entry (transfer rule) is made up of the following elements: a key-word, a source side, a target side, associated transfer macros (t-macros). Each side (source or target) contains one or more lexical signs (or even zero for the target side). The keyword corresponds to the base form of one of the lexical signs on the source side. A transfer macro is basically an additional transfer rule called by the transfer rule to which it is associated. Instead of a key-word a transfer macro contains a description which the calling transfer rules must satisfy in order for the transfer macro to be triggered. 











3.5.2. Transfer macros. The use of transfer macros has proven to be useful in porting the system to new language pairs. The purpose of a transfer macro is to state some general relationship between structures in the source and target language. In this sense, its

purpose is similar to that of transfer rules in traditional transfer approaches, with the crucial advantage that, in the case of transfer macros, no structural mapping is performed. The ability of a transfer macro to express generalizations over pairs of languages in a compact form, and their systematic use in doing this, makes it possible to port a bilingual lexicon to a new language pair without affecting the actual lexical entries (except their target side, of course). Only transfer macros need to be modified in order to accommodate the generalizations holding for the new language pair. To give an example, a generalization for the English-Spanish pair is that a nominal indirect object (e.g. ‘I tell the man that . . . ’) is mapped to a prepositional phrase plus a redundant personal pronoun (‘Yo le digo al hombre que . . . ’). Such complex transfer is performed in the EnglishSpanish bilingual lexicon by means of a specific transfer macro associated with verbs taking a dative complement. In porting the bilingual lexicon to English-Brazilian Portuguese (‘Eu digo ao homem que . . . ’) it was sufficient to change the specific transfer macro, removing the redundant pronoun requirement, without modifying anything from the actual transfer rules apart from the right hand side lexemes.

3.6. Spanish Lexicon. The Spanish lexicon is converted from a Spanish lexicon used in the METAL MT system [BS85], containing about 25,000 entries. In addition to syntactic information, the lexicon contains semantic features for nouns. Verb entries contain detailed subcategorization information, along with semantic selectional restrictions on nominal complements.

3.7. Spanish Grammar. The Spanish grammar has been developed in the spirit of traditional Phrase Structure Grammars, since it doesn’t make use of any Subcategorization Principle or Head Feature Principle. No X-bar system is in use and the relation between lexical signs and their phrasal projections is obtained by explicit stipulation in each rule. Subcategorization is accomplished in a GPSG style [GKPS85], with as many different rules as subcategorization frames and an atomic-valued subcat feature which allows the match be-

tween specific verbs and appropriate rules.

3.7.1. Robustness vs. restrictiveness. The source and target grammars have been developed using the same declarative formalism and can be used in principle for both parsing and generation. However, although there is no commitment to a specific use from a formal point of view, the bias towards a specific use has played a role in their development, suggesting different guidelines in view of different goals. While the source grammar has been developed in view of robustness, the target grammar is meant to be highly constrained and to generate only strictly grammatical output. The generative power of the source grammar is actually a superset of strictly grammatical English, whereas the generative power of the target grammar is a subset of Spanish. A grammar for generation can be fully adequate without generating every possible sentence in the target language, provided that, for every sentence ruled out, there is at least one equivalent sentence that can be generated. Such an assumption makes things easier, for instance, when dealing with languages with a high degree of word order freedom.

3.7.2. Efficiency. The development of the target grammar has also followed the guideline to have many, very specific rules rather than a few, highly general rule schemata. Such guidelines rest upon the assumption that simpler lexical signs and highly specified rules reduce the amount of unification performed in the generation process and reduce the number of active edges in a chart. Grammar rules are very specific because signs in rules are always specified for their category. Therefore subcategorization and head feature percolation are obtained by means of explicit stipulation rather than unification between daughter signs and between head daughter and mother signs.

3.7.3. Portability to new languages. Although the grammar rules are very specific, some effort has been made to distinguish between language specific, idiosyncratic constraints and linguistic generalizations valid across different languages. The latter kind of information has been stored in the body of the rules themselves, while the former has been stored

in goals associated with the rules. A consistent use of this distinction has made easier the porting of many rules across different languages, namely Spanish and Portuguese. Many of the linguistic generalizations stated in the rules could be preserved across the two languages, while the replacement or modifications of associated goals made it possible to accommodate different language-specific constraints. Issues related to the development of bilingual lexicons are also discussed in [Tur98].

4. Performance. In this section we discuss various interrelated issues about the system performance. We first exemplify how some of the system features described above affect the system’s output. Then, we discuss some methodological issues concerning evaluation, and finally provide some results for the current system.

4.1. A sample run. Table 2 shows the system output for the short movie script fragment from Table 1 in section 2, along with evaluations (to be discussed later). The translation sample exemplifies some features of the system (along with some problems): The acceptable translation of ‘should notify the county’ shows that the system is able to associate grammatical translations to ungrammatical input (in this case the subject was omitted, as is frequent in speech). A loosely constrained parse is performed, which accepts and assigns a structure to ill-formed input. However, the information added to the source bag by the parsing step is sufficient to enable a mapping onto a correct target bag in the transfer step. 



The following translation is an example of how the visual context can contribute to increasing the acceptability of an incorrect translation: ‘Cynthia, push! come on, Cynthia!’ ‘Cynthia, impulso! vamos, Cynthia!’ 

It is difficult to make sense of the Spanish translation, if considered in isolation.

However, in the visual context of a childbirth, it might take little or no effort by a user to make sense of the wrong translation of ‘push’. 

The sample in Table 2 shows how frequent parenthetical expressions and vocative forms are in colloquial text. It is the task of the target grammar to place correctly the translation of a parenthetical expression, according to purely monolingual constraints.

4.2. Evaluation methodology. One of the main distinctions in evaluating MT systems, is between operational and declarative evaluation [ASH93]. Under different names and within different classifications, the foregoing distinction seems to be common to most evaluation taxonomies [HS92]. As [ASH93] put it (p. 4), operational evaluation focuses on the benefits “to be gained by a particular user from using a particular system in a particular situation”, whereas declarative evaluation “attempts to move away from the situation of an individual user to give an evaluation based on more general, and widely applicable criteria”. Although the distinction between the two kinds of evaluation remains valid in the specific domain of closed caption translation, there are remarkable differences in the way the operational context affects the evaluation of an MT system. The importance of operational evaluations are usually associated with the determination of the cost-effectiveness of an MT system, in addition to the assessment of its quality. Given the quality of a system’s output, independently determined by a declarative evaluation, an operational evaluation measures the amount of postediting required to obtain an adequate translation (in terms of the user’s requirements) and compares the costs with those of a human translation. In our domain, there is a more radical sense in which an operational evaluation is important: unlike most other domains, where the translation quality can be independently assessed before taking into account the operational context, output quality in the domain of closed captions can only be fully assessed in an operational context. As previously pointed out, the operational context provides the user with additional sources of in-

formation: images and sound effects. Text, images, and sound effects combine to convey information to the user, who perceives them as a whole. This scenario makes the evaluation task more problematic than in other domains. On one side, the results coming from declarative evaluations are objective and can be analyzed in detail, but provide only an indirect assessment of the system output quality. On the other side, operational evaluations can provide a direct assessment of quality in terms of user satisfaction, but cannot provide analytical feedback for improving the system. Declarative evaluations are performed on a regular basis on our system. Despite the limitations pointed out above, adopting a consistent evaluation method across different evaluations allows us to measure the progress of system development in a significant number of ways. Tests are conducted by translating sections of scripts from randomly selected closed captioned TV programs. They are evaluated by a native Spanish speaker. Sentences are evaluated in terms of grammaticality and accuracy, as is customary in many of the evaluation methodologies proposed in the literature [Nag89], [ABH 94]. In consideration of our specific domain, in which the end users are exposed to translated closed captions for a very short time, we decided to adopt a very simple metric in evaluating translations. We refrained from adopting multi-point scales, which imply a detailed analysis in order to evaluate a sentence and are more suitable for written text translation evaluation. Instead, we adopted a three-point scale: the translations are evaluated as either correct (yes), acceptable though not ideal (ok), or unacceptable (no). We are also planning to conduct an operational evaluation of the system. The goal of this test is to evaluate the system output in its proper visual context and to compare the results with parallel results for human translated closed captions. A further goal is to compare the results of the operational test with results from our declarative evaluations. 

4.3. Results. Although the system is still under development, we have already obtained some interesting results concerning its performance. Table 3

English

Spanish

COME ON, CYNTHIA! CYNTHIA, PUSH! COME ON, CYNTHIA! GOOD GIRL. COME ON, CYNTHIA. GOOD GIRL. HE’S GOT THE HEAD. IT’S A BOY. OH! OH! OH, YOU’VE GOT A BOY! HERE, HONEY. HERE YOU GO. OKAY. OKAY. YEAH. SHOULD NOTIFY THE COUNTY, HUH? THANKS. ALL DONE. THANKS FOR WAITING. WHAT AN INCREDIBLE DAY. WE STARTED OUT WITH A FABULOUS RIDE ON THE BEACH AND A LITTLE PICNIC AND ALL OF A SUDDEN THERE’S A CAR IN THE WATER YOU’RE RESCUING THIS WOMAN AND WE DELIVER HER BABY TOGETHER.

¡Vamos , Cynthia! ¡Cynthia , impulso! ¡vamos , Cynthia! Muchacha buena. Vamos , Cynthia. Muchacha buena. e´ l recibi´o el cabeza. Es un chico. ¡Ay! ¡ay! ¡Ay , t´u tienes un chico! Aqu´ı , miel. Aqu´ı tienes. Okey. okey. S´ı. Debe notificar el condado , eh? gracias. Todo. gracias por esperar. Qu´e un d´ıa incre´ıble. Comenzamos con un paseo fabuloso En la playa y un poco picnic Y todo de un repentino hay un carro en el agua Rescatas a esta mujer Y entregamos a su beb´e junto.

Eval. yes no ok yes no yes yes yes no yes yes yes yes ok ok yes ok ok yes no

Table 2: English-Spanish translation sample shows the cumulative results of seven evaluation tests conducted in the first three months of 1998. Each test uses a randomly selected group of 100 sentences. The results are grouped in rows according to the three-point scale used in our evaluations, with a fourth row aggregating all acceptable sentences; i.e. those evaluated as either yes or no. The fifth row shows the number of sentences that failed to provide any output. The table columns present a distinction based on the way the system produces its output. In addition to a regular translation flow, the system is equipped with a simple fall-back translation procedure which is activated when the regular flow fails or exceeds time thresholds. Results concerning the regular translation flow and the fall-back procedure are contained, respectively, in the second and third column of results, while the first column contains the sum of the other two columns. The percentages in parentheses are relative to the total number of lines (700). Note that if we were to remove the fall-back procedure and only provide translations for sentences going through the regular flow, then 70%

of the translations would be ranked as correct or acceptable, with 41% being correct. These figures are obtained by dividing, respectively, the number of ‘yes/ok’ and ‘yes’ lines (421 and 250, respectively) by the total number of lines (604) in the ‘Regular flow’ column. By the time bilingual lexical development alone is close to completion, in terms of coverage, we have experimentally calculated that correct/acceptable rankings in the area of 80% can be expected.

5. Extension to Brazilian Portuguese. An English to Brazilian Portuguese prototype has also been developed, based on the system described above. In this section we briefly discuss the migration procedure, the main features of the prototype, and its performance. Our goal was to test the feasibility of a migration between different language pairs. To this effect, we aimed at developing an English-Portuguese prototype with all the required components in place and capable of providing good quality output on a restricted lexical domain. Thus, the prototypical aspect mainly resides in the limited development

Eval. yes ok no yes/ok failed total

Total 253 (36%) 214 (31%) 230 (33%) 467 (67%) 3 (0%) 700 (100%)

Regular flow 250 (36%) 171 (24%) 183 (26%) 421 (60%)

Fall-back 3 (0%) 43 (7%) 47 (7%) 46 (7%)

604

93

(86%)

(14%)

Table 3: Accuracy results of the lexical resources. The aim of lexical development in this phase was to provide sufficient coverage for a sample text used for testing. In our lexicalist approach, porting a system to a new language pair (involving the same source language but a different target language) mainly involves the modification of the bilingual dictionary and the re-implementation of the target linguistic resources (grammar, lexicon, and morphological generator). The source side of the system (in this case, the English module) can be reused without any changes. The following linguistic components were developed for the English-Portuguese MT system: Portuguese grammar. It is a full fledged grammar, with an underlying theoretical approach similar to that of the Spanish grammar, described above. As a matter of fact, the Spanish grammar could be reused for any syntactic structure in which the two languages overlap. The coverage is also comparable to that of the Spanish grammar, the main difference being that the implementation of rules relying on specific, idiosyncratic features was deferred until a large scale lexicon becomes available. 

Portuguese morphological generator. A pre-existing morphological generator for verbs has been incorporated. It inflects all regular verbs and a good number of irregular verbs. A morphological generator for nouns and adjectives has been implemented. It doesn’t contain any morphological lexicon, therefore it can only deal with regular inflection. A default inflection is provided for any irregular adjective or noun. 



Portuguese monolingual lexicon. It pro-

vides lexical entries for the words necessary to translate the testing sample. It contains about 500 entries. 

English-Portuguese bilingual lexicon. It provides bilingual lexical entries for the words necessary to translate the testing sample. It contains about 600 entries. The semi-automatic porting of a bilingual lexicon to a different language pair is discussed in some detail in [TPML98].

The system was tested on a sample of 173 lines randomly selected from a closed caption corpus. The input lines were manually segmented before being translated (segmentation is currently done automatically). According to the evaluation metric described above, 64% translations were acceptable (yes/ok). It should be noted that the prototype development and the testing were done at an early stage of the whole MT project. Many improvements have since been made on the English-Spanish system. All such improvements concerning the overall architecture and the source module would automatically reflect on the performance of the EnglishPortuguese system, as well. By means of example, Table 4 shows the translation of the first 20 lines from the testing sample. Each single line was randomly selected from the corpus, therefore the lines are completely unrelated and do not make any sense if read in sequence. The lines with no translation were left untranslated by the system.

6. Conclusion. The S&B approach is very suitable for our specific domain. It allows a simple treatment of idioms and other multi-word constructions, it does not require a deep semantic analysis of expressions, it requires only lexical transfer rules,

English

Portuguese

THE GIRL AT THE HOTEL. YOU SEE, WE ARE LIKE BLOCKS OF STONE AAH! MY, YES. OK IT’S CHU’S FIRST YEAR. FOR YOUR INFORMATION, HAIRDO MAKING A RESERVATION? THAT OUR FAITH HAS TO OFFER TO YOUR PEOPLE. TOM KIMBALL GRADUATES TODAY NEVER ! TRYIN’ TO KEEP MY SHIT QUICK. ABOUT ONCE A MONTH, WARDROBE, AND STYLE? I WANT IT BACK NOW. WILL YOU DO IT? OH. THOUGH GET BEATEN OR GORED YOU MIGHT MORGAN: YOU ARE IN TROUBLE. OKAY, FINE,

a garota no hotel. vocˆe vˆe, n´os somos como blocos de pedra ah! meu, sim. por sua informac¸a˜ o, penteado fazendo uma reserva? hoje [Tom] [kimball] forma-se nunca! tentando manter r´apida minha merda. cˆerca de uma vez por mˆes, guarda-roupa e estilo? agora eu o quero de volta. vocˆe o far´a? [oh-ho]. [morgan]: vocˆe est´a em apuros. certo, o´ timo,

Table 4: English-Portuguese translation sample and it is highly modular, allowing rapid extension to new target languages. In addition to these advantages, the peculiarities of CE allow us to avoid the computational complexity problems associated with the S&B approach [Bre92]; a high percentage of our input is made of very short sentences, so the computation associated with generation is kept within acceptable limits [Pop96]. Church and Hovy have emphasized the importance of selecting an appropriate application for MT, and their case could be made for NLP in general [CH93]. Inflated expectations for MT can only result in disillusionment. Automatic translation of colloquial text seems appropriate for the current state of the technology in appropriate environments. These environments either augment the information source with other input signals (as in the translation of closed captions) or provide the user with time to acclimatize to the system and to decipher inappropriate translations (as in the translation of typed conversation over the Internet). The constraints on these applications – the robustness requirements and loosely constrained analysis – as well as the practical need for rapid development of new language modules are ade-

quately served by the lexicalist approach. Plans are in place for the rapid development of new language modules for Italian, French, and German.

Acknowledgements. The machine translation system described in this paper was developed jointly by the authors and Olivier Laurens, Scott MacDonald, Patrick McGivern, Devlan Nicholson, Gordon Tisher, along with lexicographers Maricela Corzo-Pena, Lisa Pidruchney, and Ron Schwarz. This paper is a modified and expanded version of F. Popowich et al. 1997, A Lexicalist Approach to the translation of Colloquial Text. This research was supported by a Collaborative Research and Development Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC), and by the Institute for Robotics and Intelligent Systems. The authors would like to thank Dan Fass and John Grayson for their comments on earlier versions of this paper.

References [ABH 94] D. 

Arnold,

L.

Balkan,

Translation. In G. McCalla, editor, Advances in Artificial Intelligence — 11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 97–108. Springer, Berlin, 1996.

R. Lee Humphreys, S. Meijer, and L. Sadler. Machine Translation. An Introductory Guide. NCC Blackwell, Manchester-Oxford, 1994. [ASH93]

[Bea92]

[Bre92]

D. Arnold, L. Sadler, and R. Lee Humphreys. Evaluation: An assessment. Machine Translation, 8:1–24, 1993. John L. Beaven. Shake and Bake Machine Translation. In Proceedings of the Fifteenth [sic] International Conference on Computational Linguistics, pages 603–609, Nantes, France, 1992. Chris Brew. Letting the cat out of the bag: generation for Shake-andBake MT. In Proceedings of the Fifteenth [sic] International Conference on Computational Linguistics, pages 603–609, Nantes, France, 1992.

[BS85]

Winfield S. Bennett and Jonathan Slocum. The LRC Machine Translation system. Computational Linguistics, 11:111–121, 1985.

[Car92]

Bob Carpenter. The Logic of Typed Feature Structures. Cambridge University Press, Cambridge, 1992.

[CH93]

[PS94]

[TPML98] Davide Turcato, Fred Popowich, Paul McFetridge, and Olivier Laurens. Reuse of linguistic resources in MT. In Proceedings of the First International Conference on Language Resources and Evaluation, pages 1227–1233, Granada, Spain, 1998. [TTP 98] Janine Toole, Davide Turcato, Fred Popowich, Dan Fass, and Paul McFetridge. Time-constrained Machine Translation. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas, Langhorne, Pennsylvania, USA, 1998. 

[Tur98]

Davide Turcato. Automatically creating bilingual lexicons for machine translation from bilingual text. In Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, 1998.

[Whi94]

Pete Whitelock. Shake and bake translation. In C.J. Rupp, M.A. Rosner, and R.L. Johnson, editors, Constraints, Language and Computation, pages 339–359. Academic Press, London, 1994.

Kenneth W. Church and Eduard H. Hovy. Good applications for crummy machine translation. Machine Translation, 8:239–258, 1993.

[GKPS85] Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan Sag. Generalized Phrase Structure Grammar. Blackwell, Oxford, 1985. [HS92]

W.J. Hutchins and Harold L. Somers. An Introduction to Machine Translation. Academic Press, London, 1992.

[Nag89]

Makoto Nagao. Machine Translation. How Far Can It Go? Oxford University Press, Oxford, 1989.

[Pop96]

Fred Popowich. A chart generator for Shake and Bake Machine

Carl Pollard and Ivan Sag. Headdriven Phrase Structure Grammar. Centre for the Study of Language and Information, Stanford University, CA, 1994.