SWIMMING IN WORDS: CORPORA, TRANSLATION ... - Google Sites

126 downloads 0 Views 79KB Size Report
freestyle. Kieren Perkins of Australia, the world record holder for the 400m, 800m, .... 0m and 400m medley while Hong won gold in the 100m butterfly. ..... 24 sprint freestyle relay to make Matt Biondi the first male to win seven swimming.
SWIMMING IN WORDS: CORPORA, TRANSLATION, AND LANGUAGE LEARNING Federico Zanettin

0.

Introduction

Translation can be a means to help learners develop reading and writing skills, as well as increasing their cross-cultural and cross-linguistic awareness. Translating consists of interpreting a discourse in the language of the source text and re-interpreting it by creating another discourse in the language of the target text. By recasting discourse A into discourse B, learners manipulate language to a meaningful end, transforming a text originally created to fulfil a communicative function in the language of the source text into another which may have varying degrees of similarity to it according to the function of the target text, going from word-to-word transliteration to restatement. Seen from this perspective, translating between languages is in principle no different from translating from one language variety to another, or from one register to another: all involve a shift in perspective and in recipient design (see Newmark 1988, 1991; Snell-Hornby 1988; Hatim and Mason 1990; Bassnett and Lefevere 1990; Gentzler 1993). Corpora consisting of texts in two languages which are similar in subject and purpose allow not only for contrastive analysis of individual expressions but also provide learners with a mapping of the structures and strategies employed by the two language communities for "building discourse in different linguistic and socio-cultural settings" (Marmadou 1990: 564). In reading a text in the L1 and trying to formulate a suitable "equivalent" in the L2, or viceversa, learners have to strive to find the most appropriate words for the new audience. This is not simply a matter of terminological accuracy, but involves comparing higher-level cultural codes concerning conceptual and rhetorical structures. This paper presents a specific example of the use in a translation task of such "comparable corpora" - two collections of texts, one in L1 another in L2, selected on the basis of a criterion of equivalence and stored on computer. I am distinguishing here between "comparable" and "parallel" corpora, two terms which overlap in much of the literature. The term “parallel corpus” is generally used to designate a collection of texts in language A and of their translations into language B: see for example Leech and Fligelstone (1992), Baker (1995), Marinai et al. (1991). The best known such collection is probably the proceedings of the Canadian Parliament (HANSARD), which are published in both French and English (the original text may be in either language).1 Corpora of this kind are generally aligned on a sentence-by-sentence or phrase-by-phrase basis, either through reference to a bilingual dictionary (Picchi 1991), through statistical elaboration (Langé and Bonnet 1994), or a combination of the two (Johansson et al. 1996), so that instances of any textual string can be retrieved along with its equivalents in the parallel text: such corpora have been extensively used as a basis for the creation of bi- or multilingual terminology databases and thesauri, and for developing machine translation software.2 The term "parallel corpus" has also been used, however, to refer to collections of texts which are not translations of each other, but are selected on the

basis of analogous criteria. These may either be taken from different varieties of the same language (e.g. the various components of the ICE corpus, which are taken from different geographical varieties of English: Greenbaum 1992), or from different languages, for instance collections of laws in French and Danish (Dryber and Tournay 1990), collections of service encounters in British and Italian (Gavioli and Mansfield 1990), or collections of public signs from various English- and German-speaking countries (Snell-Hornby 1984). It is this latter type that I refer to as "comparable" corpora.3 In this paper I discuss the basic operations necessary to create and use small comparable corpora, outlining an experiment conducted with undergraduates to produce an English translation of an Italian newspaper article, and suggest ways in which the procedures involved may contribute to language learning. While in this case the translation was from the learners' native language (Italian) into English, the methodology would also seem appropriate to translation into the mother tongue. The objective was to write a text which would sound as if it had been taken from a British newspaper, with the aid of a corpus of comparable English and Italian newspaper texts and concordancing software. Example 1 shows the original Italian text and the translation of it into English made by one student: while the final product was individually written, much of the research using the corpus involved interaction with other learners. Example 1 In vasca. Sorvegliato speciale e' Matt Biondi, che cerca di vincere l'oro per la terza volta consecutiva ai Giochi, sul gradino piu' alto del podio ben cinque volte nell'edizione '88. Si esibisce nei 50 e 100 stile libero, oltre che nella 4x100 stile libero. Re del mezzofondo e' l'australiano Kieren Perkins, primatista mondiale dei 400, 800 e 1.500 stile libero.

Swimming. Matt Biondi, the defending champion, will be trying to win gold in his third successive Olympic Games. After gaining no less than five gold medals in 1988, this time he is back to contest the 50 and 100m freestyle, and 4x100m freestyle. Kieren Perkins of Australia, the world record holder for the 400m, 800m, and 1,500m freestyle, is top performer over the longer distances.

I will go through the steps followed in making this translation, showing how by contrasting similar formal features in the two corpora which however may differ functionally (false friends, loan words, near synonyms, metaphorical expressions, etc.), and by comparing functionally similar segments of text which may however differ in their formal realisations (rhetorical structures, contextualising information, logical connectors, terminology, etc.), learners can use relatively small comparable corpora for a variety of activities which can not only enhance the specific translation but also allow a wide range of learning to take place. 1.

Making comparable corpora

Some of the most readily available sources of computerised text are newspapers, many of which are now available on the Internet, or commercialised on CD-ROM at an affordable price. A CD-ROM usually contains up to a year of issues (8 to 10 million words of text) from

which selections can be downloaded to the user's hard disk. While not all CD-ROM and online newspaper services use the same search and retrieval software, there is a tendency to standardisation and some basic operations are common to most of them. Any user (teacher or student) who is computer/network literate should be capable of creating collections of text from these sources. The corpora used for the translation activity in question were derived from CDROMs of The Daily Telegraph, The Independent and Il Sole-24 Ore for 1992. The purpose was to create a corpus regarding one event (the '92 Olympics), from one domain (the sports section of these newspapers). The criteria for selecting the texts to be included in a comparable corpus depend on the purpose to which the corpora are to be put. If, for example, the user wants to investigate the use of a high-frequency word which supposedly serves the same function in the two languages, then probably any corpora of roughly similar materials would do. If, on the other hand, the purpose is to investigate how two different cultures treat similar topics within the same domain or genre, then the selection must accordingly target for the same topic, domain or genre. To retrieve articles from a newspaper CD-ROM, it is generally enough to specify keywords (a keyword being a string of characters, which may include wildcard characters such as "*" or "?"). In this case a first search was run using the keywords olympic* and olimp* in the English and Italian data respectively. This however found a number of articles which had little to do with the event itself: the words olympic and olympics also appeared in book reviews, and the adjective olimpico (which also means "calm") in a wide variety of other contexts. Since most search systems of newspaper CD-ROMs allow for queries to be restricted to particular parts of articles (headlines or body), as well as by author, date, and section of the newspaper, the search was rerun specifying a date span (June 1st to September 1st, the period in which the Olympic Games took place), and the sports section of the paper. This yielded 150 articles from The Independent (about 95,000 words), 307 from The Telegraph (160,000 words), and 77 from Il Sole-24 Ore (65,000 words), which were saved as ASCII files. Overall, the English corpus thus consisted of about 250,000 words, while the Italian was roughly a quarter of this size. These figures were sufficiently small to allow learners to become familiar with the texts (100,000 words are the equivalent of 40 pages or so). In teaching applications too much data can confuse the learner, reducing understanding of the relationship between citations and their contexts and at the same time increasing the number of citations to be dealt with (Gavioli, this volume). 2.

Swimming and navigating

The process of querying a corpus through a concordancer may be described as "navigation" - using a metaphor employed by Internet "surfers" - since each citation displayed in a concordance can prompt further searches and lead to the discovery of unexpected features. As it depends on the intuition of the user where to go as a next step (i.e. what to look up next in the corpus), learners have to develop strategies for navigating through the data they are dealing with. Rather than trying to report on particular "routes" which were followed, I shall focus on the kinds of navigational strategies which learners adopted with respect to the translation activity, which was based on one of the articles in the Italian corpus.4 Given that a concordance is a set of different contexts for the same word, an

obvious starting point is to examine the contexts of words in the source text which are likely to be present in both corpora, such as proper names. A concordance of Biondi in the English corpus produced 35 occurrences (see appendix A), many of which were followed by a phrase giving information about the Olympic champion. Learners looked at these citations for possible translations of the first sentence of the source text, which states who Matt Biondi is, and what he is trying to do. These included: Example 2 Biondi expects to be back to his best Biondi is the big man of swimming, Matt Biondi: Swimmer. Won five golds Matt Biondi, the defending Olympic champion, Matt Biondi, the defending champion Matt Biondi, the first man to win seven swimming Biondi, who gained five golds in 1988, Matt Biondi, winner of five gold medals Biondi, with five golds last time MATT BIONDI will try to slip into his `Superman' guise

In the Italian article Matt Biondi is introduced as sorvegliato speciale. This is a phrase that belongs to the language of law, used to refer to a person under police surveillance. Here it is used metaphorically to convey the idea of Biondi, champion of the '88 Olympics, being under attack and defending his supremacy. Thus among the descriptions in the English corpus, "defending champion" seemed a feasible way of translating sorvegliato speciale. The other proper name in the Italian text is that of Kieren Perkins, often referred to as "Australia's Kieren Perkins" or "Kieren Perkins of Australia". This surprised one learner, who had hypothesised using "the Australian Kieren Perkins" in his translation. By generating sample concordances to compare the use of adjectives of nationality, country names as possessives, and of followed by the country name, it was found that the third of these forms was quite the most frequent when referring to contestants in the English corpus. This form was therefore selected as a translation. After proper names, a second strategy was to look for similar expressions and/or classes of expressions in the two corpora. Work with concordancing software favours an approach which starts from a relatively low level of text constituency - the behaviour of words (Brodine, this volume; Baker 1992). What the learner is typically looking for is something of the kind "how do you say this in English?" - the equivalent of a key word. For instance, in the first sentence of the Italian text, two more things are said about Biondi: che cerca di vincere l'oro per la terza volta consecutiva ai Giochi (lit. "who is trying to win the gold for the third consecutive time at the Games"), and sul gradino più alto del podio ben cinque volte nell'edizione '88 (lit. "on the highest step of the podium no less than five times in the '88 edition"). Concordances were therefore generated for the presumed English equivalents of the key words oro (gold), podio (podium), and consecutiva (consecutive). A concordance of gold* produced nearly 850 lines. Sorting these by the words to the left/right and skimming through them, a number of patterns were noticed, for instance that one can "win/gain/earn/get the/a gold (medal)", or "win golds". By also generating a concordance of or* in the Italian corpus (109 lines) these expressions could be analysed

contrastively. In Italian you can say "vincere/conquistare/prendere la/una medaglia d'oro" (lit. win/conquer/take the/a medal of gold), "vincere/conquistare/prendere un oro" (lit. win/conquer/take a gold), or "vincere/conquistare/prendere ori/medaglie d'oro" (lit. win/conquer/take golds/medals of gold). It was noted, however, that while in English you can "win gold" (ex. 3), Italian requires a definite article: "vincere l'oro" (ex. 4): Example 3 (gold with win* or won immediately to the left, sorted by the word to the left of gold: every third citation) 1 2 3 4 5 6 7 8 9 10 11 12 13

way ahead. I badly wanted to win gold but I accept I probably won't now.' F de open,' he said. 'Anyone can win gold.' Robb's Liverpool Harriers team-mate run she destroyed the field to win gold with one of the greatest track perform y, they sailed brilliantly to win gold. Windsurfers Barrie Edgington, 25, and ke status in Turkey after winning gold in the 60kg in Seoul in 1988, had bee heir tally, Romas Ubartas winning gold in the discus for Lithuania. When Atl ed strong men to tears by winning gold in Munich aged 33. Brasher (3,000m ste onze, and Ann Brightwell, who won gold and silver. So you can imagine how I f and, for a long time, better won gold; when he was collecting bronze medals here else - East German women won gold and silver in all events at the 1986 0m and 400m medley while Hong won gold in the 100m butterfly. BRITISH SWIMM Mike McIntyre and Bryn Vaile won gold medals in South Korea four years ago, Pattison, a naval officer, won gold in the Flying Dutchman class in 1968 a

Example 4 (or* with vinc* within two words to the left, sorted by the second word to the left: all citations) 1 2 3 4 5 6

ct Velasco _ non solo non si vince l'oro, ma non si arriva alla finale>. Nella ablo Morales, un ragazzo che vince l'oro nei 100 farfalla a 28 anni (per il nu Matt Biondi, che cerca di vincere l'oro per la terza volta consecutiva ai Gio di ridicolo. Due bulgari vincitori dell'oro erano risultati positivi agli ster to, io prendo un sabbatico e vinco l'oro>. Spitz, forse il piu' grande nuotato ", Maurizio Damilano, che ha vinto l'oro nella 20 chilometri di marcia addirit

Learners rapidly discovered that the next two key words in the first sentence of the source text, podio and consecutiva, had cognates in the English corpus, podium and consecutive. The question was: if there is a cognate form in English, is it a true or a false friend (Holmes and Guerra Ramos 1993; Partington 1995)? As can be seen from the following concordances (podium* in the English texts: ex. 5; podi* in the Italian texts: ex. 6) the sense of podium does in fact correspond to that of podio in this context: Example 5 (every third citation) 1 2 3 4 5 6 7

ay Michael Carruth was standing on a podium in the boxing arena here, listening t finish anywhere better than on the podium of an Olympic Games.' Ever since his win. But I was proud to stand on the podium after a race like that. 'It's a gr me away from a definite place on the podium after crushing Australia 98-65 in t two Americans stood on the winner's podium to salute the anthem. True that duo .60m. Despite his climbing on to the podium along with Zelezny and Raty, there w as Skah stepped jauntily out to the podium matched that which accompanied his

Example 6 (every third citation) 1 2 3 4 5 6 7 8 9

e Abebe), conosceva la sua ascesa al podio (terzo posto) dopo un quarto d'ora qu spettacolare autorita' la scalata al podio piu' alto del torneo a squadre costri assullo e Bomprezzi sono lontani dal podio. Nella vela un avvio in sordina dopo avoriti per il gradino piu' alto del podio Scarpa e Josefa Idem; outsider Rossiera mista) sul gradino piu' alto del podio, la cinesina Zhang Shang. La seconda, tiste azzurre che hanno sottratto il podio piu' alto alle tedesche. Da sinistra: 6 e 10,8 milioni. Il terzo posto sul podio equivale, quindi, a una vittoria in u empio, se Michael Jordan salira' sul podio alla cerimonia di premiazione del tor i di esilio, il Sud Africa torna sul podio: i tennisti Ferreira e Norval si sono

It was noticed, though, that there was a difference in the relative frequency with which the two terms occurred. There were 27 occurrences of podio in the Italian as opposed to only 22 of podium in the English corpus, even though the latter was four times as large. Inspecting the concordances highlighted that the expression in the source text, il gradino più alto del podio ("the highest step of the podium") is repeatedly used to mean "winning the gold medal" (6 occurrences: cf lines 4-5 in ex. 6), and that this figurative use did not occur in the English corpus. Consequently, the word podium was not used to translate this expression, with learners resorting instead to the less figurative but more adequately attested gaining five gold medals (ex. 7): Example 7 (gain* with gold? within five words to the left, sorted by the word to the left of gain*: all citations) 1 2 3 4 5 6

king their first Olympic appearance, gained the women's gold medal for hockey a omplete their convincing progress by gaining gold medals, but at least they are olwill on how Spain and Germany made gains in the gold market in the hockey tou lifting three times his body weight, gaining the gold medal for Turkey and then d be in a week's time. Biondi, who gained five golds in 1988, will again be th ll wrong.' Not so Tamas Darnyi, who gained Hungary's second swimming gold medal

The other cognate noted, consecutive, also posed the question of whether it was a false friend. One learner, who suspected that "successive" might be a better translation, compared the two Italian and the two English words (consecutivo/successivo vs consecutive/successive). The concordances showed that the English words were almost always preceded by a number, and seemed to be synonyms (ex. 8), while the two Italian words differed. Consecutivo (the citation form is the masculine singular) seems more or less equivalent to the English successive/consecutive, while successivo means something like "following/next", appearing two out of four times in the phrase gli anni successivi (lit. "the following years": ex. 9). (This led this student to also investigate the behaviour of following and next.) Example 8 (every fifth citation) 1 2 3

beat him after a run of 10 consecutive defeats at the distance aroused an unmi se we never knew existed. At successive Olympics, World Championships and Common l of the same event in four successive Olympics? 13 Who came second behind Seb

4 5 6 7 8 9 10

rst time in 1986 and then in consecutive years, 1990 and 1991, but has had to w a final total for the second consecutive year. Twelve months after breaking his be if he is to win his third consecutive Olympic gold medal. He and his partner, emselves seeking their third consecutive Olympic gold in the coxed pairs today. owing Steve Redgrave's third successive rowing gold on Saturday, Johnny, 23, and xercise where she does three consecutive back-flips, in which her hands never to in 1920, has collected three successive gold medals. Redgrave bridged 72 years o

Example 9 (every third citation) 1 2 3 4 5

rl Lewis, per la terza volta consecutiva campione d'Olimpia, l'ha vinta al pri ultimo ostacolo. Per 12 gare consecutive e dieci anni, fra il 197 e il 1982, ano la prua al mito, tre ori consecutivi alle Olimpiadi insieme al timoniere D in termini di monetizzazione successiva della medaglia d'oro, sono piu' lontan difiche apportate negli anni successivi, riusci' a rendere competitivo.