Language and publication in Cardiovascular Research articles

4 downloads 33 Views 68KB Size Report
categories were divided into three principal groups: grammatical, structural ... The articles were corrected without any knowledge of the author's nationality or the ...

Cardiovascular Research 53 (2002) 279–285 / locate / cardiores


Language and publication in Cardiovascular Research articles a, a b c R. Coates *, B. Sturgeon , J. Bohannan , E. Pasini a

Centro Linguistico dell’ Universita` di Brescia, Contrada Santa Chiara, 25100 Brescia, Italy b P.G.C.E. L’ Universita` degli Studi di Bergamo, Bergamo, Italy c Fondazione ‘ Salvatore Maugeri’, IRCCS Gussago, Italy Received 15 October 2001; accepted 5 November 2001

Abstract Background: The acceptance rate of non-mother English tongue authors is generally a lot lower than for native English tongue authors. Obviously the scientific quality of an article is the principal reason for publication. However, is editorial rejection purely on scientific grounds? English mother tongue writers publish more than non mother-tongue writers—so are editors discriminating linguistically? We therefore decided to survey language errors in manuscripts submitted for publication to Cardiovascular Research (CVR). Method: We surveyed language errors in 120 medical articles which had been submitted for publication in 1999 and 2000. The language ‘error’ categories were divided into three principal groups: grammatical, structural and lexical which were then further sub-divided into key areas. The articles were corrected without any knowledge of the author’s nationality or the corrections made by other language researchers. After an initial correction, a sample of the papers were cross-checked to verify reliability. Results: The control groups of US and UK authors had an almost identical acceptance rate and overall ‘error’ rate indicating that the language categories were objective categories also for the other nationalities. Although there was not a direct relationship between the acceptance rate and the amount of language errors, there was a clear indication that badly written articles correlated with a high rejection rate. The US / UK acceptance rate of 30.4% was higher than for all the other countries. The lowest acceptance rate of 9% (Italian) also had the highest error rate. Discussion: Many factors could influence the rejection of an article. However, we found clear indications that carelessly written articles could often have either a direct or subliminal influence on whether a paper was accepted or rejected. On equal scientific merit, a badly written article will have less chance of being accepted. This is even if the editor involved in rejecting a paper does not necessarily identify language problems as a motive for rejection. A more detailed look at the types and categories of language errors is needed. Furthermore we suggest the introduction of standardised guidelines in scientific writing.  2002 Elsevier Science B.V. All rights reserved.

1. Introduction Various editors of important medical journals [1–3] have indicated the importance of well-written scientific research. Today written English research is the principal means of spreading scientific knowledge. The subject of publication and the nationality of authors has been touched on in this journal in the past [4] and physicians whose native language is not English have additional problems when presenting work for publication. Publications are available which look at the problem from a strictly medical approach to the IMRAD (Introduction, Materials, Results And Discussion) structure [5] or which give English mother tongue doctors an outline of how to write medical articles *Corresponding author. E-mail address: [email protected] (R. Coates).

[6–9]. However, an analysis of the influence of language on publication in medical research has, to our knowledge, not been made. How can we define an article which is ‘well written’? Given the large number of non-native English language physicians this question should be answered in two words: ‘simple’ and ‘clear’. Unfortunately this is exactly the opposite to how many, even native English users, write medical research. We therefore decided to analyse the language problems which could effect the clarity of medical writing. In co-operation with Cardiovascular Research, we analysed 120 IMRAD articles which had been presented for publication from eight different nationalities. Given that we were looking for problems which made an article difficult to understand, we had to consider style as well as grammatical errors.

0008-6363 / 02 / $ – see front matter  2002 Elsevier Science B.V. All rights reserved. PII: S0008-6363( 01 )00530-2 Downloaded from by guest on 12 December 2017


R. Coates et al. / Cardiovascular Research 53 (2002) 279 – 285

Some authors have indicated general language areas which could create problems for comprehension [10]. The problem with style is, however, objectivity; what is difficult to understand for one editor might be perfectly acceptable for another. It should be also noted that we give a strict definition of ‘error’ categories in the Methods section, and that this definition is limited to this research.

the Discussion section where authors switch back and forward from his / her current research to published scientific literature. However, it should be noted that this is a simplification of possible verb tense use. Indeed, the ‘IMRAD’ structure is itself a simplification of how medical work could be presented. Many currently published texts (especially UK English authors) use a considerably more complex scheme.

2. Methods

2.1.3. General grammar problems These were all grammatical errors which should have been eliminated either by mother tongue colleagues or professional translators before submission to an editor. For example these included, third person errors, plural errors, preposition errors, etc.

2.1. Grammatical errors 2.1.1. Passives Different sources [7–9] have indicated that high passive use makes a text obscure to read and difficult to understand. We agree generally with this statement although it should be noted that in a medical text which follows an IMRAD structure there are times when use of the passive is necessary. Indeed a very clear ratio of passive to active has been shown for each of the IMRAD sections [11]. Thus, ‘A separate group of animals underwent coronary artery perfusion . . . ,’ would count as one active use. ‘Coronary artery perfusion was performed on a separate group of animals . . . ’ would count as a passive use. The total number of verbs was then counted and expressed as a single figure, i.e. passive verbs divided by active verbs. Generally once a subject is introduced, it is quite normal to use the passive. For this reason the Methods section usually has a high passive to active ratio as one subject is used throughout, e.g. ‘We studied our patient group over a six-month period. The group was interviewed at the beginning of the study and informed consent was obtained according to guidelines from our ethics department. The group was sub-divided . . . ’ etc. Furthermore, we looked at the overall average active to passive ratio to simplify these ratios. This method was a little simplistic but gave a clear indication of whenever use of the passive was more than for an English language author. 2.1.2. Tense We simplified our definition of verb tense use as much as possible using certain US sources [6,7] as the basic criteria. Day [7] underlined that in a scientific work any reference to a work which had been previously published, and therefore accepted by the scientific community, should be written in the present tense. Any reference to current research work (i.e. that carried out and described by the author) should be in the simple past tense (i.e. what the authors did). With the exception of introducing new concepts (the present perfect tense) and tables (the simple present tense) and ‘reporting verbs’ e.g. said, found, discovered etc. (the simple past tense). This ‘neatly’ simplifies tense use in scientific work, and especially so in

2.2. Structural errors (syntax) We again restricted our categories to countable and objectively verifiable groups.

2.2.1. Long sentences We came across sentences which were often a paragraph long or where the original subject was lost by the time the end of the sentence was reached. For practical reasons we considered any sentence which had more than one subordinate clause (except for a clearly defined reason, e.g. a list of procedures in an experiment) in this category. 2.2.2. Word order By mixing up a simple subject1verb1compliment structure, very often a sentence became totally incomprehensible. We included in this category, split infinitives, out of place subordinate clauses, etc. It should be noted that often just one or two confused sentences could make a complete IMRAD section very difficult to understand so therefore these categories could be relatively more important than others. 2.3. Lexical errors (word choice) 2.3.1. Jargon These were words (or groups of words) which were unnecessarily obscure or complicated for no apparent reason. Thus a child would become a ‘paediatric patient’, experimental mice were ‘sacrificed’ instead of killed etc. Note that specific medical terminology was very seldom a problem for any nationality and was not included in this category. To be sure that the authors of this work considered the same words as jargon, each word was written down and agreed as such by consensus. Individual prepositions, articles etc. were considered in the grammar category. 2.3.2. Noun misuse A common specific lexical problem was the use of a

Downloaded from by guest on 12 December 2017

R. Coates et al. / Cardiovascular Research 53 (2002) 279 – 285 Table 1 Publication and average ‘error’ rates Country

% Acceptance

Overall error rate / article

United States United Kingdom France Germany Spain Japan Sweden Italy

31.8 29 26.2 23.6 19.6 16.7 11.6 9

21.9 23.1 43.1 41.1 37.9 36.9 35 48.6


lexical items which had been considered errors and give a third opinion on any of the categories. We divided the language categories first into the three principal linguistic areas, i.e. grammatical, structural and lexical. We then divided each of these groups into more specific areas following indications both of leading editors Day [7], Zeiger [6], and O’Connor [8], on clarity in medical texts and our own previous findings (Coates [12,13]). Given that nearly all articles should have already been checked for spelling errors, printing errors etc., we had to define exactly what we considered ‘errors’ to be. The number of errors per nation can be seen in Table 2. The overall error count was deemed to be reliable with a standard deviation of 64.2 errors in articles with an average of 39 errors. However, the tense category was more difficult to agree upon. As stated before, the original criteria for tense errors was a simplification to make writing easier. In written English there are numerous different ways of expressing scientific knowledge and especially discussing them. While some authors were confident with this usage, others (notably non-English speakers) were not. Therefore we were unable to find a clear definition of when tense use constituted an error or not. This was interesting in our control groups (US and UK) as the only relevant difference between the two groups was in the tense group (apart from active–passive use). This would suggest that US writers prefer the simpler definition we have already given while British writers were more willing to use more complex forms. We were unable to judge however, whether one form was clearer or ‘better’ than the other. However, we did include the data in the tables as a point of reference. The fact that both the error rate and the acceptance rate for the two control groups was very similar would suggest that they did offer a good comparison. Furthermore, although some of these studies were far from perfect, they did represent a standard. Thus the average of the two control groups (22.5 errors per article) was considered the zero point, i.e. what a normal native English speaker would produce. The average of the other nationalities’ errors (40.3) was considered the ‘upper error limit’. i.e. more

group of words with a noun when either a verb, adjective or adverb would have been clearer and simpler [12]. Thus; ‘a recovery was achieved in a quick way . . . ’ instead of ‘being a quick recovery . . . ’ ‘we studied the data in a statistical manner . . . ’ instead of ‘statistically,’ ‘we made an analysis of the data . . . ’ instead of ‘analysing the data . . . ,’ etc. Most lexical corrections were not necessarily ‘true errors’ as such. However, they were often pompous and unclear thus obscuring the scientific data presented. Many of these items have been noted by various authors on this subject [6–9]. Used sparingly, many of these words would not even be noticed, however in abundance together with other structural and grammatical errors the result could confuse the scientific message of an article. One hundred and eighteen articles were surveyed (two were discounted for procedural reasons) and the general error counts together with acceptance percentages were noted (see Table 1). Each article was read twice and the errors in each of the categories were counted. In the first reading, the passive– active rate was counted and in the second all other errors were counted. The research was ‘blind’ as we had no idea which country the articles came from as all references to country / hospital etc. as well as the title and authors had been taken out by the editor of CVR. A sample of 10% of the articles were double checked by a second researcher who had no idea of the error counts of the first. Finally a third researcher (a medical doctor) was consulted to check

Table 2 General error frequencies (average no. errors / article) Nationality

UK US Sweden Japan Germany France Spain Italy





General grammar errors

Long sentences

Word order

Noun misuse


8.3 4.4 5 3.9 7.3 7.9 7.9 7.9

2.4 2.1 8.1 6 6.7 6 8.8 13.9

1.7 2.3 1.1 2.7 5.5 5.6 2.7 2.2

2 0.6 2.7 2.6 3.8 5.4 2.6 4.3

4 6.4 9.7 11.7 10.7 14.6 10.1 12.8

4.6 6.1 8.4 7.9 7.1 9.4 6.4 7.5

Downloaded from by guest on 12 December 2017


R. Coates et al. / Cardiovascular Research 53 (2002) 279 – 285

Table 3 Percentage of articles with low and high ‘error’ rates per nationality Countries

% of articles with fewer than control average errors (22.5)

% of articles with more than total average errors (40.3)

United States United Kingdom France Germany Spain Japan Sweden Italy

66% 66% 26% 26% 0% 26% 40% 14%

26% 13% 53% 33% 27% 53% 40% 64%

errors than this and language would begin to make an article difficult to read as can be seen in Table 3.

3. Results As might be expected, there was no direct relationship between the average number of language errors and the acceptance rate (see Table 1). However, there was a naturally high difference between the number of articles with low error rates (below the control average number of errors) presented by mother tongue writers and other countries. These would represent those articles where language (hopefully) would have no effect either good or bad on the scientific value of a work. Furthermore, there were considerably more articles with a high error rate (above the total average of errors) in countries whose publication rates were considerably less than the control (see Table 3). The exception to this were the Spanish papers who had a small number of both high and low error rates.

3.1. Grammatical errors 3.1.1. Passive–active ratio This ratio varied greatly both between individual authors and between national averages. We found that normal ratios for our controls were as follows (i.e. passive verbs divided by active verbs): Abstract, 0.6; Introduction, 0.7; Materials and methods, 2.0; Results, 0.67; Discussion 0.6. Average passive:active use per nationality can be seen in Table 4. Table 4 Average passive / active use rate Country

Average passive / active ratio

United States United Kingdom France Germany Spain Japan Sweden Italy

0.58 0.78 0.84 0.97 0.65 0.69 0.58 0.79

An interesting point was that the British authors used the passive considerably more than the US. This would fit the general description that the US authors tended to prefer a simpler style. However, there was no indication that this effected the acceptance rate at all. The high passive use by German authors reflected the high natural passive use in the German language. In reality, active–passive use is a much more complex subject than mere frequency. When a subject is unequivocal, (for example when the subject of an experiment in the Materials section has been well introduced) passive use is perfectly acceptable. It is only when an author had different subjects that this created problems for comprehension.

3.1.2. Tense Given the procedural problems in agreeing with the definition of ‘tense’ errors in scientific work we had to come to the conclusion that this category was not an accurate indicator of problems. We suggest that more research should be carried out for a clear definition of what editors expect in this category. Indeed there was no clear difference between the nationalities. Interestingly, the Japanese and the Swedes had fewer corrections in this group than the control. This probably reflected the use of simpler constructions used by these authors. 3.1.3. General grammatical errors Given that these errors should have been eliminated before presentation for publication, this category could be considered a ‘general sloppiness’ category. Furthermore this category would be almost instantly picked up on by most referees and editors. It is not therefore surprising that the group with the highest average of this group also had the highest general error count and the lowest acceptance rate (Italian authors) (Table 2). 3.2. Structural errors Although the smallest group in terms of numbers, the structural category was probably more important in terms of overall comprehension. Thus only a few long rambling sentences (often as long as a paragraph) would make a

Downloaded from by guest on 12 December 2017

R. Coates et al. / Cardiovascular Research 53 (2002) 279 – 285

whole article sometimes incomprehensible whereas a relatively large number of lexical ‘errors’ would often have no effect on an otherwise well-written article.

3.2.1. Long sentences These were possibly the single most obvious problem especially as the controls wrote relatively short sentences. Indeed, several sentences seemed to try to hide rather than clarify medical data! The following was a typical example. ‘The soluble form of B2 micro-globulin (B2 m) HLA class I heavy chain (FHC) consists of three size variants, namely the intact lipid-soluble 43 kDa heavy chain (A variant), released through a shedding process; the truncated water soluble 39 kDa heavy chain (B variant), which lacks the trans-membrane segment and is produced by an alternative RNA splicing and the 34–36 kDa (C variant), which lacks the trans-membrane and intratoplasmatic portion of the molecules.’ Simply breaking such sentences into a number of smaller ones or using suitable connectors would have made such sentences considerably easier to understand. German and French authors had twice the amount of long sentences than the other nationalities. 3.2.2. Word order Given the importance of word order in English (with no agreeing nouns / adjectives, declining verbs etc.), this category was very important for simple comprehension. Given that the word order of the controls was very simple, this problem was even more evident. The French and Italians had the highest ‘error’ rate in this category. Some examples: ‘In all patients, bioptic material was taken and was studied in the period from December 1999 to May 2000.’ corrected to ‘Bioptic material was taken from all patients in the period between December 1999 and May 2000. It was then studied.’ or ‘Brown detected, after LSD-treatment, by in-situ hybridisation, striking regional and cellular differences in the rabbit spinal cord.’ (Was it Dr. Brown or the rabbits who had had the LSD treatment?) corrected to ‘Brown detected striking regional and cellular differences, by in-situ hybridisation, after LSD treatment in the rabbit spinal cord.’ 3.3. Lexical 3.3.1. Jargon Numerically this group was the biggest source of ‘errors’ although taken independently many of the words considered here as being unnecessarily complex could be perfectly acceptable. However, we took as our criteria the following points: (1) did the word / s appear in one of the


numerous lists of ‘words to avoid’ [6–9] already published? (2) Did a simpler word / s exist? (3) Did the word / s unnecessarily complicate the text? We then sub-divided these word categories into three groups, (a) confusing words, (b) unnecessary words, (c) inaccurate words. For the sake of uniformity, each lexical ‘error’ was written down independently by the two language specialists and further checked by the medical specialist. Therefore in some of the ‘well-written’ articles (i.e. with fewer errors than the control mean), we noted these lexical items in any case to maintain uniformity. In context, many of these articles could be considered to be without any real mistakes at all. Many lexical ‘errors’, only really could become a potential problem for editors when they were summed with other errors, thus creating a ‘fog of words’.

3.3.2. Noun misuse We divided this category from the jargon category because noun misuse was generally very widespread. This involved using nouns when either a verb, adjective or adverb would have been simpler, easier to understand and less pretentious. Furthermore this ‘error’ also mirrored correct language use in the other languages. Thus: ‘are in agreement with . . . ’ rather than ‘agree with’ (a noun used instead of a verb) ‘The care of the patient . . . ’ rather than ‘patient care’ (a noun used instead of an adjective) ‘in recent years . . . ’ rather than ‘recently’ (a noun used instead of an adverb)

For an indicative list of some examples see Table 5.

4. Discussion The purpose of this study was not to try to find a direct correlation between language errors and acceptance rates. Obviously all papers are accepted or rejected on scientific merit rather than literary skill. However, we wanted to pin-point certain language area problems which could either directly or subliminally effect the possibility of an otherwise sound medical work being rejected. Summarising the language areas we looked at, we can say the following. 1. Passive use. Apart from the Materials (patients) section, the norm in medical articles was to have as high an active–passive ratio as possible. However, if the subject is clearly defined then the passive is acceptable. 2. Tense. We were unable to outline an objective tense use in this study principally due to the different use of tense by US and UK authors. However, we would prefer the simplest possible use of tense as outlined by Day [7], i.e. the past tense to refer to the current work being described and the present tense to describe other

Downloaded from by guest on 12 December 2017

R. Coates et al. / Cardiovascular Research 53 (2002) 279 – 285

284 Table 5 Lexical categories Lexical category





(a) Confusing word use

Paediatric patient Sacrificed Utilised Advocate Evidenced Actually Employed Represents Studies in the scientific literature It could be hypothesised The above mentioned . . . Experience a meaningful response Until healing occurs At variance with The termination of The number was fewer In recent years In a first step Not in a specific way By optical observation Are in agreement with The necessity of Is the possibility to For the treatment of In detecting the presence of

Child Killed Used Suggest Showed Now Used Is Studies Might These Benefit Until healed Contrasting Finished Were fewer Recently Initially Not specifically Optically Agree Needed Can To treat On detecting

(b) Inaccurate word use

(c) Unnecessary word use

Noun misuse

(a) instead of adjectives (participles) (b) Instead of adverbs

(c) Instead of a verb






published work (see exceptions). This does not cover all potential English use but it does considerably simplify the task of writing, especially if the author is not mother-tongue English. General grammar errors. When a research paper is presented for publication, there should be no general grammar errors. If there are it means that the work has not been checked by either mother-tongue colleagues or professional scientific writers. A computer spell checker alone is not enough! Long sentences. Avoid sentences with more than one subordinate clause. Shorter sentences in English denote a simple style and clearer science. Word order. In English the word order is fundamental for understanding due to the lack of declensions or agreeing adjective, nouns etc. Thus a simple word structure (in simple sentences), i.e. subject1verb1 object would be easier to understand. Jargon. Given that only a relatively small circle of doctors will be comfortable with the precise vocabulary of any given specialisation, there is already a lot of effort required to understand a text without complicating general language. If a simpler alternative exists, use it. Noun misuse. Given the formation of many European languages, the ‘misuse’ of nouns in English was very common. However, in English these structures tend to be overly complex if not necessarily an error. The verb is the strongest means of conveying meaning in English.

Generally we did not find a direct correlation between the number of ‘errors’ written and the final acceptance rate of articles presented for publication to Cardiovascular Research. There was however a closer relationship between the number of well-written articles and acceptance rates. That is, a well-written article would be judged solely on its scientific merit without any language interference. The partial exception to this were the Spanish articles which had no very well written articles although a high proportion were reasonably written. Thus it would seem that a well-written medical article was one which had as little ‘language interference’ as possible, i.e. as simple as possible. A large scale, cross reference survey including such data as study design and data management are needed to indicate exactly how important language interference is in medical writing. We suggest that further work is done on this subject to make these clear to publishing doctors. Furthermore we would suggest that standardised if not universal guidelines be made to make both the work of medical writers and editors easier.

References [1] Smith R. The case for structuring the discussion of scientific papers. BMJ Educ 1999;318:1224–1225. [2] Horton R. The rhetoric of research. Br Med J 1995;310:985–987. [3] Rennie D. The present state of medical journals. Lancet 1998;352(s2):6.

Downloaded from by guest on 12 December 2017

R. Coates et al. / Cardiovascular Research 53 (2002) 279 – 285 [4] Opthof T. Submissions, publications and reviewers from Europe: focus on Spain. Cardiovasc Res 1999;43:265–267. [5] Hall G, editor, 2nd ed, How to write a paper, BMJ Books, 1998. [6] Zeiger M. In: 2nd ed, Essentials of writing biomedical research papers, 1999. [7] Day R. In: 5th ed, How to write a scientific paper, 1998. [8] O’Connor M. In: 2nd ed, Writing successfully in science, 1992. [9] Goodman N, Edwards M. In: 2nd ed, Medical writing: a prescription for clarity, 1997.


[10] Kirkman J. Writing in English for an international readership. BMJ Educ Debate 1996;313:1321–1323. [11] Heslop J. Tense and other indexical markers in the typology of scientific tests in English. In: Hoedt J et al., editor, Pragmatics and LSP, Copenhagen School of Economics, 1982, pp. 83–103. [12] Coates R. The use of jargon in Italian scientific research. In: II National Congress A.N.C.E, 2000. [13] Coates R. Errors in medical publications. In: 1st theoretical–practical course of ‘scientific writing’, Bari: IRCCS, 2001.

Downloaded from by guest on 12 December 2017