MIXED MESSAGES: Referees' Comments on the ...

2 downloads 0 Views 1MB Size Report
Rita J. Simon ..... since authors often complain that they receive mixed messages (Simon et al. 1986). When ..... Cole, Stephen, Gary Simon, and Jonathan Cole.
MIXED MESSAGES: Referees’ Comments on the Manuscripts They Review V o n Bakanic* University of Southern Mississippi

Clark McPhail University of Illinois-Urbana

Rita J.Simon American University There are many criticisms of the manuscript review process but few systematic studies of the referees’ comments on manuscripts they review. The authors examined comments on a sample of first submission manuscripts reviewed by the American Sociological Review (ASR), 1977-1981. Positive and negative comments were classified into twelve categories: topic, theory, review of literature, design, data, sample, measurement, analysis, results, style, ad hominem, general. No manuscripts received unequivocally favorable reviews, but some reviews were less negative than others. Referees were more critical of manuscripts in comments to the editor than in comments to authors. Authors did not see comments to the editor nor did they know the referees’ recommendations concerning their manuscripts. Thus, editors had more information than authors, and authors may complain that they got mixed messages from two ormorereferees. However, there were surprisingly few actual contradictions in referees’ comments. While most comments (both positive and negative) were general. a substantial number were about theory and analysis. Reviews of the same manuscript by different referees yielded surprisingly few contradictions, most of those about theory. The biggest difference between the comments by referees recommending publication and by those recommending rejection was praise. Referees were equally critical of all manuscripts, but were more likely to praise those recommended for publication.

INTRODUCTION

Referees sometime disagree about recommendationsfor manuscripts. Since rejection rates are higher in social sciencejournals, many assume that social sciencereferees disagree more often than others. This has given rise to criticisms of journal reviews (Yoels 1974), comparison of review practices (Cole and Cole 1979; Cicchelli 1980; Hargens 1988), and *Direct all correspondenceto: Von Bakanic, Department of Sociology, S.S. Box 5074, University of Southern Mississippi, Hattiesburg,MS 39406-5074.

The Sociological Quarterly, Volume 30, Number 4, pages 639-654. Copyright 0 1989 by JAI Press, Inc. All rights of reproductionin any form reserved. ISSN: 0038-0253.

640

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

charges of referee prejudice or bias (Peters and Ceci 1982). Much attention has focused on referee recommendations and agreement among referees (cf. Tinsley and Weiss 1975; Whitehurst 1984). Critical inferences have been made about scholarship and scientific development in the social sciences (e.g., Kruskal 1981; Gergen 1982; Cole 1983), while other scholars have defended the social sciences on the basis of the importance and complexity of its subject matter (e.g., Hedges 1987; Van Blaaderen 1969). Acceptance rates are only part of the journal review process. Referees both evaluate a manuscript and make a recommendation for its disposition. These are two different judgments. For example, a referee may evaluate a manuscript’s quality highly, but recommend rejection because other journals have a more suitable audience. Or, two or more referees may find similar problems with a manuscript, but differ in their recommendations. By examining referees’ comments we can discern whether they disagree on the merits of the manuscript or merely about the recommended disposition. Unfortunately, there has been little examination of referees’ comments accompanying their recommendations.What reasons do referees give for their recommendationsto publish, to reject, or to revise and resubmit? Do two or more referees evaluating the same manuscript provide similar comments?Is there arelationshipbetween the evaluative content of a review and the referee’s recommendation? REFEREE COMMENTS

While criticisms of manuscript reviews are numerous (cf. Gove 1978;Freese 1979;Kochen and Perkel 1978; Yoels 1973, 1974), few have systematically analyzed the review process or the comments that accompany referees’ recommendations (cf. McCartney 1979; Levinsohn and Schleiter 1978; McHugh 1974). Most research about the criteria used to evaluate scientific scholarship has employedsurvey methods (e.g., Lodahl and Gordon 1972;Lindsey and Lindsey 1978; Chase 1970). This literature reports considerable agreement within disciplines among editors and referees surveyed about the normative criteria used. But, just as there is disparity between attitudes and behaviors, there may be disparity between referees’ scientific ideals and their review practices. Several pertinent studies have been made on the peer review process. Smigel and Ross (1970) examined the editorial decisions and attendant correspondence for 193 manuscripts submitted to Social Problems between 1958 and 1961. They tested the hypothesis that quality is the major determinant in the decision to publish. However, quality was measured by the extent of consensus among referees’ publication recommendations. The authors reported a relationshipbetween referee recommendations(i.e., “quality”) and the editor’s decision to publish or reject the manuscript. The study did not examine how referees evaluated the manuscripts or what criteria were important in their determination of manuscript quality. Bonjean and Hullum (1978) analyzed letters written to rejected authors by the editors of the Social Science Quarterly between 1973and 1976.In constructing these letters, the editor (or associate editor) had read the reviews, skimmed the manuscript, and drafted a letter that summarized the reasons for rejection. Bonjean and Hullum organized the reasons for rejection into five categories: unimportant contributions; methodological shortcomings;

Mixed Messages

64 1

theoretical problems; poor presentations; and editorial discretion. These data were based on the response of the editor to thereviewsand the manuscripts,but this included some indirect references to the criteria referees employed in their manuscript evaluations. McCartney (1979) examined the referees’ comments on 100 articles submitted to The Sociological Quarterly between 1974 and 1979, deriving nine categories: topic, analysis, writing, literature, data, design, theory, conclusions, and statistics. He found that referees frequently commented on methodological techniques and “procedural rules,” but that creative concerns such as choice of topic were also important. The notion that sociology is a low consensus discipline was challenged by the relatively high consistency among referees’ commentsabout topics, theories, and methods. McCartney reported an informative qualitative analysis of the referees’ comments, but the data had some acknowledged limitations. First, the manuscripts were purposively selected. Second, the measures of referee consensus were crude (e.g., whether referees made check marks on the same or adjacent categories on the face sheet summary that accompanied the manuscript). Third, there was no attempt to calculate the frequency with which comments in each category appeared. Although Snizek and colleagues (1981) did not examine journal reviews, their analysis of book reviews pertains to this study. They examined 219 book reviews on 79 major works over a thirty year period, using Mullins’s (1973) theory group schema to classify books and reviews. Reviewers sharing the author’s theoretical perspective were found more critical than those with different perspectives.While Snizek and his colleagues reported the relative frequency of criticisms,they did not report their substance. We know of no published examinationsof the connection between the evaluativecontent of reviews and the referees’ recommendations. Rather than relying solely on the rcports scholars provide about idealized criteria or focusing exclusively on recommendations, we examined the referee comments sent to the journal editor. This study reports a content analysis of the referees’ written evaluations accompanying their recommendationto publish or reject manuscripts submitted to the American Sociological Review (ASR) between 1977 and 1981. Our analysis was organized around three empirical questions. First, was there a differencebetween the evaluative content of reviews accompanyingrecommendationsto publish versus those to reject? Second, what criteria did referees cite in their evaluations? Third, did referees of the same manuscript cite similar criteria or did they contradict one another? DATA AND METHOD

Our data set was a stratified random sample drawn from the 2,337 manuscripts submitted to the ASR between 1977 and 1981. The volume of data, time, and financial constraints contributed to our decision to analyze only a sample of submissions.’ We drew the sample in two stages. First, the 2,337 submissions were divided into eventually published manuscripts and those rejected. Because ASR has a substantial rejection rate, we included all eventually published manuscripts (n = 393).2 A random sample of approximately equal number (n = 362) was drawn from all nonaccepted manuscripts (i.e., rejected, and revise and resubmit).

642

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

Content analysis of all reviews from 755 manuscripts in this sample was not feasible (approximately 1,812 reviews), so we excluded from analysis manuscripts which were commentaries, rejoinders, or rejected without undergoing formal review. We further restricted our analysis to first-submissionmanuscripts.Revised and resubmittedmanuscripts were not included because second reviews often were obtained from an original referee who referred to specificrevisions called for in the initial review. From the manuscriptsremaining after these exclusions we drew another stratified random sample? The basis for stratificationwas the number of submissions by substantive area, ranging across 64 different areas of specialization. Areas were grouped according to high, medium, or low submission! Our final sample may be described as a stratified sample of a data set that included the population of accepted manuscripts and a simple random sample of all other manuscripts? This sample included 323 manuscripts, averaging 2.4 reviews per manuscript (rejected n = 192, accepted n = 131): This sample size was required to insure statisticalreliability while still reducing the volume of data to more manageable levels. ASR editors routinely asked each referee to complete a manuscript evaluation form which directed the referee to recommend one of five possible dispositions of the manuscript and included an area reserved for referee comments to the editor regarding the manuscript, with attached pages for comments to the author. A content analysis of comments to the editors and those to authors generated the information analyzed below. All illustrative quotations attributed to referees were verbatim items from this analysis. We began coding our data with categories used in previous work (e.g., McCartney 1979; Bonjean and Hullum 1978). As we proceeded, additional classifications and distinctions were introduced. This resulted in twelve categories: GENERAL, TOPIC, LITERATURE REVIEW, THEORY, DESIGN, DATA, SAMPLE, MEASUREMENT, ANALYSIS, RESULTS, WRITING STYLE, AD HOMINEM.7 Each category contained a list of subcategories which further specified the comment. For example, “atheoretical” and “theory not clear” were two subcategories for negative comments about theory. To refine the categories, several coders read and attempted to code reviews of randomly selected manuscripts. When we thought we had a workable code we conducted intercoder reliability tests. Disagrecments among coders were examined and the code was adjusted to resolve them. The final three reliability tests yielded 76%, 77%, and 79% reliability? The value of each variable in the analysis reported below refers to the number of times a referee made a comment meeting our criteria for that category. We also calculated whether commentsin each category were made by one, two, or three referees of the same manuscript. As indicated above, the referees’ manuscript evaluation form designated places for comments to the editor distinct from those to the author, and we analyzed separately comments to editor and author. Altogether, we coded four sets of comments: negative comments to the editor; positive comments to the editor; negative comments to the author; and positive comments to the author. We also report the frequencies for variables we call contradictions, calculated by comparing positive and negative statements made by the referees of a manuscript. For example, if referee “A” made a negative comment about the manuscript’s ANALYSIS and referee “B” made a positive comment about the same manuscript’s ANALYSIS, this was a contradiction. If referee “C” also made a positive comment about the manuscript’s

Mixed Messages

643

ANALYSIS it was a double contradiction. Here too, we analyzed comments to the editor and author separately. RESULTS

No manuscript received a completely favorable review, although some reviews were less negative than others. There were just as many negative comments about accepted as about rejected manuscripts. The referees’ task is to be critical, and critical they were, whether to editors or authors. For each manuscript we summed all the nega.tivecomments addressed to the editor from all referees reviewing a manuscript? We also summed all the negative comments sent to authors,all the positivecommentsto editors,and all the positive comments to authors. These four quantities are summarized in Table 1. Negative comments far outnumbered positive ones. But there were some important differences between commentsto the author and those to the editor. There were significantly more negativecommentsto the editorabout rejected manuscriptsthan about those eventually published. While there were always more negative than positive comments, the ratio was 3:1 for a rejected manuscript. Accepted manuscripts were more likely to receive positive comments, but still the ratio of negative to positive comments was 1.51. Thus, referee comments were more frequently negative than positive for all manuscripts,but significantly more negative for rejected than accepted manuscripts. The pattern was in part different for comments addressed to the authors. The number of positive commentswas consistentlyhigher in reviews of acceptedas comparedwith rejected manuscripts. Positive comments were the best indicators of a referee’s recommendation. When addressing the authors of eventually published manuscripts,referees were more likely to preface the inevitable criticisms with positive comments.The ratio of negative to positive comments to the author was approximately4:1for accepted manuscriptsand 5: 1for rejected manuscripts.This made interpreting the referees’ commentsmore difficult for authors, since comments accompanying favorable recommendations were as critical as those with a recommendationto reject. The editor saw comments intended for editors and authors, as well as the referees’ Table 1 Means and Standard Deviations for Positive and Negative Comments ~~

~~~~~

~

COMMENTS TO EDlTOR ACCEPTED REJECTED Negative Comments Positive Comments

3.4** (2.77) 2.5** (2.29)

Notes: Standard Deviations are in parentheses. * t=test significant at the .05 level. ** t=test significant at the .01 level.

4.5**

(3.60) 1.5**

(1.86)

COMMENTS TO AUTHORS ACCEPTED REJECTED

15.3 (9.10) 4.2* (3.32)

15.5 (9.37) 3.2* (3.27)

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

644

recommendations. The author did not see the referees’ correspondence with the editor or the referees’ recommendations to the editor. In addition, the referees were blunt in their comments to the editor. One referee wrote: To the editor: “This article is very muddled, it reads like an accident waiting to happen!” To the author: “This needs considerable editorial work.” Thus, the editor had more information and often a much clearer message than authors. But, this information could be a two-edged sword. It gave the editor an advantage in interpreting the referees’ comments, but it made the editor’s decision harder to justify when authors complained. The Content of Positive Comments

To the Editor. Referee comments to the editor were often overall assessments of the paper’s merit. If positive comments were made, they were usually brief and general. Figure 1 shows their relative distribution by category. The most frequent positive comments made to the editors were general statements such as, “This is an excellent effort,” “The ASR should definitely publish this paper,” “This paper is clear and easily understood,” and “I found this to be the most interestingmanuscript I have reviewed in a long time.” The most frequently used word of praise was “interesting” (cf. McCartney 1979).1° Fifty-three % of all positive GENERAL comments were matched by a positive GENERAL comment from another referee.“ Comments about manuscript TOPIC were next in frequency. The word “interesting” also appeared here with some regularity. Positive

(Total Comments = 265) General

Design 1.1% Theory

opic 14.3%

Analysis Results 5.3%

Ad Horninem 1.5% Review of Literature 1.5% Datc 4.9%

figure 7. Positive Comments to the Editor

Mixed Messages

645

comments about TOPIC were less frequent than GENERAL comments and less often matched (26%).Comments concerning STYLE and ANALYSIS were also common but contained fewer matches (17%and 13%). There were few positive comments in the remaining categories and no matches among those comments. Since there were fewer positive comments to the editor than negative comments, there were fewer opportunities for matches. Praise was rare and seldom specific. To the Author. As noted above, the big difference between reviews accompanying recommendationto accept versus those to reject was commentspraising the manuscript. We also noted that praise was more frequent in comments to authors than to editors. Figure 2 shows this distribution. Most positive comments to authors were GENERAL. Statementscoded in this category were often overall summationsof the paper’s merit. Matches were prevalent among positive GENERAL comments to authors as well as editors (52.5%). The following example illustrates why matches were so prevalent in this category. Referee A: “On the whole this is a nice article.” Referee B: “Overall, I like this article.” Although each referee subsequently made quite different specific comments about the manuscript and differentrecommendations(one to accept, the other to revise and resubmit), they both liked the manuscript and found merit in the work. Positive comments to authors about their TOPIC were also common. These were less frequently matched (32%)by other referees, perhaps because some referees took importance of the topic for granted. One referee wrote, “This is an important topic for studies of subjective class identification.” A more terse referee might deem such statements superfluous.

(Total Comments = 488) General

Design 1.6%

Review of Literature 5.1%

Measurement 1.6%

Data 5.1% Sample -2% Ad Horninem 3% Figure 2. Positive Comments to the Author

646

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

Positive comments to authors about THEORY were infrequent but often matched (43.4%). Although there were frequentcomments about both ANALYSIS and STYLE, there were fewer positive matches among them. Positive GENERAL, comments were correlated with comments about TOPIC (.426) and STYLE (.431),’2 illustrating differences noted earlier in the uses of criticism and praise. Positive comments in all three of these categories were broad and vague. An exemplary comment from the TOPIC category read, “This is an interesting topic that deserves more attention in the literature.” Similarly, the STYLE category contained comments such as, “The paper is well written.” Our GENERAL category,by definition,contained encompassing statements. Thus, the correlations between these variables indicate the tendency for referees to be general in praise, in contrast to their specificity in criticisms, especially regarding methodologicalmatters. The Content of Negative Comments

To the Editor. The distribution of negative comments to the editor is summarized in Figure 3. Although GENERAL comments were most frequent (n = 170), they accounted for a smaller percentage of all negative comments. They included statements such as, “This is not sociology,” “. .. as it is the paper is severely flawed and not suitable for ASR,” and even, “Yuk!” This category also covered such statements as, ‘‘I wasn’t convinced by this research” and “I couldn’t folfow the author’s argument.” Fifty-four percent of these comments were matched by similar negative comments from other referees. Comments about THEORY were also frequent (11.7%), including, “This (paper) is completely atheoretical” and “The theory is poorly developed.” There were 64 separate negative commentsin this category; 37% were matched by a similar negative comment from another referee.

(Total Comments = 551)

Figure 3. Negative Comments to the Editor

Mixed Messages

647

The only other category containing enough negative comments to mention was ANALYSIS; 35% of negative comments were matched by another referee. Due to the limited space on the form for referees’ comments to the editor, those remarks tended to be brief and to convey general impressionsrather than detailed criticisms. Thus, there were few comments or matches in the remaining categories. GENERAL negative comments were most frequent and yielded the most matches. Specific negative comments werelikelytobeaboutTHEORYand ANALYSIS. Manycommentstotheeditor weremade in other categories by a single referee, but matches were rare. To the Author. Whereasreferees were generaland blunt in their comments to the editor, they were specific but diplomatic to authors. Figure 4 summarizes those comments by category. Again, GENERAL comments were most frequent, such as, “nothing new,” “boring,” and “so what?’’ Sixty-seven percent of these negative comments were matched by other referees’ comments. Negative comments about THEORY were also quite frequent, 61%of which were matched by another referee. Of the eight subcategories of comments on THEORY, three appeared frequently. The modal complaint was that the manuscript was atheoretical;the “nothing new” phrase was often specificallyapplied to theoretical development; and referees often complained that theories were misinterpretedor were not clear. Manuscript treatments of data analysis and results frequently drew negative referee commentsthat were pedantic in tone and substance,citing statistical texts, technicalarticles, and even program manuals. Thus, flaws in analysis appeared to be easy targets for scholars recently trained in graduate programs with a quantitative emphasis. Not surprisingly, 62% of negative commentsabout ANALYSIS and approximately 60%of those about RESULTS were matched by other referees’ comments. Even such a high percentage of matches does not adequately convey the consensus observed among the referees’ comments. As noted earlier, we coded comments into twelve

Analysis 12.39:

,

General 16.2%

6. .8%

Results 11.7%

I

Recriew of Literature 11.1%

wG31ylI

‘t..J/O

opic 6.4% Data 4.8%

Figure 4. Negative Comments to the Author

Ad Horninlem

.5%

648

THE SOCIOLOGICALQUARTERLY Vol. 30/No. 4/1989

categories with numerous, very specific subcategories, so that the data could be reduced for analyses without losing their richness. What was extraordinaryabout the ANALYSIS and RESULTS categories was the incidence of almost identical statements by independent referees. For example: Referee A: “You ought to make clear what kind of factor analysis was used, assumptions involved (why wasn’t loglinear analysis used?).” Referee B: “Since you did use the factor analysis you should state the limitations and assumptions and consider whether other measures might be called for.”

In addition to analytic concerns, referees gave ample attention to sociological prose (cf. Becker 1986).The STYLE category contained 56%matches. There was equal concern about adequately reviewing appropriate literature; REVIEW OF LITERATURE also contained 56% matches. Comments about MEASUREMENT, TOPIC, DATA, and DESIGN were less frequent and were matched less than 50% of the time. Although comments about samplingwere rare, referees agreed when the SAMPLE was not up to standards (92.5% of comments on SAMPLE were matched). Although fewer negative comments were addressed to editor than to author, there were some similarities.The most frequent negative commentsto both editor and author were about GENERAL features of the manuscript. THEORY and ANALYSIS were also frequent targets. In addition, RESULTS, STYLE, and REVIEW OF LITERA= frequently contained negative comments. While comments were overwhelmingly critical, referees were quite civil. AD HOMINEM remarks were rare. We performed t-tests to determine if there were any differences in the kinds of negative commentsreferees made to authors aboutrejected versuspublished manuscripts. There were few differences in the distributions of these two groups. In all categories except STYLE, eventually published manuscripts received as many criticisms as did rejected manuscripts. Comments on STYLE were even more frequent for eventually published than for rejected manuscripts (significant F and t values), perhaps reflecting referees’ efforts to improve potentially acceptable manuscripts. There were also several noteworthy correlations between categories of negative comments. The suongest correlation was between ANALYSIS and RESULTS (540). As one would expect, criticisms of results were often accompanied by criticisms about how the analysis was conducted. There were substantial correlations between MEASUREMENT and ANALYSIS (.544) and MEASUREMENTand RESULTS (.525). Thus, there may be consensusabout some aspectsof the disciplineof sociology; two or more refereeswere more likely to agree about methodological matters (e.g., ANALYSIS, RESULTS, MEASUREMENT) than about other matters. However, such agreements did not necessarily correlate with their recommendations for the disposition of the manus~ript.~~ Contradictions

Sociology has been characterized as a low consensus discipline because of high manuscript rejection rates at journals and the high percentage of split decisions in peer review (Zuck-

Mixed Messages

649

erman and Merton 1971). Those situations could arise because referees explicitly disagree about the quality of a manuscript or because they simply attend to different specific matters within it. We calculated the number of contradictory comments among referees. By contradiction we mean that one referee made a negative comment in one of the twelve categories while another referee made a positive comment in that same ~ategory.’~ For exampIe: Referee A: “The theoretical discussions are somewhat unfocused.” Referee B: “The theory section is clear and concise.” An author reading these two reviews might be incredulous, if not angry, about the contradictory “expert” opinions. Fortunately, such contradictory comments were rare, to editor or author. To the Editor. There were only 10 contradictions in statements to the editor out of approximately 7 16 coded statementsfrom the 230 reviews which included comments to the editor. This low incidence may have been due to the brevity of the comments in the limited space available to the referees and to the more generic tone of those comments.15But the lack of contradictionsis important nonetheless. It shows that even though referees disagreed about manuscript disposition, their comments to the editor more often complemented than contradicted one another. To the Author. Contradictory referee comments to an author are especially important since authors often complain that they receive mixed messages (Simon et al. 1986). When an author receives a negative decision about a manuscript and finds discrepancies among the comments of referees, the author often interprets these as evidence of a biased or incompetent review. We found, however, that there were surprisingly few contradictionsin

(Total Comments = 37) Theory

Analysis

.l%

Style

en t 8.1%

Results

Review 5.4% Data 2.7%

Salmple 5.4%

Figure 5. Number of Contradictions to the Author

650

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

the substantivecontent of comments sent to authors. About 11%of our sample manuscripts received contradictory referee comments. Figure 5 shows the distribution of contradictory comments by category for those thirty-seven manuscripts. Comments to the author were more frequent, more specific, and much longer than those addressed to the editor (positive = 3.71 and negative = 15.34). While the absolute number of comments was the greatest in the GENERAL category (n = 416), the rate of contradiction was relatively low (1.5%)? Twenty-four percent of the contradictions were about THEORY. While there was a smaller number of actual contradictions in the STYLE category, the ratio of contradictions to comments was higher (THEORY 1:25, STYLE 1:20). That is, even though there were fewer comments about STYLE,these were more frequently contradicted. RESULTS were also frequently contradicted. Although disagreementin the areas of THEORY, STYLE,and RESULTS was important, the paucity of contradictions indicated more similarity than dissent among referee comments. Whether or not two or more referees made the same recommendation,their comments about the manuscript were more similar than not. During the period under examination (1977-1981). 40% of the manuscripts reviewed by the ASR received split reviews, that is, referees made divergent recommendations. Why, then, were there not more contradictions in referees’ comments on the manuscripts? What authors perceive as conflicting referee evaluations may not correspond to direct contradictions. Different referees attend to and discuss different aspects of manuscripts. Indeed, referees can be deliberately selected to do so. Our analyses of these data indicated much more similarity than disagreement among referee comments. Perhaps the issue of consensus in sociology needs to be re-examined. There may be more consensus than some have assumed (e.g., Lodahl and Gordon 1972; Gove 1978; Cole et al. 1988). Moreover, consensusdid not result from referees’ reluctance to criticize. We found no evidencethat referees were “soft.” Indeed, criticisms(often harsh) far exceeded praise. While there were few direct contradictions and numerous matches (especially among criticisms), most comments were neither contradicted nor matched. Instead, the referees simply observed and commented on different aspects of the manuscript. Our results also suggest much more agreement in some areas than others. For example, referees were quick to recognize and agree on procedural errors, but they did not always agree on theoretical discourse or substantive interpretation of results. DISCUSSION

No manuscript received an unequivocally favorable review. Negative comments far outnumbered positive ones in both the comments to the editor and to the author. Referees were equally critical of manuscripts they rejected and those they recommended for publication, but they were twice as likely to make positive comments about the latter. Although praise was important, negative commentswere more specific and more frequent than positiveones. Most negative comments (excepting the GENERAL category) were about THEORY and statistical ANALYSIS. Referees more frequently criticized the author’s STYLE in manuscripts that were eventually published than in those rejected. Since the STYLE category

Mixed Messages

651

included comments dealing with editorial changes, it may be that acceptable manuscripts received these criticisms as refinements. For the most part, referees tried to be constructive. While we did not evaluate the usefulness or “quality” or comments, it is important to note that revision is an integral part of the manuscript review process. In earlier works (Bakanic et al. 1987, 1988) we argued that in the social sciences (where single authorship is the mode) the review process serves much the same function as research collaboration in the physical and biological sciences. That is, in lieu of research teams comprising different specialties,we use peer review as a source of expert feedback. Thus, regardless of wounded egos, negative comments can be constructive. We arrived at no firm conclusions about the objectivity of reviews based on their evaluative content. It may be that referees read through a manuscript, come to a decision, and then go through it again constructing justifications. The evaluations accompanying negative recommendationsreduce to, “I don’t like it. This is what’s wrong with it.” Those accompanying recommendations to publish reduce to, “I like it but this is what’s wrong with it.” Viewed this way, criticism need not be negative or threatening. Reviews were usually aimed at revision rather than abandonment.While referees did not always attend to the sameaspects of the manuscript,there were few differences in the contentof their reviews and fewer outright contradictions, regardless of their recommendations. ACKNOWLEDGMENTS

We wish to thank William Snizek, Charles Tucker, and several anonymous referees for comments on earlier drafts. We thank Debra Kelley for assistance with conducting intercoder reliability tests and Robert Partridge for help with the graphics. APPENDIX A Correlation Matrix Positive Comments to the Author TOPIC REVIEW THEORY

1 .Ooo

DESIGN DATA

.114 .32A .117 ,397

SAMPLE

-.OX

MEASURE

ANALYSIS RESULTS STYLE AD HOM GENERAL N O W

.119 ,195 ,200 .329 ,141

1 .Ooo .311

.l% ,070 -.012 ,079 .047 .001 .248

.047 .262 ,426 -.ON .045 topic review

1 .ooo .308 .208 -.011 .097 .I36 ,100

1.m

.033

-.w -.201 ,178 ,089

,220 -.026

.180 ,115

.289

.22A

-.042 chcory

,019 design

1.000 -.013 .134 .286 .136 .191 .071 .221

1 .Ooo

-.w .121 -.015 -.017 -.005

1.Ooo .422 .015 .236 -.167 .073

1.Ooo .115 349 .134

-.036

1.Ooo ,035

1.ooO

.3M

.308

,431

.006

.054 -.017 .011 ,032 data sample measure analysis

-.W1 mlts

.098

1.Ooo .088

1.Ooo

.055 .058 -.056 style adhom general

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

652

APPENDIX B Correlation Matrix Negative Comments to the Author TOPIC REVIEW

THEORY DESIGN DATA SAMPLE MEASURE ANALYSlS RESULTS

STYLE ADHOM GENERAL NOREV

1 .ooO .311 ,485 ,184

1.ooO A81

1.OOO

,198 .32A ,375

,225 ,403

.382 ,273 255 3 2 ,451

.344 ,293

,416 ,493

.493 ,469

-.012

,116

,457 -.016

,457 -.lo0 review

.160

topic

.320 .264 ,088

1.000 ,237 .145 ,280 ,307

1.ooO

,264 ,349 ,316

1.ooO

,317 .326

1 .ooO

,544

1.000

,302 .140

,388

,300 ,086

.525 .2%

,640

.048

,029

,072

,637 -.074

,219 -.lo5 design

theory

,240

-.W -.OM

,459 -.017

1 .ooO

,419

1.W

.037

,077

,382 .260 .454 A86 ,553 -.078 -.095 -.084 -.081 -.074 data sample measure analysis results

1.ooO

,526

.057 1.000 ,001 -.lo4 style adhan general

-.OM

NOTES 1. A mean of 2.4 reviews per manuscript in this data set yielded approximately5,708 reviews. 2. Included in the accepted manuscripts were all previous submissions of the manuscript prior to the decision to accept.This often involvedmultiple revised and resubmitted versionsof the manuscript. 3. All analyses were weighted to compensate for the disproportionate number of accepted manuscripts included in the data. 4. The variable indicating substantive area was generated from the list of specialties recognized by the ASA. Ten more areas were added to this list during coding. Coding instructions and reliability test results are available (see Bakanic 1986). We used the frequency of submission among areas as a basis for stratification out of concern for representing a cross-section of manuscripts received. 5. The manuscripts selected by stratified random sampling for the content analysis included many that were missing some information. We did not exclude them from the sample, because we did not want to bias the random selection process. We selected 323 manuscripts, 142 of which had missing information. There were primarily two reasons for missing information. First, referees omitted comments to either author or editor, as was their option. It was also possible for the comments to be totally critical or totally positive. When there were no comments, all content variables were assigned a zero. Second, through secretarial error some of the correspondence accompanying manuscripts submitted in 1978 and the first half of 1979 was destroyed. To compensate, control variables were included in all correlational analyses. Files were cataloged alphabetically by first authors’ surname. Year submitted and alphabetizing were the only two systematic influences among the affected files. 6. Manuscripts that received revise and resubmit decisions were classified as rejected for this analysis (but cf., Bakanic et al. 1987). 7. Each of the twelve categories included subcategories of specific types of comments classified within the category. For example, a subcategory of GENERAL was GENERALDoring. If the comment included the word boring or dull in a criticism, it was assigned to the GENERAL category and the Boring subcategory.

Mixed Messages

653

8. Each reliability test was conducted on a new random sample from our data set. The manuscripts selected for the analysesreported here were included in the pool of manuscripts from which those test cases were drawn. The same coders were used throughout pretesting and coding of the manuscripts included in these analyses. For a complete descriptionof coding instructions and reliability test results see Bakanic 1986. 9. To control for the number of referees, the t-test comparing the total numbers of negative and positive comments was repeated using the average number. The results were similar. The major difference was that the F value for the average number of negative comments to the editor was not statistically significant, while it was significant for the total number. The t value for this variable remained statistically significant. 10. Because the adjective “interesting” was used to refer to avariety of qualities we used the noun it referred to in classifying the comment rather than the adjective itself. Thus, when it referred to the overall paper quality it was classified as GENERAL. 11. A match occurred when two referees reviewing the same manuscriptmade comments that were substantively the same (i.e., coded in the same category). 12. The correlation matrices are included in the appendices. Due to a clerical error some of the editorial correspondence, referee comments, and manuscripts were discarded. W e included control variables when possible to reduce the effects of missing data. 13. Although the referee form indicates five possible recommendations,the editorial board did not consider recommendationsto refer or reject a split decision. Neither were accept and conditional accept split decisions.Thus, while the actual inter-raterproduct moment correlation among the five categories was .15 for first submission articles, the condition identified by the editorial staff as a split decision occurred with about 40% of the manuscripts. 14. If a third referee also made a positive comment in that same category we classified it as a double contradiction. 15. Comments to the editor were optional; not all manuscripts received any. 16. Because the GENERAL category included such a broad variety of statements it was possible that positive and negative statements classified in the category were not contradictory. Therefore, we used the subcategories (e.g.. boring, not sociology)to establish if the statements were actuallyreferring to the same quality. However, the categories with the most comments did not always contain the most contradictions. The number of contradictions was not solely a function of the frequency of positive and negative comments in each category. Contradictions were calculated by comparing the comments made by two or more referees of the same manuscript, not by comparing variables across different manuscripts.

REFERENCES Bakanic, Von. 1986. “Tracing Social Science: The Manuscript Review Process.” Doctoral dissertation,University of Illinois. Unpublished. Bakanic, Von, Clark McPhail, and Rita Simon. 1987. “The Manuscript Review and Decision Making Process.” American Sociological Review 52: 63 1-642. . 1988. “Try, Try Again: The Role of Revision in Peer Review.” Paper presented at Southern Sociological Society Meeting. Becker. Howard S. 1986. WritingforSocial Scientists. Chicago: University of Chicago Press. Bonjean. Charles, and Jan Hullum. 1978. “Reasons for Journal Rejection: An Analysis of 600 Manuscripts.” PS 11: 480-483. Chase, Janet. 1970. “NormativeCriteriafor ScientificPublication.” AmricanSocioZogist September: 262-265.

THE SOCIOLOGICAL QUARTERLY Vol. 30/No. 4/1989

654

Cicchelli, D.V. 1980. “Reliability of Reviews for the American Psychologist: A Biostatistical Assessment of the Data.” American Psychologist 35: 300-303. Cole, J.R., and Stephen Cole. 1979. “Which Researcher Will Get the Grant?” Nature 279: 575-576. Cole, Stephen. 1983. “The Hierarchy of the Sciences?” American Journal ofSociofogy 89: 11 1-139. Cole, Stephen, Gary Simon, and Jonathan Cole. 1988. “Do Journal Rejection Rates Index Consensus?” American Sociological Review 53: 152-156. Freese, Lee. 1979. “On Changing Some Role Relationships in the Editorial Review Process.” AmericanSociologist 14: 231-238. Gove, Walter. 1978. “Sociology: The Issue of Publication.” Paper presented at the Southern Sociological Society meetings, New Orleans. Hedges, Larry. 1987. “How Hard is Hard Science?” American Psychologist 42: 433455. Kochen, Manfred, and Barbara Perkel. 1978. “Improving Referee Selection and Manuscript Evaluations.” Proceeding of the First International Conference of ScientificEditors, Jerusalem. April 24-29, 1977. Dordrecht, Reidel Publishers. Hargens, Lowell. 1988. “Scholarly Consensus and Journal Rejection Rates.” AmericanSociological Review53: 139-151. KNskal, W. 1981. “Statistics in Society: Problems Unresolved and Unformulated.” Journal ofthe American Statistical Association 76: 505-5 15. Levinsohn, Florence, and Mary Schleiter. 1978. “Publication Decisions: Universalistic or Particularistic.” University of Chicago. Unpublished. Lindsey, Duncan, and T. Lindsey. 1978. “The Outlook of Journal Editors and Referees on the Normative Criteria of Scientific Craftsmanship.” Quality and Quantity: European American Journal OfMethodology 12: 45-62. Lodahl, Janice, and Gerald Gordon. 1972. “The Structure of Scientific Fields and the Functioning of University Graduate Departments.” American Sociological Review 37: 57-72. McCartney, James. 1979. “Verification and Imagination: Qualities Reviewers Seek in Sociological Papers. Paper presented at the American Sociological Association meetings, Boston. McHugh, Peter. 1974. On the Beginning of Social Inquiry. London: Routledge and Kegan Paul. Mullins. Nicholas C. 1973. Science: Some Sociological Perspectives. New York: Bobbs Merrill Co. Peters, Douglas, and Stephen Ceci. 1982. “Peer Review Practicesof PsychologicalJournals: The Fate of Published Articles Submitted Again.” The Behavioral and Brain Sciences 5: 187-255. Simon, Rita J., VonBakanic, andClarkMcPhail.1986. “Who Complains to Journal Editors andWhat Happens?” Sociological Inquiry 56: 259-271. Smigel, Erwin, and H.L. Ross. 1970. “Factors in the Editorial Decision.” American Sociologist, 5:

19-21. Snizek, William, E.R. Fuhrman. and M.R. Wood. 1981. “The EffectofTheory Group Associationon the Evaluative Content of Book Reviews in Sociology.” AmericanSociofogist,16: 185-195. Tinsley. H.E.,andWeiss,D.J. 1975. “Inter-raterReliabilityand Agreementof SubjectiveJudgments.” Journal of Counseling Psychology 22: 358-376. Van Blaaderen, Andreas. 1969. “Of Sociology and Science.” American Sociologist 4 147-149. Whitehurst, Grover. 1984. “Inter-rater Agreement for Journal Manuscript Reviews.” American Psychologist 39: 22-28. Yoels, William. 1973. “On Publishing or Perishing: Fact or Fiction.” American Sociologist 8:

128-130. . 1974. “The Structure of Scientific Fields and the Allocation of Editorships on Scientific Journals: Some Observations on the Politics of Knowledge.” Sociological Quarterly 15:

264-276. Zuckerman, Harriet, and Robert Merton. 1971. “Patterns of Evaluation in Science.” Minerva 9:

66-100.