Вы находитесь на странице: 1из 22

Organization Science informs

Vol. 16, No. 2, MarchApril 2005, pp. 180200 doi 10.1287/orsc.1040.0107


issn 1047-7039  eissn 1526-5455  05  1602  0180 2005 INFORMS

How Much Better Are the Most-Prestigious Journals?


The Statistics of Academic Publication
William H. Starbuck
Department of Management, Stern School of Business, New York University, 40 West Fourth Street, Tisch 7-22,
New York, New York 10012, wstarbuc@stern.nyu.edu

A rticles in high-prestige journals receive more citations and more applause than articles in less-prestigious journals, but
how much more do these articles contribute to knowledge?
This article uses a statistical theory of review processes to draw inferences about differences value between articles in
more-prestigious versus less-prestigious journals. This analysis indicates that there is much overlap in articles in different
prestige strata. Indeed, theory implies that about half of the articles published are not among the best ones submitted to
those journals, and some of the manuscripts that belong in the highest-value 20% have the misfortune to elicit rejections
from as many as ve journals.
Some social science departments and business schools strongly emphasize publication in prestigious journals. Although
one can draw inferences about an authors average manuscript from the percentage in top-tier journals, the condence limits
for such inferences are wide. A focus on prestigious journals may benet the most prestigious departments or schools but
add randomness to the decisions of departments or schools that are not at the very top. Such a focus may also impede the
development of knowledge when mediocre research receives the endorsement of high visibility.
Key words: citations; journals; knowledge; peer review; personnel evaluation; research; reviewing

Introduction economists what they thought about review processes,


A few years ago, some deans of business schools an- 60% of them took time to reply, some wrote several
nounced new criteria for tenure and promotion that blistering pages (p. 165), and many told stories about
place extreme emphasis on publications in so-called A very inuential articles that journals had rejected.
journals. The deans statements implied that articles in This situation has contradictory aspects for me. Per-
A journals are essential and those in B or C journals are sonal experience tells me that it makes no sense to judge
insignicant. To gain tenure or promotion to full profes- articles solely on the journals in which they appear. I
sor, they said, a professor must have published at least have served on editorial boards of 15 journals in four
N publications in A journals. elds, including journals with great prestige and journals
Even though many social science departments and with none; I have reviewed manuscripts for many other
business schools use such an approach to faculty eval- journals; and I have read debates about peer review.
uation, its apparent oversimplication and narrow focus However, emphasis on A journals is widespread, and
is disturbing. Journals with higher prestige usually pub- widespread practices almost always benet someone. To
lish more high-value articles, but lower-prestige journals move beyond philosophy and ad hoc opinion, I decided
also publish excellent articles and high-prestige journals to review evidence about peer review and to analyze sta-
publish pedestrian articles. One can see substantial value tistically the articles in different prestige strata.
overlap among journals with very different prestige. The One needs to analyze these issues in terms of fun-
most inuential writings have included books and chap- damental statistics rather than only in terms of data
ters in books, and distinguished social scientists have because reputations and citations are strongly inuenced
told me they refuse to submit their manuscripts to jour- by social construction. Journals citation rates and repu-
nals. When Gans and Shepherd (1994) asked 140 leading tations reect other factors as well as the average value
180
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 181

of articles. Citation rates of journals correlate with such Figure 1 Average Impact Factors of Articles in Economics
factors as elds citation practices, journals circulations, Journals
journals languages, and the nations where journals are 5 Top-tier 20 Top quintile
published. Journals reputations have been rather iner- 3 11 Second-tier 41 Second-third quintiles
tial despite shifting editorial policies. The statistics help 5 Third-tier 41 Fourth-fifth quintiles

to distinguish the processes themselves from their social


context.
This article draws on data about review processes to

Average impact factor


frame a statistical analysis of differences between the 2

top 20%, the middle 40%, and the bottom 40% of jour-
nals. Although the theory generalizes to other elds, this
article focuses on economics, psychology, sociology, and
their business-school cousin, management. The next sec- 1
tion reviews trends in citations to journals in these four
elds. This review suggests that increasing emphasis on
top-tier journals results from administrative choice rather
than from adaptation to widespread trends in social sci-
0
ences. Final sections of this article discuss how emphasis

81

83

85

87

89

91

93

95

97

99

01
on top-tier publications affects personnel evaluations, the

19

19

19

19

19

19

19

19

19

19

20
standings of departments and schools, and the develop- Year

ment of knowledge. Because examination of these issues


needs a theoretical framework, the middle sections of in one year by articles published during two previous
this article develop a statistical theory that describes how years:
editorial acceptances and rejections vary with reviewers Citations in year t to articles published in t 1 or t 2
abilities to assess manuscripts. 
Articles published in years t 1 or t 2
Some ideas in this article, especially those in the mid-
dle section, will be familiar to readers of the many arti- ISI has said that two years are long enough to obtain
cles about peer review. What this article contributes is high percentages of citations for nearly all articles, so
a more integrated and systematic analysis rooted in a impact factors accurately reect visibility of journals
statistical theory. This theory helps to relate properties even though they may not accurately reect visibility of
of review processes to citation practices, and it allows specic articles. Because this article aggregates impact
inferences about the usefulness for personnel evaluations data across many journals, it places no reliance on the
of counts of publications in prestigious journals. representativeness of impact factors for single articles or
even single journals.
Ellisons observation suggested two possibilities. First,
For Whom Have Very Prestigious Journals trends in economics might be widespread across social
sciences. Second, deans who are economists might have
Become More Important? adopted a policy that reected conditions peculiar to
One of my initial conjectures was that the deans pol- their own discipline.
icy change might be part of a widespread trend. I read To nd out whether many business schools were
an article in which Ellison (2002, p. 978) remarked increasing their emphasis on top-tier publications, I
that articles in ve top-tier economics journals had queried colleagues at 16 top North American business
been attracting more and more citations, whereas articles schools. Three respondents pointed out that because
in second- and third-tier economics journals had been rankings by Business Week and Financial Times now
attracting fewer and fewer citations. Ellisons observa- take account of publications in journals that these mag-
tion raised the possibility that the new policies at some azines deem inuential, such publications have become
business schools might reect trends in social sciences more salient. Colleagues at six schools replied that their
generally. The solid lines in Figure 1 graph impact fac- schools have been placing strong emphasis on publi-
tors of 21 economics journals Ellison studied. It uses his cations in prestigious journals for many years; these 6
classications of these journals as top tier, second tier, schools received an average rank of 6.6 from Business
or third tier, but uses my own data about impact factors Week in 2002, much higher than the average rank of 11.8
from 19812002. for the other 10 schools. Another six colleagues stated
Impact factors are published by the Institute for Scien- that their schools use broader criteria than publications
tic Information (ISI), which surveys citations in thou- in prestigious journals, and that they deem their own
sands of academic journals (Amin and Mabe 2000). An faculties to be more capable of evaluating research than
impact factor is the average number of citations received journal reviewers are; Business Week assigned these six
Starbuck: The Statistics of Academic Publication
182 Organization Science 16(2), pp. 180200, 2005 INFORMS

schools heterogeneous ranks in 2002. Colleagues at only Figure 2 Ratios of Impact Factors: Ratio of Top Quintile to
four schools said they had noticed an increased empha- Fourth-Fifth Quintiles and Ratio of Second-Third
sis on publishing in prestigious journals during recent Quintiles to Fourth-Fifth Quintiles
years; these 4 schools received an average rank of 15.8 14
Top management, Mean = 6.2
from Business Week in 2002, much lower than the aver- Top economics, Mean = 5.6
age rank of 7.6 for the other 12 schools. 12
Top psychology, Mean = 8.6
Top sociology, Mean = 4.2
To nd out whether economics is distinctive among Second-third management, Mean = 2.7
social sciences, I examined data about impact factors Second-third economics, Mean = 2.1

Ratio of average impact factors


10 Second-third psychology, Mean = 3.0
of journals related to business. These data include only Second-third sociology, Mean = 2.1
journals that received at least six citations during 2001
from some combination of 150 business journals, and I 8

have not retained data about journals that have ceased


publication. Both biases imply an underrepresentation 6
of journals having very low citation rates. I have data
about 102 economics journals, 95 management journals 4
(including industrial relations), 90 psychology journals,
and 47 sociology journals.
2
My rst discovery was that Ellisons nding is spe-
cic to economics, and especially to the 21 journals
he examined and how he classied them. If one clas- 0

81

83

85

87

89

91

93

95

97

99

01
sies journals into strata based on their citation rates
19

19

19

19

19

19

19

19

19

19

20
during recent years, then top-strata journals do appear Year

to have gained over lower-strata journals. Gains appear


to have occurred in all elds. However, these gains are
illusions created by retrospective sense making: Those because I want to discuss the overall stratication of
that did better appear to have done better in retrospect. academic journals and do not want to focus on a small
When I reclassied journals each year based on cita- number of highly prestigious journals.
tion rates in that year, different trends appeared. The Figure 2 shows that, in economics, there has been an
dashed lines in Figure 1 show the impact factors for increase in citation rates of higher-status journals rel-
102 economics journals when these are divided into ative to lower-status journals. The ratio of citations to
three broad categoriesthe most-cited quintile, the sec- the most cited versus the least cited rose 27%, and the
ond and third quintiles, and the two least-cited quintiles. ratio of citations to the second and third quintiles versus
The average impact factor of the most-cited quintile of the least-cited rose 33% over the two decades. However,
economics journals actually decreased about 10% over these changes are entirely due to decreasing citations
the two decades, and the increase Ellison observed was of the least-cited journals. As Figure 2 shows, cita-
specic to the ve journals on which he focused. tion rates decreased in all three categories over the two
Figure 2 graphs two ratios for each of four elds: decades, but the percentage decrease was larger for the
least-cited journals.
Three-year average impact factor for the most-cited The most striking trends in Figure 2 are decreases
quintile of journals
in the ratios for management and psychology journals,
Three-year average impact factor for the two especially during the early 1980s. The ratio of top-to-
least-cited quintiles of journals bottom declined 12% in sociology, 35% in management,
and and 40% in psychology. These decreases are largely
due to increases in citations of the least-cited journals.
Three-year average impact factor for the second and The average impact factor of the least-cited journals
third quintiles of journals
increased 4% in sociology, 36% in psychology, and 81%
Three-year average impact factor for the two in management. Thus, stratication decreased in all three
least-cited quintiles of journals.
elds, but it decreased less in sociology, which was ini-
For example, the heavy solid line in Figure 2 is the ratio tially more egalitarian.
of citations to the most-cited quintile versus citations to Figure 2 does not support the notion that social sci-
the two least-cited quintiles for management journals, ence researchers have been citing top-tier journals more
and the light solid line is the ratio of citations to the often and bottom-tier journals less often. On the con-
second and third quintiles versus citations to the two trary, with the exception of economics, citation data
least-cited quintiles for management journals. I divide mainly indicate that researchers have been citing bottom-
journals into three large categories partly because rank- tier journals more often. These data do suggest, how-
ings of specic journals are unstable over time and partly ever, that economists might have perceived an increased
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 183

emphasis on their top-tier journals, whereas psycholo- and by showing how inferences do or do not change with
gists, sociologists, and management researchers might different assumptions. Although the analysis is algebraic
have perceived a decreased emphasis on their top-tier at root, this article states no formulas because its formu-
journals. las appear in every textbook on mathematical statistics.
Thus, an increasing emphasis on publishing in top- Instead of formulas, this article presents graphs show-
tier journals seems to be a consequence of administra- ing the sensitivity of inferences to different assumptions.
tive choice rather than adaptation to trends in social Some of these graphs are surprising in that they indicate
sciences in general. Some business schools that are very review processes would act similarly for a wide range of
highly rated by business magazines have been empha- assumptions about these processes or about reviewers
sizing top-tier publications for a long time. Because abilities.
those business schools that have only recently begun to
place more emphasis on publishing in top-tier journals What Is the Value of a Manuscript? This article talks
are not among the most highly rated schools, they may about three kinds of values. One of these, discussed
be pursuing greater legitimacy by imitating their more- near the end of this article, is citation valuethe
prestigious competitors (DiMaggio and Powell 1983). value that researchers see when they choose published
One issue this situation poses is whether an emphasis works to cite. Citation value is partly a self-fullling
on top-tier publications may produce the changes in fac- prophecy because articles in higher-status journals
ulty that departments and schools seek. A second issue attract more attention and more citations (Gottfredson
is whether overemphasis on top-tier publications may 1978, Gottfredson et al. 1977, Lindsey 1978, Palacios-
degrade research and slow development of knowledge. Huerta and Volij 2004). This creates circular causality:
I can address these issues more intelligently after devel- Articles receive more citations because they appear in
oping a theory about the review process and then reex- higher-prestige journals, and journals gain prestige
amining data about impact factors. Although a theory because they publish articles that receive more citations.
has the disadvantage of requiring explicit assumptions However, noncircular factors are also at work. Citations
about the review process, it claries issues regarding the correlate strongly with each journals circulations, pub-
dependability of editorial processes. lication by a professional association, and nationality
and language. Fields have different norms about citation
practices. Citations of individual articles correlate with
A Statistical Theory articles lengths, numbers of references they cite, num-
Rousseeuw (1991, p. 41) commented, It is commonly bers of coauthors, and nonuse of mathematics. Theoret-
known and a constant course of frustration that even ical discussions attract more citations than do empirical
well-known refereed journals contain a large fraction studies, and general-interest articles attract more cita-
of bad articles which are boring, repetitive, incorrect, tions than articles on specialized topics (Stigler 1994).
redundant, and harmful to science in general. What Undoubtedly, a published article ends up with a retro-
is perhaps even worse, the same journals also stub- spective value that reects what journal published it,
bornly reject some brilliant and insightful articles (i.e., who cited it, and so forth.
your own) for no good reason. Rousseeuw explained A second value is the manuscripts shared valueits
that because editorial decision makers are imperfect, value in the personal value systems of reviewers. Shared
errors occur; because bad papers are submitted in such value may differ from citation value because review-
vast quantities    the small fraction of them that gets ers differ from those who cite published articles. Bowen
accepted may outnumber the good ones (1991, p. 43). et al. (1972) suggested that shared value might some-
Unfortunately, most academics may not appreciate the times increase agreement between reviewers for unde-
degree to which even well-known refereed journals sirable reasons, as when reviewers favor manuscripts
contain a large fraction of bad articles and reject some written by members of their journals editorial board.
brilliant and insightful articles. As Rousseeuw implied, The third value is the manuscripts inherent true
we nd it easier to believe that errors occurred when valueits ability to serve as a basis for widespread
journals rejected our own manuscripts than when the social consensus (Calhoun and Starbuck 2003). True
same journals rejected other peoples manuscripts (an value is the hypothetical quality on which nearly all
example of the self-serving bias; Heider 1958). reviewers ought to be able to agree. Because the main
Lindsey (1978) and Stinchcombe and Ofshe (1969) purpose of social science is to produce beliefs with
presented arguments similar to Rousseeuws, and they which nearly everyone can agree, even two review-
and Rousseeuw supported their observations with sta- ers who disagree strongly with each other about many
tistical calculations that made hypothetical assumptions. issues should be trying to agree with each other about
The analysis below improves on their analyses by allow- manuscripts true values. Indeed, one advantage of edi-
ing for more complex review processes, by anchoring tors choosing reviewers who might disagree would be
key assumptions in data about real reviewers behavior, that intersections of their judgments would approach
Starbuck: The Statistics of Academic Publication
184 Organization Science 16(2), pp. 180200, 2005 INFORMS

true values (Bailar 1991, Cicchetti 2003, Hargens and contributions depend on percentage differences, and
Herting 1990, Kiesler 1991). However, editors and ref- readers abilities to recognize intellectual contributions
erees may agree about properties that constitute true depend on percentage differences, the numbers of ele-
value but be unable to perceive true value accurately. ments in articles and books may have distributions that
Gottfredson (1978), Gottfredson and Gottfredson (1982), reect these perceptual tendencies. Thus, the analy-
and Wolff (1970) found that reviewers for psychologi- sis below assumes that the logarithms of true values
cal journals agree rather strongly about properties that in the population of manuscripts have a Normal dis-
manuscripts ought to possess, but their judgments about tribution with a mean at zero and a standard devia-
whether specic manuscripts possess various properties tion of one. Although the lognormal seems the best
agree much less strongly; correlations ranged from 0.16 choice on balance, it is only one of several similar
to 0.50. distributions.
Assumptions About Perceptions. Because perception Assumptions About Review Processes. Many social-
involves error, analysis of how reviewers perceive manu- science journals report that they publish around 15%
scripts needs to begin from assumptions about errors in to 25% of submissions, so the analysis assumes that
perceptions. Psychological research suggests that human editors publish 20% of the manuscripts they receive.
perceptions of differences between stimuli depend The analysis also assumes that editors send manuscripts
mainly on percentage differences between stimuli rather to two reviewers who determine the disposition of
than on absolute differences, and Luce and Galanter many manuscripts. If two reviewers cannot agree on
(1963) showed that a logarithmic transformation of acceptance or rejection, even after two revisions and
stimuli approximates the idea that people react sim- three reviews, editors themselves intervene and accept
ilarly to equal percentage differences. That is, as a a fraction of the manuscripts about which two review-
rough approximation, reviewers perceive the logarithms ers disagreed. The analysis assumes that editors and
of manuscripts properties. Psychological research also reviewers have approximately equal abilities to assess
suggests that perceptions are less discriminating for manuscripts and that editors make independent judg-
extreme stimuli, partly because perceivers have less ments about manuscripts without being inuenced by
experience with extreme stimuli. Thus, reviewers prob- reviewers. However, based on reviews of editorial deci-
ably make less sensitive distinctions among extremely sions, Cicchetti (2003) inferred that when faced with
good manuscripts or extremely poor ones than among judgments from two reviewers, editors typically adopt
manuscripts they see frequently. However, the analysis the more negative judgment, and when faced with judg-
below does not rely strongly on the accuracy of extreme ments from three reviewers, editors typically adopt the
perceptions because it only divides manuscripts into modal judgment.
three large categoriesthe most-cited quintile, the sec- These assumptions fall near the middle of actual edi-
ond and third quintiles, and the fourth and fth quintiles. torial practices (Campanario 1998a, Schminke 2002).
The analysis uses gross categories so that inferences In economics and nance, editors commonly send a
will not depend on the precise accuracy of assumptions, manuscript to only a single reviewer, who makes a
which are approximate. nal decision. A survey by Seidl et al. (2001) revealed
It also seems reasonable to suppose that manuscripts that more-prestigious economics journals use fewer
true values have skewed distributions that generally reviewers than do less-prestigious journals. At the other
resemble lognormalsmany mediocre manuscripts and extreme, one prestigious psychology journal sometimes
then fewer and fewer manuscripts as value increases. sends manuscripts to ve reviewers. Lindsey (1978) doc-
Many phenomena associated with publishing have umented substantial variance in qualications of review-
skewed distributions that resemble lognormals, including ers and editors. Some editors seek highly qualied
impact factors, lengths of books and articles, circula- reviewers, others choose reviewers who are likely to dis-
tions of journals, and numbers of authors who pro- agree with each other, and still others use inexperienced
duce various numbers of articles (West and Shlesinger reviewers such as doctoral students. Some editors inject
1990). One rationale for such distributions is that intel- their personal evaluations, whereas others mainly try to
lectual contributions result from multiple elements. Few implement reviewers decisions. Although these varia-
manuscripts incorporate almost no elements that consti- tions would alter the analysis, they do not affect the basic
tute intellectual contributions, whereas most manuscripts logic and they would shift the numbers little: Adding
incorporate several of these elements, but fewer and more reviewers has weak effects because correlations
fewer manuscripts incorporate additional elements as between reviewers are low.
the numbers of elements increase. I conjecture that My experience says that journals that use several
the lognormal pattern of intellectual contributions may reviewers, including editors who offer personal opin-
derive from humans abilities to perceive. Because ions, place more emphasis on authors conforming to
authors abilities to recognize their own intellectual social norms. More reviewers generate more advice, so
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 185

authors have to confront more demands for compliance. biases of reviewers (Armstrong 1997; Bedeian 2003;
One reviewer explained without embarrassment: The Campanario 1996, 1998b; Ellison 2002; Hargens 1990;
roles of the reviewers and editors are both gatekeepers Horrobin 1990; Mahoney et al. 1978; Nylenna et al.
and co-developers. Facing such attitudes, authors tend 1994). One interesting nding is that reviewers criticize
to believe that they must follow reviewers advice even the methodology of studies that cast doubt on theories
when they deem the advice ill-founded (Bedeian 2003, that the reviewers like and they applaud the methodology
2004; Frey 2003; Starbuck 2003). of studies that support theories that they like (Mahoney
Most journals ask reviewers to recommend accep- 1977, 1979).
tance, revision, or rejection. The analysis below assumes There is no way to directly measure reviewers abili-
that each reviewer recommends rejecting 58% of the ties to discern manuscripts true values, but two kinds of
manuscripts, accepting 17%, and seeking revisions of observations offer tangential evidence about these abili-
25%. These frequencies resemble ones that editors have ties. First, Gottfredson (1977, 1978) found that review-
reported, as listed in Table 1. Although the assumed fre- ers forecasts of manuscripts impacts correlated only
quencies of acceptance and rejection do affect the distri- 0.37 with later citations and their ratings of manuscript
butions discussed below, they affect actions taken after a quality correlated only 0.24 with later citations. If true
rst review more than they affect actions taken after two values are less visible and more difcult to discern
or three reviews. than citation values, correlations of reviewers judg-
ments with manuscripts true values are lower than these
The Correlation Between Reviewers Judgments and correlations.
Manuscripts True Value Second, as detailed in Table 2, various studies have
How accurately do reviewers appraise the true value of reported measures of agreement between reviewers that
manuscripts? This is a key issue. Peters and Ceci (1982) fall between 0.08 and 0.54 (Starbuck 2003). These
conducted what may be the most discussed and contro- measures present three issues, however. First, two of
versial study of reviewing by journals. They resubmit- the measures are product-moment correlations, two are
ted 12 articles to the very journals that had published Kappas, and most are intraclass correlation coefcients.2
them just 18 to 32 months earlier. All 12 were top-tier These statistics have slightly different properties. Sec-
journals. Whereas authors from prestigious psychology ond, these measures were computed from data about
departments originally had written the articles, Peters reviewers judgments that utilized three, four, or ve
and Ceci gave the resubmissions ctitious authors with categoriessuch as accept, revise, and rejectwhereas
return addresses at obscure institutions. In all, 38 edi- the statistical models below assume continuous evalua-
tors and reviewers saw the resubmissions. Three of these tion variables. Third, the measures in Table 2 indicate
38 recognized that submissions had already been pub- agreement about shared values rather than agreement
lished, which cut the sample to 9 articles that received about true values.
18 reviews. Out of 18 reviewers, 16 recommended rejec- The next subsection of this article draws on the num-
tion, and the 9 editors rejected 8 of the 9 articles. bers in Tables 1 and 2 to infer condence limits for
Peters and Cecis article drew 50 pages of commen- reviewers ability to assess a manuscripts true value.
tary by others, and Cicchettis (1991) report about low This inference process is rather complicated. However,
interrater agreement in 10 psychological and sociologi- I need to explain why my assumptions about these con-
cal journals drew 33 pages of commentary. Peer review dence limits are plausible, and I want to do this in a
elicits strong feelings and diverse attitudes (Baumeis- technically defensible way.
ter 1990; Bedeian 1996a, b; Holbrook 1986). Gossip The Information Value of Agreement Between Re-
about inconsistent reviews and biased reviewers per- viewers. What might measures like those in Table 2
vades academe, and studies have documented various say about relations between reviewers opinions and

Table 1 Frequencies of Accept, Revise, or Reject Recommendations by Reviewers

Personality
Journal of and Social American Journal of American
Administrative Academy of Personality Psychology Psychologist Abnormal Sociological
Science Management and Social Bulletin (Cicchetti 1980, Psychology Review
Reviewers Quarterly Journal and Psychology (Hendrick Scarr and (Cicchetti and (Hargens and
judgments (Starbuck 2003) Review1 (Scott 1974) 1976, 1977) Weber 1978) Eron 1979) Herting 1990)

Accept (%) 25 13 9 14 10 11 12
Minor revise (%) 28 9 9
Revise (%) 25 34 25 15 18 44 21
Drastic revise (%) 26 13
Reject (%) 50 53 66 18 50 46 58
Starbuck: The Statistics of Academic Publication
186 Organization Science 16(2), pp. 180200, 2005 INFORMS

Table 2 Reported Measures of Agreement Between Reviewers

Manuscripts
Agreement reviewed Categories

Reviewers judgments, Administrative Science Quarterly pm = 012 500 3


(Starbuck 2003)
Reviewers judgments, American Psychologist (Cicchetti 1980, R = 050 to 0.54 and 87 2, 3, and 5
Scarr and Weber 1978) K = 052 to 0.53
Reviewers judgments, American Psychologist (Cicchetti 1991) R = 038 72 5
Reviewers judgments, American Sociological Review R = 029 322 4
(Cicchetti 1991, Hargens and Herting 1990)
Reviewers judgments, Developmental Review (Whitehurst 1984) R = 027 73 4
Reviewers judgments, Journal of Abnormal Psychology, 1973 R = 008, 0.17, and 0.16 215 2, 3, and 4
(Cicchetti and Eron 1979)
Reviewers judgments, Journal of Abnormal Psychology, 1974 R = 029, 0.34, and 0.30 213 2, 3, and 4
(Cicchetti and Eron 1979)
Reviewers judgments, Journal of Abnormal Psychology, 1975 R = 019, 0.16, and 0.15 191 2, 3, and 4
(Cicchetti and Eron 1979)
Reviewers judgments, Journal of Abnormal Psychology, 1976 R = 007, 0.10, and 0.08 216 2, 3, and 4
(Cicchetti and Eron 1979)
Reviewers judgments, Journal of Abnormal Psychology, 1977 R = 005, 0.21, and 0.20 232 2, 3, and 4
(Cicchetti and Eron 1979)
Reviewers judgments, Journal of Educational Psychology R = 034 325 5
(Marsh and Ball 1981)
Reviewers judgments, Journal of Personality and Social Psychology R = 026 286 3
(Scott 1974)
Reviewers judgments, Law and Society Review (Cicchetti 1991) R = 023 251
Reviewers judgments, Personality and Social Psychology Bulletin R = 022 177 5
(Hendrick 1976, 1977)
Reviewers judgments, Social Problems (Smigel and Ross 1970) K = 040 193 4
Reviewers judgments, Sociometry (Hendrick 1976, 1977) pm = 021 140 5
Best paper, Consumer Psychology Division, American Psychological
Association (Bowen et al. 1972) W = 011 10 judges ranked 8 papers 8

Notes. pm = product-moment correlation; R = intraclass correlation; K = Kappa; W = coefcient of concordance.

manuscripts true values? If two reviewers opinions cor- product-moment correlations, and intraclass correlations.
relate very strongly with manuscripts true values, the I found trivial differences between product-moment cor-
reviewers opinions would correlate very strongly with relations and intraclass correlations, but Kappas have
each other. However, reviewers opinions could correlate smaller absolute values than the other two measures. The
strongly with each other even if they correlate weakly second step transforms Kappas into numbers comparable
with true values. For their opinions to correlate strongly, with the other measures. The measures in Table 2 indi-
reviewers must share values, but these values might cate correlations between reviewers judgments, which
involve elements other than manuscripts true value. For fall into just three to ve categories. The third step trans-
example, reviewers might value a manuscripts relevance lates statements about correlations between reviewers
to a faddish topic that will drift into obscurity, they judgments into statements about theoretical correlations
might value manuscripts conformity to social norms between reviewers opinions, which the theory assumes
that do not support generation of valid knowledge, or to be nely measured evaluations symbolized by num-
they might have high regard for authors from prestigious bers accurate to decimal places. Both small numbers of
schools and low regard for authors from little-known judgment categories and random errors of classication
schools (Bowen et al. 1972, Lindsey 1978, Pfeffer et al. cause judgment measures to differ from theoretical cor-
1977). Although specialized journals might promote dis- relations between opinions. Thus, the third step draws
tinctive values, Gottfredson (1978) found no such differ- inferences about correlation B in Figure 3 from mea-
ences among 299 editors and consulting editors for nine sures of correlation A. The fourth step translates state-
psychological journals. ments about correlations between reviewers opinions
Going from the measures in Table 2 to statements into statements about correlations between reviewers
about reviewers abilities to assess true value involves opinions and manuscripts true values. That is, the fourth
four steps. As would any measures of selection processes step attempts to draw inferences about correlation D
that are subject to error, the measures in Table 2 entail from correlation B in Figure 3. Because conjectures are
some uncertainty. Thus, the rst step involves estimating necessary at this point, I consider a range of possibilities
standard deviations for the reported measures. The mea- that show how correlation D depends on assumptions.
sures in Table 2 are of three different typesKappas, All four steps utilize simulations.3 The analyses could be
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 187

Figure 3 Relations Discussed in this Subsection showed that if utterly random numbers are converted to
the binary categories reject-or-not-reject, apparent agree-
D ment between two short lists of random numbers could
Reviewer 1s Reviewer 1s range as high as 80%. Indeed, because rejects outnum-
C
opinions judgments ber not-rejects, such random data would exhibit the oft-
reported property that reviewers are more likely to agree
True Shared
values values
B A about which articles to reject than about which articles to
accept.
Reviewer 2s Reviewer 2s The fourth analytic step translates correlations be-
opinions judgments
tween reviewers opinions into statements about correla-
tions between reviewers opinions and manuscripts true
values. That is, the fourth step attempts to draw infer-
algebraic, but simulation seemed easier, and it produces
ences about correlation D from correlation B in Figure 3.
scatter diagrams that I intended to use in this article.
Because conjectures are necessary at this point, I con-
Table 3 shows implications of the rst three analytic
sider a range of possibilities.
steps. The third and fourth columns report 95% con-
dence limits for correlations between reviewers judg- Reviewers shared values intervene between review-
ments, results of the rst two analytic steps. The three ers opinions and manuscripts true values, so correla-
right-hand columns report 95% condence limits for tion D relates more directly to correlation C than to
correlations between reviewers opinions, results of the correlation B. Correlations between reviewers opinions
third analytic step. (as estimated in Table 3) do not arise solely from review-
The term opinion denotes a nely measured assess- ers mutual appreciation of manuscripts true value. True
ment, but reviewers do not report such opinions, so one value should be a quality on which almost all review-
must infer correlations between opinions from corre- ers agree. If reviewers agree strongly on shared values
lations between categorized judgments such as accept, that differ from the values of most other people, one
revise, and reject. Potential distortions from reviewers can observe a strong correlation between reviewers that
classifying manuscripts into small numbers of categories has only a weak component coming from a manuscripts
are not widely appreciated. For example, one reader of true value. When data encompass hundreds of diverse
a draft of this article pointed out that reviewers often reviewers, it seems very unlikely that the correlations
agree about whether manuscripts ought to be published. between shared values and true values go as low as zero
Possibly this reader, and many editors, overlooked the or as high as one. Would it be reasonable to assume that
possibility that totally random judgments may agree shared values correlate 0.5 with true values? Or 0.25? Or
strongly when the sample is small and high percent- even 0.75? Editors who choose reviewers who tend to
ages of the judgments fall into one or two categories disagree with each other might be able to raise this cor-
(reject, or drastic revise and reject). Cicchetti (1985) relation to higher values, but no journal receives numer-

Table 3 Condence Limits for Observed Correlations Between Reviewers Judgments and Opinions

95% Condence limits for correlation 95% Condence limits for correlation
between judgments between reviewers opinions

Below Above Below Expected Above

Social Problems 0.38 0.59 048 0.63 0.78


American Psychologist 0.49 0.76 055 0.75 0.94
Administrative Science Quarterly 0.05 0.19 005 0.17 0.29
Sociometry 0.08 0.34 007 0.25 0.42
Journal of Abnormal Psychology, 1976 0.02 0.18 005 0.10 0.26
Journal of Abnormal Psychology, 1975 0.04 0.26 003 0.19 0.36
Journal of Abnormal Psychology, 1973 0.06 0.26 005 0.21 0.36
Journal of Abnormal Psychology, 1977 0.10 0.30 011 0.26 0.41
Personality and Social Psychology Bulletin 0.11 0.33 011 0.26 0.41
Journal of Personality and Social Psychology 0.16 0.36 022 0.36 0.51
Developmental Review 0.09 0.45 007 0.35 0.63
American Sociological Review 0.20 0.38 024 0.37 0.51
Journal of Abnormal Psychology, 1974 0.19 0.41 023 0.39 0.54
Journal of Educational Psychology 0.25 0.43 029 0.40 0.52
American Psychologist 0.20 0.56 021 0.45 0.69
American Psychologist 0.38 0.70 042 0.64 0.86

Notes. pm = product-moment correlation; R = intraclass correlation; K = Kappa.


Starbuck: The Statistics of Academic Publication
188 Organization Science 16(2), pp. 180200, 2005 INFORMS

ous citations to more than two thirds of its articles, and Figure 4 Ratios of the Correlations Between Reviewers
only a tiny fraction (0.6%) of journals receive numerous Opinions and Manuscripts True Values vs. the
citations to more than 30% of their articles.4 The many Correlations Between Reviewers Opinions,
200 Manuscripts in Each Sample
reasons why people share values that are uncorrelated
or weakly correlated with scientic principles suggest 5
that true values ordinarily correlate less than 0.5 with 4
shared values. However, there would be variation among
pairs of reviewers and possibly variation among elds of 3
study, so I have made computations for correlations of 2
0.25, 0.5, and 0.75.
Figure 4 shows scatter plots for ratios of correlations 1

Ratios
between individual reviewers opinions and manuscripts 0
true value as functions of correlations between review- 0.4 0.2 0.2 0.4 0.6 0.8 1.0
ers opinions. The three distinct groups of data corre- 1

spond to alternative assumptions about the correlations, 2 R = 0.75


denoted R, between shared values and true values; the Expected when R = 0.75
R = 0.50
horizontal lines describe R = 025, 0.5, or 0.75. Vari- 3 Expected when R = 0.50
ances are similar for all three assumptions and vari- 4
R = 0.25
Expected when R = 0.25
ances are small only when correlations between review-
ers opinions are above 0.4. When correlations between 5

reviewers opinions fall below 0.3, variances become Correlations Between Reviewers Opinions
very large and the three assumptions about R become
nearly equivalent. Variances are also smaller when there
are more manuscripts in each sample and larger when
there are fewer manuscripts in each sample. have correlated not at all or slightly negatively with
Table 4 estimates correlations between reviewers manuscripts true value. The upper limits indicate that
opinions and manuscripts true values for the measures reviewers opinions might have correlated between 0.2
in Table 2. In the upper portion of the table, the six and 0.5 with manuscripts true value.
right-hand columns give 95% condence limits for cor- The three bottom rows of Table 4 give statistics for
relations of reviewers opinions with manuscripts true expected values across all 16 reported measures. For
values. The condence limits are very wide. According example, if one assumes that shared values correlate 0.5
to the lower limits, many reported measures are consis- with true values, the average across all 16 measures
tent with the possibility that reviewers opinions might would be a correlation of 0.18 between reviewers opin-

Table 4 95% Condence Limits and Expected Values for Correlations of Reviewers Opinions with True Values

If Correl(TV, SV) = 025 If Correl(TV, SV) = 05 If Correl(TV, SV) = 075

Lower Upper Lower Upper Lower Upper

Social Problems 004 0.36 011 0.52 027 0.67


American Psychologist 006 0.44 012 0.62 031 0.81
Administrative Science Quarterly 011 0.19 007 0.24 003 0.28
Sociometry 018 0.30 011 0.36 005 0.42
Journal of Abnormal Psychology, 1976 019 0.24 016 0.27 014 0.29
Journal of Abnormal Psychology, 1975 017 0.27 013 0.32 008 0.37
Journal of Abnormal Psychology, 1973 016 0.26 011 0.32 006 0.37
Journal of Abnormal Psychology, 1977 014 0.27 008 0.33 001 0.40
Personality and Social Psychology Bulletin 015 0.28 008 0.34 002 0.41
Journal of Personality and Social Psychology 010 0.28 000 0.37 009 0.46
Developmental Review 024 0.42 016 0.51 007 0.59
American Sociological Review 008 0.27 001 0.36 010 0.46
Journal of Abnormal Psychology, 1974 011 0.30 001 0.40 008 0.50
Journal of Educational Psychology 006 0.26 004 0.36 014 0.46
American Psychologist 019 0.41 007 0.52 004 0.64
American Psychologist 011 0.43 005 0.59 021 0.75
Minimum expected value among all reports 0.03 0.05 0.08
Average expected value over all reports 0.09 0.18 0.27
Maximum expected value among all reports 0.19 0.37 0.56

Notes. TV denotes true value, and SV denotes shared value.


Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 189

ions and manuscripts true values. Also, if shared values value of 2 would have a mean evaluation of 02 03 =
correlate 0.5 with true values, the smallest reported mea- 006, because log2 = 03. Likewise, if a reviewers
sure implies a correlation of 0.05 between reviewers evaluations correlate 0.3 with logarithms of true value,
opinions and manuscripts true values, and the largest manuscripts with a true value of 5 would have a mean
reported measure implies a correlation of 0.37. evaluation of 03 07 = 021 because log5 = 07.5
One of my earlier reactions to the large variances The gray area in Figure 5 shows the population
in the foregoing relationships was to propose that jour- of submissions. The gray lines show distributions of
nals should use more categories for reviewers judg- manuscripts that would receive rejects from both of two
ments and that they should compel reviewers to use reviewers after rst review, and the black lines show dis-
categories rather equally. Simulation experiments have tributions for two accepts. The solid gray line assumes
subsequently convinced me that these are not useful that Rho = 015; 12% of these rejected manuscripts rep-
ideas. Five categories do yield more reliable data about resent editorial errors because they actually belong in
agreement between reviewers than do three categories, the highest-value 20% of manuscripts. The dashed gray
and seven equal categories yield slightly more reliable line assumes that Rho = 045; 6% of these rejected
data than ve somewhat unequal categories. However, manuscripts represent editorial errors because they come
increasing the categories to seven does not reduce the from the highest-value 20%. For the in-between case
variances associated with inferences about the agreement where Rho = 03, 10% of the rejected manuscripts
of reviewers opinions with true values, which are dom- represent editorial errors. All distributions for Rho
inated by uncertain relationships between shared values between 0.15 and 0.45 overlap the population distri-
and true values. Three categories, ve categories, and bution to considerable degrees, so rejected manuscripts
seven categories all have very similar implications for have a distribution much like the population of submis-
inferences about true values. If reviewers are poor judges sions.
of manuscript quality, more-rened measures of review- Figure 5 also graphs distributions of manuscripts that
ers opinions have insignicant utility, whereas small would receive accepts from both reviewers after the rst
numbers of categories may remind editors that review- review. The solid black line assumes that, on average,
ers judgments are crude. Rho = 015, and the dashed black line assumes that,
All considered, it seems reasonable to infer that cor- on average, Rho = 045. When Rho = 015, only one-
relations between reviewers opinions and manuscripts fourth of the accepted manuscripts actually belong in
true values are almost always less than 0.5 and usu-
ally below 0.3. Indeed, these correlations could fall
below 0.1 in some journals or with some pairs of review- Figure 5 Percentages of Manuscripts that Receive Two
ers. Therefore, the computations to follow consider 0.15, Accepts or Two Rejects After First Review by Two
0.3, and 0.45, and implicitly the range between them. Reviewers
To reduce verbosity, I call reviewers average correlation Population
with true value Rho. Rho = 0.45, two rejects, 6% belong in
The next two sections of this article show how edito- highest-value 20% of manuscripts
Rho = 0.15, two rejects, 12% belong in
rial reviews vary as functions of Rho. The next section highest-value 20% of manuscripts
looks at a single review cycle in which two reviewers 1.2 Rho = 0.45, two accepts, 78% belong
in highest-value 20% of manuscripts
render independent judgments about each manuscript, Rho = 0.15, two accepts, 26% belong
and the ensuing section looks at the disposition of 1.0
in highest-value 20% of manuscripts

manuscripts after successive revisions and reevaluations


by reviewers and a terminating judgment by an editor.
Both parts of this process are graphed for both Rho =
Percent of manuscripts

0.8
015 and Rho = 045. If Rho is sometimes below 0.15
or above 0.45, these are very unusual events. Because
0.15 and 0.45 are supposed to be extremes, the discus- 0.6
sion also reports implications of Rho = 03, which may
approximate expected values.
0.4

Outcomes of First Reviews


Saying that a reviewers evaluations correlate Rho with 0.2

logarithms of true value implies that the evaluations


have a distribution with a mean at Rho (true value).
0.0
For example, if a reviewers evaluations correlate 0.2 4 3 2 1 0 1 2 3 4
with logarithms of true value, manuscripts with a true Log of true value of manuscript
Starbuck: The Statistics of Academic Publication
190 Organization Science 16(2), pp. 180200, 2005 INFORMS

the highest-value 20% of submissions and three quar- How Editorial Decisions After Three
ters represent errors because they should not have been Reviews Differ from Those After
accepted. When Rho = 045, these percentages reverse: First Reviews
78% of the accepted manuscripts come from the highest- Many of my colleagues believe that reviewers will be
value 20% and 22% represent errors. When Rho = 03, more likely to accept their manuscripts if they comply
half of the accepted manuscripts belong in the highest- with reviewers suggestions and more likely to reject
value 20% and half represent errors. Acceptance errors their manuscripts if they reject reviewers suggestions
occur often because four-fths of the submissions, and (Bedeian 2003, Frey 2003). On the premise that review-
hence four-fths of the opportunities to make errors, ers do reward compliance, these calculations assume that
are manuscripts that should not be accepted (Rousseeuw reviewers become 50% more likely to issue two accepts
1991, Stinchcombe and Ofshe 1969). However, the error and only 70% as likely to issue two rejects during the
rate is much lower when Rho is above 0.4. second and third reviews. Experiments disclose, how-
Figure 6 shows what happens to the best 20% ever, that these moderate assumptions about reviewers
of manuscripts when Rho varies from 0.15 to 0.45. reactions have trivial effects.
Manuscripts classied as mixed evaluation are any Figure 7 shows the dispositions of all submissions
that did not receive accepts from both reviewers or after authors have revised some of them once or even
rejects from both reviewers. The percentage of excel- twice. This calculation assumes authors revise and
lent manuscripts receiving two rejects after rst review resubmit half of the mixed evaluations from the rst
declines from 20% when Rho = 015 to 11% when review and 80% of the mixed evaluations from the sec-
Rho = 045. The percentage of excellent manuscripts ond review. The percentage of manuscripts receiving two
receiving two accepts rises from 4% when Rho = 015 rejects is quite independent of Rho.
to 22% when Rho = 045. Unfortunately, it seems that One can now estimate how often editors must be inter-
Rho rarely goes as high as 0.45. For the in-between case vening. For Rho between 0.15 and 0.45, dual accep-
when Rho = 03, just 9% of the excellent manuscripts tances fall between 6% and 13%, being 8% when
receive two accepts. Rho = 03. Yet journals accept 15% to 25% of sub-
The curvature in Figure 6 nearly disappears when cal- missions. One explanation for this difference would be
culations take all submissions into account. The per- that editors are making nal decisions about manuscripts
centage of all manuscripts receiving two rejects after with mixed evaluations.
the rst review rises only slightly from 34% to 37% Editorial intervention eliminates the blank area labeled
as Rho rises from 0.15 to 0.45, whereas the percentage mixed evaluations in Figure 7; all manuscripts are
of all manuscripts receiving two accepts rises from 3% nally accepted or rejected. Indeed, because there is
to 6%. Obviously, for Rho in this range, the disposi- no reason to believe that Rho affects the percentage of
tion of manuscripts depends very weakly on reviewers manuscripts that journals publish, editorial actions turn
abilities to discern true values. This selection process Figure 7 into two regions separated by a horizontal line:
discriminates so poorly that the phrase blind review is 75% to 85% rejections and 15% to 25% acceptances. For
apt in an unintended way. lower values of Rho, manuscripts accepted through edi-
tors interventions outnumber those accepted by review-

Figure 6 After First Review by Two Reviewers, Distributions of Figure 7 After Third Review by Two Reviewers and Two
Actions on Manuscripts that Belong in the Highest- Revisions, Distributions of Actions on All
Value 20% of Manuscripts Manuscripts
100 100

90
80 Two accepts
80
Mixed evaluations
Percent of manuscripts

Two accepts
Percent of manuscripts

70 Two rejects
Mixed evaluations
60 Two rejects
60

50
40 40

30
20 20

10
0 0
0.15 0.25 0.35 0.45 0.15 0.25 0.35 0.45
Rho, average correlation of reviewers with log of true value Rho, average correlation of reviewers with log of true value
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 191

Figure 8 Selection of Manuscripts After Two Revisions, Third whether to revise, and reviewers usually remember (or
Review by Two Reviewers, and Evaluation by journals remind them) what they said earlier. However,
the Editor
encouragement or discouragement by editors or review-
Rho = 0.45, rejects, 15% in ers cannot have effects beyond those supported by the
highest-value 20% of manuscripts underlying Rho, because people who misperceive true
Rho = 0.15, rejects, 14% in
highest-value 20% of manuscripts
value are very likely to offer advice or to demand
Rho = 0.45, accepts, 71% in
changes that are uncorrelated with true value.
1.2
highest-value 20% of manuscripts A few studies have found that reviewers agree more
Rho = 0.15, accepts, 23% in strongly about whether to publish a manuscript than
1.0
highest-value 20% of manuscripts
about what is wrong with the manuscript. For instance,
Mahoney (1977, 1979) obtained interrater correlations
around 0.3 for recommendations about acceptance and
Percent of manuscripts

0.8 for ratings of scientic contribution, but he found inter-


rater correlations close to zero for ratings of methodol-
0.6
ogy, relevance, and quality of discussion.

0.4
How Much Better Can the Most Prestigious
Journals Be?
We now return to the topic that initiated this study:
0.2
the stratication of journals. This section estimates the
maximum advantages of high status by analyzing a
0.0 review process that gives all of the advantages to the
4 3 2 1 0 1 2 3 4 highest-status journals. Thus, the inferences drawn likely
Log of true value of manuscript
overstate quality differences among strata. This section
ers, and for higher values of Rho, manuscripts accepted also compares inferred differences in the value of arti-
by reviewers outnumber those accepted through editors cles published by different strata with reported citation
interventions. rates.
Figure 8 compares the distributions of manuscripts The analysis assumes authors initially submit all of
that ultimately receive rejects with those that receive their manuscripts to journals in the rst quintile, which
accepts. The two solid lines assume that Rho = 015, choose 20% to publish. Disappointed authors then sub-
and the two dashed lines assume that Rho = 045. The mit the remaining manuscripts to journals in the second
distributions of rejected manuscripts (gray lines) are quintile, which also choose 20% to publish. However,
very similar: About one-seventh of the manuscripts that because the second-quintile journals receive only 80%
receive rejects belong in the highest-value 20%, and as many manuscripts as rst-quintile journals, they pub-
these comprise more than half of the manuscripts in lish only 16% of the original submissions. Authors then
the highest-value 20%. Because journals reject four to submit the remaining manuscripts to journals in the
seven times as many manuscripts as they accept, there third quintile, and so on through all ve quintiles. If
are many more opportunities for erroneous judgments the highest-quintile journals were actually able to select
among rejected manuscripts than among accepted ones. the best 20% of the manuscripts, none of the best 20%
The largest effect of higher correlations between would remain for second-quintile journals, and the arti-
reviewers judgments and true value is better selec- cles appearing in the higher-ranking journals would have
tion of manuscripts for acceptance. For manuscripts distinctly higher true value. However, if the highest-
in the highest-value 20%, accepts rise to 71% when quintile journals are able to select only one third of the
Rho = 045, although only 43% receive accepts when manuscripts in the highest-value 20%, two thirds remain
Rho = 03. These acceptance percentages are slightly for lower-status journals; in this case, a sequence of ve
lower than after a single review because under the journals would publish 67% of the best 20% of submis-
assumptions of this model, editors must agree only with sions, leaving 33% unpublished.
themselves, whereas reviewers must agree with each Figure 9 compares distributions of manuscripts that
other, so editors are more likely to accept the wrong ultimately receive acceptances from journals in various
manuscripts. strata when Rho = 015. The gray area shows the popula-
Figures 5 and 8 are very similar. Similarity results tion of manuscripts when initially produced. The heaviest
in part from assuming that authors are equally likely to black line shows acceptances by journals in the rst
revise all mixed-evaluation manuscripts and that review- quintile to receive them; 23% of these manuscripts
ers evaluate revised manuscripts afresh. Obviously, edi- belong in the highest-value 20% of all manuscripts, and
tors and reviewers inuence authors decisions about the average accepted article is one-fourth of a standard
Starbuck: The Statistics of Academic Publication
192 Organization Science 16(2), pp. 180200, 2005 INFORMS

Figure 9 Manuscripts Accepted After Reviews by Journals in Figure 10 Manuscripts Accepted After Reviews by Journals in
Different Strata when Rho = 015 Different Strata when Rho = 045

Population Population
First quintile, 23% in highest-value First quintile, 71% in highest-value
20% of manuscripts 20% of manuscripts
Second-third quintiles, 20% in Second-third quintiles, 38% in
1.2 highest-value 20% of manuscripts highest-value 20% of manuscripts
1.8
Fourth-fifth quintiles, 15% in highest- Fourth-fifth quintiles, 2% in highest-
value 20% of manuscripts value 20% of manuscripts
1.6
1.0

1.4
Percent of manuscripts

0.8

Percent of manuscripts
1.2

1.0
0.6

0.8

0.4 0.6

0.4
0.2
0.2

0.0 0.0
4 3 2 1 0 1 2 3 4
4 3 2 1 0 1 2 3 4
Log of true value of manuscript Log of true value of manuscript

deviation above the population mean. The medium black gure probably represents a realistic average across the
line shows acceptances by journals in the second and social sciences.
third quintiles; 20% of these manuscripts belong in the Table 5 summarizes the frequencies of articles with
highest-value 20% of all manuscripts. The lightest black different true values in each quintile of journals. Each
line shows acceptances by journals in the fourth and column sums to 1.0.
fth quintiles; 15% of these manuscripts belong in the
highest-value 20% of all manuscripts, and the average
accepted article has a value near the population mean. Of Figure 11 Manuscripts Accepted After Reviews by Journals in
Different Strata when Rho = 0.30
the manuscripts in the highest-value 20%, a depressing
30% are rejected by ve journals in sequence. Population
Figure 10 compares distributions of manuscripts that First quintile, 43% in highest-value
20% of manuscripts
receive two accepts when Rho = 045. Again, the gray Second-third quintiles, 29% in
area shows the population. The heaviest black line 1.2
highest-value 20% of manuscripts
Fourth-fifth quintiles, 13% in
shows acceptances by the rst quintile; 71% of these highest-value 20% of manuscripts
manuscripts belong in the highest-value 20%, and the
average accepted article is more than one standard devi- 1.0

ation above the population mean. The medium black


line shows acceptances by the second and third quin-
Percent of manuscripts

0.8
tiles; 38% of these manuscripts belong in the highest-
value 20%. The lightest black line shows acceptances
by the fourth and fth quintiles; only 2% of these 0.6
manuscripts belong in the highest-value 20% of all
manuscripts, and the average accepted article has a value
slightly below the population mean. All manuscripts 0.4

in the highest-value 20% receive acceptances from one


journal or another. 0.2
Figure 11 shows the in-between case when Rho =
03: 43% of acceptances by the rst quintile belong in
the highest-value 20% of all manuscripts, as do 29% of 0.0
-4 -3 -2 -1 0 1 2 3 4
acceptances by the second and third quintiles, and 13%
Log of true value of manuscript
of acceptances by the fourth and fth quintiles. This
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 193

Table 5 Distributions of Manuscripts True Values for Journals in Different Quintiles

Rho = 015 Rho = 03 Rho = 045

Journal Journal in Journal in Journal Journal in Journal in Journal Journal in Journal in


in rst second-third fourth-fth in rst second-third fourth-fth in rst second-third fourth-fth
quintile quintiles quintiles quintile quintiles quintiles quintile quintiles quintiles

Value in lowest 20% 0.09 0.11 0.14 0.05 0.08 0.14 0.01 0.03 0.10
Value in fourth 20% 0.18 0.20 0.23 0.10 0.15 0.23 0.03 0.09 0.28
Value in middle 20% 0.23 0.24 0.25 0.16 0.21 0.26 0.07 0.18 0.38
Value in second 20% 0.26 0.25 0.24 0.25 0.27 0.25 0.17 0.32 0.23
Value in highest 20% 0.23 0.20 0.15 0.43 0.29 0.13 0.71 0.38 0.02

This portrayal of academic publishing does not allow implicit in Figures 9 and 10 to consider the stratication
for specialization by journals or for authors select- of average citation values. The three center columns give
ing journals that focus on specic subtopics or specic the ratios when the correlation between citers judg-
methodologies. The analysis assumes that all journals ments and citation values is 0.15, 0.3, or 0.45. The right-
are equally interested in all manuscripts and that all hand column gives averages of the ratios graphed in
authors send their manuscripts to the highest-quintile Figure 2. The table shows that citation patterns of social
journals rst. Specialization enables less-prestigious scientists have a steeper status hierarchy than would be
journals to attract higher-value manuscripts, which could consistent with the belief that citers judgments corre-
decrease differentiation between lower-status and higher- late 0.15 with articles citation values, but citation pat-
status journals. Less-prestigious journals have probably terns have a much atter status hierarchy than would be
made gains over the last two decades, as graphed in consistent with the belief that citers judgments corre-
Figure 2, by developing specialized niches. Many less- late 0.45 with articles citation values. Citation patterns
prestigious journals also take more risks when selecting resemble the belief that citers judgments correlate 0.31
manuscripts for publication, which increases the vari- with articles citation values.
ance in what they publish but also makes them more The numbers in Table 6 cast doubt on the idea that
welcoming to innovative topics or methodologies. Rho could be as high as 0.45. If Rho were that high, cita-
Of course, not all authors send all of their manuscripts tions would be much more stratied than they actually
to rst-quintile journals initially, but it is debatable are, because readers of journals would have higher con-
how much these deviations from the assumptions may dence that the best articles appear in the most prestigious
affect overall patterns. Authors make misjudgments journals. Although there are other reasons to cite articles
about their manuscripts at least as often as reviewers do, other than their true values, it seems unlikely that these
and because an authors decision is very categorical other reasons are sufcient to cut the ratios of citation
whether to submit this manuscript to a journal in that rates for the rst-quintile journals from 32.9 to 5.8.
quintileit would be a noisy decision. Furthermore, Table 7 takes this notion of citation value further to
authors choices of where to submit manuscripts reect examine differences among social science elds. Cita-
factors other than their judgments about manuscripts tions of psychology journals are consistent with belief
true values. Many authors submit to lower-status jour- that citers judgments correlate between 0.33 and 0.37
nals not because they deem their manuscripts to lack with citation values. By contrast, citations of sociology
value, but because they are hoping for friendlier editors journals are consistent with the belief that this correla-
and less-intrusive reviewers. tion falls between 0.28 and 0.31.
How Citers Judgments Correlate with Articles Cita- Although the numbers in Table 7 generally resemble
tion Values. Table 6 reinterprets average true values those assumed in the analyses of editorial review, the

Table 6 Stratication of the First Quintile, Second and Third Quintiles, and Fourth and Fifth Quintiles of Journals

Ratios of average Ratios of average Ratios of average Observed average ratios


citation values if citation values if citation values if of impact factors in
correlation = 015 correlation = 030 correlation = 045 social sciences

Ratio of rst quintile to average of fourth 1.7 4.9 329 5.8


and fth quintiles
Ratio of rst quintile to average of second 1.2 2.1 58 2.4
and third quintiles
Ratio of average of second and third 1.3 2.4 56 2.4
quintiles to average of fourth and
fth quintiles
Starbuck: The Statistics of Academic Publication
194 Organization Science 16(2), pp. 180200, 2005 INFORMS

Table 7 Correlations of Citers Judgments with Articles Citation Values that Are
Consistent with the Impact Factors of Journals in Different Fields

Ratios of rst Ratios of rst Ratios of second-


quintile to fourth- quintile to second- third quintiles to
fth quintiles third quintiles fourth-fth quintiles

Sociology journals 0.30 0.31 0.28


Economics journals 0.33 0.35 0.28
Management journals 0.33 0.34 0.33
Psychology journals 0.37 0.33 0.35

two processes are not equivalent. Citation values dif- manuscripts are at the 90% percentile (1.28 standard
fer from true values, and the criteria that citers apply deviations above the mean) would publish 32% of their
differ from the criteria that reviewers apply. However, manuscripts in the rst quintile of journals, 46% in
citation values undoubtedly correlate positively with true the second and third quintiles, and 23% in the fourth
value, both citing and reviewing rely on similar percep- and fth quintiles, and none of their manuscripts would
tion capabilities, and both processes operate on distribu- remain unpublished. At the opposite extreme, when
tions that are approximately lognormal. Rho = 045, authors at the 60% percentile would publish
25% of their manuscripts in the rst quintile of journals,
34% in the second and third quintiles, and 20% in the
Implications for Faculty Evaluation fourth and fth quintiles, and 20% would be rejected
Expectations about true values should depend on Rho: by ve journals in sequence. These distributions assume
The more condence one has in reviewers judgments, that the standard deviation of a hypothetical authors
the higher value one should assign to publishing in pres- manuscripts is one-fourth of the population standard
tigious journals. Using average true value of articles in deviation. Because Figures 12 and 13 describe expected
the fourth and fth quintiles of journals as a norm, the percentages, which would be approximated only by very
upper half of Table 8 lists implied average true values large samples, small samples of an authors work may
of articles in the rst quintile and second and third quin- depart noticeably from these expectations.
tiles of journals. Although one could use these averages Higher values of Rho give an advantage to authors
as metrics with which to evaluate articles by individ- who consistently produce higher-value manuscripts and
ual faculty, every category of journals publishes articles give a disadvantage to authors who produce lower-value
with a very wide range of true values, so average true manuscripts. When Rho is low, journals make erro-
values for journals actually provide no useful informa- neous decisions more often, and poorer manuscripts have
tion about any single article. The lower half of Table 8 higher probabilities of acceptance by higher-prestige
demonstrates this dispersion by listing ranges of true val- journals. However, according to Figure 13, even when
ues that one can expect to nd in different quintiles of Rho = 045, an author whose manuscripts fall into the
journals. 90th percentile would publish only 77% in the rst quin-
Departments and schools rarely base faculty eval- tile of journals, and the other 23% would appear in the
uations on a single article, and the ambiguity indi- second and third quintiles. Under the more plausible
cated by Table 8 decreases as researchers produce more assumption that Rho = 03, an author whose manuscripts
manuscripts. Figures 12 and 13 describe expected dis- fall into the 90th percentile would publish only half in
tributions of articles by a single author as a function the rst quintile of journals, and would publish the other
of the average true value of that authors manuscripts. half in the second and third quintiles. Figures 12 and 13
For example, when Rho = 015, authors whose average imply that many authors meet repeated failure. In fact,

Table 8 Means and Probable Ranges of the Mean True Values of Journals as Functions of Rho

Rho = 015 Rho = 02 Rho = 025 Rho = 03 Rho = 035 Rho = 04 Rho = 045

Mean true values


First quintile 1.7 2.2 3.1 4.9 8.7 16.8 32.9
Second-third quintiles 1.3 1.5 1.9 2.4 3.1 4.2 5.6
Fourth-fth quintiles 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Ranges that encompass 80% of the true values
First quintile 0.1 to 19 0.2 to 27 0.2 to 41 0.3 to 68 0.4 to 121 0.7 to 228 1.3 to 446
Second-third 40% 0.1 to 14 0.1 to 18 0.1 to 23 0.2 to 29 0.2 to 36 0.3 to 45 0.5 to 56
Fourth-fth quintiles 0.1 to 10 0.1 to 11 0.1 to 11 0.1 to 10 0.1 to 8.5 0.1 to 6.7 0.2 to 5.6
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 195

Figure 12 Expected Distributions of Numerous Manuscripts Looking Only at Counts of Articles in A Journals.
by an Author Whose SD = 025 when Rho = 015 It is possible to draw some inferences about the aver-
100 age value of an authors manuscripts based solely on
90
Unpublished information about the percentage of publications in top-
Two bottom quintiles
tier journals, and these inferences have some surprising
Second-third quintiles
80
Top quintile characteristics. We will consider rst inferences possible
70 from very large numbers of articles and then inferences
Percent of manuscripts

60 possible from just 20 articles.


Figure 14 shows how the percentiles of authors aver-
age manuscripts vary with the percentage that appeared
50

40
in the highest-quintile journals. The steepest lines made
30 of small dashes represent the assumption that Rho =
20
015, the least-steep lines made of alternating short and
long dashes represent the assumption that Rho = 045,
10
and the solid lines represent the assumption that Rho =
0 03. Heavy black lines describe expected average val-
0 10 20 30 40 50 60 70 80 90
Percentile of an authors average manuscript compared to all submissions
100
ues, and thinner gray lines describe 80% condence
by all authors intervals that assume the author has produced numerous
manuscripts. For example, if Rho = 015, authors with
40% of their publications in the rst quintile of jour-
Trieschmann et al. (2000) reported that authors from the
nals have published manuscripts with an average value
top 50 business schools account for 70% of the articles
in the 86th to 90th percentiles. However, if Rho = 045,
in 20 prestigious business journals.
authors with 40% of their publications in the rst quin-
I have been unable to think of a rationale other than
tile journals have published manuscripts with an aver-
laziness or simplicity that would support actually ignor-
age value in the 44th to 62nd percentiles. The curves
ing publications that are not in top-tier journals. Any
for Rho = 015 terminate near the center of the chart
decision rule that discards potential information cannot
because the randomness of review processes makes it
perform better than an equally reasonable rule that uti-
impossible to place more than 60% of a large sample in
lizes all information. Furthermore, because even articles
the highest-quintile journals.
in the lowest quintile of journals may actually belong
Figure 14 shows an interesting effect of Rho. If one
among the best 20% written, it makes no sense to dis-
believes Rho = 0.3 or 0.45, one should have a higher
miss these articles as valueless merely based on where
opinion of authors who have published less than 20% in
they appeared. For those who insist on basing assess-
the highest-quintile journals than if one believes Rho =
ments solely on the journals in which articles appear,
0.15. If Rho = 0.15, placing a very small percentage of
a more rational evaluation scheme would be to use the
articles in the highest-quintile journals is a strong sign of
average true values associated with different categories
of journals, as listed in Table 8.
Figure 14 Percentile of an Authors Average Manuscript
as a Function of the Percentage Published by
Figure 13 Expected Distributions of Numerous Manuscripts Top-Quintile Journals, Numerous Manuscripts
by an Author Whose SD = 025 when Rho = 045 in Sample
100 100
Percentile of authors average manuscript among all

Unpublished
90 Two bottom quintiles 90
Second-third quintiles
80 Top quintile 80
Submissions by all authors

70
Percent of manuscripts

70

60 60

50 50
10% limit, Rho=0.15
40 40 Mean, Rho =0.15
90% limit, Rho=0.15
30 30 10% limit, Rho=0.3
Mean, Rho =0.3
20 20 90% limit, Rho=0.3
10% limit, Rho=0.45
10 10 Mean, Rho =0.45
90% limit, Rho=0.45
0 0
0 10 20 30 40 50 60 70 80 90 99 10 20 30 40 50 60 70 80 90 100
Percentile of the authors average manuscript compared to all submissions Percentage of numerous manuscripts published by top-quintile
by all authors journals
Starbuck: The Statistics of Academic Publication
196 Organization Science 16(2), pp. 180200, 2005 INFORMS

Figure 15 Percentile of an Authors Average Manuscript as at the lower right shows that an author who has pub-
a Function of the Percentage Published by Top- lished 20 articles with 4 or more in the rst-quintile
Quintile Journals, 20 Manuscripts in Sample journals is very unlikely to be producing research in the
100 lowest 30% of submissions. The different values of Rho
have similar implications when authors have published
Percentile of authors average manuscript among all

Mean, Rho = 0.15 Mean, Rho = 0.3


Mean, Rho = 0.45
90
3 to 7 articles in the rst quintile out of 20. Authors
80
who have published 5 articles out of 20 in the rst quin-
70 tile may have been producing manuscripts that average
submissions by all authors

60 value as high as the 80th percentile or as low as the 44th


50
percentile if Rho = 015, as high as the 71st percentile
or as low as the 35th percentile if Rho = 03, and as high
40 10% limit, Rho=0.15
Mea n, Rho=0.15
as the 65th percentile or as low as the 30th percentile if
30 90% limit, Rho=0.15
10% limit, Rho=0.3
Rho = 045.
20 Mea n, Rho=0.3
90% limit, Rho=0.3
Because articles published in high-prestige journals
10
10% limit, Rho=0.45 receive signicantly more citations, departments and
Mea n, Rho=0.45
90% limit, Rho=0.45 schools that strongly emphasize publication in the most-
0
2 4 6 8 10 12 14 16 18
prestigious journals may be pursuing visibility, which
Number of manuscripts published by top-quintile journals, out of 20 attracts students and faculty. However, actual citation
manuscripts behavior does not support an exclusive focus on high-
prestige journals. On average, 2.4 articles in the second
and third quintiles of journals draw as many citations
low average value because review processes are so ran-
as 1 article in the rst quintile of journals, and 5.8 arti-
dom that mediocre articles get into the highest-quintile
cles in the fourth and fth quintiles of journals draw as
journals by accident. But conversely, if one believes
many citations as 1 article in the rst quintile of jour-
Rho = 0.3 or 0.45, one should have a lower opinion nals. These ratios vary, however, across academic elds.
of authors who have published more than 30% in the Top-tier publications attract higher percentages of cita-
highest-quintile journals than if one believes Rho = tions in psychology and lower percentages in sociology.
0.15. If Rho = 0.15, placing a high percentage of arti- Moreover, if departments and schools want to measure
cles in the highest-quintile journals is a strong sign of individuals visibility, in addition to using journals pres-
high average value because work must have very high tige as a proxy, they can use citations of specic arti-
value to overcome the randomness of review processes. cles, appearances at conferences, and even solicited data
For instance, an author who places 40% to 60% of a about reputations.
very large sample in the rst-quintile journals should be
deemed in the 86th to 93rd percentile if Rho = 0.15, in
the 59th to 84th percentile if Rho = 0.3, or in the 44th Conclusion
to 69th percentile if Rho = 0.45. Although higher-prestige journals publish more high-
Its assumption of a very large sample limits the use- value articles, editorial selection involves considerable
randomness. Highly prestigious journals publish quite
fulness of Figure 14. A practically useful analysis must
a few low-value articles, low-prestige journals publish
allow for small to moderately large samples. Actual pub-
some excellent articles, and excellent manuscripts may
lications are binomial processes with each author sub-
receive successive rejections from several journals. Eval-
mitting a limited number of manuscripts and with proba-
uating articles based primarily on which journals pub-
bilities of acceptance set by review processes. Figure 15 lished them is more likely than not to yield incorrect
shows what happens to Figure 14 when authors have assessments of articles values. Yet personnel evalua-
produced 20 articles. Again, the steepest lines made of tions by many departments and schools seem to underes-
small dashes represent the assumption that Rho = 015, timate or even to ignore this randomness, and in extreme
the attest lines made of alternating small and large cases these evaluations focus on one myopic measure.
dashes represent the assumption that Rho = 045, and the For most departments and schools, extreme empha-
solid lines represent the assumption that Rho = 03. The sis on publication in top-tier journals has a signicant
heavy black lines describe expected average values, and probability of introducing randomness because the con-
the thinner gray lines describe 80% condence intervals. dence intervals associated with such publications are
Although the small sample widens the condence very wide. Furthermore, an evaluation process that dis-
intervals, two regions have rather clear implications. The cards potentially useful information is less sensible than
blank area at the upper left shows that an author who has a process that takes advantage of all available infor-
published 20 articles with 3 or fewer in the rst-quintile mation. However, departments and schools could use
journals is very likely to be producing research that aver- publication in top-tier journals as a reliable criterion
ages below the 60th percentile. Likewise, the blank area for personnel who have published many articles (say,
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 197

more than 50) with high percentages of these articles reviewers judgments say little about the true values
in top journals. Indeed, widespread use of this criterion of manuscripts, but if reviewers agree fairly strongly,
would help elite departments and schools to retain their authors should infer that reviewers judgments reect the
statuses by enabling them to select and retain excel- true values of manuscripts to some degree. When they
lent researchers while adding randomness into the per- agree with each other, reviewers who seem to espouse
sonnel decisions of most departments and schools. If different values are probably offering especially infor-
lower-status departments and schools are adopting this mative judgments.
criterion to imitate their more successful competitors, Of course, reviewers who agree strongly are not preva-
imitation is placing the imitators at a disadvantage rel- lent, and reviewers are especially prone to disagree
ative to those they are imitating. Might it be possible about a specic manuscriptwhat does it do right;
to nd evidence about whether departments and schools what does it need? In particular, reviewers comments
fare differently over time if they use different criteria in about research methodology may be cloaking agreement
personnel decisions? or disagreement of their personal values with studies
People who are dealing with unreliable data should ndings. Authors should expect to receive inconsistent
attempt to cross-check their data by triangulating differ- reviews, and they dare not rely on editors and review-
ent kinds of data and data from different sources. Even ers to tell them what to do. Because academic research
departments and schools that could use publication in elicits mainly negative judgments, authors dare not let
top-tier journals as a reliable criterion could give their negative reviews undermine their condence in their
decisions more reliable foundations by taking additional work. Although it is useful to listen to what reviewers
evidence into account and by analyzing all available data say (Starbuck 2003), authors need to base their ultimate
in some depth. Of course, it is debatable whether bet- decisions on their own expertise and their own values.
ter data would produce better personnel decisions, but The belief that high-status journals publish excellent
that is a topic for another article. Personnel decisions articles whereas low-status journals publish poor arti-
entail even more uncertainty than does peer review. For cles may be impeding the development of knowledge.
one thing, universities are making commitments for long- The analysis in this article indicates that 29% to 77%
term future periods whereas data concern past activi- of the articles in the rst quintile of journals do not
ties. For another thing, faculty migration gives individ- belong among the highest-value 20% of manuscripts, the
uals opportunities to overcome adverse decisions and it in-between estimate being 57%. If publication in high-
undercuts universities efforts to overtake higher-status status journals leads social scientists to adopt less valu-
competitors. able articles as exemplars, mediocre articles are exerting
The simulations behind Figure 4 and Table 4 sug- as much or more inuence on scientic values as are
gest some practical implications for editors. If the cor- excellent articles. Like people making personnel deci-
relation between reviewers is below 0.4, which is very sions, researchers should triangulate different kinds of
likely, editors should assume that the reviewers judg- evidence about scientic contributions and draw evidence
ments correlate weakly with manuscripts true values. from diverse sources. Before one can deal effectively
Only when there is a fairly high correlation between with randomness, one must acknowledge its existence.
reviewers should editors consider reviewers judgments
to yield useful information about manuscripts true val- Acknowledgments
ues. Furthermore, editors are more likely to obtain useful This article has beneted from contributions from Linda
information about manuscripts true values if they choose Argote, Joel Baum, Tom Cummings, Joan Dunbar, Ned Elton,
reviewers who hold different values from each other. Moshe Farjoun, Bruno Frey, Ari Ginsberg, Paul Hirsch,
The simulations and algebraic analyses in this arti- Marshall Meyer, Wanda Orlikowski, Jeff Pfeffer, Karlene
cle assume that reviewers opinions have the same Roberts, Sandra Robinson, Josh Ronen, Ehsan Soo, and Mike
variance as the population of submissions. I also exper- Tushman, and especially from extensive comments by Art
imented with an alternative assumption that reviewers Bedeian, Roger Dunbar, Joanne Martin, Bill McKelvey, Joe
opinions have only half the population variance, which Porac, and Sim Sitkin.
amounts to saying that reviewers judge manuscripts as
being more similar than they really are. This assump- Endnotes
1
tion greatly reduces agreement between reviewers and Statistics for Academy of Management Journal and Academy
of Management Review are averages of data reported by
greatly reduces correlations between reviewers opinions
Angelo DeNisi and Susan Jackson to the Academys Board of
and manuscripts shared values. Thus, editors receive Governors in 1996.
less-useful information when they use reviewers whose 2
Both intraclass correlation coefcients and Kappas charac-
judgments have restricted ranges. terize the consistency with which nominal variables were
Authors of research manuscripts can draw similar classied on multiple occasions or by multiple classiers.
inferences from reviewers judgments. That is, if review- Both measures are supposed to equal zero when there is no
ers disagree with each other, authors should infer that consistency, and both can take negative and positive values.
Starbuck: The Statistics of Academic Publication
198 Organization Science 16(2), pp. 180200, 2005 INFORMS

However, it may be impossible for either intraclass correla- normally distributed with a mean of R (true value) and with
tion coefcients or Kappas to have absolute values as large a variance of 1 R-squared, and the two reviewers opinions
as 1.0. One can use different weighing schemes when calculat- were normally distributed variables with means of P (shared
ing Kappas, and calculations in this article use linear weights. value) and with variances of 1 P -squared. Opinions were
3
The rst set of simulations explored properties of the mea- classied in three or ve categories with the expected frequen-
sures of agreementintraclass correlations, Kappas, and cies of 17%, 25%, and 58% or 17%, 12.5%, 12.5%, 29%, and
product-moment correlations. The computer generated a rst 29%. A sample of 50 manuscripts was generated and product-
reviewers opinion as a normally distributed variable with a moment correlations were computed between opinions and
mean of zero and variance of 1, and then generated a sec- between judgments. This process was repeated 10 times for
ond reviewers opinion as a normally distributed variable with P = 0, P = 005, P = 01, and so on up to P = 095, then
a mean of R times the rst reviewers opinionR being an results were recorded. All of the above was repeated for sam-
assumed correlation between reviewers. Because this corre- ple sizes of 50, 100, 200, 500, and 1,000 and for R = 025,
lation transmits some variance from the rst reviewer to the 0.5, 0.75, and 0.95. These calculations veried the conjecture
second reviewer, if the rst reviewer has a variance of 1, the that the assumed value of R affords a good basis for estimat-
generated opinions of the second reviewer must have a vari- ing the correlation of each reviewers opinions with true value
ance of 1 R-squared in order that both opinions will have as well as regression estimates of the standard deviations of
the same variance. Both reviewers opinions were then con- this relationship (Figure 4).
verted to the categories accept, revise, and reject or the cate- 4
Citations are skewed and approximately lognormal. ISI cat-
gories accept, minor revise, revise, drastic revise, and reject. aloged citations to 1,705 social science journals that received
Categorization used values of the normal distribution that citations during 1999. During 1999, these journals published
would hypothetically produce frequencies of 17%, 25%, and over 70,000 articles, which received nearly 2 million citations,
58% or 17%, 12.5%, 12.5%, 29%, and 29%. For example, a almost all of these citations to articles published before 1999.
reviewer who recommends acceptance if the normal deviate In fact, the articles published during 1997, 1998, and 1999
is above 0.954 would accept 17% of the manuscripts. After account for only about 300,000 of the 2 million citations, so
this process was repeated for a sample of 50 manuscripts, the the bulk of citations refer to articles that are more than two
classied judgments were used to compute an intraclass corre- years old. On average, articles published by these journals
lation, a Kappa, and a product-moment correlation. All of the during 1997 and 1998 received 0.5 citations during 1999; the
foregoing was repeated 50 times for R = 095, 50 times for most-cited quintile had an average impact of 1.93, the sec-
R = 090, and so forth up to 50 times for R = 095. Results ond and third quintiles had an average impact of 0.67, and the
were recorded, then all of the above was repeated for samples fourth and fth quintiles had an average impact of 0.18.
of 100, 200, 500, 1,500, and 15,000 manuscripts. These cal- In mid-2004, ISIs listing of most-cited articles included
culations allowed estimates of the standard deviations of the 600 articles published during 1999; these had received at least
measures in Table 2, and they showed that intraclass correla- 10 citations from 1999 to mid-2004, and as many as 193 cita-
tions and product-moment correlations give very similar num- tions. For these 600 articles, the logarithm of the number of
bers whereas Kappas generally have smaller absolute values. times they were cited correlated 0.44 with the logarithm of
A second set of simulations looked at relations between the 1999 impact ratings of the journals that published them.
reviewers judgments, which place manuscripts in a few
Among the 1,705 journals, 1,536 journals (90%) had no arti-
discrete categories, and their nely graded opinions of
cles among the most-cited 600 and 169 had one or more arti-
manuscripts. The computer generated a normally distributed
cles among the most-cited 600. Only two journals (Behavioral
shared value for each manuscript, and then two opinions
and Brain Sciences and Journal of Economic Literature) had
that correlated with this shared value. Both reviewers opinions
more than 50% of the articles they published among the most-
were normally distributed variables with means of R (shared
cited 600, and just eight other journals had over 30% of the
value) and with variances of 1 R-squared. Opinions were
articles they published among the most-cited 600. The fore-
classied in three or ve categories with the expected frequen-
going statistics only describe citations by journals to journals;
cies of 17%, 25%, and 58% or 17%, 12.5%, 12.5%, 29%, and
citations to books or by books are not included.
29%. A sample of 50 manuscripts was generated and product- 5
The calculations assume that the editor and the two review-
moment correlations were computed between opinions and
ers have slightly different correlations with true value: One
between judgments. This process was repeated 10 times for
R = 0, R = 001, R = 002, and so on up to R = 099, then reviewer has a correlation of 12 Rho, the other has a correla-
results were recorded. All of the above was repeated for sam- tion of 08 Rho, and the editor has a correlation of Rho. For
ple sizes of 50, 100, 200, 500, 5,000, and 15,000. These cal- example, if the editors correlation is 0.2, one reviewers corre-
culations supported estimates of the ratios of the correlations lation is 0.24 and the other reviewers correlation is 0.16. This
between opinions versus the correlations between judgments assumption seems more realistic than three identical reviewers,
as well as standard deviations for these ratios. but it has a negligible effect on the inferences.
A third set of simulations investigated relations of review-
ers opinions and judgments with manuscripts true values. It References
had a logical ow similar to Figure 3. The computer gener- Amin, M., M. Mabe. 2000. Impact factors: Use and abuse. Perspec-
ated a normally distributed true value for each manuscript, tives in Publishing. Elsevier Science, http://www.elsevier.com/
then generated a normally distributed shared value that cor- framework_editors/pdfs/Perspectives1.pdf.
related with this true value, and then generated two opinions Armstrong, J. S. 1997. Peer review for journals: Evidence on quality
that correlated with this shared value. The shared value was control, fairness, and innovation. Sci. Engrg. Ethics 3 6384.
Starbuck: The Statistics of Academic Publication
Organization Science 16(2), pp. 180200, 2005 INFORMS 199

Bailar, J. C. 1991. Reliability, fairness, objectivity and other inappro- Gottfredson, D. M., S. D. Gottfredson. 1982. Criminal justice and
priate goals in peer review. Behavioral Brain Sci. 14 137138. (reviewer) behavior: How to get papers published. Criminal
Baumeister, R. E. 1990. Dear journal editor, its me again: Sample Justice Behavior 9(3) 259272.
cover letter for journal manuscript resubmissions. Dialogue Gottfredson, S. D., W. D. Garvey, J. E. Goodnow. 1977. Quality indi-
5(Fall) 16. cators in the scientic journal article publication process. JSAS
Catalog Selected Doc. Psych. 7 74.
Bedeian, A. G. 1996a. Thoughts on the making and remaking of the
management discipline. J. Management Inquiry 5 311318. Hargens, L. L. 1990. Variation in journal peer review systems:
Possible causes and consequences. J. Amer. Medical Assoc. 263
Bedeian, A. G. 1996b. Improving the journal review process: The
13481352.
question of ghostwriting. Amer. Psychologist 51 1189.
Hargens, L. L., J. R. Herting. 1990. Neglected considerations in the
Bedeian, A. G. 2003. The manuscript review process: The proper analysis of agreement among journal referees. Scientometrics 19
roles of authors, referees, and editors. J. Management Inquiry 91106.
12 331338.
Heider, F. 1958. The Psychology of Interpersonal Relations. Wiley,
Bedeian, A. G. 2004. Peer review and the social construction of New York.
knowledge in the management discipline. Acad. Management
Learn. Ed. 3 198216. Hendrick, C. 1976. Editorial comment. Personality Soc. Psych. Bull.
2 207208.
Bowen, D. D., R. Perloff, J. Jacoby. 1972. Improving manuscript eval-
uation procedures. Amer. Psychologist 27 221225. Hendrick, C. 1977. Editorial comment. Personality Soc. Psych. Bull.
3 12.
Calhoun, M. A., W. H. Starbuck. 2003. Barriers to creating knowl-
Holbrook, M. B. 1986. A note on sadomasochism in the review pro-
edge. M. Mark Easterby-Smith, M. A. Lyles, eds. Handbook of
cess: I hate when that happens. J. Marketing 50 104106.
Organizational Learning and Knowledge Management. Black-
well, Oxford, UK, 473492. Horrobin, D. F. 1990. The philosophical basis of peer review and
the suppression of innovation. J. Amer. Medical Assoc. 263(10)
Campanario, J. M. 1996. Have referees rejected some of the most-
14381441.
cited papers of all times? J. Amer. Soc. Inform. Sci. 47 302310.
Kiesler, C. A. 1991. Confusion between reviewer reliability and
Campanario, J. M. 1998a. Peer review for journals as it stands today
wise editorial and funding decisions. Behavioral Brain Sci. 14
Part 1. Sci. Comm. 19 181211.
151152.
Campanario, J. M. 1998b. Peer review for journals as it stands Lindsey, D. 1978. The Scientic Publication System in Social Sci-
todayPart 2. Sci. Comm. 19 277306. ence: A Study of the Operation of Leading Professional Jour-
Cicchetti, D. V. 1980. Reliability of reviews for the American Psy- nals in Psychology, Sociology, and Social Work. Jossey-Bass,
chologist. Amer. Psychologist 35 300303. San Francisco, CA.
Cicchetti, D. V. 1985. A critique of Whitehursts Interrater agreement Luce, D. R., E. Galanter 1963. Discrimination. D. R. Luce,
for journal manuscript reviews: De omnibus, disputandem est. R. R. Bush, E. Galanter, eds. Handbook of Mathematical
Amer. Psychologist 40 563568. Psychology. John Wiley & Sons, New York, 191243.
Cicchetti, D. V. 1991. The reliability of peer review for manuscript Mahoney, M. J. 1977. Publication prejudices: An experimental study
and grant submissions: A cross-disciplinary investigation. of conrmatory bias in the peer review system. Cognitive
Behavioral Brain Sci. 14 119186. Therapy Res. 1 161175.
Cicchetti, D. V. 2003. The peer review of scientic documents: Sug- Mahoney, M. J. 1979. Psychology of the scientist: An evaluative
gestions for improvements. Presentation to the Committee on review. Soc. Stud. Sci. 9(3) 349375.
Research in Education, National Research Council, Washington, Mahoney, M. J., A. E. Kazdin, M. Kenigsberg. 1978. Getting pub-
D.C. (February 26). lished. Cognitive Therapy Res. 2 6970.
Cicchetti, D. V., L. D. Eron. 1979. The reliability of manuscript Marsh, H. W., S. Ball. 1981. Interjudgmental reliability of review
reviewing for the Journal of Abnormal Psychology. Proc. Amer. for the Journal of Educational Psychology. J. Ed. Psych. 73
Statist. Assoc. 22 596600. 872880.
DiMaggio, P. J., W. W. Powell. 1983. The iron cage revisited: Insti- Nylenna, M., P. Riis, Y. Karlsson. 1994. Multiple blinded reviews of
tutional isomorphism and collective rationality in organizational the same two manuscripts: Effects of referee characteristics and
elds. Amer. Sociological Rev. 48 147160. publication language. J. Amer. Medical Assoc. 272 149151.
Ellison, G. 2002. The slowdown of the economics publishing process. Palacios-Huerta, I., O. Volij. 2004. The measurement of intellectual
J. Political Econom. 110 947993. inuence. Econometrica 72 963978.
Frey, B. S. 2003. Publishing as prostitution? Choosing between ones Peters, D. P., S. J. Ceci. 1982. Peer-review practices of psychological
own ideas and academic success. Public Choice 116 205223. journals: The fate of published articles, submitted again. Behav-
ioral Brain Sci. 5 187255.
Gans, J. S., G. B. Shepherd. 1994. How are the mighty fallen:
Rejected classic articles by leading economists. J. Econom. Pfeffer, J., A. Leong, K. Strehl. 1977. Paradigm development and
Perspect. 8 165179. particularism: Journal publication in three scientic disciplines.
Soc. Forces 55 938951.
Gottfredson, S. D. 1977. Scientic quality and peer-group consen-
sus. Dissertation Abstracts International 38 1950B, University Rousseeuw, P. J. 1991. Why the wrong papers get published. Chance:
Microlms No. 7719, Johns Hopkins University, Baltimore, New Directions Statist. Comput. 4(1) 4143.
MD, 588. Scarr, S., B. L. R. Weber. 1978. The reliability of reviews for the
Gottfredson, S. D. 1978. Evaluating psychological research reports: American Psychologist. Amer. Psych. 33 935.
Dimensions, reliability, and correlates of quality judgments. Schminke, M. 2002. From the editors. Acad. Management J. 45
Amer. Psychologist 33(10) 920934. 487490.
Starbuck: The Statistics of Academic Publication
200 Organization Science 16(2), pp. 180200, 2005 INFORMS

Scott, W. A. 1974. Interreferee agreement on some characteristics of Stinchcombe, A. L., R. Ofshe. 1969. On journal editing as a proba-
manuscripts submitted to the Journal of Personality and Social bilistic process. Amer. Sociologist 4 116117.
Psychology. Amer. Psychologist 29 698702. Trieschmann, J. S., A. R. Dennis, G. B. Northcraft, A. W. Niemi, Jr.
Seidl, C., U. Schmidt, P. Grsche. 2001. A beauty contest of referee 2000. Serving multiple constituencies in business schools: MBA
processes of economics journals. Manuscript, University of Kiel, program versus research performance. Acad. Management J. 43
Kiel. 11301141.
Smigel, E. D., H. L. Ross, H. L. 1970. Factors in the editorial deci- West, B., M. Shlesinger. 1990. The noise in natural phenomena. Amer.
sion. Amer. Sociologist 5(February) 1921. Scientist 78(JanuaryFebruary) 4045.
Starbuck, W. H. 2003. Turning lemons into lemonade: Where is the Whitehurst, G. J. 1984. Interrater agreement for journal manuscript
value in peer reviews? J. Management Inquiry 12 344351. reviews. Amer. Psychologist 39 2228.
Stigler, S. M. 1994. Citation patterns in the journals of statistics and Wolff, W. M. 1970. A study of criteria for journal manuscripts. Amer.
probability. Statist. Sci. 9 94108. Psychologist 25 636639.

Вам также может понравиться