Peer Review

views updated

PEER REVIEW

Peer review is a term of art covering a set of practices that collect and apply the judgement of expert reviewers (identified as "expert," not just "knowledgeable"—so the designation is a political justification as well as a substantive one) to decisions about which manuscripts to publish, which proposals to fund, and which programs to sustain or trim. Peer review and its variants are preferred in science not only because they bring appropriate expertise to bear on decisions, but also because they assert the professional autonomy of scientists. The review of original ideas grounded in acceptable evidence certifies the accuracy, validity, and heuristic value of results. Peer reviewers are collegial critics who contribute uniquely to this competitive negotiation process by allocating scarce resources—money, time, space—and the career capital they help to generate. Outcomes based on peer review thus concentrate or disperse available resources over a pool of eligible competitors, advancing collective knowledge and practice, on the one hand, and individual careers, on the other.

Although peer review is a highly valued process, it nevertheless lacks careful or rigid definition. What constitutes a "peer" may be disputed, the factors to be considered by reviewers may vary, and the weights accorded their judgments are likely to be unequal. Moreover, there is probably an inverse relationship between knowledge and conflict of interest: the smaller a circle of peers the more sound and nuanced their knowledge of an area, but the more likely that these peers are friends or maintain potentially compromising relations with those being reviewed. How those relations are restrained to preserve balanced judgment is a challenge to peer review procedures.

For example, there are no hard-and-fast rules about how a peer reviewed journal, versus a non–peer reviewed journal, will decide what to publish. The former, however, are valued for the presumed standards of rigor and fairness that often carry scientific and academic prestige. Likewise, some peer review processes for grants are blind to reviewer and reviewed (proposer) alike, others only to the reviewed. Reviewers may vary widely in number and characteristics (demographic, intellectual, national, or organizational context) and may shade reviews in unanticipated ways. Finally, the collective judgment represented by peer review is sometimes deemed unassailable, often just advisory. Corporations and government agencies also employ peer review—internally or with external reviewers mixed in—to assess the quality of the science destined for reports, decisions, or policy recommendations. Peer review in scientific and technological contexts has been most subject to analysis, but also diminished in level of detail by preserving the anonymity and confidentiality promised by editors and agency stewards.

Origins and Purposes

Peer review of scientific manuscripts dates back to the Philosophical Transactions of the Royal Society of the mid-seventeenth century. The origins of peer review for grants are more recent and murky. The National Advisory Cancer Council, established in 1937, was authorized to review applications for funding and to "certify approval" to the surgeon general. The Office of Naval Research developed an informal variant of peer review, which may have been brought to the National Science Foundation (NSF) when Alan T. Waterman became its first director. Peer review is not mentioned in the NSF founding legislation, but the agency is known as its foremost practitioner (England 1983). The more widespread development and application of peer review processes has occurred episodically since the 1960s.

Understanding peer review requires reflection on both its purposes and values. Peer review circulates research ideas in their formative stages to key gatekeepers in a field. Sometimes this signals others to avoid duplication of effort. Other times it calls attention to a problem that is promising, attracting other researchers and setting off a race for priority (for example, work on cancer genes). Thus, by the time new research is finally published, aspects of its findings and methods may be generally familiar to many in the field, speeding its acceptance and utilization while drawing constructive criticism.

Peer review may also bring values beyond scientific or technical quality to research funding decisions. These values may be overriding or subtle, and they relate ways in which peer review is grounded in a democratic context. History attests to the political contamination of science and other forms of malpractice, such as Nazi attempts to control science in Germany or the manipulation of genetics in the Soviet Union by Trofim Lysenko (1898–1976) (Chubin 1985). Indeed after the cold war, many postsocialist countries sought to replicate peer review practices used in the West. In most cases, a system of government distribution of research funds was sought that favored quality of ideas over professional stature alone. In contrast, during the same period, the United States fine-tuned its peer review practices to achieve other goals. For example, NSF program officers try to balance their portfolios by taking account of geographic distribution, age, gender, or ethnicity of investigators; research participation of four-year colleges or historically black colleges and universities; or the hotness of a topic or method. At the National Institutes of Health (NIH), advisory councils are empowered to recommend some proposals for funding because they address urgent national needs (U.S. GAO 1999).

With the Government Performance and Results Act of 1993 requiring U.S. research agencies to show that their investments yield societal benefits, some wonder if scientific experts are the best qualified reviewers to render such judgments (NAPA 2001). At NSF, reviewers now must address two merit review criteria: scientific merit and broader social impact (two other criteria were dropped because they were routinely ignored or deemed too difficult to measure). The latter encompasses educational benefits ranging from precollege outreach to increased participation of students from underrepresented groups and enlarged undergraduate research experiences to ways to enhance public understanding of the scientific content of workaday processes and outcomes.

A relatively recent innovation allows more direct citizen participation in scientific and technical allocation decisions. The Dutch Technology Foundation, for example, has augmented traditional peer review with lay review by citizens. In the United States, activist and support groups for various diseases have applied similar pressure, especially at NIH (which uses a quantitative scoring system that leaves little room for study section or institute director discretion). Other federal agencies, such as the Office of Naval Research, the Defense Applied Research Projects Agency, and parts of the Department of Energy and the Department of Agriculture, limit their use of external peer reviewers to the identification of more risky but potentially highly rewarding areas of research and development. In the end, who participates in the process redefines peer and alters the purpose of the review.

Peer review allows scientists to make recommendations in a privileged zone, apart from the general public. It creates the expectation that the principles of fair and ethical behavior embedded in professional culture will be observed. This may seem inconsistent with the principle of public participation, but should be understood as reflecting the role of peer review as a boundary process that demarcates the limits of authority based on credentials or power. When participation crosses borders, participants carry the distinctive characteristics of their professional region (Gieryn 1999, Guston 2000). A good review system thus preserves professional autonomy while permitting lay participation. This balances deference to expert evaluation against sensitivity to societal needs and extrascientific values (concerning research applications, risks and benefits to whom, and long-versus short-term consequences) (Atkinson and Blanpied 1985).

Ethical Dimensions

Precisely because peer review is a highly valued process that spans the boundaries of several social worlds—science and policy, research and practice, academe and bureaucracy, public and private—its purposes and meaning may be understood differently across communities and at different times in the history of a single community. Focusing primarily on peer review as a process for managing scientific publication and grant funding, what follows is a brief review of some of the value and ethics-related dimensions that often manifest themselves as competing understandings and aspirations. (For elaboration, see Chubin and Hackett 1990.)

OPENNESS AND SECRECY. Peer review is in principle open to the community of qualified scientists as proposers or reviewers. The process of peer review, as procedures, criteria, rating scales, and such, is knowable, transparent (or at least translucent), and held to account for its workings and outcomes. But the criteria are themselves seldom discussed.

Peer review is also secret. Confidentiality is sacrosanct, and anonymity is assured throughout much of the process. Meetings are typically closed, with proposals, reviews, and panel discussions deemed privileged information. To outsiders, who participates and how they are chosen can seem mysterious, and the identities of the reviewers—who represent the intellectual community-at-large—are generally not disclosed.

EFFECTIVENESS AND EFFICIENCY. Peer review is asked to be effective—to recommend projects that would advance knowledge and confer social benefit. But it is also asked to be efficient, to operate at low cost (e.g., for travel and reviewer compensation) and minimize the burden imposed on proposal writers and reviewers alike.

How realistic are these expectations? A thorough review might take half a day, but reviewers are usually not paid for their services. Of course, the reviewer is partly compensated by learning what constitutes a fundable proposal and gaining access to unpublished ideas and data.

Nonetheless, a low success rate—10 to 20 percent in many agencies these days—reduces the expected return (to proposers and agencies) for the investment of effort. Hence the invention of a two-stage proposal process with the first a preliminary proposal that can be screened into or out of the more competitive second stage.

SENSITIVITY AND SELECTIVITY. The peer review system is asked to be highly sensitive and highly selective of research projects at the same time. A sensitive review system would detect the merit in every worthwhile proposal, whereas a selective system would filter out all projects of dubious quality or significance.

But scientific research can be risky, and given the difficulties in communicating original ideas clearly and persuasively, it is possible that the phenomenon of interest may itself be in question (e.g., the Higgs process, the top quark, prions). A system acutely sensitive to scientific merit would probably support some projects that do not work out. One so selective that only projects beyond skepticism are chosen for funding would surely ignore some good ideas along with the rest. And inevitably, some researchers write better than others. Still others construct better proposals than conduct the research once funded. What is the review rewarding?

INNOVATION AND TRADITION. Peer review couples what Thomas Kuhn (1977) terms an "essential tension" between originality and tradition in science with what Robert Merton (1973 [1942]) defines as the norm of "organized skepticism." Promising new ideas are tested against the cumulative store of shared knowledge and established theory. Peer review challenges whether new ideas are truly novel and worth pursuing, and purports to distinguish between sound innovation and reckless speculation.

Reviewers defend tradition against claims of originality when they reject novel ideas as impractical, unworkable, or implausibly inconsistent with the established body of knowledge. Sharp disagreements among reviewers about the merits of an idea may indicate a promising but risky new research path. Consensus, in contrast, might indicate an insufficiency of important problems left to solve, the grip of a school of thought, an overbearing conservatism, or just plain risk-aversion.

An innovative review system would reward novelty and risk taking, whereas a traditional system would sustain the research trajectory established in the body of accepted knowledge by restraining bold excursions. Peer review is expected to identify, encourage, and support frontier work but to screen out fads and premature ideas (Stent 1972).

MERIT AND FAIRNESS. Peer review is expected to be meritocratic, judging proposals and manuscripts in accordance with the stated criteria. NIH instructs proposal reviewers to evaluate all the science, only the science, and nothing but the science. The rendered judgment is to extract the science from speculation, rhetoric, common sense, practical benefit, and whatever else the proposer orchestrated in the document.

Peer review is reputed to apply standards of fairness to ideas apart from consideration of a scientist's reputation, personal characteristics, or geographic or academic position; the economic potential of the proposed work; or its relevance to pressing national needs. Nevertheless, advantages accumulate over the course of a career, making it increasingly difficult to judge what one does apart from who one is (or has accomplished). In this way, the Matthew Effect prevails: In recognition and influence the rich get richer, the poor poorer (Merton 1973 [1968]).

It may thus be wrongheaded to assume that the best science simultaneously serves one's career, one's discipline, and the welfare of the nation. Just as the principle of equitable distribution might indicate that decisions at the margin should favor investigators who currently have inadequate funds, similar arguments could be advanced for criteria such as growing research capacity, increasing educational or economic investments, or making politically savvy allocations. Such decisions deviate from strictly meritocratic principles, yet are entertained by participants much of the time, leading to charges of earmarking, log-rolling, cronyism, and elitism (U.S. Congress 1991, Chubin 1990).

RELIABILITY AND VALIDITY. As an assessment tool, peer review must be both reliable and valid, that is, have little random error and measure what it is supposed to measure. To be reliable, ratings should show high levels of agreement between raters and consistency from one group of raters to another. To be valid, a measure must take account of the scientific merit of a proposal in all its complexity without becoming distorted by other properties of the proposal. But merit is both abstract and multifaceted. A valid evaluation of a proposal, therefore, is said to derive from the combined assessments of several diverse experts. How their reviews are weighed depends on the steward—the program manager or journal editor—and the mission that he or she serves.

Evaluating a proposal or manuscript from several divergent perspectives, not surprisingly, may yield low inter-rater agreement; different experts reach different judgments about quality as seen through their particular set of cognitive lenses (Cicchetti 1991, Harnad 1982). In this sense, peer review builds sound inferences upon a broad foundation. Given the limited number of reviews that can be elicited for any one proposal and the range of reviewer backgrounds necessary to cover the intellectual content of the proposal, divergent recommendations can result. Stewards and editors act on those recommendations when they decide whether or not to fund or publish (or to defer a decision until a revision addresses criticisms).

Conclusions

Clearly, peer review does many things and serves many values, but it cannot simultaneously deliver on all things equally well. Which purposes and which values are most important for which sorts of science? Who is to decide?

Similarly, involving the best researchers in the review process probably leads to better and more legitimate reviews—those that will be accepted by the community. But such experts are also the most likely proposal writers. Because it is unwise to allow people to review proposals for a competition in which they are also contestants, strategies for handling such conflicts of interest must be accepted by the community, or the legitimacy of the process will erode.

Because peer review sometimes can straddle disciplines, it may also cross the boundaries of knowledge production and professional practice, of research and policy. At one extreme, it will be highly particularistic by restricting the competitors to those with certain characteristics (through what is known as set-asides by age, gender, discipline, prior accomplishment, or location at an institution with a track record or facility to conduct the research). At the other extreme, peer review will be highly universalistic, resembling a lottery with the criteria of choice seemingly random and unrelated to properties of the chosen projects. In practice, review processes fall between these polar extremes, which competitors usually find to be fair and the outcome justified enough so they try again even after an unsuccessful submission.

Developing a review process that has widespread legitimacy entails building responsibilities, relationships, and trust. Together, these qualities add research findings to a body of knowledge, introduce conjectures into theories, and socialize researchers into a community that has moral as well as intellectual authority. In the end, peer review is expected to demand rigor and integrity, while stimulating new knowledge that ultimately makes a difference in people's lives. To do so, it must be responsive to emerging needs and possibilities. Ultimately, the flexibility of human judgment and the quality of collective imagination will determine which values and purposes are served by peer review.

DARYL E. CHUBIN EDWARD J. HACKETT

BIBLIOGRAPHY

Atkinson, Richard C., and William A. Blanpied. (1985). "Peer Review and the Public Interest." Issues in Science and Technology 1(4): 101–114.

Chubin, Daryl E. (1985). "Open Science and Closed Science." Science, Technology, and Human Values 10 (Spring): 73–81.

Chubin, Daryl E. (1990). "Scientific Malpractice and the Contemporary Politics of Knowledge." In Theories of Science in Society, ed. Susan E. Cozzens and Thomas F. Gieryn. Bloomington: Indiana University Press.

Chubin, Daryl E., and Edward J. Hackett. (1990). Peerless Science: Peer Review and U.S. Science Policy. Albany: State University of New York Press.

Cicchetti, Dominic V. (1991). "The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation." Behavioral and Brain Sciences 14(1): 119–186.

England, J. Merton. (1983). A Patron for Pure Science: The National Science Foundation's Formative Years, 1945–57. Washington, DC: National Science Foundation.

Gieryn, Thomas F. (1999). Cultural Boundaries of Science: Credibility on the Line. Chicago: University of Chicago Press.

Guston, David H. (2000). Between Politics and Science: Assuring the Productivity and Integrity of Research. London: Cambridge University Press.

Harnad, Stevan, ed. (1982). Peer Commentary on Peer Review: A Case Study in Scientific Quality Control. Cambridge, UK: Cambridge University Press.

Kuhn, Thomas S. (1977). The Essential Tension. Chicago: University of Chicago Press.

Merton, Robert K. (1973 [1942]). "The Norms of Science." Reprinted in The Sociology of Science, ed. Norman W. Storer. Chicago: University of Chicago Press.

Merton, Robert K. (1973 [1968]). "The Matthew Effect in Science." Reprinted in The Sociology of Science, ed. Norman W. Storer. Chicago: University of Chicago Press.

National Academy of Public Administration (NAPA). (2001). A Study of the National Science Foundation's Criteria for Project Selection. Washington, DC: Author.

Stent, Gunther S. (1972). "Prematurity and Uniqueness in Scientific Discovery." Scientific American 227(6): 84–93.

U.S. Congress. Office of Technology Assessment. (1991). Federally Funded Research: Decisions for a Decade. Washington, DC: U.S. Government Printing Office.

U.S. General Accounting Office (GAO). (1999). Federal Research: Peer Review Practices at Federal Science Agencies Vary. Washington, DC: Author.

Encyclopedia of Science, Technology, and Ethics