Prizes Signal Scientific Revolutions

Scientific revolutions affect funding, investments, and technological advances, yet predicting their onset and projected size and impact remains a puzzle. We investigated a possible signal predicting a topic’s revolutionary growth – its association with a scientific prize. Our analysis used original data on nearly all recognized prizes associated with 11,539 scientific topics awarded between 1960 and 2017 to examine the link between prizes and a topic’s unexpected growth in productivity, impact, and talent. Using difference-in-differences regressions and counterfactuals of matched prizewinning and non-prizewinning topics, we found that in the year following the receipt of a prize, a topic experiences an onset of extraordinary growth in impact and talent that continues into the future. At between five to 10 years after the prize year, prizewinning topics are 38% more productive and 31% more impactful in citations, retain 53% more incumbents, and gain 35% more new entrants and 46% more star scientists than their non-prizewinning peer topics. While prizewinning topics grow unexpectedly fast in talent and impact, funding does not drive growth; rather, growth is positively associated with the recency of work on the topic, discipline-specific rather than general awards, and prize money. These findings advance understanding of scientific revolutions and identify variations in prize characteristics that predict the timing and size of a topic’s revolutionary growth. We discuss the implications of these findings on how funding agencies and universities make investments and scientists commit time and resources to one topic versus another, as well as on the quality of research.


Introduction
Scientific revolutions involve the unexpectedly fast growth of a scientific topic's impact and talent and have been famous throughout history for inspiring scientistic solutions to intellectual and technological challenges [1][2][3] . However, little is known about the factors predicting the genesis of a revolution 4 despite the fact that knowledge about when and where revolutions occur can lead to better coordination of scientific talent and research investments [5][6][7] .
Revolutions follow the decisions of scientists to commit their resources to a topic perceived to have strong prospects for development 4,8,9 . However, for any single scientist, estimating a topic's prospective growth is difficult because the factors that predict a topic's growth requires extensive data collection and discerning analysis [10][11][12][13][14][15][16] . Under such conditions of uncertainty and costly direct measurement, scientists frequently make decisions using signals 6,[17][18][19] . For example, journal ranking is taken as a signal of a paper's true quality 17,20,21 , order of authorship is taken as a signal of how much credit an author deserves on a paper 17,22 , and educational pedigree is taken by hiring committees as a signal of a candidate's fundamental abilities 8,21 .
Scientific prizes appear to play many roles in advancing science but are only now becoming a focus of theoretical and empirical study that suggests they may be associated with more than the recognition of individual achievement. Recent work on scientific prizes has revealed that prizes have proliferated in modern science and can inspire risk-taking, highlight overlooked ideas, and involve public rituals that reinforce norms of good scientific practice 24,[31][32][33][34][35] . Our work builds on research that shows that prizes can change perceptions of a scientist's work 23,34 . Analyses of the Howard Hughes Medical Investigator prize shows that awardees' past papers gain more citations than expected 28 , while nonprizewinners' citations decline 36 . Relative to control groups, prizewinners switch research areas more frequently than expected 28 , while "inducement" prizes 37 tend to attract more diverse ideas to a topic than normal. 27,29 Building on the findings that prizes can affect an awardee's work, we investigate whether prizes signal the onset of topic-wide revolutionary changes in impact and talent. Our analysis uses new data on hundreds of recognized scientific prizes worldwide 24 as well as on longitudinal data on the historical development of whole topics across the sciences. In this way, our work can contribute knowledge to tunderstanding the dynamics of unexpected changes in scientific thinking and talent 6,38 , and investment funding 5,39,40 .
The analytical challenege in investigating a link between prizes and scientific revolutions focuses on demonstrating that the expected growth trajectory of a topic significantly differs before and after the prize and that non-prizewinning topics that have levels of productivity, citations, and talent historically indistinguishable from those associated with the prizewinning topic do not experience extraordinary growth in the absence of a prize. We use difference-in-differences regression to test whether a prizewinning topic's growth trajectory after the prize is significantly greater than it was before the prize. To examine the dynamics of growth of comparable non-prizewinng topics, we used matching to identify five non-prizewinning topics with 10 years of talent and impact comparable to each prizewinning topic using proper adjustments that insure valid standard errors and parallel trend requirements are met 17,41 .
We collected data on scientific topics using Microsoft Academic Graph (MAG).
MAG triangulates third party expert opinions and NLP algorithms to identify meaningful distinctions among "scientific fields of study" (a.k.a. "topics") 42,43 . The MAG analyzes the near universe of scientific works, covering over 172 million publications written by 209 million authors in 48,000 journals from 1800 to 2017. MAG classifies scientific knowledge hierarchically -disciplines > domains > and topics. Discipline, domain, and topic labels are based on the classifications published on Wikipedia and created through crowdsourcing. The discipline of psychology, for example, has domains that include neuroscience, social psychology, and developmental psychology, and each domain has many topics. MAG uses NLP to assign papers to topics. The algorithm that assigns a paper to a topic is based on a paper's complete text in relation to other texts, not just keywords.
We collected original data on 458 scientific prizes conferred 5,327 times between 1960 and 2017 and with respect to 11,539 scientific topics covering nearly all disciplines, celebrated awards like the Wolf Prize or Breakthrough Prize, and hundreds of lesser but recognized prizes from Wikipedia's "scientific prizes" pages 24,35 . To validate the crowdsourced Wikipedia data, we manually cross-checked it with data on prizes found on the web and print media. Prizes were linked to topics by associating the prizewinning scientist with their topics. A scientist was defined as having worked on a particular topic if they published at least 10 papers on that topic. The 10-paper threshold was validated using Wikipedia's "known for" dataset. The "known for" data lists the topics a scientist is associated with based on the crowdsourced opinions of other scientists. Consistent with our 10-paper threshold, a scientist published on average 10 papers on a topic if that topic is included in that scientist's known-for list on Wikipedia (see SI Appendix Sec. 1 for details on data collection and further robustness checks.) Our test for the link between prizewinning and a topic's extraordinary growth in impact and talent used difference-in-differences regression with control variables 17 (the Methods section presents model specification details). To establish a comparison set of non-prizewinning topics, we used a popular dynamic optimal matching method (DOM) [44][45][46][47] . DOM dynamically identifies a prizewinning and non-prizewinning group of peer topics from the same discipline that have statistically-indistinguishable annual growth and impact trends for 10 years before the prize year, automatically ensuring parallel trends between the prizewinning and non-prizewinning topics 48,49 . For each prizewinning topic, DOM identifies five peer topics that are matched on key measures of a topic's growth [10][11][12][13][14][15] (hereafter the big five growth measures): its yearly number of (a) publications, (b) citations,  Revolutionary change is defined as occurring when the historic growth pattern on a big five measure deviates significantly from its expected growth trend as well as when it significantly deviates from the expected growth computed for peer topics:

Results
Prizewinning topics display extraordinary growth in productivity, impact, and talent that begins the year following the prize year and continues for at least the next 10 years. Our first test of the link between prizewinning and a topic's revolutionary growth applies our difference-in-differences analysis net of control variables. Table S10 shows that scientific topics, after being associated with a prize, grow unexpectedly large in terms of productivity, citation impact, the entry of star scientists, entry of rookie scientists, and the retention of scientists already working on the topic relative to peer topics (p < 0.001).
Figures 3A-E plot the extraordinary magnitudes of revolutionary growth of our big five variables in prizewinning topics relative to peer topics using the DOM analysis.
Several broad findings are noteworthy. First, the DOM method shown in the plots confirms the DID analysis. Second, The average magnitudes of for our big five measures of growth indicate that at five years after the prize year, prizewinning topics have grown an average 16% to 30% larger than peer topics (all p − values < 0.0001). At five years after the prize year, work on prizewinning topics has grown an average 16% to 30% larger than peer topics. At 10 years after the prize, the growth gap increases to 22% to 53%. Third, Figure 3F shows that after the prize year, prizewinning topics also influence patents. Patents cite prizewinning topics papers significantly more than peer topic papers (p < 0.0001).
Examining individual changes in more detail, we find that growth is sharpest for the "number of incumbent scientists" variable ( Fig. 3D). After the year in which a topic becomes prizewinning, incumbent scientists continue to publish on that topic at a rate 53% (Δ #$ = 0.4279, % "# − 1 = 0.5340) higher than incumbent scientists publish on peer topics. Relatedly, prizewinning topics gain over 35% more new entrants on average than peer topics do (Δ #$ = 0.3001, " "# − 1 = 0.35, Fig. 3E). About half of new entrants (46.8%) are rookie scientists who appear to make their first publication and a long-term commitment to a prizewinning topic 50 .
Star scientists are among the influx of new entrants who begin working on the topic for the first time after it is associated with a prize. Star scientists are the 5% most highlycited scholars at the level of their discipline (physics, chemistry, sociology, etc.) 51 .
Counting star scientists working on prizewinning topics before and after the prize, we found that prizewinning topics attract over 46% more star scientists on average than do Migrations of scientists to prizewinning topics correlate with increases in productivity, impact, and the topic's paradigmatic diversification. At 10 years, prizewinning topics are 38% more productive in terms of number of publications (Δ #$ = 0.3232, " "# − 1 = 0.3815) than peer topics (Fig. 3A) and have 31% more yearly Figure 3B shows that prizewinning topics experience a 7% increase in citations per paper at year 10 than do papers in peer topics ( Fig. S7), and the impact of top-cited scientists who work on the prizewinning topic is 22% greater than the impact of top-cited scientists working on non-prizewinning peer topics Fig 3C), which indicates that the increase in citations is topic-wide rather than a redistribution of citations 36 .
Prizewinning topics become paradigmatically diverse than matched peer topics.
Paradigmatic diversification refers to the heterogeneity of concepts scientists use to study a topic 1,52 . To measure paradigmatic diversification, we created a master list of all the topics new entrants in prizewinning or peer topics had published on before becoming a new entrant. Topics in the master list were defined as different from one another if topics were associated with different disciplines (N=19 disciplines). The topic diversity of the master list was then measured using Shannon Entropy as: where represents a discipline, and & measures the probability that a topic in the list belongs to discipline .
Robustness checks corroborate the results. First, we examined whether fundings explains the results. Figure 5 indicates that levels of research funding do not explain the results. Available data show that funding levels are uncorrelated with the extraordinary growth of prizewinning topics. NIH funding before and after the prize's conferral are flat.
Second, we ran "placebo" tests 49,53,54 that pretend each peer topic is a prizewinning topic at year zero and found that peer topics continued to grow at expected rates ( Fig. S8 in SI).
Third, we tested whether the results were driven by a few overlie topics. To conduct this test, we examined the growth of each prizewinning topic and its specific five peers separately for all 11,539 prizewinning topics. We found more than 60% of the   Table S1-S8 for specific details.) We find recency has the overall greatest signaling strength. A one standard deviation increase in recency is associated with a 14% increase in Δ #$ of new scientists and a 14.7% increase in Δ #$ of citations. Moneyed or field-specific prizes predict a Δ #$ in publication of an average 4.7% or 7.2% over peer topics, respectively. These finding indicate that prizes reliably signal a topic's impending extraordinary growth in impact and scientific talent, and a prize's signaling strength strongly predicts the magnitude of exceptional growth. Robustness checks further confirm the findings generalize across time, prizes, and disciplines. 59 (see Table S3-S7.)

Discussion
How scientists devote their research efforts affects their careers and collectively shapes scientific progress Funding agencies and universities similarly endeavor to invest in prosperous research areas and avoid spending on topics that will ultimately be proven fadish, yet knowledge of how scientists' collective actions impact science is nascent 5,60,61 .
We found that social signals strongly predict unexpectedly large changes in the migrations of scientists to specific research topics as well as a topic's increased growth and impact in a manner akin to Kuhn's scientific revolutions. The social signal we investigated is prizewinning 24,34 . Relative to the growth of peer topics that grew the same way as prizewinning topics did for 10 years before the prize's conferment, prizewinning topics have significantly more incumbents recommit their effort; rookie scientists make first time investments; and star scientists direct their efforts toward the prizewinning topic.
Importantly, these demographic changes deepen and diversify the talent of the prizewinning topic, leading to more paradigmatic diversification and higher impact papers than peer topics. The magnitude of these changes was not found to be related to funding but to the signal strength of the prize. Prizes associated with the award-winning scientist's recent work, discipline-specific rather than general topics, and award money strongly predict the magnitude of growth that follows the prize. Counterfactual and placebo tests of peer topics that displayed statistically indistinguishable growth trends as prizewinning topics for 10 years before the prize showed only expected levels of growth. The study has broad implications for science of science and science policy. This is among the first studies to reliability predict scientific revolutions and to conceptually link prizewinning to revolutions. For policy leaders, these findings demonstrate how scientists and institutions make risky research investment decisions, how investments link to tomorrow's innovative research topics, and how to predict the frontiers along which science evolves. Practicing scientist will find the information valuable in career planning and mentorship. Following our findings and methods, philosophers of science and researchers in the burgeoning field of science of science can begin a whole range of studies linking scientific revolutions and paradigm shifts to vast amounts of quantifiable data that has not been previously available or analyzed. Science policy implications are associated with funding. In an era when funding has been flat but the number of topics eligible for funding has continually increased, the efficacy of research spending is critical. The current funding system has limitations in several dimensions, including the weakness of grant scores for predicting funded research impact and researcher career success 39 . Additionally, even when targeted funding is prescient in backing the next important topic, that funding alone appears to do little to move researchers' investment effort away from their current projects and toward the targeted topic 5 . Our findings on prizewinning as a driver of collective shifts in effort suggest prizes provide a novel approach for predicting highpotential research that is complementary to other approaches by harnessing the collective intelligence of diverse scientific talent.

Matching Design
To study growth pattern before and after a topic is associated with a topic's prizewinning event, we find five topics in the same discipline that had growth patterns that were statistically indistinguishable from each other and the prizewinning topic for ten years before the prize year. We focus on topics where their first prizewinning event occurred in

Dynamic Optimal Matching (DOM) Method for Defining Peer Topics:
To select the peer non-prizewinning topics, we use a Dynamic Optimal Matching method, , where ,,. indicates the quantity for the topic in terms of one of the = 5 matched categories (i.e. yearly publications, yearly citations, # of incumbent scientists, # of new scientists, top scientist quality). measures number of years prior to the prizewinning year for topic , where * represents the prizewinning year for topic , and $ = 10, indicates we traced the growth pattern for topics in a 11-year duration, which includes 10 years prior to the prizewinning year.
Second, to ensure the balance between the peer and prizewinning topics for the entire system and the closest topics are selected (i.e., "fine balance" and "closeness"), we further identify 5 peer topics for each prizewinning topic from the candidate pool. Specifically, we want to 1) minimize the distances between the peer and prizewinning topics in terms of ,,& , and also 2) make sure the distribution of the peer and prizewinning topics are acceptably and simultaneously close for all 55 covariates. In other words, for each (time) and (the matching category), we should make sure the differences between the prizewinning groups and peer topics are small enough, where the difference between the prizewinning topic and its expected growth at time and category is quantified by  [1,5], to make sure that they are evenly distributed around zero. This method guarantees closeness between the peer and prizewinning topic and good balancing between the groups.

Robustness Checks: Robustness check validated the DOM method.
A Placebo Test was performed as a second statistical validation (Fig. S8). In Fig.1, we have already shown that there is no difference between the peer and prizewinning topic before the prizewinning year, but whether the prizewinning topic's growth pattern could be explained by chance? To answer this question, we perform a placebo test for the DOM method 49,53,54 tests. Specifically, for each of the prizewinning topic, we selected one of its peer topics as a "pretend winning topic". We repeated the same DOM process for each pretend topic, obtaining five new peer topics for each pretend topic. By comparing the growth pattern of the pretend topic and its peers, we can test whether growth patterns are alike for topics without a prizewinning event. In Fig. S8, we show that there is no difference in the expected growth for pretend topic before and after the prizewinning.
A Difference in Difference regression analysis was also used in our study. The outcomes can be expressed as following: where ,,! is the outcome variable, quantifying the impacts of topic at time using each of the big five growth variables defined in the main text. , is a dummy variable quantifying whether the topic is a prizewinning topic or a non-prizewinning topic from a matched peer group as defined in the text.
! is a dummy variable measuring whether time is before or after the prizewinning event 41 . If the topic belongs to the peer group, the prizewinning year of the related prizewinning topic will be used as the reference point. Control variables including fixed effects for discipline and specific prizewinning year are captured by term , . ,,!,4 is an error term.
In Tab. S10, we show the regression results of the analysis. We find # (coefficient for the , ) is not significant ( > 0.05) for all five categories, indicating that there is no difference between the prizewinning topics and the peer topics before the prizewinning event, further corroborating the accuracy of the DOM method. By contrast, we find both ' and * are significantly larger than 0 for all five categories ( < 0.001), demonstrating that topics associated with a prize grow at unexpected rates after the prize relative to the matched peer group topics. * also provides an estimation of the average value of Δ ! for each category.

Acknowledgments
We   Prizewinning and peer topic groups have no statistical difference for all 10 years before the prize (p >0.1 for 5×11=55 covariate data points). Statistically significant growth differentials (gold minus black line) begin shortly after the prize (red circle) and compound yearly following the prize (red arrow). At 10 years, the growth rates of prizewinning topics exceed peer topics by 22% to 53% depending on the growth variable (#publications , #citations , #active incumbent scientists , #new entrants , #citations by top scientists on topic ) . Panel (f) shows the association between prizewinning and yearly growth in citations from patents. (SI appendix Fig. S7 reports robustness checks.) The plot shows the percentage increase in paradigmatic diversification for prizewinning topics relative to peer topics (measure explained in the text). The inset shows the cummulative distribution of the paradigmatic diversity of the prizewinning topics with and their peer groups signifcantly differ (K-S test, = 5.78 * 10 !"# ). The relationship between revolutionary growth and paradigmatic diversity shows that as increases, relative paradigmatic diversity increases significantly ( = 0.38 , = 1.89 × 10 !"$% ). At Δt equal to two, a prizewinning topic is estimated to be 17% more diverse paradigmatically than its peer topics.