In 2020, the world and science changed. As a global pandemic shut down work and places of entertainment, scientists began producing research at a rate never seen before. But sharing this research through traditional publishing mechanisms was too slow.
Although preprints have been used in science since the 1960’s, the term “preprint” first entered the public consciousness during the COVID-19 pandemic. Scientists endeavoured to share relevant research findings as quickly as possible while the media and governments rapidly reported on and enacted those findings1. Preprints grew exponentially over the COVID-19 pandemic.
But what is a preprint, and how do they differ from the peer reviewed literature?
A published manuscript (“paper”) is a manuscript that has undergone editorial assessment and journal organised peer review prior to being published by a select journal. A published manuscript will also most often undergo formatting by the journal to maintain that journal’s style. Papers are either free to publish but cost to access or they cost authors to publish (article processing charges, APCs) and are free to access. These are the so-called “traditional” (the first) and “gold” or “fully open access” (the second) models for article publication.
A preprint, in contrast, is a manuscript that has most often been deposited onto a “preprint server” (like Research Square, the world’s largest) that has not undergone journal organised peer review. This kind of pre-printed article is often described as the author-version of the manuscript, that is work that is shared when the authors believe it is ready for dissemination. Preprints are free to deposit and access for all. Preprints can be advantageous for researcher’s careers in a number of key ways.
A brief history of preprints in the biosciences
Scientific publishing continues to languish in a pre-internet era, with limits on word counts, pages and numbers of figures – a system designed for print. This model of publishing is also expensive – the academic publishing industry has profit margins greater than those of Google, Apple or Microsoft. Preprints are one potential solution to some of these issues.
Experiments with preprints go back to the 1960’s. The National Institutes of Health (NIH) initiated the Information Exchange Groups in 1961 as a way to share biology-based preprints. However, by 1967, the Information Exchange Groups shut down. It would be the physics field that would be the pioneers in preprints. The physics preprint server, Arxiv, was launched in 1991 and revolutionised how physicists, mathematicians, and computer scientists shared their findings.
It took another 20 years before a dedicated preprint server was launched and would persist in the biological sciences. In this field, bioRxiv was launched in 2013 by Cold Spring Harbour Laboratories (CSHL) and has become one of the biggest bioscience preprint servers today. Research Square launched in 2019 and rapidly became a significant preprint server, partnering with various journals.
Are preprints different from published versions?
As preprints are (normally) not peer reviewed and free to access by all, a number of researchers have raised concerns over their quality and trustworthiness.
Thankfully, this is one area of metascience (the science of how we do and share science) that has resulted in a body of evidence which can be used to address this concern.
One way to assess the “quality” of science within a preprint is therefore to compare a preprint with the published article. This is the method that I and others have used.
In one study, a group assessed the quality of reporting between preprints and published articles2. This research focussed on reporting quality as opposed to writing or findings and included elements like data availability and conflict of interest statements. The authors found that although peer-reviewed articles had a higher quality of reporting than preprints, the difference was limited.
A separate study more directly investigated the text of preprints compared to their published versions3. This study compared character differences, length and semantic changes. It found limited differences between preprints and published articles.
More recently, a group performed a comprehensive natural language processing analysis on bioRxiv preprints compared to their published versions4. Again, this study concluded that the changes upon publication are minimal.
One limitation against these studies could be that they rely on automated methods or are reporting on elements not directly linked to the conclusions or findings of a paper.
We investigated COVID-19 preprints and manually compared them to their published versions. We performed an automated text analysis and a manual analysis of abstracts in addition to manually comparing figures between a preprint and its published version. Ultimately, we discovered that preprints only undergo very limited changes upon publication and that this very rarely impacts the conclusions of a study.
This increasingly strong body of evidence supports the safe use of preprints by scientists and nonscientists whilst questioning the current role of peer review.
Indeed, my personal experience has been that peer review generally leads to small improvements but that these do not change the conclusions of a paper. Indeed, when I have peer reviewed I have never suggested things that would significantly alter a paper.
Why would a preprint be changed?
If preprints don’t generally change when published, what reasons are there for a preprint to change at all?
There are many reasons why a preprint might be updated. Perhaps the most common reason is that the authors post the journal accepted version immediately prior to publication. Authors may choose to do this to provide the most recent (and peer reviewed) version of their work in a free-to-access location.
Alternatively, some authors post preprints because they are works-in-progress. They will then update a preprint as the work moves forwards in incremental steps. However, not many authors do this.
Preprint posting enables work to be shared openly with the wider scientific community. This may lead to non-journal organised peer review or comments from individuals. Authors might then update their preprint to reflect these comments. For example, our work was previously pre-printed and underwent open, community, peer review. Following this, we updated the preprint to reflect our responses to this feedback5. Alternatively, authors may notice small mistakes themselves and update to correct these.
A preprint may also be changed by the preprint server. In the case of serious error or fraud, a preprint server may choose to “withdraw” a preprint. This leaves the preprint available but adds a note to say that it has been withdrawn.
Ultimately, the benefit of preprints is that it puts the decision on sharing work back into the hands of the authors. This means that they are free to update or change as they deem appropriate.
Does the published version need to be different from the preprint?
The evidence would suggest that in many cases the published version is not different from the preprint. So: Do the two need to be different?
The simple answer of course is no. If the preprint is a solid piece of work then why should it change? Reviewers shouldn’t be demanding more work unless it is truly warranted.
There are even some who argue that a preprint should be the final destination for a study and that journals should be skipped entirely6. It is my hope that the biosciences become more like physics in which publication in a journal is a formality and that preprints are fully accepted as the final outputs of a study.
1. Fraser, N. et al. The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLOS Biol. 19, e3000959 (2021).
2. Carneiro, C. F. D. et al. Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature. Res. Integr. Peer Rev. 5, 16 (2020).
3. Klein, M., Broadwell, P., Farb, S. E. & Grappone, T. Comparing published scientific journal articles to their pre-print versions. Int. J. Digit. Libr. 20, 335–350 (2019).
4. Nicholson, D. et al. Examining linguistic shifts between preprints and publications. PLoS Biol. 20, e3001470 (2022).
5. Fraser, N. et al. Preprinting the COVID-19 pandemic. bioRxiv 2020.05.22.111294 (2020) doi:10.1101/2020.05.22.111294.
6. Vianello, S. D. The “Pre” in [my] “Preprint” is for Pre-figurative. Commonplace (2021) doi:10.21428/6ffd8432.5de25622.