On Preprint Duplication

The issue of preprint duplication across servers is a hot topic in light of 2020's preprint explosion.

The messiness and confusion in the digital scholarly corpus caused by duplication has led to a rise in debates around the policies employed by platforms. The challenge lies in the ability to work together to ensure a seamless user experience for authors and readers in the development of and access to high quality research.

Like other high-volume servers, Research Square hosts duplicative preprints. Much of the volume on our platform originates from our integration, via In Review, with nearly 500 Springer Nature journals . Over 30% of authors submitting to those journals choose to post a preprint at the point of submission.

Why are preprints so popular?

Authors make this choice not only because they find the prospect of sharing their work early appealing. They do it because they stand to benefit from the increased transparency and growing collection of features uniquely associated with the Research Square platform. For this reason and because of the inherent ease of the In Review opt-in process, even authors who had previously posted a preprint elsewhere often choose to post again when given the opportunity.

Because we know that unique features of our platform, such as a detailed peer review timeline and automated checks, are valued by authors, our policies do not preclude posting submissions whose titles match those of preprints on other servers. Additionally, authors may leverage the discrete features and audiences of preprint servers in the same way that people use different social media platforms to share the same updates and thoughts.

We are mindful that this cross-posting contributes to redundancy in the digital scholarly space and recognize that this outcome is not ideal. However, we expect the scale of this duplication and any resultant disorder to be mitigated by a number of factors.

Reducing internal duplication

At the scale of our operation - now over a thousand preprints per week - internal duplicates arise simply as a consequence of having multiple points of entry into our platform.

In December 2020, we completed the first phase of a project aimed at reducing the incidence of duplicate preprints within our own platform by means of automated detection and consolidation. We will continue to improve our operations and technology to minimize our internal preprint redundancy.

Author education and normalization

As awareness around preprints grows and the practice is increasingly adopted, encountering multiple versions of the same work will become normal. Duplication, after all, has been a feature of the shift to digital since indexers began offering full-text versions of journal articles. Aggregators have adapted this practice to preprints, pulling pdfs from the various servers to save clicks for their users.

These procedures do not seem to have led to mass confusion yet. I suspect they’ll continue to be innocuous, as long as preprint servers commit to the convention of linking to the version of record. The incidence of duplication could also decrease over time as authors gain familiarity with the servers and their features and start defaulting to a preferred platform to the exclusion of others.

Improved consolidation by aggregators

Indexers and aggregators have consistently responded to the need for tidier and smarter curation of versions associated with a given research object. These can be preprints, author accepted manuscripts (AAMs), or post-publication corrections. These outputs leverage the utility of the internet in their potential to accelerate and enrich research content, but they all contribute some noise and confusion to the system.

In response to the proliferation of discrete but related items, indexers have gotten better. Google scholar continues to improve its grouping of versions, always privileging the version of record. Crossref encourages and provides data to its members allowing for cross-linking of DOIs across versions and related objects. That said, as with the adoption of Crossmark, adherence is not as high as it should be, and so pressure to improve digital hygiene should be applied to publishers and preprint servers alike.

Cooperation between servers

Research Square is actively engaged in community-wide efforts - notably those led by ASAPbio - to confront challenges facing preprint servers and reify our commitment to ethical publishing practices. We closely monitor the sentiments of authors, readers, and the broader academic publishing community to ensure our impact is an overwhelmingly positive one.

If a more aggressive approach to preventing preprint duplication becomes clearly warranted, we will take the appropriate actions.

Final Thoughts

Experience the benefits of publishing your research early. Submit a preprint to our platform today.

For further reading

2020: The Year of the Preprint

What is a Preprint?

Most Influential Research Square Preprints of All Time

Putting Preprints Into Context

For more on our policies, please visit our dedicated Editorial Policies page.

Blog

Latest

Author Resources

Reader Resources

Scholarly Community

News and Events

scholarly community