A Q&A interview with physicist Henrik Aratyn
Many individuals in scholarly publishing and higher education speak of preprints as a new concept, one that has the potential to shake up academia and the publishing world’s centuries old peer review process.
In reality, preprints have been circulating since the 1940s, starting with researchers in high-energy physics, a discipline that - perhaps ironic for some in the publishing industry – was responsible for the world’s first atom bomb.
Joking aside, preprints were never so explosive in the physics world, at least not in the negative sense. Quite the contrary; they served a vital need and became an important step toward publication in peer-reviewed journals.
Dr. Henrik Aratyn, professor of high-energy physics and former associate dean in the College of Liberal Arts and Sciences at the University of Chicago-Illinois, was fully immersed in the preprint culture since his research career took off in the early 1980s.
Aratyn recently shared the history of preprints, his own experiences producing and consuming them as he climbed the academic ladder, and some insights into the preprinting culture of his scientific discipline.
Physicists started a model culture where preprinting and peer-reviewed publishing have gone hand-in-hand, a culture where an entire research community has been actively involved in preprint reviews, and a culture that has used preprints to move their science forward more quickly through teamwork.
And as preprints continue to show their staying power in the scientific world, perhaps publishers and researchers alike could take pages from the physicists’ playbook to ultimately improve the quality and integrity of scientific information.
Preprints existed decades before arXiv was even created. How were they distributed before they went digital?
For my discipline and for me, preprints exist as a means of communicating scientific knowledge, which generally goes under the broad name of high energy physics (HEP). Since 1974, there was an organization called SLAC (the Stanford Linear Accelerator Center) at Stanford University[1] . They ran a HEP database called SPIRES. Each week, every active institution on SPIRES’ list distributed and received paper preprints.
It did not always work perfectly. Researchers in third world countries would see preprints after delays, and some close by would receive them the next day. So clearly there were disadvantages in how our research was communicated, but it served the purpose of communicating knowledge: still the purpose of SPIRES (now called inSPIRE-HEP), even today.
How would you use these preprints in paper form?
Every day, you would start your day taking a ream of preprints from the library before - someone else would take them away from you - and making photocopies of them before they disappeared. And of course, after a few days or weeks, you couldn’t really find them.
People also used postcards to request preprints. Each institution had professional postcards for researchers who wanted to request interesting preprints from other institutions. These were preprints they found in literature references. You filled out the postcard, and they’d send your materials in brown envelopes with cheap printed postage. We would cherish those postcards. They were useful to us, because you knew exactly who wanted to see your research.
During these ancient times, that was the structure of how we communicated and how we thrived. The idea was that preprints only served a purpose for six to 12 months. Then later, you would find the article in a journal.
Preprints went digital before arXiv was created. How were preprints digitally circulated before arXiv?
The technological revolutions in the mid-’80s made it possible to replace our primitive method of distribution with an advanced one. Everything started to be written in the LaTeX (originally TeX) typesetting system. TeX was designed and written by Donald Knuth, a visionary computer scientist who made the scientific communication revolution possible.
Around 1985, you would be able to communicate the TeX files with preprints using email, and you could eventually - at some well-equipped institutions - print them. That was the beginning of a revolution. You could suddenly write down your equations and include your drawings and communicate more effectively by posting it all over the Internet. Only it was not the Internet back then. It was called BITNET.
Around ‘89 or ‘90, you could send requests by email. You initially sent a request, and then the person running the preprint server manually sent you a file. Then you could print it on your own machine. By then, we had a preexisting preprint structure, a culture, and expectations to communicate our knowledge.
This manual server structure was automated by arXiv. arXiv has a very effective search tool. With arXiv, you could just go and download the preprint you searched for.
How was arXiv perceived by the physics community at the time of its launch? Was there skepticism, jubilation, or perhaps feelings in between?
It further simplified and automated our means of communicating research. arXiv made it easier to find and read preprints. These papers wouldn’t disappear. It democratized and equalized the research communication process. It allowed more people to participate in discussions and contribute. It became easier to communicate research, but it changed the landscape completely in terms of publishing. arXiv became door number one to publication. Then, after posting in arXiv, we decided to discuss which journal to send it to. Sometimes people did the opposite, publishing first in a journal and then posting on arXiv, but it was common to do both.
Have the needs around preprint servers evolved among physicists since the 1990s? What’s changed?
Overall, arXiv filled the same need: access to knowledge as fast as possible. In research, you could search through many years of production, see what people did in the past, and retrieve it. Ease of retrieving and communicating information is the number-one need to this day.
What changed was the idea of open access. arXiv was so much ahead of the curve. We naturally accepted open publishing and open-access publishing. Now this culture is starting to spread across more disciplines in science.
We also now have a saturation of easily available knowledge. You used to have to grab as many preprints as possible to see what’s going on in the bigger world. Now you’re getting it all, and you have to be selective about what you read. You have to search for things that interest you. You don’t start your day looking through preprints in arXiv. There’s just too much. You need systems like arXiv to help you sort it all out.
When preprints first began circulating in arXiv, were there concerns about the need for peer review - or about researchers engaging in fraudulent activities?
So you’re asking how do you know if this [preprinted] information is reliable, and what assurances we had that a crackpot was not writing them? First, not everyone can use arXiv. You had to be established at a university, so that is some assurance; but, of course, that’s not enough. When you publish in arXiv, you put your own name and reputation on the line. You will have critiques. If you go to arXiv, you see that there are many versions often due to these critiques. You may get a question. You may get statements, like “this is trivial” or “this is wrong,” but this is actually useful feedback.
It’s also possible to write another preprint referring to the “wrong” preprint, saying, “This is about this other paper, and I disagree.” And it happens quite often. And then what happens is that, in some cases, preprints are being withdrawn or changed; or a discussion emerges between experts and people from outside having a hard time making up their minds. So a process of corrections exists, and nobody wants to risk putting in various arguments they don’t believe in.
The review process is necessary from an academic standpoint. For us, this process of correction exists at the preprint stage so that you have the possibility of changing it before submission to a journal. Of course, everybody sends it to a journal, not only for reviewing, but also because it’s more established for tenure considerations
Were there other concerns about arXiv, like plagiarizing?
We were always concerned about people reproducing papers. It’s easy to copy. So what arXiv introduced was a plagiarizing tool to make sure one person is not copying one paper to another. These cases were an exception. Sometimes researchers were not comfortable with English. Some might rely on another paper’s text. Of course, that’s plagiarizing, and there was some concern about that. Those were valid but relatively minor concerns, and arXiv addressed them very well.
Preprints primarily served the purpose of communicating knowledge, and it was also an opportunity for people to retract information and correct what’s wrong. And there had to be a simple mechanism to fight intentional plagiarism - or even self-plagiarism; but these were relatively minor issues, and we could iron them out.
Physicists are relatively rarely exposed to cheating, because we have a community where everyone is watching. Your reputation can suffer. You cannot hide if you did something wrong. We avoid many issues as a community that is inspecting everything.
As an associate dean and department head at the University of Illinois - Chicago, how did you view preprints with regard to hiring and tenure?
When you make administrative decisions, you look at the quality of the publication, number of citations, and impact. So if you hire somebody, you look at recommendation letters and what the letter writers say about the candidates. And you do a literature search, plus you look at impact. For hires, the impact of a candidate’s individual work, whether preprints or journal articles, is of the highest importance. It’s somewhat important if they’re published in a well established, high-impact journal.
When it comes to promoting people, preprints are listed, but you look more at their journal articles, their citations, and the impact factors of the journals. It’s important you publish in journals of high impact. So if this person in five years has published six physical papers in a journal, that would weigh heavier than if this person published a similar number of papers on arXiv, because it is easier to make a quantifiable statement about the impact parameters of journal articles.
I publish in arXiv to communicate my research. When I publish in a journal, it’s to establish a record that I can place on my vita. This is a view that I've held for many years, since the advent of arXiv.
Are there any situations where your colleagues in physics would post their work only on a preprint server or a journal, not both?
People may just send their publication to a journal without publishing in arXiv. In those cases, there could be personal reasons or fear of competition. Maybe they don’t want to disclose their work until it’s published in a journal; then others can be surprised. But the standard is, you want to show your work as fast as possible with preprints by sending them to arXiv.
Maybe people would post the preprint for a personal reason, but I would say the majority of preprints are eventually getting added to both arXiv and journals sooner or later. Quite often, the referee makes decisions.
Why do you think preprints did not catch on with other disciplines until recent years?
I don’t know. One factor is that they don’t use TeX as a standard, and maybe they rely more on a different format. You could publish doc files. Maybe people were waiting for a publishing standard. TeX quickly became a standard in HEP, but it went through different versions.
It could also be psychological reasons. Scientific work owes a lot to collaborations, but also due to the presence of competing projects. High energy physics would rely on us standing on each other’s shoulders to move fast. Maybe some other disciplines are more secretive about how they do work. I don’t know. It’s an interesting sociological question. It should be a project for a sociology paper.
Where do you see preprints in five to ten years? Will they be a normal part of the scholarly publishing landscape and a regular step in publication?
If preprints are free and accessible to everybody, if you can trust the institutions which maintain preprints, I’m pretty sure preprints servers in physics will play a major role if they remain reliable, transparent, and well supported. We need them. If one particular institution fails, the community will look for a different version of the same concept. It’s the concept which is important. arXiv only needed the Internet and a few algorithms to get it to work. The community existed. The expectation existed. The culture existed. Computers just came to fill the additional steps.