OACA has been a major topic of discussion in the literature over the past twenty years, generally assuming that a better accessibility fosters research impact. In fact, this belief is not so obvious, with contradictory findings in the studies dealing with the OACA, that have often proven tough to compare, depending on many factors such as disciplines, OA status taken into account, publications types as well as the method and database used.
The first paper to analyze the question of OACA is that of Steve Lawrence15, published at the turn of the century in Nature. He made a correlation between the number of citations and the share of freely accessible articles. He found a positive correlation between the two indicators. Since then, dozens of papers have been published on the topic. Overall, studies that found the existence of OACA were more common in social sciences 16–18, Medical and health sciences 19–21, and Natural sciences 22–24. OACA is less important (and in several disciplines non-existent) in Physical Sciences and Engineering 25. In addition, there are some studies that concluded on the nonexistence of OACA in social sciences 26, Medical and health sciences 27–29, and Natural sciences 30,31. On a large sample of the Web of Science database, Dorta-Gonzalez et al. (2017)32 concluded that there is no OACA in all disciplines.
Recently, a review analysis on the topic identified 134 publications dealing with OACA 33. Applying the EBL critical appraisal method to analyze the risks of bias based on factors like sample size, data collection or study design, the authors emphasized that most of these studies (131 – about 98%) present a high risk of bias. Two of the three publications with a low risk relate to medical and natural science research 29,34, and the third 35 used relatively old data (2007-2011). Moreover, none of these articles used randomization techniques/control group.
As several known confounding factors have limited the scope of the results of existing studies, a key issue to address the question rigorously would be to figure out a way to "isolate" the OA effect. OA publications should thus be compared to a counterfactual sample of publications with the only difference is to be published in subscription-based journals.
Data
We extracted the publications data from the French OST in-house database. It includes five indexes of the WoS available from Clarivate Analytics (Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (AHCI), Conference Proceedings Citation Index (CPCI-SSH) and Conference Proceedings Citation Index (CPCI-S)), and corresponds to WoS content indexed through the end of March 2021. The study focuses on three types of publications: articles, reviews and conference proceedings.
OA sample
We selected all the documents published with a Gold OA status from 2010 to 2020 by distinguishing those published in a fully OA and those in hybrid journals, representing respectively 2,458,378 and 1,024,430 publications.
Control samples
We used the raking ratio method 36–38 to ensure comparability between the two samples. The method comprises the construction of a control sample similar to the sample of interest, except for the analyzed parameter, which is in our case the OA status.
The control sample is thus obtained by finding all the non-OA publications, qualified as doubles, that match with OA publications on a set of features identified in the literature as having an effect on the citation impact 39–42. The main publications characteristics used for raking ratio (see method) are:
- the publication year (11 classes : 2010 to 2020),
- the discipline (OST classification in 27 ERC panels),
- the journal impact (5 classes : <0.8, [0.8 , 1.2[, [1.2 , 1.8[, [1.8 , 2.2[, >=2.2), for the calculation method see: 40.
- the number of countries of contributors, based on WoS addresses information
(5 classes : 1,2,3,4 and 5 or more),
- the number of funding received, based on WoS acknowledgment information
(5 classes : 1,2,3,4 and 5 or more),
- the presence of an European Research Council (ERC) funding (2 classes : Yes or No),
- the presence of at least one European (UE27) address (2 classes : Yes or No),
- the presence of a patent citation (2 classes : Yes or No).
On this basis, we categorized each OA publication in one among 242,924 different clusters. From the same clusters, we then identify 12,088,681 double candidates among which 10,310,342 and 11,533,001 are respectively eligible for full OA and hybrid OA publications.