Are the Lists of Questionable Journals Reasonable: A Case Study of Early Warning Journal Lists

The lists of questionable journals are regarded a policy or tool to ensure research quality and integrity. However, due to their lack of clear criteria, they remain highly debated. Taking a typological perspective, we assess the reasonableness of the lists of questionable journals by examining how well it re�ects the differences in bibliometric attributes among distinct groups when categorizing and labelling them, and whether these differences are consistent. Using the Early Warning Journal Lists released by the National Science Library of the Chinese Academy of Sciences as an example, we grouped listed journals based on warning levels and years. Subsequently, we compared them in groups to determine if there are differences in key academic indicators among different warning categories, thereby evaluating the reasonableness of the warning journal list. Our research �ndings suggest that Early Warning Journal Lists may have employed inconsistent criteria when assigning warning levels. Variations in the degrees of differences or the absence of differences were observed among groups across different key academic indicators. Additionally, citation metrics like journal impact factor and journal citation indicator might not have been treated as grouping criteria in the Early Warning Journal Lists, yet this lack of detailed explanation from the creators is evident. This highlights the need for a more scienti�c and meticulous assessment of the lists of questionable journals, along with a greater emphasis on sharing detailed standards and data. Furthermore, our study offers recommendations for future formulation of lists of questionable journals by various institutions.


Introduction
The lists of questionable journals gather academic journals with potential issues related to research quality and integrity onto a single list, such as Beall's List[1], Cabell's Journal Blacklist [2], and Early Warning Journal Lists [3].The lists of questionable journals are considered an effective method of research evaluation by universities and research institutions (Nature 2018).They serve as lters for research evaluation agencies to quickly identify lowquality publications.Moreover, they provide guidelines for researchers to shape their publishing behavior and attempt to ensure research integrity and quality.However, the lists of questionable journals are highly controversial.Different developers of such lists employ varying criteria, resulting in the inclusion of different academic journals whose accuracy and trustworthiness has been questioned (Strinzel et al. 2019).
Previous studies have concluded that 28 criteria of Cabell's Journal Blacklist need clari cation and revision, and 39 should be removed (Teixeira da Silva et al. 2023).Additionally, Beall's list has also faced criticism due to its vague criteria (Teixeira da Silva and Kimotho, 2021).Indeed, there appears to be a gray area between "good" and "bad" (Dunleavy, 2022), which casts doubt on the reasonableness and trustworthiness of the lists of questionable journals (Teixeira da Silva and Tsigaris, 2018).In contrast, some scholars have defended the lists of questionable journals and proposed ways to improve the criteria of the lists (Frandsen, 2019;Nelhans and Bodin, 2020;You et al. 2022).
Despite the criticism, the lists of questionable journals continue to be widely used in quality assurance.In December 2020, the National Science Library of the Chinese Academy of Sciences (NSLC) [4] published the "Early Warning Journal List 2020" (EWJL 2020) containing 65 English academic journals [5], all of which are indexed by the JCR (Journal Citation Reports).These journals were considered potentially lacking academic rigor.One year later, NSLC released the "Early Warning Journal List 2021" (EWJL 2021), which includes 35 English academic journals that are also indexed by the JCR [6].The release of the EWJLs is seen as a response to China's research integrity policies.In May 2018, the State Council of the People's Republic of China issued "Several Opinions on Further Strengthening the Construction of Research Integrity" (State Council of the People's Republic of China 2018).This policy mentioned several possible measures to achieve this goal: "The Ministry of Science and Technology should establish an early warning mechanism for academic journals to support relevant institutions.An early warning list of domestic and international academic journals should be published, and dynamic tracking and timely adjustment should be implemented.Academic journals that neglect research quality, disorderly management, or put commercial interests rst will be blacklisted.
In February 2020, the Ministry of Science and Technology of the People's Republic of China introduced the policy document titled "Measures to Eliminate the Bad Orientation of 'Paper Only' in Scienti c and Technological Evaluation (Trial)", emphasizing the use of warning journal lists to ensure research quality (Ministry of Science and Technology of the People's Republic of China 2020).Within the context of "nurturing and enhancing highquality scienti c journals in China", it reiterated a focus on: "Enhancing the effectiveness of early warning mechanisms for academic journals, regularly releasing warning lists for both domestic and international academic journals, and implementing continuous monitoring and prompt adjustments." After the release of EWJL 2020, more than 40 Chinese universities and institutions have promptly adopted the EWJL 2020 and EWJL 2021 as a basis for their research quality assurance (Tang and Jia, 2022).Researchers who publish papers in journals listed in EWJLs will not receive any material rewards or honors (Jiamusi University 2023).However, the release of EWJL 2020 has sparked considerable controversy.Multidisciplinary Digital Publishing Institute (MDPI) [7] had 22 academic journals under its umbrella included in EWJL 2020.
Subsequently, MDPI issued a statement expressing concern that the criteria for inclusion of journals in EWJL 2020 were not su ciently clear (MDPI 2022).
Upon reviewing the literature and various viewpoints, the task of defending the reasonableness of lists of questionable journals remains challenging.As a tool or policy designed to ensure research quality and integrity, the lists of questionable journals not only have the potential to signi cantly impact researchers' publishing practices (Petrou 2021) but also may in uence institutions or universities in their practices of ensuring research quality and integrity.Only when a policy or tool is reasonable can it potentially assist in enhancing research quality and ensuring research integrity.Therefore, evaluating the reasonableness of such lists holds signi cant importance.The generation of lists of questionable journals stems from typological thinking, as it attempts to categorize academic journals based on certain attributes and assign them concepts or symbols.For instance, EWJLs assign warning levels (low, middle and high) to questionable journals, in order to distinguish academic journals based on varying degrees of research quality and integrity risk.However, when delineating speci c entities, the reasonableness of the criteria used for categorization must be considered.This involves signi cant and consistent differences in attributes between these entities.This viewpoint can be illustrated by Aristotle's view in his work on zoology: "A species is de ned through its genus and its differentia (what came to be known as de nition per genus et differentiam).What a species is, is determined by the genus to which the species belongs and what differentiates that species from all other species of the same genus."(Sandford 2019, 6) If we view academic journals as an ecosystem, then questionable journals represent an "species" within this system.They should exhibit distinct bibliometric attributes from other "species" (e.g., regular journals) since they fall within the category of publications.Does a signi cant difference exist between questionable journals and other regular journals?This question has amassed some evidence (Kulczycki et al. 2021).Building upon this, we raise another under-discussed issue: the reasonableness of the criteria used for categorizing within questionable journals.Considering the extensive in uence of EWJLs and its a liation with a top-tier Chinese research institution, along with its internal division into various warning levels and newly created groups after annual updates (such as excluded groups, retained groups, and added groups), we use it as an example to explore the following two questions regarding the reasonableness of this list: RQ1: Did EWJLs consider consistent bibliometric attribute differences when assigning warning levels?RQ2: Did EWJLs consider consistent bibliometric attribute differences when excluding, adding, and retaining academic journals?
Our ndings can offer more reliable evidence for ongoing scholarly debates and provide reference points for countries developing lists of questionable journals, ultimately fostering the creation of more dependable policies or tools to ensure research quality and integrity.
[1] The Beall's List is a compilation of predatory journals.The original website of Beal's list has been removed for certain reasons, and the list has been archived on the following website: https://beallslist.net/.
[3] While some studies utilize the concept of "journal blacklists", we have opted for the alternative concept of "the lists of questionable journals" due to its more inclusive nature.
[4] NSLC is an a liated institution of the Chinese Academy of Sciences, the highest-level academic institution for natural sciences in China (https://www.cas.cn/zz/yk/201410/t20141016_4225142.shtml).NSLC is recognized as one of the most proli c and cited research organizations in the eld of library and information sciences in China, possessing a strong representation in this domain (http://english.las.cas.cn/About/about/).

Sample Selection
We chose EWJL 2020 and EWJL 2021 as the primary samples (EWJL 2022 suspended publication for unknown reasons).EWJL 2021 comprises 35 journals, while EWJL 2020 comprises 65 journals.There are 9 journals that overlap between the two lists.Then we identi ed three journals with missing data and excluded them.Speci cally, EWJL 2021 includes 35 journals, and EWJL 2020 includes 62 journals.

General Information
We classi ed the research domains of 88 warning journals according to the JCR classi cation criteria.The results are presented in Figure 1.As illustrated in Figure 1, 50% of the warning journals are categorized under medicine, followed by engineering (15.9%) and biology (14.7%).Moreover, there is one warning journal in each of the domains of multidisciplinary science and agricultural and forestry science.

Group Division
To address RQ1, we classi ed the journals according to their warning levels.In EWJL 2020, there are 28 journals with low-warning-level, 28 journals with mid-warning-level warnings, and 6 journals with high-warning-level.In EWJL 2021, there are 15 journals with low-warning-level, 15 journals with mid-warning-level, and 5 journals with high-warning-level.Since the overlapping journals have different warning levels in 2020 and 2021, we separately categorized them for statistical purposes.Therefore, this process resulted in 97 samples.
To address RQ2, we categorized the journals based on their ow status.For academic journals that appear in EWJL 2020 but not in EWJL 2021, we categorized them into the "excluded group".Academic journals that are present in EWJL 2021 but not in EWJL 2020 are classi ed as the "included group".Academic journals that appear in both EWJL 2020 and EWJL 2021 are grouped under the "retained group" (overlapping journals).There are 26 journals in the included group, 9 journals in the retained group, and 53 journals in the excluded group.Due to the need for comparing their data before and after the ow, we gathered relevant data for both 2020 and 2021 [8].Therefore, this process resulted in a total of 176 samples (88 samples each year).

The Key Academic Indicators
To measure the bibliometric attributes of different groups, we operationalized these attributes as key academic indicators.The NSLC exclusively provides limited information about EWJLs, including the journal name, discipline, and warning level.No further details are available regarding individual journals.However, in late 2020, the NSLC released a list that outlined seven bibliometric criteria utilized for journal selection.These bibliometric criteria include the number of articles in the journal, degree of internationalization, rejection rate, article processing charges (APCs), journal citation success index, self-citation rate, and retraction information (We have chosen to quantify this criterion using the number of retractions, and the number of retractions by Chinese authors).
The bibliometric criteria (it can also be regarded as bibliometric attributes, as their purpose is to distinguish between different questionable journals).were initially published in Chinese, but an English version was also made available alongside the EWJL 2021 (NSLC, 2021).Although the criteria remained unchanged from 2020, there are slight disparities between the original Chinese version and its English translation.Notably, the Chinese version includes the criterion of "degree of internationalization," while the English version omits it.Additionally, the Chinese version describes the "number of articles in the journal" as the "growth rate of productivity" in the English version.Moreover, the English version explicitly mentions "paper mill," whereas the Chinese version does not.
Since the rejection rate of academic journals is usually not publicly available, we excluded this criterion as a key academic indicator for subsequent analysis.Furthermore, the NSLC has not disclosed journal citation success index for the warning journals, and this information is unavailable to us.As a result, we have no choice but to consider using the Journal Citation Indicator (JCI) as a substitute for this key academic indicator.The reason is that both indicators serve as supplements and enhancements to the JIF (Torres-Salinas, Valderrama-Baca, and Arroyo-Machado 2022).To investigate the reasonableness of EWJLs, we conducted a study where we regarded these bibliometric criteria (key academic indicators) as the bibliometric attributes of questionable journals.This effort involves thorough scrutiny, taking into consideration prior scholarly work (Zhang et al., 2022).Furthermore, we have incorporated supplementary key academic indicators, including JIF, article in uence, number of open access (OA) articles, and the number of retractions by Chinese authors.Detailed information regarding the data sources and all indicators can be found in Table 1.
Table 1.Selection of key academic indicators, data sources, and their explanation The number of international collaborations (papers that contain one or more international co-authors) for an entity divided by the total number of documents for the same entity represented as a percentage

Data Analysis
Firstly, we utilized Prism 10.0 to conduct the D'Agostino-Pearson test in order to examine the adherence of key academic indicators to a normal distribution.The D'Agostino-Pearson test assumes, under the null hypothesis, that all values are drawn from a population that follows a Gaussian distribution.If the obtained p-value is less than 0.05, we reject this hypothesis, indicating a departure from normality in the data.
Secondly, to answer RQ1, we classi ed the journals into three distinct groups based on their warning levels and proceeded to analyze potential statistically signi cant differences among these groups.For key academic indicators that satis ed the assumptions of normality and exhibited homogeneity of variances, we performed a parameter test using the Brown-Forsythe test within the framework of one-way ANOVA (analysis of variance).Subsequently, we reported both the F-value and the signi cance level to provide a comprehensive understanding of the results.Conversely, for key academic indicators that deviated from a normal distribution, we employed the nonparametric Kruskal-Wallis test in one-way analysis of variance.The resulting Kruskal-Wallis statistic and signi cance level were reported to facilitate a comprehensive interpretation of the ndings.
To address Research Question 2, we divided our samples into three distinct groups according to their ow status.
We then proceeded to examine potential statistically signi cant variations between the years 2020 and 2021 (before and after the ow).For key academic indicators that adhered to the assumptions of normality and exhibited homogeneity of variances, we conducted a parameter test called the Ration paired t-test within the framework of paired sample t-test.Subsequently, we reported the calculated t-value and signi cance level, allowing for a thorough analysis of the observed trends.For key academic indicators that deviated from a normal distribution, we employed the nonparametric Wilcoxon test within paired sample t-test.We reported the Sum of signed ranks (W) and signi cance level to ensure a comprehensive interpretation of the observed changes.
Finally, based on the outcomes of the group comparisons, we discuss the reliability of the warning journal list formulated by NSLC and offer a valuable reference for countries or institutions considering the development of similar lists.
[8] Since EWJLs have not disclosed any details about the data, and considering its update frequency is once a year, we speculate that EWJLs focus on dynamic and real-time data.Therefore, we utilized data from the status of the journal in 2020 (before the ow) and in 2021(after the ow), without addressing its long-term data variations.

Descriptive Analyses
To provide a comprehensive description of the sample situation, we conducted a descriptive statistical analysis on the data from 88 journals (n=176) for the years 2020 and 2021.The results are presented in Table 2 and Figure 2. The ndings of the descriptive statistical analysis indicate that compared to the year 2020, the mean values of key academic indicators such as the number of articles, degree of internationalization, and article in uence have decreased for all journals.However, they have shown higher mean values in terms of APCs, selfcitation rate, number of retractions, JIF, number of OA articles, and number of retractions by Chinese authors.
From 2020 to 2021, the mean values of JCI remained consistent across all journals.Notably, the mean JIF for all journals in 2020 was 3.43 with a maximum value of 8.89, while in 2021, the mean JIF was 3.66 with a maximum value of 10.18.Some academic journals with JIF greater than 5 are also considered warning journals, despite being seen as high-impact or high-quality journals in certain elds (McGrath et al., 2020).
In the year 2020, the highest APCs value among all journals was 34.91 million USD, and in 2021, it increased to 37.13 million USD.However, correspondingly, the maximum number of OA articles in 2021 decreased by 3831, while the average number increased by 376 compared to 2020.Retraction information is also noteworthy.In 2020, there were 44 journals with zero retractions, whereas in 2021, there were 35.In 2020, there were 3 journals with retraction occurrences exceeding 50 times, and in 2021, this number increased to 5. The number of retractions by Chinese authors is concerning.In the year 2020, 36 journals had retraction occurrences by Chinese authors accounting for more than 50% of the total retractions.This number increased to 41 journals in 2021.

Normality Test Results
When conducting ANOVA and paired t-tests, it is essential to evaluate the normal distribution of the data to determine the appropriate statistical tests to employ.As shown in Table 3, we performed a test for normality on the data of the 88 journals for the years 2020 and 2021 (n=176).The results indicate that the degree of internationalization of the journals in 2021 (P=0.2331) and JIF of the journals in 2020 (P=0.1766)passed the D'Agostino-Pearson test, suggesting that these two variables follow a normal distribution.Therefore, parametric tests should be used to compare inter-group differences for these variables.However, the remaining variables did not pass the D'Agostino-Pearson test (P<0.05),indicating that they do not follow a normal distribution.Therefore, non-parametric tests should be used to compare inter-group differences for these variables.(3) There were only marginal differences between the high-warning-level and mid-warning-level groups.In EWJL 2020, no statistically signi cant differences were found between the high-warning-level and mid-warning-level groups in any key academic indicators.In EWJL 2021, the high-warning-level and mid-warning-level groups differed only in number of articles in the journal (P=0.0451) and APCs (P=0.0483), with no differences in other key academic indicators.This suggests: (1) EWJL 2020 and EWJL 2021 can aid researchers or research institutions in swiftly distinguishing journals with potential quality risks from those with substantial quality risks.However, distinguishing journals with higher quality risks from those with signi cant quality risks may be unreasonable (mid-warning-level and high-warning-level groups exhibit minimal differences across many key academic indicators).( 2) NSLC might have employed inconsistent criteria when assigning warning levels to academic journals.For instance, in EWJL 2020, the low-warning-level group differed from the mid-waning-level group in number of articles in the journal, while it exhibited no differences with the high-warning-level group in the same indicator.
(3) In scientometrics, the JIF and JCI, which serve as crucial metrics for assessing the quality and impact of academic journals (Ansari, Rahman, and Hashem Hussein Al-Attas, 2020), do not appear to have been adopted by the NSLC.

Group comparison based on ow status
In the context of ow status analysis, a comparative grouping was conducted, and the results are presented in Table 5.
(1) The results reveal signi cant differences between the added group before (2020) and during (2021) the warning period.Several key academic indicators, such as the number of articles in the journal (P<0.0001),degree of internationalization (P=0.0055),APCs (P<0.0001),article in uence (P=0.0005),number of OA articles (P=0.0001), and the number of retractions by Chinese authors (P=0.0312),exhibit statistically signi cant variations between the pre-warning and warning periods.
(2) In the two consecutive years of continuous warnings, the retained group did not demonstrate any signi cant differences in key academic indicators.Except for the number of retractions (P=0.0469), which showed a statistically signi cant distinction, other key academic indicators displayed no substantial variations between the two consecutive years of continuous warnings. (3) The excluded group showcased relatively weaker disparities before and during the warning period.While the selfcitation rate (P<0.0001) of the excluded group demonstrated statistically signi cant differences between the prewarning and warning periods, the differences in the number of articles in the journal (P=0.0112),APCs (P=0.0083), and the number of OA articles (P=0.0116) were comparatively weaker.Furthermore, there were no statistically signi cant differences in the key academic indicators between the two periods.The ndings shed light on the following aspects.(1) The criteria for including or excluding academic journals in the EWJLs may lack uniformity.Notably, indicators such as the degree of internationalization, self-citation rate, article in uence, and the number of retractions by Chinese authors displayed varying degrees of disparity between the included and excluded academic journals during their ow.
(2) The selection of the retained group appears reasonable.
Academic journals that were retained in the warning journal list did not exhibit signi cant differences in key academic indicators of quality and misconduct over two consecutive years, rea rming the importance of continued attention to these journals.(3) It is plausible that JIF and JCI were not considered as criteria for assessing ow, as none of the three groups exhibited notable differences during ow or retention, based on these two key academic indicators.

Discussion
We analyzed the appropriateness of using EWJL 2020 and EWJL 2021, published by NSLC, as a means to ensure research quality and integrity.Previous studies have primarily focused on creating academic journal lists to identify mainstream academic journals (Safón and Docampo, 2023) or analyzing the overlap, purposes, and researchers' perceptions of different academic journal lists (Chen, 2019;Adams and Johnson, 2008).There has been ongoing debate, especially concerning the so-called "the lists of questionable journals", but these debates lack empirical validation.
To address the limitations of existing studies, we examined whether groups with different warning levels within EWJLs exhibit differences in key academic indicators.To address RQ1, we studied the signi cant differences in speci c key academic indicators among the following groups: low-warning-level vs. mid-warning-level, lowwarning-level vs. high-warning-level, and high-warning-level vs. mid-warning-level.We found only a few differences between the mid-warning-level and high-warning-level groups.Citation indicators in scientometrics (JIF and JCI) did not show differences across the three groups with EWJL 2020 and EWJL 2021.Indicators related to research integrity, such as the number of retractions by Chinese authors, exhibited signi cant differences between the low-warning-level and high-warning-level groups, and the low-warning-level and the midwarning-level groups.This suggests that NSLC's grouping of warning journals may have considered the academic journals' performance in terms of research integrity.However, even so, the assignment of different warning levels to academic journals by NSLC lacks stronger persuasion and scienti c rigor.After all, the inconsistent differences among key academic indicators across different warning levels raise suspicions.As highlighted in a previous study (Tsigaris and Teixeira da Silva, 2021), despite the "ideal" nature of the lists of questionable journals, the biases in its formulation process, especially regarding standards, have been overlooked.Hence, if academic journals on the EWJLs fail to exhibit consistent differences in key academic indicators, maintaining the reasonableness for EWJLs becomes signi cantly challenging.
We also examined whether different academic journals in EWJLs exhibited differences in key academic indicators during their ow.To address RQ2, we categorized academic journals in EWJL 2020 and EWJL 2021 into "added group", "excluded group", and "retained group" based on their ow status.We analyzed differences in key academic indicators for these groups in 2020 and 2021.We found differences in six key academic indicators for the added group, one key academic indicator for the retained group, and four key academic indicators for the excluded group.Similarly, citation indicators in scientometrics (JIF and JIC) did not exhibit differences across the three groups.Conversely, a previous study has emphasized the necessity of citation content analysis in identifying questionable journals and non-questionable journals (Emmanuel et al., 2021).If the lists of questionable journals fail to reach consensus on inclusion and exclusion criteria, it becomes di cult to have con dence in their reasonableness.This is because if academic journals do not exhibit differences in research quality or integrity indicators before and after inclusion or exclusion, then there is a lack of reason to include or exclude them (they do not exhibit a tendency to improve their research quality or alter their misconduct behavior status).Furthermore, if the lists of questionable journals decide not to adopt traditional scientometric indicators, the reasons should be elucidated and explained, rather than merely presenting a nal result to the academic community.
In some countries, the lists of questionable journals serve not only as a tool but also as a policy to ensure research quality and integrity (Cyranoski, 2018;Bagues, Sylos-Labini, and Zinovyeva, 2019).However, while using the lists of questionable journals to address issues like low quality and research misconduct, the processes of creating these lists and establishing their standards should also undergo re ection and critical examination.As policies that can have a broad impact on the academic community, the formulation and promotion of the lists of questionable journals require consideration of various factors, including reasonableness, transparency, accountability, and potential ethical issues that the lists might introduce, such as stigmatization.
One key step in addressing these issues is to ensure that the standards for compiling the lists of questionable journals are reasonable.Using NSLC's published EWJLs as an example, our study found that EWJLs may adopt inconsistent criteria for including, excluding, or retaining academic journals, as well as for assigning warning levels to them.Although the issues with EWJLs might be isolated cases, they could also potentially re ect broader problems with the lists of questionable journals.Therefore, we should approach the lists of questionable journals with greater skepticism and exercise caution regarding its reasonableness, as we cannot ensure consistent treatment for all academic journals warned or labeled.Our recommendations are as follows.Firstly, those compiling the lists of questionable journals should assess whether academic journals exhibit differences in key academic indicators before and after inclusion, exclusion, or retention, in order to determine whether there is a tendency to improve reserch quality or address academic misconduct behavior.When the lists of questionable journals updating the status of the questionable journals, it might be worth considering the use of more extended-term data or even reducing the frequency of update, to yield more robust outcomes.Secondly, the creators of the lists of questionable journals should openly share their criteria, detailed data, and assessment methods, thereby increasing the con dence of the academic community in such lists.Thirdly, involving diverse expert groups in the creation process, including representatives from various disciplines and research institutions, will bring forth a broader range of critical perspectives, thereby enhancing the reasonableness of the formulation of the lists of questionable journals.This will help reduce biases and ensure comprehensive evaluation.
Ultimately, our goal should be to establish a robust and reasonable system that supports the enhancement of research quality and integrity while minimizing potential negative impacts on researchers and academic journals due to inclusion or exclusion from the lists of questionable journals.
Limitations and Future Directions Taking EWJLs as an example, we examined the reasonableness of its internal grouping.However, attention to its applicational reasonableness remains lacking.In other words, while universities or research institutions hope that lists of questionable journals can play a constructive role in ensuring research quality and integrity, there is still a dearth of critical thinking and evidence regarding potential ethical risks and the possible negative impact on researchers and academic publishers when implementing them.Therefore, future research could focus on this aspect, exploring the potential ethical concerns of the lists of questionable journals through qualitative studies.

Conclusion
Based on the principles of typology, we believe that the compilers of the lists of questionable journals, when categorizing and labeling academic journals, which can also be seen as endowing academic journals with new concepts or symbols, should carefully consider whether the bibliometric attributes (key academic indicators) of the designated categories exhibit inter-group differences, and whether these differences are consistent.This is a crucial factor in determining the reasonableness of such a list.This is because entities can only be classi ed into different categories when there are fundamental differences between them (Podsakoff, MacKenzie, and Podsakoff 2016).Building upon this perspective, we utilized the EWJLs published by NSLC as a case study to scrutinize its reasonableness in categorizing warning levels and making decisions about excluding, including, and retaining academic journals.This effort aimed to provide more comprehensive evidence for research in this domain.Our ndings suggest that EWJLs may have employed inconsistent criteria when assigning warning levels.Varied degrees of differences or the absence of differences were observed among groups across different key academic indicators.Consequently, the formulation of EWJLs might lack reasonable and requires careful consideration of category criteria.Furthermore, widely-adopted citation metrics such as JIF and JCI seem to have been disregarded by EWJLs as evaluation criteria, though the organization has not furnished any clari cation or justi cation on this matter.In light of these revelations, creators of such lists must adopt a more proactive stance by openly sharing their evaluation standards and furnishing comprehensive data to enhance the accessibility and transparency of list-related information.This endeavor can elevate the rationale behind these lists and foster greater con dence in their reasonableness.As questionable journals continue to emerge, we are becoming increasingly aware of the imperative to embark on a path of transformation, adopting more responsible approaches in devising quality assurance and research integrity policies.

Declarations
Figures The number of OA articles published by an academic journal each year multiplied by the journal's APCs (converted to USD based on live exchange rates) DOAJ and journals' website JCI (A4) The average category-normalized citation impact for papers published in the prior three-year period InCites Self-citation rate (A5) Calculated by dividing an academic journal's self-citations by its total citations InCites Number of retractions (A6) The number of articles retracted by an academic journal each year Retraction Watch JIF (A7) All citations to the journal in the current Journal Citation Reports year to items published in the previous two years, divided by the total number of scholarly items published in the journal in the previous two years.InCites Article influence (A8) It is calculated by multiplying the Eigenfactor by 0.01 and dividing by the number of articles in the journal, normalized as a fraction of all articles in all publications.Incites Number of OA articles (A9) The number of OA articles published by an academic journal each year InCites Number of retractions by Chinese authors (A10) The number of articles retracted (including those with Chinese authors) each year by an academic journal Retraction Watch et al. (2022) pointed out, "The geographic diversity of authors may reflect the widespread recognition of the journal in the world."Therefore, this indicator is considered an important academic metric in China.

Table 2 .
Descriptive statistical analysis divided by year

Table 5 .
Group difference test results based on mobility status