The methodology of this study was adopted from an earlier protocol version published online [12]. The applied changes from the original protocol are listed in Appendices/Section 1. As much as applicable, this study is reported according to the PRISMA statement [13] (see Appendices/Section 2).
Systematic literature search of systematic review reports
Systematic review reports that met all of the following criteria were included:
 Systematic review of RCTs;
 RCT quality rated using the second version of Cochrane’s Risk of Bias (RoB) tool (as recommended by Sterne et al., 2019 [4]);
 Application of the RoB 2 tool indicated in report abstract;
 Includes at least one RCT rated as of overall “highbias” and one RCT rated as of “lowbias” risk.
PubMed was searched until January 24, 2024, using the following string of search terms:
(systematic review rob 2 OR Cochrane RoB 2.0 OR Cochrane RoB 2) AND systematic review
with the following set limits: Article type = Systematic review; Publication date = 1 year.
One reviewer (SM) conducted the search by screening citation titles and abstracts. Systematic review reports in line with the selection criteria were retrieved in full copy. A second reviewer (VY) independently verified the retrieved articles for eligibility. Disagreements were resolved via discussion and consensus.
Data extraction and management
The full references of each selected systematic review report were recorded and an ID number assigned to each. The following data was extracted from each report and recorded in a MS Excel file:

Systematic review ID number;

Number of RCTs with overall “low bias” rating

Number of RCTs with overall “high bias” rating

Full reference of all RCTs
Systematic reviews found during this process that did not include a high and a low bias risk rating for at least one RCT, each, did not report an overall biasrisk rating for trials, reported the appraisal results of the 1st RoB tool version instead of the RoB 2, did not clearly apply or reported the appraisal result in line with the RoB 2 tool, did not make supplementary material with details of bias risk appraisal accessible online, did not appraise RCTs but other types of studies, gave no information about trial appraisal, did not publish the RoB 2 graph which reported the appraisal results for each RCT, in a readable manner, or did not report the full references of the appraised trials, were excluded.
One reviewer (SM) extracted and entered all data into a MS Excel sheet. A second reviewer (VY) verified all data entry for accuracy. Disagreements were resolved via discussion and consensus.
RCT extraction and test for selection bias risk
From the selected systematic reviews, all citations of the reviewed RCTs were extracted. With the help of an experienced librarian, an attempt was made to retrieve all of the identified RCTs in full copy. One reviewer (SM) reviewed the full RCT reports for eligibility, in line with the following selection criteria:

Trial reference reported by systematic review;

Full traceable clinical trial report;

Two separate treatment groups included;

Treatment groups were randomised and not matched by reported baseline variables;

Baseline variables reported per treatment group;

Mean (SD) values and precise sample size (N) reported per group;

No duplicates/different report of the same trial;
A second reviewer (VY) independently verified the reviewed reports for eligibility. Disagreements were resolved by discussion and consensus.
All selected RCTs were tested for selection bias risk following the test method suggested by Mickenautsch and Yengopal (2023) [11]. From each RCT, details of the baseline variable ‘age’ was extracted for one test group and one control group, including mean value, standard deviation (SD), and number of subjects (N). If ‘age’ was not reported, another reported baseline variable was chosen. Where more than one test and/or control group was reported, data extraction was limited to the test and control groups with the most significant baseline variable differences. Where standard error (SE) has been reported instead of SD, the SE was converted to SD using the formula: SD = SE x √N. All trials that did not report mean (SD or SE) values were excluded. Trials that reported median values with either minimummaximum or interquartile range (IQR) were later included in the sensitivity analysis.
For each trial, two “simulated comparator trials” (SCT) were generated (Appendices/Section 3). Each SCT consisted of two parallel data columns entered into an MS Excel sheet:

Column 1: Random allocation sequence for two groups, A and B;

Column 2: List of randomly selected values within the trialspecific age range, sorted in ascending order.
The total number of subjects combined for the test and control group was set at N = 100 per group. The random allocation sequence in column 1 was generated by blockrandomisation (block size = 4 for two groups: A, B) using the “Sealed Envelope” online tool [14]. The ascending list of randomly selected values in column 2 was generated using an online random number generator [15]. The comprehensive version of the online generator was used to randomly select the values of the baseline variable for each subject with the following settings: Lower limit = 8 /Upper limit = 80; Numbers to be generated = 200; Allow duplication of results? = Yes; Sort the results? = Yes/Ascend; Type of result to generate = Integer. Data column 2 was sorted according to allocation to group A and B in column 1 using the sorting function in MS Excel. This process was repeated separately for the two SCTs, with separate sequences generated for columns 1 and 2, respectively.
After sorting, the mean (SD) value for each of the two SCTs were calculated and entered, together with the sample size per group A and B, into a fixed effect metaanalysis (Review Manager – RevMan 5.0.24 software). The two SCTs were pooled using inverse variance method and the resulting zero I2 point estimate confirmed (Appendices/Section 3).
In order to test a RCT for selection bias risk, the mean (SD) value of the baseline variable together with the sample size (N) per group were entered into the generated SCT metaanalysis and the analysis repeated. The resulting new I2 point estimate was recorded. This procedure was repeated for each individual RCT, separately throughout the study (Appendices/Section 4).
If the I2 point estimate of the repeated metaanalysis was also zero %, the test result was considered negative and no selection bias risk for the tested RCT assumed. If the point estimate showed an I2 > 0% value, the test result was considered positive, and the tested RCT was assumed to be at high risk of selection bias.
Main statistical analysis
Throughout testing, either true negative (TN) or false negative (FN) and true positive (TP) or false positive (FP) values for overall “low bias” and overall “high bias” risk (RoB 2) ratings, respectively, were established:

TP = “High bias” rated trials with positive test result (I2 > 0%);

TN = “Low bias” rated trials with negative test result (I2 = 0%);

FN = “Low bias” rated trials with positive test result (I2 > 0%);

FP = “High bias” rated trials with negative test result (I2 = 0%).
From these, the False Omission Rate (FOR – defined as the ratio between FN and the sum of FN + TP results, reported in %) with 95% Confidence interval (CI) was computed. Within the context of this study, the FOR (95% CI) was considered as the probability for a RCT to have high selection bias risk, given an overall “lowbias” risk rating using the 2nd version of Cochrane’s risk of bias tool. A FOR of zero % value indicates zero probability for a “low bias” risk (RoB 2) rated RCT to have high selection bias risk. It also indicates that no FN ratings were established.
Sensitivity analysis
Sensitivity analysis was conducted by adding data from RCTs reporting median values with either a minimummaximum (min/max) range or IQR, followed by the exclusion of data from RCTs with any baseline variable other than ‘age’from the main analysis.
Median values with a min/max range or IQR were converted into mean (SD) estimates following the methods by Hozo et al. (2005) [16] and Wang et al. (2014) [17], respectively.
Subgroup analysis
Subgroup analysis was conducted for RCTs that were included in the main analysis and reported either a “low bias” or “high bias” risk rating for RoB 2 domain 1 concerning “Bias arising from the randomisation process”. Like in the main analysis, either true negative (TN) or false negative (FN) and true positive (TP) or false positive (FP) values for “low bias” and “high bias” risk (RoB 2) ratings, respectively, were established using the same definitions as applied in the main analysis related to the established I2 values.
From the data, the negative likelihood ratio (LR) with 95% Confidence Interval (CI) was computed. In line with convention [18], the –LR in this study was adopted as the ratio of:
divided by the
The computed –LR (95% CI) was interpreted as the likelihood of a RCT, rated as of “low bias” risk for RoB 2 domain 1, to actually have low selection bias risk: LR < 0.1 = highly, 0.1–0.2 = moderately, 0.2–0.5 = little, and 0.5–1.0 = rarely likely [19].
A negative likelihood ratio close to 1.0 indicated that a RCT, rated as of “low bias” risk for RoB 2 domain 1, was almost likely to have low selection bias risk than high selection bias risk, and a negative likelihood ratio larger than 1.0 indicated that the risk of selection bias for such RCT was more likely to be high than low.