We developed the blind review methodology as a part of quality assurance (QA) activities within human research protection programs (HRPPs) to analyze consistency among IRB panel decisions. In this process, one protocol selected for blind review was submitted to every IRB panel except to the one that originally conducted the review. Three HRPP staff members with more than two years’ experience in the Human Research Protection Center (HPC) selected the protocol. Although there were no specific criteria for selection, we considered current research trends and specific issues, such as Artificial Intelligence software study, verbal consent, and germline mutation study, whose review guidelines are a work in progress. Thus, we included various research types to compare the decisions made by different IRB panels, and did not exclude poorly designed protocols. Identical protocols were submitted to each panel simultaneously via a convened meeting, and IRB members were not notified about the blind review. After the blind review, HPC QA staff assessed the results and shared them in the IRB member workshop. However, the investigators were not notified about the blind review results and thus had no obligation to respond to any comments from the blind review. The IRB’s decision regarding the protocol was made by the original review panel. A detailed description of the blind review process is shown in Figure 1.
As a QA measure, the HPC at Severance Hospital, Yonsei University College of Medicine, Seoul, Korea, has conducted a blind review annually, according to the HRPP standard operating procedures (SOPs). We retrospectively analyzed the blind review records of eight protocols conducted between January 2010 and December 2018. The first three results, from 2010, were pilot tests across four panels; the other five, one for each year from 2014 and 2018, were across seven panels and aligned with the SOP. No blind review was conducted between 2011 and 2013 as our institution had to develop an e-IRB system from the paper review process. We collected IRB member designation logs from the HPC database to assess IRB member factors, such as IRB years of experience and major specialty. We did not enroll any human research subjects, or use their personal data. Therefore, IRB approval was not required for this study.
Review criteria specified by the Severance Hospital IRB
Severance Hospital follows domestic regulations for research involving human participants, such as the Pharmaceutical Affairs Act, Bioethics and Safety Act, Korean GCP guidelines, and the Medical Device Act. However, if there are issues not specified in domestic laws, and if the following regulations are stricter than the ones specified in domestic laws, internal regulations will be provided by referring to International Council for Harmonization (ICH), GCP guidelines, US Department of Health and Human Services regulations, federal and local laws, regulations for Food and Drug Administration participants (21 CFR 50, 56, 312 and 812), etc. The criteria of IRB review and approval are as follows:
(1) Approval: If the research activity conforms to the criteria for approval defined by the related law, and if modifying the research is not recommended.
(2) Approved with modifications: The research activity meets the criteria for approval, and the modifications required by the IRB are only minor and of minimal risk.
(3) Deferred: The study does not meet the criteria for approval as defined in the relevant laws and regulations, lacks sufficient information to conduct an adequate review, and/or if substantial revisions to the protocol are necessary.
(4) Disapproved: The study is found to be unethical, without scientific or scholarly merit, and/or does not meet the criteria for approval as defined in the relevant laws and regulations.
(5) Tabled: A study that cannot be reviewed at the meeting because of lack of time, lack of quorum, and/or extenuating circumstances.
Scoring method for IRB review results
To assess the review quality and standardization, HPC staff developed “essential review points,” based on GCP and HRPP guidelines, and “protocol review points” that should be detected by IRB members. These were to be discussed during the review. In detail, as a first step, the QA team staff analyzed the IRB minutes and set out the discussion points. The second step was to collect the IRB minutes of all panels and confirm the details of the discussion results. Similar to conducting a systemic review or meta-analysis study, we found the specific word and other similar words to ensure that all relevant discussions were included. Then, the QA team discussed and reached a consensus on the content of the discussion and whether it validated the blind review score. Thereafter, we developed a 3-point scale with the values 0, 5, and 10, with 10 indicating that both reviewers addressed each review point and 5 indicating that the primary reviewer addressed it but not the secondary reviewer. This was applied even when a wrong comment was made in the IRB review (it is important to know what to point out as well as make accurate comments). Thus, even if an accurate point was not made, if any IRB member discussed the topic, we marked 5 points for the question. A score of 0 was allocated when none of the IRB members discussed or mentioned the issue. For GCP requirements, we identified “risk and benefit analysis” (ICH-GCP 2.2), “determining the continuing review interval” (ICH-GCP 3.3.4), and “research resources” (ICH-GCP 3.1.3., 3.1.9) as mandatory review points based on GCP regulations . In addition, we assessed whether the panels identified the issues to be discussed, as given information what is required. The consent process, vulnerability of participants, waivers for stored specimens, concerns regarding the confidentiality of participants, randomization methods, and the appropriateness of placebo procedures for the control group formed the majority of these considerations.
Descriptive data were expressed as mean ± standard deviations. Differences between review scores, according to member expertise and IRB experience, were analyzed using the Wilcoxon rank-sum test and Kruskal-Wallis test [19, 20]. Since Fleiss’ kappa has paradoxically low values, despite the high proportion of agreement when the marginal totals of raters are imbalanced, the observed multiple rater agreement was used instead of Fleiss’ kappa to assess the agreement of IRB results among panels . The observed multiple rater agreement was estimated by summing all pairwise agreement tables from the panels, and a 95% confidence interval (CI) was obtained by 1,000 instances of bootstrap resampling . All statistical analyses were conducted using SPSS statistical software version 23 (SPSS Inc., Chicago, IL, USA), SAS software version 9.4 (SAS Institute Inc., Cary, NC), and R software version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). A two-sided p-value <0.05 was considered statistically significant.