Blind Review
We developed the blind review methodology as a part of quality assurance (QA) activities within human research protection programs (HRPP) to analyze the consistency among IRB panels’ decisions. Under this process, one protocol is selected for blind review—this protocol is submitted to every IRB panel except the one that originally made the review. Three HRPP staff members with more than two years of experience in the Human Research Protection Center (HPC) selected the protocol. Although there were no specific criteria for selection, we considered current research trends and specific issues, such as Artificial Intelligence software study, verbal consent, and germline mutation study, whose review guidelines are a work in progress. Therefore, we included various research types to compare the decisions made by different IRB panels, and did not exclude poorly designed protocols. Identical protocols were submitted to each panel simultaneously via a convened meeting, and IRB members were not notified about the blind review. After the blind review, HPC Quality Assurance (QA) staff assessed the results and shared them in the IRB member workshop. However, the investigators were not notified about the blind review results so that they have no obligation to respond to any comments from the blind review. The IRB’s decision for the protocol was made by the original review panel. A detailed description of the blind review process is shown in Figure 1.
Materials
As a QA measure, the HPC at Severance Hospital, Yonsei University College of Medicine, Seoul, Korea, has conducted a blind review annually, according to the HRPP standard of procedure (SOP). We retrospectively analyzed the blind review records of eight protocols conducted between January 2010 and December 2018. The first three results, in 2010, were pilot tests across four panels; the other five, one for each year from 2014 and 2018, were across seven panels and streamlined with the SOP. No blind review was conducted between 2011 and 2013 as our institution had to develop an e-IRB system from the paper review process. We collected IRB member designation logs from the HPC database to assess IRB member factors, such as IRB experience duration and major specialty. We did not enroll any human research subjects, nor their personal data. Therefore, IRB approval was not required for this study.
Review criteria specified by the Severance Hospital IRB
Severance Hospital follows domestic regulations for research involving human participants, such as the Pharmaceutical Affairs Act, Bioethics and Safety Act, Korean GCP guidelines, and the Medical Device Act. However, if there are issues not specified in domestic laws, and if the following regulations are stricter than the ones specified in domestic laws, the internal regulations will be provided by referring to ICH (International Council for Harmonization) GCP guidelines, US Department of Health and Human Services (DHHS) regulations, federal and local laws, regulations for Food and Drug Administration (FDA) participants (21 CFR 50, 56, 312 and 812), etc. The criteria of IRB review and approval are as follows:
(1) Approval: If the research activity conforms to the approval criteria defined by the related law, and if modifying the research is not recommended.
(2) Approved with modifications: The research activity meets the criteria for approval, and the modifications required by the IRB are only minor and would be considered of minimal risk.
(3) Deferred: The study does not meet the criteria for approval as defined in the relevant laws and regulations, lacks sufficient information to conduct an adequate review, and/or if substantial revisions to the protocol are necessary.
(4) Disapproved: The study is found to be unethical, without scientific or scholarly merit, and/or does not meet the criteria for approval as defined in the relevant laws and regulations.
(5) Tabled: A study that cannot be reviewed at the meeting due to lack of time, lack of quorum, and/or extenuating circumstances.
Scoring method for IRB review results
To assess the review quality and standardization, HPC staff developed “essential review points,” based on GCP and HRPP guidelines, and “protocol review points” that should be detected by IRB members. These are to be discussed during the review. We developed a 3-point scale with the values 0, 5, and 10; here, 10 denotes both reviewers addressing each review point, and 5 denotes the primary reviewer addressing it but not the secondary reviewer. This was applied even when a wrong comment was made in the IRB review; it is important to know what to point out as well as make accurate comments. Thus, even if an accurate point was not made, if any IRB member discussed the topic, we marked 5 points for each question. A score of 0 was allocated when none of the IRB members discussed or mentioned the issue. For GCP requirements, we identified “risk and benefit analysis” (ICH-GCP 2.2), “determining the continuing review interval” (ICH-GCP 3.3.4), and “research resources” (ICH-GCP 3.1.3., 3.1.9) as mandatory review points based on GCP regulations [18]. In addition, we assessed whether the panels identified issues to be discussed, as derived by the HPC staff. The consent process, vulnerability of participants, waivers for stored specimens, concerns regarding the confidentiality of participants, randomization methods, and the appropriateness of placebo procedures for the control group formed the majority of these considerations.
Statistical Analysis
Descriptive data were expressed as mean ± standard deviations. Differences between review scores, according to member expertise and IRB experience, were analyzed using the Wilcoxon rank-sum test and Kruskal-Wallis test [19, 20]. Since Fleiss’ kappa has paradoxically low values, despite the high proportion of agreement when the marginal totals of raters are imbalanced, the observed multiple rater agreement was used instead of Fleiss’ kappa to assess the agreement of IRB results among panels [21]. The observed multiple rater agreement was estimated by summing all pairwise agreement tables from the panels, and a 95% confidence interval (CI) was obtained by 1,000 instances of bootstrap resampling [22]. All statistical analyses were conducted using SPSS statistical software version 23 (SPSS Inc., Chicago, IL, USA), SAS software version 9.4 (SAS Institute Inc., Cary, NC), and R software version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). The two-sided p-value of <0.05 was considered statistically significant.