Inconsistencies in IRB decisions have been observed since the 1980s, largely due to an increase in the number of multicenter clinical trials (Goldman and Katz 1982; Hotopf, Wessely, and Noah 1995) [9, 14]; furthermore, the fact that IRB panels sometimes review the same protocol in a different way remains a major issue. IRBs aim to conform to the protocol and ensure sound research design, both scientifically and ethically. Thus, in terms of ethical considerations, dissimilar opinions are an undisputed consequence. In addition, consistency among IRBs cannot ensure a correct decision. However, major inconsistencies can affect the reliability of the IRB in terms of the wider research environment. Therefore, HRPP QA activities must verify whether IRBs identify inappropriate designs or miss essential review points.
This study analyzed the IRB minutes of panels at the same institution, following the same IRB regulations, in order to observe the consistency of review results. Several studies have shown that IRBs are relatively consistent in terms of their final decisions, but are often inconsistent regarding the reasons for their decisions. [11, 12, 20, 21] Taylor et al. [22] suggested a process for measuring the ethical quality of local IRBs. They defined seven points, including scientific value, assessment of risk, benefit, acceptable risk/benefit ratio, fair selection of subject, adequate informed consent process, and adequate mechanisms for respecting enrolled subjects, while also confirming how many points were satisfied when one protocol was reviewed by multiple IRBs. A revised common rule was also announced recently, regarding the use of a single IRB, which aims to reduce administrative costs. As of January 25th, 2018, the National Institutes of Health (NIH) requires the use of a single IRB record for most domestic NIH-funded multisite research studies. [23] Growing interest in single IRB, currently a research developed measuring process of IRB’s inconsistency called “mystery shopper” and scoring system using discussion theme. [13]
Although many studies continue to find inconsistencies, it is still unclear why such inconsistencies occur in multiple IRBs. Our analysis sought to clarify this inconsistency by considering both the issue of discussion and the characteristics of IRB members. We assessed the consistency of seven IRB panel results within a single institution. Partial consistency was observed, but we have only analyzed the results in an exploratory manner. To overcome the limitations of previous studies that had a scoring system, HRPP staff in this study comprehensively discussed and determined the review points. The total score can considered a direct measuring tool for IRB QA activity. Based on our data, complying with the GCP requirements during an IRB review showed gradual improvement over time, while protocol-specific reviews were influenced by the characteristics of each protocol, making it relatively difficult to achieve consistency or standardization. This indicates that IRB reviews are commonly affected by specific protocols. However, the observed multiple raters’ agreement and review scores both increased with an increase in the experience of panel members. Therefore, IRB members’ career spans and compatibility are factors that affect IRB review processes. Thus, active and systematic training of IRB members is recommended to improve the quality of IRB reviews. IRBs are not a permanent system—rather, they should be considered as a regular interaction space for experts and non-experts. There is a significant need to support IRBs at the institutional level, because laws change continuously and often become stricter. Our data shows that the continued involvement of IRB members is essential to ensure and improve the quality of IRB reviews. Continuous monitoring of IRB reviews is necessary for agreeable standardization, excluding extreme and odd results rather than viewing them as being of equal merit. We recommend that the Blind Review process be integrated with the HRPP's activities for assessing the quality of an IRB review.
This study has several limitations. First, it was a retrospective study that was based on IRB minutes from a single tertiary hospital. This limits the generalizability of our data to the HRPP environment. However, we report our experience as a preliminary study—we provide basic information for the Blind Review and monitoring process for IRB reviews, and hope that this is explored in further studies. In addition, blindness to the reviewers can also make the results more scientific and unbiased. Second, only a small number of protocols were included in the Blind Review. However, as our center has operated HRPP since 2010 and maintains 8 years of experience in IRB review quality assessment, we believe that any degree of bias was minimal. Third, we analyzed the review scores discretionally, and the review points used for consensus among HRPP staff had no scientific criteria. Further studies should investigate and validate this as a multi-center study involving a larger number of protocols.