To the best of our knowledge, FI investigation for HCC trials has not been performed. The FI has been evaluated in other RCTs, such as emergency medicine17, giant cell arteritis18, and Clinical Practice Guidelines19. These studies consistently show that many RCTs are fragile, and several researchers have recommended that FI should be adopted in reporting clinical trial outcomes18, 19, our study showed that most results from the randomized trials were far more fragile.
This retrospective analysis demonstarated that over 70% of the phase 3 trials supporting HCC treatments had a low FI; however, they are vulnerable to losing their significance with just a small change in the designation of a small number of events, often equating to < 1% of the sample size in an experimental group. As clinical practices or the use of drugs approved by Food and Drug Administration are developed on the results of phase 3 clinical trials, the change in the number of events required for fragility raises concerns about a statistical change in the results.
RCTs, particularly phase 3 clinical trials, are likely to remain an important evidence base for clinicians’ practice. Despite this, the statistical methodology used to establish significance in such clinical trials has barely evolved. In principle, the P value is an indication of the compatibility among data from a trial; a smaller P value implies a greater statistical incompatibility of the result with the null hypothesis (an estimation of no difference between the experimental and control group20). However, this approach has been greatly criticized for being simplistic, and has frequently been misinterpreted21. The log-rank test used in survival data analysis has advantage in that it accounts for events, but it relies on the assumption that the hazard ratio of two treatments remains constant over time. Fisher’s exact test, which is used to calculate the FI, has the disadvantage of not accounting for the time-to-event22. Thus, the FI is simplistic in its application and resolves some of these shortcomings.
Although the FI and FQ do provide a relative wealth of information when consider alongside other metrics, this study again emphasizes the limitations of the FI itself. First, clinical trials must obtain significant in effects in the treatment group, which could to be analyzed by the FI. Many non-inferiority studies cannot be included in this analysis, such as the E7080 trials of lenvatinib for HCC, which produced the same treatment results as sorafenib23. Second, because the FI relies on P value, it is essentially an extension of the most frequent approach to data analysis. Thus, it cannot be applied to an outcome of a continuous variable. Third, although many time-to-event outcomes are usually dichotomous, such as mortality, and survival, etc, the FI does not account for the difference in outcomes over time. Particularly in longer studies with variable follow-up time periods, analyses that account for time (such as a Kaplan–Meier curve, or a Cox proportional hazards model) are more appropriate than a simple binary outcome analysis. Finally, there is no specific cut-off value or lower limit of the FI to classify a study as "either fragile" or "robust".