To the best of our knowledge, FI investigation for HCC trials has not been performed. The FI has been evaluated in other RCTs, such as emergency medicine[12], giant cell arteritis, Clinical Practice Guidelines[14], and cardiac surgery field [15]. These studies consistently show that many RCTs are fragile, and several researchers have recommended that FI should be adopted in reporting clinical trial outcomes[13, 14], our study showed that most results from the randomized trials were far more fragile.
This analysis demonstrated that over 60% of the phase 3 trials supporting HCC treatments had a low FI; however, they are vulnerable to losing their significance with just a small change in the designation of a small number of events, often equating to < 1% of the sample size in an experimental group. As clinical practices or the use of drugs approved by Food and Drug Administration are developed on the results of phase 3 clinical trials, the change in the number of events required for fragility raises concerns about a statistical change in the results.
RCTs, particularly phase 3 clinical trials, are likely to remain an important evidence base for clinicians’ practice. Despite this, the statistical methodology used to establish significance in such clinical trials has barely evolved. In principle, the P value is an indication of the compatibility among data from a trial; a smaller P value implies a greater statistical incompatibility of the result with the null hypothesis (an estimation of no difference between the experimental and control group[16]). However, this approach has been greatly criticized for being simplistic, and has frequently been misinterpreted[17]. The log-rank test used in survival data analysis has advantage in that it accounts for events, but it relies on the assumption that the hazard ratio of two treatments remains constant over time. Fisher’s exact test, which is used to calculate the FI, has the disadvantage of not accounting for the time-to-event[18]. Thus, the FI is simplistic in its application and resolves some of these shortcomings.
Although the FI and FQ do provide a relative wealth of information when consider alongside other metrics, this study again emphasizes the limitations of the FI itself. First, clinical trials must obtain significant in effects in the treatment group, which means that treatment group got better results compared with control group. These trails could be included to be analyzed by the FI. Many non-inferiority studies cannot be included in this analysis, such as the E7080 trials of lenvatinib for HCC, which produced the same treatment results as sorafenib22. Second, because the FI relies on P value, it is essentially an extension of the most frequent approach to data analysis. Thus, it cannot be applied to an outcome of a continuous variable. Third, although many time-to-event outcomes are usually dichotomous, such as mortality, and survival, etc, the FI does not account for the difference in outcomes over time. Particularly in longer studies with variable follow-up time periods, analyses that account for time (such as a Kaplan–Meier curve, or a Cox proportional hazards model) are more appropriate than a simple binary outcome analysis. Fourth, our study shows a tendency of the inverse correlation between the FI and p-value, which is similar with previous FI studies [19, 20]. This might be the RCT studies included small number patients. Also, The FI was much higher as the samples increasing [21]. Finally, there is no specific cut-off value or lower limit of the FI to classify a study as "either fragile" or "robust".