Two experiences are described in which the heterogeneity assessment has been applied to data sets obtained from reconstructed patient-level data. In the first case, the high degree of heterogeneity, identified by the likelihood ratio test, was found to influence the results of indirect comparisons. In the second case, the degree of heterogeneity was low, thus suggesting full reliability of the results of indirect comparisons.
Case study n°1: first-line treatments for triple-negative advanced breast cancer
Both atezolizumab and pembrolizumab are known to significantly improve overall survival (OS) in patients with PDL-1 positivity at values of CPS ≥ 10 [4–6]. On the other hand, the controversy about the relative effectiveness of these two agents arises when we consider all PDL-1 positive patients [7]. In this population, pembrolizumab did not determine any significant survival benefit, whereas atezolizumab induced a significant prolongation of OS [4–6]. This result has found confirmation in a patient-level pooled analysis of all PDL-1 positive patients enrolled in KEYNOTE-355, IM-PASSION-130 and IM-PASSION-131 trials [7]; this analysis found a significant difference in OS in favor of atezolizumab vs pembrolizumab (hazard ratio [HR], 0.73; 95% confidence interval [CI], 0.61 to 0.87, p < 0.001; median, 20.4 vs 15.5 months). This result was obtained through an indirect comparison performed according to the Shiny method combined with the IPDfromKM tool [2]).
One hypothesis to explain this finding in the absence of a true difference between the two agents is that the population given pembrolizumab had worse prognostic characteristics than that treated with atezolizumab (or vice-versa). To assess this hypothesis, one suitable method is to perform an indirect comparison across the three control groups of the three trials. In more detail, the controls of the KEYNOTE-355 trial received chemotherapy (N = 211) whereas, in the two trials on atezolizumab (IM-PASSION-130 and IM-PASSION-131), the controls received nab-paclitaxel (N = 184) and paclitaxel alone (N = 101), respectively.
In a preliminary assessment (detailed data not shown), we verified that there was a very similar survival pattern between the controls treated with nab-paclitaxel (N = 184) in the IM-PASSION-130 trial and those treated with paclitaxel in the IM-PASSION-131 trial (HR in favor of the former control group, 0.91; 95%CI, 0.64 to 1.28; P = 0.57). Hence, these two patient groups were pooled into a single control group of 285 patients.
Thereafter, in comparing the controls of the two atezolizumab trials (N = 285) vs those of the pembrolizumab trial (N = 211), the hazard ratio (HR) in favor of the former control group was estimated to be 0.67 (95%CI, 0.54 to 0.83, p < 0.001). Figure 1A shows the Kaplan-Meier curves of this indirect comparison. Figure 1B shows the heterogeneity assessment based on the analysis of the control groups. The likelihood ratio test is 12.94 (df, 1; p = 0.0003) and the p-value is lower than threshold of 0.05; this indicates that a homogeneous model is inappropriate or, in other words, that a significant heterogeneity is present.
Case study n°2: first-line treatments for advanced or metastatic prostate cancer.
Numerous treatments have been developed for nonmetastatic castration-resistant prostate cancer [8–10]. Because direct comparisons between these treatments are not available, indirect comparisons can be of interest. The analysis conducted by Rivano et al. [11] evaluated second-generation hormone treatments proposed for this disease condition (namely, apalutamide, darolutamide, and enzalutamide). Three phase-III studies were studied; details about these studies are reported in Supplementary Table 2.
As shown in Fig. 2A, apalutamide (HR, 0.75, 95%CI: 0.64–0.88), darolutamide (HR: 0.70, 95%CI: 0.58–0.84) and enzalutamide (HR, 0.77, 95%CI: 0.65–0.90) were all significantly more effective than the controls given placebo. Our results showed no difference in OS between any of these three active agents.
To assess heterogeneity, comparisons across the controls of the 3 included trials are shown in Fig. 2B. The likelihood ratio test was 1.17 (df 2, p = 0.60); this result clearly shows that there is no significant heterogeneity in these data sets. Furthermore, using the controls of the apalutamide trial as common comparator, the following values of HR were estimated: i) controls of the darolutamide trial, HR = 1.09 (95%CI, 0.85 to 1.40; p = 0.48); ii) controls of the enzalutamide trial, HR = 1.13 (95%CI, 0.90 to 1.43; p = 0.29).