3.1 Study selection and characteristics
As detailed in Figure 1, the initial search yielded 913 results. After the removal of duplicate records and ineligible studies, 36 remained and were fully reviewed based on inclusion criteria. Of these, a total of 13 randomized controlled trials (RCTs) [25–38] were included, comprising 1,124 patients. A total of 687 (61,1%) patients were in the intervention group, of which 27 (3,9%) received saroglitazar, 138 (20,1%) received elafibrinor, 87 (12,6%) received fenofibrate, 331 (48,2%) received seladelpar and 104 (15,1%) received bezafibrate. Study characteristics are reported in Table 1. The total number of women in the study was 1039 (92,4%). The mean age of participants was 55.6 years. Mean ALP level at baseline was 325.2 units/liter. Thirteen studies administered UDCA or allowed its continuation for both groups. [25–29,31–38] In addition, ten studies included patients with unresponsive or inadequate biochemical responses to UCDA. [25–31,33,34,36] Finally, there was a significant variability of follow-up duration between the studies. (Table 1)
3.2 Pooled analysis of all studies
In those patients receiving PPAR agonists, the ALP (MD -130.93, 95% CI -156.44 to -105.42, p<0.01, I²=84%, Figure 2), GGT (MD -39.83, 95% CI -78.44 to -1.22, p=0.04, I²=94%, Figure 3) and total bilirubin levels (SMD -0.03, 95% CI -0.06 to -0.01, p<0.01, I²=69%, Figure 4) were significantly lower when compared to control group. There was no statistically significant difference in terms of direct bilirubin (SMD 0, 95% CI -0.05 to 0.04, p=0.91, I²=60%, Figure 5), AST (MD -1.85, 95% CI -5.72 to 2.02, p=0.35, I²=64%, Supplementary Figure 2) and ALT levels (MD -5.15, 95% CI -12.48 to 2.19, p=0.17, I²=85%, Supplementary Figure 3).
PPAR agonists significantly reduced the incidence of pruritus (RR 0.63, 95% CI 0.41 to 0.96, p=0.031, I²=9%, Figure 6) when compared to the control group. The normalization of ALP levels (RR 10.65, 95% CI 2.18 to 52.01, p=0.003, I²=83%, Supplementary Figure 4) was significantly higher when compared to the control group. However, the incidence of abdominal pain (RR 1.91, 95% CI 1.04 to 3.53, p=0.038, I²=0%, Supplementary Figure 5) and laboratory abnormalities (RR 2.15, 95% CI 1.37 to 3.37, p<0.001, I²=0%, Supplementary Figure 6) were significantly higher in the PPAR agonists group.
There was no statistically significant difference in terms of nausea (RR 1.65, 95% CI 0.89 to 3.05, p=0.112, I²=0%, Supplementary Figure 7), headache (RR 1.78, 95% CI 0.67 to 4.72, p=0.244, I²=37%, Supplementary Figure 8), fatigue (RR 0.85, 95% CI 0.56 to 1.28, p=0.436, I²=19%, Supplementary Figure 9), myalgia (RR 1.59, 95% CI 0.68 to 3.72, p=0.282, I²=0%, Supplementary Figure 10), and diarrhea (RR 0.68, 95% CI 0.32 to 1.48, p=0.337, I²=5%, Supplementary Figure 11).
3.3 Subgroup analyses in selected populations
In a subanalysis restricted to studies with 12 to 35 weeks (MD -152.67, 95% CI -164.99 to -140.35, p<0.01, I²=36%, Supplementary Figure 12) and 52 to 104 weeks (MD -111.20, 95% CI -139.91 to -82.49, p<0.01, I²=86%, Supplementary Figure 12) of follow-up, the change in ALP levels was significantly reduced in the PPAR agonists group compared to the control group. In addition, in a subanalysis restricted to type of PPAR agonist, ALP levels were significantly reduced in the intervention group compared to the control group: seladelpar (MD -104.05, 95% CI -122.63 to -85.47, p<0.01, I²=0%, Supplementary Figure 13), elafibrinor (MD -128.78, 95% CI -161.09 to -96.47, p<0.01, I²=68%, Supplementary Figure 13), bezafibrate (MD -169.11, 95% CI -191.59 to -146.62, p<0.01, I²=31%, Supplementary Figure 13), and fenofibrate (MD -95.34, 95% CI -150.38 to -40.31, p<0.01, I²=83%, Supplementary Figure 13).
3.4 Sensitivity analyses
The leave-one-out analysis showed the robustness of the pooled results for the levels of ALP, direct and total bilirubin. Leave-one-out analysis for the normalization of ALP levels was also consistent with the pooled results. For those outcomes, there was no significant variability in effect size with the removal of each study. For the continuous outcomes, it was not possible to identify one single study responsible for the high heterogeneity, so we identified the study that rendered the lowest heterogeneity possible for each outcome.
Meta-regression analysis for the outcome of ALP levels showed that the heterogeneity remained high (I2) and the p-value for the test of significance of the model (QMp) was not significant, independently of the chosen predictor. For the outcome of GGT levels, the heterogeneity remained high independently of the chosen predictor. However, the value of ALP at baseline was a statistically significant predictor of GGT levels (QMp = 0.049) and explained 23.8% of the outcome’s variance. For the outcome of total bilirubin levels, the heterogeneity was reduced below the 25% threshold independently of the chosen predictor. The remaining heterogeneity ranged from 17.94% to 0.05%. The time of follow up and the PPAR dosage were statistically significant predictors of total bilirubin levels (QMp = 0.029 and 0.009, respectively) (Supplementary Table 1).
3.5 Quality assessment
Individual bias assessment is reported in Supplementary Figure 1. RCTs were evaluated using Rob2. Seven studies lost points in domains related to deviation from intended interventions or outcome measurement. [29–35] A crossover study was assessed using Rob2 for crossover, considered at high risk of bias due to insufficient time for carryover effects to dissipate before outcome assessment in the second period and lack of blinding of participants and assessors. [38] The remaining studies were considered at low risk.
Publication bias was investigated for the outcomes of ALP, GGT, and total bilirubin levels, as at least 10 studies were available. Overall, some outcomes showed asymmetry of the funnel plots, but the possibility of small study effect was contradicted by a more profound analysis with the help of the enhanced contour and Egger’s Test (Supplementary Table 2) in most cases.