Subgroup Analysis in Pulmonary Hypertension-specic Therapy Trials: a Systematic Review

Background. Pulmonary hypertension (PH) treatment decisions are driven by randomized controlled trials (RCTs) results. Subgroup analyses are often performed to assess whether the intervention effect will change due to the patient’s characteristics. As subgroup claims may mislead clinician treatment decisions, there is a need for standards of such analyses. Objective. To evaluate the appropriateness and interpretation of subgroup analysis performed in pulmonary hypertension-specic therapy RCTs. Methods. A systematic review of the literature for pulmonary hypertension-specic therapy RCTs published between January 2000 and December 2020 was conducted. Claims of subgroup effects were evaluated with Sun X et al., 2012 criteria. Results. 30 RCTs were included. Evaluated subgroup analyses presented: a high number of subgroup analyses reported, lack of prespecication, and interaction test. The trial protocol was not available for most RCTs; signicant differences were found in those articles which published the protocol. Authors reported 13 claims of subgroup effect, with 12 claims meeting 4 or fewer Sun criteria. Conclusion. Subgroup analyses in pulmonary hypertension-specic therapies are of poor quality. The lack of published protocols limited our capability to assess whether the published results correspond to the initially predened analyses. Most claims of subgroup effect did not meet critical criteria.


Introduction
Pulmonary hypertension (PH) is a relatively frequent complication of multiple clinical disorders [1]. Among other factors, the variety of aetiologies of PH makes it an extremely complex disease; for this reason, a clinical classi cation into 5 categories has been developed to group PH according to clinical presentation, ndings, underlying conditions, and treatment [2]. As PH affects older patients disproportionally and may cause rapid deterioration and an increased risk of death, it is considered a major health issue, speci cally in countries with older populations [3]. Several drugs with diverse pharmacological mechanisms have been developed for the treatment of PH. The choice of treatment for PH will vary according to the group of pulmonary hypertension to be treated, as therapies usually considered appropriate may even be harmful in a certain subgroup of patients [1]. Treatment decisions in PH are driven by results from randomized controlled trials (RCTs). Usually, only average results are reported in RCTs, and trial participants are often recruited from heterogeneous populations. However, clinicians ideally want more speci c information to assist them in applying trial results to individual patients. Researchers conducting RCT usually perform subgroup analysis to assess whether the intervention effect will change due to the patient's baseline characteristics such as underlying pathologies, age, sex, or disease severity. Based on subgroup analysis results, researchers may report claims of subgroup effects. Nonetheless, subgroup claims should be interpreted cautiously since misstatements about subgroup effects may result in patients being denied bene cial treatments or even receiving treatments that may be harmful or ineffective [4][5][6].
The need for standards for the interpretation of subgroup analysis is crucial for treatment decisions in medical practice. Explicit criteria have been developed for this purpose [7][8][9][10][11][12]. Recent tools to evaluate subgroup credibility have been published, such as Gil-Sierra MD et al. 2020 [8] and Schandelmeier S et al. 2020 [7]. However, as far as we are concerned, the "10 criteria for assessing the credibility of a subgroup claim" [12] is the most reliable tool to assess con dence in subgroup analysis as they have been widely tested in several disciplines [13][14][15][16].
The central purpose of this study was to evaluate the appropriateness and interpretation of subgroup analysis performed in pulmonary hypertension-speci c therapy RCTs. In order to achieve our goals, the following aspects have been studied: Description of subgroups analysis and claims of subgroup effects.
Research characteristics of subgroup analysis.
Analysis and interpretation of subgroup effects for primary outcomes.
Assessment of subgroups claims credibility using the "10 criteria for assessing the credibility of a subgroup claim" [12].

Literature search.
This systematic review aims to summarize the available data to solve the following research questions, framed in the Population Intervention Comparator Outcome-Study (PICOS) design framework: Population, patients with pulmonary hypertension; Intervention, pulmonary hypertension-speci c therapy; Comparison, studies with a comparator will be considered; Outcomes, subgroup analysis; Study design, randomized clinical trials.
As pulmonary hypertension-speci c therapy was considered the following groups of drugs: Calcium channel blockers.
Prostacyclin analogues and prostacyclin receptor agonists.

Guanylate cyclase stimulators.
A systematic search was conducted according to the Preferred Reporting Items for a Systematic Review and Meta-analysis (PRISMA) guidelines [17]. The systematic review protocol was registered with the prospective register for systematic review protocols (PROSPERO), registration number: CRD42021242265.
The search was conducted between January 2000 and December 2020 using vocabulary and keywords controlled by Mesh terms in the MEDLINE database to identify RCTs assessing pulmonary hypertensionspeci c therapy for pulmonary hypertension patients.
The search was performed in March 2021. The full literature search strategy is available in Additional le 1.
The following criteria were used for the trial selection: We considered all published pulmonary hypertension-speci c therapy RCTs on pulmonary hypertension adults with subgroup analysis reported.

Exclusion criteria.
Articles written in languages other than English, Spanish, and French.
Post-hoc analyses of a previously published RCT.
Articles that were not available.
Trials in which subgroup analysis credibility was impossible to evaluate due to missing data.

Study screening and selection.
Two investigators independently checked the titles and abstracts of the search results using prede ned inclusion criteria. The full text was accessed for all titles that seem to meet the inclusion criteria or have uncertainties. Two reviewers, HRR and NBG, assessed whether the article met the selection criteria. Any disagreements were resolved through discussion or arbitration with the third reviewer, LAM.

Data extraction.
For data extraction, other sources included in the study were used (i.e., trial registration, published protocols, and online supplements). Data were extracted and entered in a structured Microsoft Excel (Redmond, WA, USA) database.
Eligible RCTs were evaluated to determine whether a subgroup analysis was reported. A subgroup analysis was de ned as a statistical analysis that explored whether or not the effects of the intervention differed according to the status of a subgroup variable. A subgroup effect was de ned as a difference in the magnitude of a treatment effect across a group of a study population [12]. For each RCT reporting subgroup analysis and subgroup claims, the following information was collected: Trial characteristics: Information on the funding source, year and journal of publication, journal impact factor, pulmonary hypertension classi cation according to Clinical classi cation of pulmonary hypertension [2], updated by the European Society of Cardiology and the European Respiratory Society (ESC/ERS) Guidelines [1], centre (multicentric or unicentric), trial design (parallel, cross-over, or factorial), trial type (superiority, noninferiority, or equivalence), allocation concealment, blinding of patients, the number of patients randomized. The primary endpoint was categorized according to whether the results were statistically signi cant and the type of outcome variable (time-to-event, binary, continuous, or count).
Reporting of subgroup analysis: Number of subgroup factors, type of subgroup factors (clinical factors or biomarkers), number of subgroup analysis and outcomes for subgroup analysis reported, forest plots used, prespeci ed or post hoc subgroup, the statistical method used to assess the heterogeneity of the treatment effect (descriptive only, subgroup P values and con dence interval or interaction test). When the trial protocol was available, the agreement on the number of subgroup factors, the number of subgroup analyses, and the pre-speci cation of such analyses between the journal publication and the trial protocol were measured.
A subgroup factor was de ned as a study variable, by which the population may be categorized into different subgroups, i.e., sex, age, the presence of a mutation. A subgroup analysis was de ned as a speci c analysis performed to compare two categories within a subgroup factor. For example, within the age factor, the analysis that compares the subgroups: > 65 years vs. <65 years.
Claims of subgroup effects: Subgroup claims mode of presentation (abstract or text only), number of subgroup claims, subgroup variable (primary or secondary outcome), and number of outcomes for subgroup claims were recorded. A subgroup effect was considered to be claimed when the authors stated in the abstract or discussion that the intervention effect differed between the categories of the subgroup variable. The claims of subgroup effects were classi ed according to the strength of the claim into three categories: strong claim, a claim of a likely effect, or suggestion of a possible effect based on Sun et al. classi cation (Additional le 2). To evaluate the credibility of subgroup claims for primary outcomes, "the 10 criteria for assessing the credibility of a subgroup claim" were applied pair-wise (Additional le 3). If the subgroup claim met less than half the criteria, the credibility of this claim was considered low. 2.4 Assessment of risk of bias.
The risk of bias was assessed using the Cochrane Collaboration's tool for assessing the risk of bias in randomized trials [18]. The risk of bias was assessed by two independent reviewers. Possible disagreements between reviewers were resolved by discussion or arbitration by a third reviewer when consensus could not be reached.

Data analysis.
A descriptive analysis was developed. Continuous and categorical variables were presented as mean (range) and n (%), respectively.
For those RCTs that stated a subgroup effect without providing an interaction test, P interaction was calculated using the Joaquin Primo calculator [19] to verify that there was indeed statistical signi cance.
The inter-reviewer agreement for assessing the credibility of the subgroup claims was estimated by Cohen's kappa coe cient.

Results
The initial literature search identi ed 1837 studies. After the rst review by title or abstract and the deletion of duplicates, 185 articles were selected for full-text review. Finally, 30 papers were included (Fig. 1). The excluded articles and the reasons for their exclusion are provided in the supplementary material (Additional le 4).

Trials characteristics.
The characteristics of included trials in this study are listed in Table 1. Included publications reported data on 7765 randomized patients (Median: 208; range: 52-1156).

Subgroup analyses.
Characteristics of reported subgroup analysis are listed in Table 2. Subgroup analyses were mostly mentioned in the result (90%; n = 27) and the discussion (63.3%; n = 19) sections. Most trials, 56.7% (n = 17), did not clearly report the number of subgroup factors or subgroup analysis carried out. The remaining trials reported at least 5 subgroup factors or subgroup analyses in 36.7% (n = 11) and 40% (n = 12) of the trials, respectively. Subgroup analysis for more than one outcome was reported in 16.7% (n = 5) of trials. Forest plots used to report subgroup analyses data in 53.3% (n = 16) of the trials.
For 30% (n = 9) of trials, it was unclear whether subgroup analysis was pre-planned or post hoc, in 46.7% (n = 14) of trials were prespeci ed and 16.7% (n = 5) were post hoc.
Only 36.7% (n = 11) of trials used an interaction test to assess heterogeneity of the treatment effect; 33.3% (n = 10) reported subgroup analysis without any statistical analysis.
The clinical trial protocol was available for 8 of the 30 RCTs included. Relevant differences were found for all 8 of the RCTs when comparing the trial protocol and the published manuscript: Subgroup analyses: 6 RCTs reported a fewer number of subgroup analyses than prespeci ed in the protocol, the two RCTs remaining reported subgroup analyses that were not prespeci ed in the protocol; in both cases, these analyses were characterized as prespeci ed in the published manuscript.
Subgroup factors: The number of subgroup factors reported differed between the protocol and the published manuscript in 7 cases: 5 RCTs reported fewer factors than those speci ed in the protocol, the remaining two added several subgroup factors that were not previously de ned.
Selective reports of subgroup analyses by outcome: There were differences in the number of subgroup analyses reported for the primary outcome in 7 RCTs. In addition, in 4 protocols, authors speci ed that subgroup analysis would be carried out for primary and secondary endpoints; however, the published manuscript only reported the subgroup analyses for the primary endpoint on three of these RCTs.
3.3 Claims of subgroup effects. Table 3 lists the characteristics of subgroup claims identi ed. In 11 RCTs [20][21][22][23][24][25][26][27][28], authors claim heterogeneity of treatment effect of at least one subject subgroup. Two RCTs made two claims of subgroup differences [29,30]. Of the 11 RCTs with claims of subgroup effect: 4 reached the primary endpoint, 5 did not reach it, and for the rest, a clear primary endpoint was not de ned. Only three (27.7%) RCTs provided interaction test results to prove a subgroup difference. A total of 13 subgroup differences were claimed in 11 trials. These claims were classi ed as: three (23.1%) strong claims, one (7.7%) claim of a likely effect, and 9 (69.2%) suggestions of a possible effect.
Concerning the 10 criteria to assess the credibility of subgroups claims (  3.4 Secondary analyses. Figure 2 shows the evolution of the quality of the subgroup analyses reported over 4 periods of time.
An improvement was observed for most methodological characteristics of pulmonary hypertensionspeci c therapy RCTs over time, except for the use of subgroup variables as a strati cation factor at randomization.

Discussion
Subgroup analyses have the potential to generate investigation hypotheses, discover new treatments, and identify baseline factors that may in uence treatment e cacy or toxicity. However, when subgroup analyses are misused may also lead to spurious ndings and misleading interpretations [31][32][33]. The most frequent methodological limitations of subgroup analyses in RCTs have been reported extensively; multiple testing of hypotheses, inadequate statistical power, inappropriate a priori speci cation, and lacking biological rationale [4,5,[33][34][35].
As a result of this review, we can observe that, generally, the subgroup analyses carried out in RCTs of pulmonary hypertension-speci c therapy are of low quality, despite being published primarily in highimpact factor journals. It highlights the lack of clarity in the allocation concealment. For most clinical trials, the study protocol is not available; therefore, it is challenging to verify aspects such as the prespeci cation of the subgroup analyses. Furthermore, of the 11 RCTs with subgroup effect claims, only one has a publicly available protocol. For those studies whose protocol was available, subgroup analyses reported in the manuscript lacked description and were signi cantly different from those planned in the protocol.
Other factors that stand out the methodological errors when performing subgroup analyses in this study were identi ed; A high number of subgroup analyses reported, the high number of post hoc analyses, and the lack of interaction test to con rm the existence of subgroup effects.
When multiple subgroup analyses are carried out, the results obtained should be interpreted with caution since the probability of obtaining a false positive can be signi cantly augmented [5]. This risk may be increased, especially if, in addition, the hypothesis of the subgroup analyses has not been pre-speci ed [5,13,33]. The approximately calculated risk for a false positive result for 5 subgroup analyses is 25%; however, it may increase as the number of subgroup analyses arises. We identi ed a median of 6 subgroup analyses reported among the RCTs evaluated in this review.
The pre-speci cation of subgroup analysis is a frequent parameter measured in order to estimate methodological quality. For a subgroup analysis to be prespeci ed, it must be planned and documented before any examination of the data; this is based on the premise that a prespeci ed analysis usually follows a biological rationale. However, pre-speci cation alone may not lead to solid subgroup analyses as prespeci ed analysis may be based on unlikely and poorly formulated hypotheses [36]. In pulmonaryspeci c therapy RCT, 46.7% (14) of subgroup analyses were prespeci ed.
In addition to the pre-speci cation of the subgroup analysis, the correct direction of subgroup hypotheses must also be speci ed. For those claims in which the direction of the effect has not been or has been wrongly identi ed, their credibility could be reduced.
A common mistaken belief among authors is to claim a subgroup difference when a statistically signi cant effect is found in one subgroup but not in the other. One of the essential criteria to appropriately establish a claim of subgroup effect is performing an interaction test [37]. The p-value of an interaction test provides information about the probability that the existence of a subgroup difference is due to an accidental nding or chance rather than an actual subgroup effect. In this review, we observed that only 37.7% of the RCTs performed an interaction test to con rm the existence of a subgroup claim.
Of the 9 claims of subgroup difference identi ed in this study, 44 [39].
Most of the studies included in this review were industry-funded (90%), which could have in uenced our results. The source of funding of clinical trials may play a role in the quality of the reports of subgroup analyses; industry-funded RCTs are more likely to report subgroup analyses [40][41][42], even when an overall treatment effect for a primary outcome could not be proved [40]. Industry funding was also correlated with suboptimal reporting of subgroup effects; often, the subgroup hypotheses were not prespeci ed, and the use of an interaction test was rare [40,42]. This is consistent with our ndings in this primarily industry-funded sample of RCTs as, among the articles that claimed difference of subgroup effect, only 4 (36.4%) RCTs reached the primary endpoint.
Previous studies have found that the methodological quality reported on the methods sections of published articles is lacking compared to study protocols [43,44], nding high-quality studies being poorly reported. Protocols provide a complete insight into the analysis methods utilized in RCT. It is recommended to publish trial protocols all together with the publication of the RCT and its publication in clinical trial registries, thus providing the reader a transparent and complete description of the prespeci ed methods. However, several studies have found that RCT protocols are often not freely available [41,45]; this is consistent with our ndings, as only 7 out of 30 RCTs provided the study protocol, and discrete growth in protocol publishing was observed during the studied period.
Similarly, high inconsistency between protocols and publications has been described in several methodological characteristics of subgroup analysis: Omitted prespeci ed analyses [54], interaction test, pre-speci cation of subgroup analyses, and minor differences for the anticipated direction of the effect [41]. Due to these prevalent discrepancies, the credibility of subgroup methods may be questionable if the study protocol is not accessible.
Our ndings coincide with previous reports; few studies (23.3%) published the protocol either in the journal publication or clinical trial registries. 46.7% (n = 14) of studies reported a prespeci ed subgroup analysis, with only half publishing the study's protocol. Furthermore, 30% (n = 9) of studies did not report clearly whether the subgroup analysis was prespeci ed or post-hoc; in none of these cases, the protocol was freely available.
Despite subgroup analysis methodological limitations in RCTs are increasingly recognized, a review of 437 randomly selected RCTs published in high-impact journals found a decrease in the appropriateness of reporting subgroup analyses from 2007 to 2014 [42].
In contrast with these results, we observed an improvement of most methodological characteristics of pulmonary hypertension-speci c therapy RCTs: a priori speci cation, forest plot utilization, and interaction test improved from 2002 to 2019. However, a decline of subgroup variables set as strati cation factors during randomization was observed. This decrease adds to the hypothesis that most subgroup analyses, even when prespeci ed, are exploratory. When a particular characteristic is known to in uence the trial outcome, it should be used as a strati cation factor at randomization.
Claims of subgroup effect are common in RCT reports. Several systematic reviews and analyses have shown that authors believe and report a difference in treatment effects between patient subgroups in 40-60% of all RCT reporting subgroup analyses [13,36,55]. Few systematic reviews have described a relatively low number of subgroup claims [14,39]. Our results were in line with the latest, as we found that pulmonary hypertension-speci c therapy RCTs reported claims of subgroup effect on 26.7% (n = 9) of RCTs reporting subgroup analyses. Fewer subgroup claims may indicate that authors are cautious in their reporting, as these claims may result in changes in clinical practices.

Strengths.
To our knowledge, this is the rst systematic review of the credibility of subgroup analysis and subgroup effect claims reported on pulmonary hypertension-speci c therapy RCTs. A rigorous systematic method was employed. Standardized criteria were used in order to assess the credibility of subgroup claims.

Limitations.
This study has some limitations: First, although we use a scale to determine the credibility of the claims, the sun criteria were not designed to provide a score; therefore, the later interpretation of its results is not without subjectivity.
Secondly, when assessing the strength of a claim, there is an undeniable subjective value in interpreting what the authors state. However, the pair-wise work and the high agreement in the results of both researchers suggest that the limitation in this sense was not signi cant.
Third, in most of the studies, we were unable to nd the study protocols. In many cases, we could not know whether the published results correspond to the initially de ned objectives; this limits our capability to judge the credibility of subgroup claims. For this purpose, authors must provide detailed information about the conduct and results of subgroups analysis.

Improvement on the reporting of subgroup analyses proposals.
Although the methodological limitations of subgroup analyses are consistently reported in the literature, similar mistakes are carried when conducting and reporting subgroup analyses in recent RCTs. As improvement measures to change the current state of subgroup analyses, we propose the following: Firstly, subgroup analysis should be prespeci ed and documented in trial registries. Secondly, scienti c journals should request authors to make the study protocol accessible to reviewers and readers as a requirement for publishing the results of RCTs. Thirdly the use of guidelines or tools for the correct publication of subgroup analyses should be enforced. Fourthly, researchers should be cautious when claiming subgroup differences, even when a robust methodology for subgroup analyses was followed.

Conclusions
Subgroup analysis in pulmonary hypertension-speci c therapies is of poor quality; aws identi ed in previous studies were common. Although the ful lment of several criteria improved over time, most studies did not set subgroup variables as strati cation factors at randomization, prespeci ed the subgroup analyses, or published the study protocol.
Subgroup claims credibility was low. Most claims did not meet critical criteria; therefore, clinicians should be sceptical of claims of subgroup effects if these differences are not con rmed in later RCTs.   Flowchart of screening of randomized clinical trials included in this analysis.