Reduced Intensity Conditioning Before Allogeneic Hematopoietic Stem Cell Transplantation for Acute Myeloid Leukemia in Complete Remission and Myelodysplastic Syndrome: A Meta-Analysis of Randomized Controlled Trials

Reduced intensity conditioning (RIC) before allogeneic hematopoietic stem cell transplantation (allo-HSCT) was reported had the same overall survival (OS) as myeloablative conditioning (MAC) for acute myeloid leukemia (AML) in complete remission (CR) and myelodysplastic syndrome (MDS) but results in different studies are contradictory. Therefore, we conducted a meta-analysis according to the PRISMA 2009 guidelines to conrm the ecacy and safety of RIC vs. MAC for AML in CR and for MDS. Methods and

follow-up data. Glucksberg,15 International Bone Marrow Transplant Registry grading systems 16 and Seattle criteria 17 were used to grade aGVHD and cGVHD. Incidences of III-IV aGVHD, extensive cGVHD, graft failure (GF), overall organ toxicity, oral mucositis, speci c organ toxicities and reported infection were safety outcomes.
We electronically searched databases and hand-searched related articles between Jan 1, 1980 and July 1, 2020. Supplement 2 showed the detailed searching strategy. Cochrane highly sensitive search lters were used for identifying RCTs in Medline and Embase. 18 Yanzhi Song (YS) and Zhichao Yin (ZY) independently screened retrieved records, extracted data of the characteristics of included studies according to Table 1 and Supplement 3 and used Cochrane Collaboration-recommended tool to assess quality of included studies (Table 2 and Supplement 3). 19 Only studies in low-risk group were included. Any disagreement was resolved by discussion through YS, ZY and Jie Ding. We contacted authors if data was not enough. Table 1 Demographic characteristics of included studies.

Studies
Beelen et al. 8 Bornhäuser et al. 9 Kröger et al. 10 MC-FludT.14/L Trial I 7 Ringdén et al. 11 Scott et al. 12 Recruitment period Jan 25th, 2013-November 16th, 2016  We used Cochrane Collaboration-recommended tool to assess quality of included studies. 19 The studies were classi ed into low-risk and high-risk groups. Studies reporting su cient information to show low risk of bias in the sequence generation and allocation concealment were strati ed into low-risk group, otherwise were strati ed into high-risk group. Studies with high risk in any other domains were strati ed into high-risk group, too. Funnel plots and meta-regression would be used to assess publication bias.
Revman software (Version 5.3; Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2012) was used. We combined hazard ratio (HR) and its 95% con dence interval (CI) for OS, CIR, LFS, NRM, aGVHD and cGVHD with generic inverse variance method. 20 Log HR and variance statistics were calculated according to Parmar et al. 21 We used Mantel-Haenszel 22 and DerSimonian-Laird 23 method with relative risk (RR) or odds ratio (OR) and 95% CIs to combine dichotomous data. Two-sided P < 0·05 was statistically signi cant. Heterogeneity was calculated with Q test and I 2 statistics. Fixed effect model was used if heterogeneity was not signi cant (P > 0·10 and I 2 < 50%). If heterogeneity was signi cant (P ≤ 0·10 and/or I 2 ≥ 50%) we used random effects model. Because treosulfan was less toxic than TBI/Bu, 8,24 we prede ned three subgroups named RIC vs. TBI/Bu based MAC, RIC vs. treosulfan 30 g/m 2 based MAC and RIC vs. treosulfan 42 g/m 2 based MAC. In NRM and aGVHD meta-analyses, we only combined HR of every subgroup but did not combine the total HR of all included studies. Except for NRM and aGVHD, we combined both the HR in the subgroups and all included studies. Sensitivity analyses removing included studies were used to evaluate whether quality of studies and clinical characteristics in uenced results. We had planned to use funnel plot and meta-regression to detect publication bias.
Quality of evidence on main endpoints were evaluated with "GRADE evidence pro les" table. 25

Results
Our search retrieved 7770 references. We rst screened the titles and abstracts and excluded 7751 records that were not relevant to RIC for AML in CR and MDS or not RCTs. After further examined fulltexts of the remained 19 records, we excluded 10 references that were not RCT studies, not relevant to RIC or not compared with MAC regimens and the duplicated reports. In the end, we included 6 RCTs reported in 9 references into meta-analyses. All authors agreed to include the six studies (Bornhäuser et al., 9 Kröger et al., 10 Ringdén et al., 11 Scott et al., 12 Beelen et al. 8 and MC-FludT·14/L Trial I; 7 for ow diagram see Fig. 1). Studies Bornhäuser et al., 9 Kröger et al., 10 Ringdén et al. 11 and Scott et al. 12 reported the long-term follow up data. 11,26−28 The six included studies with 1413 participants (711 in the RIC group and 702 in the MAC group) all focused on the e cacy and safety of RIC  Table 1. All included studies displayed low risk of bias. Details of quality assessment of the included studies are shown in Table 2 and Supplement 3. All the studies used the intention-to-treat method to analyze OS, CIR and LFS. There was no selective reporting in all the included studies. Because funnel plots and meta-regression should only be used with more than 10 studies, we did not use them to detect publication bias in our analysis. 29 OS was not statistically different between RIC and MAC (HR = 0·95, 95% CI 0·64-1·4, P = 0·80). Heterogeneity of the meta-analysis was signi cant (P = 0·003, I 2 = 72%) ( Fig. 2A). The result was also similar in the RIC vs. TBI/Bu based MAC subgroup analysis (HR = 0·84, 95% CI 0·5-1·4, P = 0·50) with signi cant heterogeneity (P = 0·04, I 2 = 65%), but in the RIC vs. treosulfan 30 g/m 2 based MAC subgroup analysis, RIC was signi cantly inferior to the treosulfan based MAC conditioning regimen (HR = 1·63, 95% CI 1·17 − 2·28, P = 0·004). The combined long-term follow-up data also showed there was no difference between RIC and MAC (HR = 0·86, 95% CI 0·53-1·41, P = 0·56) with signi cant heterogeneity (P = 0·01, I 2 = 73%) (Fig. 4).
RIC showed a trend of increasing GF (OR 2·19, 95% CI 0·96-5·03, P = 0·06) without heterogeneity (P = 0·34, I 2 = 12%). We repeated the meta-analyses for the OS, CIR and long-term OS with the xed-effect model because of their signi cant heterogeneity and the results did not change the overall conclusions of these endpoints (Supplement 5).
We removed one study at a time and then repeated the meta-analysis in the sensitivity analysis.

Discussion
Retrospective studies and their meta-analyses cannot balance the baseline characteristics of patients among different treatment arms. Most patients in the RIC arm in these studies were older or had higher comorbidity burden, which might underestimate the e cacy and safety of RIC.
Half of all nished RCTs (Bornhäuser et al., 9 Scott et al. 12 and Kröger et al. 10 ) did not enroll enough participants as the studies had planned which limited their power to demonstrate the difference between RIC and MAC. All the nished studies cannot provide reliable evidence to evaluate RIC for AML in CR and MDS, so we need higher level of evidence on this question. Our meta-analysis included six high quality RCTs with 1413 participants, furthermore, we included both published and unpublished data which limit the risk of publication bias. Therefore, it was more powerful and covered more patients than previous studies. As far as we know, our study is the rst comprehensive meta-analysis of RCTs combined HR value to clarify the e cacy and safety of RIC vs. MAC and provides the highest current level of evidence for this question.
Worrying that RIC may increase CIR is the main concern for physicians to prescribe these conditioning regimens. The Scott et al. 12 study demonstrated that RIC increased relapse signi cantly and suggested physicians should choose MAC rst for t patients. However, when we combined data from all available RCTs, we failed to show any difference in CIR between RIC and MAC. The heterogeneity was caused by the Scott et al. 12 study. After we removed it in the sensitivity analysis, there was no heterogeneity between the remaining ve studies and the results did not change (Appendix F). The relapse rate is affected by many factors; for example, the cytogenetic and molecular biologic characteristics of diseases, minimal residual disease (MRD) before HSCT and immunosuppressant ajustment protocol, etc. [30][31][32][33] It was impossible that all factors before transplantation were similar in every study, hence the CIR was expected to be heterogeneous between studies. In a large observational analysis by the EBMT included 2974 middle-aged AML patients, relapse incidence was higher in intermediateor high-risk patients but not in low-risk patients in the RIC group. 32,33 Most of our included studies did not examine MRD before HSCT to stratify participants, which might have substantially in uenced the results as patients who were MRD-positive would have higher CIR after RIC more than after MAC. 34,35 In the Scott et al. study, nearly two-thirds of the AML participants were found to have commonly mutated genes in AML using next generation sequencing techniques, and in these patients RIC signi cantly increased CIR compared to MAC, whereas in the remaining third of participants in whom these genes were not detected, RIC had the same CIR as MAC. 36 In addition, all of the six included studies used the same GVHD prophylaxis in RIC and MAC, but the dose-adjustment protocol of immunosuppressant that was appropriate for MAC might have increased CIR for RIC. Therefore, it was possible there was heterogeneity between the included studies. Three RCTs demonstrated that RIC still did not increase CIR in the long-term follow-up data. 11,26,28 As there were insu cient long-term data reported in all the included studies, we could not combine the long-term CIR. However, as most of relapses after HSCT occur within two years; 35 we conclude that RIC conditioning regimens do not increase CIR more than MAC for AML in CR and MDS.
A more intensive conditioning regimen causes more serious tissue damage, which may result in more severe aGVHD. 36 Therefore, RIC is expected to not only decrease organ toxicity and tissue damage but also will cause less aGVHD and NRM than TBI/Bu based MAC. Our metaanalysis showed a trend for RIC to decrease aGVHD and III-IV aGVHD compared to TBI/Bu based MAC, but it was not statistically signi cant.
We are still in need of more high-quality studies to con rm whether there is difference between RIC and MAC on the incidences of aGVHD and III-IV aGVHD. Our results indicated that there was no difference in cGVHD between RIC and MAC and con rmed the incidence of cGVHD was not related to conditioning intensity. 37 In the retrospective studies, RIC reduced NRM 4-6 but RCTs failed to demonstrate the reduction. Our meta-analysis con rmed that RIC signi cantly reduced NRM compared with TBI/Bu based MAC. There was no heterogeneity, and the quality of evidence was high (Appendix I). The RCTs were relatively small sample size, especially some RCTs did not include enough participants as planned so they might be not powerful enough to demonstrate the difference. We included all the RCTs which expanded the sample size and provided more powerful evidence to clarify the difference. Additionally, the four included studies in the RIC vs. TBI/Bu based MAC subgroup analysis involved relatively young and t patients but not old patients and in the subgroup analysis RIC also caused less NRM. Consequently, RIC signi cantly reduces NRM more than TBI/Bu based MAC for both young and old patients.
Moreover, our results showed RIC signi cantly reduced some organ toxicity and infections compared to MAC, which indicated that RIC was more tolerable than MAC. On the other hand, our result did not show the difference on mucositis between RIC and MAC as generally expected.
The heterogeneity of the meta-analysis was signi cant so more studies are needed to clarify the problem. RIC had a trend to increase GF compared to MAC, but it was not signi cant. There were only 18 GFs out of 701 patients and 8 GFs out of 690 patients reported in the RIC and MAC groups, respectively. The incidence of GF in the two groups was rare. According to the evidence available we can conclude RIC also caused little GF.
According to our results RIC had the same OS as MAC, but heterogeneity was signi cant. In the HSCT procedure, the individualized prescriptions of different physicians will inevitably interfere with the results. Therefore, heterogeneity is common in clinical studies on HSCT, Both of the two studies also showed RIC did not increase relapse. Our meta-analysis could not divide participants according to age, but our results also showed RIC at least did not decrease OS than MAC. The RIC vs. TBI/Bu based MAC subgroup analysis included more young patients, RIC also showed no difference from MAC on OS. Additionally, our long-term follow-up OS data meta-analysis showed RIC did not decrease long-term OS compared with TBI/Bu based MAC. Consequently, we concluded RIC did not increase cumulative incidence of relapse but decreased NRM compared with traditional MAC regimens, furthermore, it at least did not increase aGVHD and had the same cGVHD as MAC, as a result, RIC did not decrease OS. Therefore, we con rmed there was no difference between RIC and MAC in OS for AML in CR and

MDS.
In the RIC vs. treosulfan 30 g/m 2 based MAC subgroup analysis, treosulfan caused less NRM than RIC and furthermore increased OS. 8 Treosulfan is a novel myeloablative agents with less toxicity than Bu. 24 Treosulfan based MAC was named reduced-toxicity conditioning regimen. 24 The subgroup analysis con rmed treosulfan was less toxic than Bu and suggested treosulfan 30 g/m 2 based MAC was better than Bu or TBI based RIC. It was a promising result and provided a new myeloablative agents that was superior to the traditional Bu or TBI.
However, only one RCT nished until recently and the RIC vs. treosulfan 42 g/m 2 based MAC subgroup analysis did not show treosulfan caused less NRM or OS than RIC; 7 hence, we need more high-quality studies to con rm the result.
There are some limitations of our meta-analysis. First, a relatively small number of clinical trials were included. Second, in OS, CIR, and LFS meta-analyses, there was signi cant heterogeneity between included studies. We suggested the reason for the heterogeneity was the difference in treatment details available from the different transplantation centers and the inevitable patient heterogeneity between included studies. Third, not all the included studies used blinding to personnel and patients. Allo-HSCT is a treatment with high NRM 40  Consent for publication: Written informed consent for publication was obtained from all participants.
Availability of data and materials: No additional unpublished data are available.
Competinginterests: The authors have no con ict of interest to declare.   Results of meta-analyses of NRM, aGVHD and cGVHD endpoints. The forest plots showed RIC signi cantly decreased NRM than TBI/Bu based MAC (A). RIC showed a trend to decrease aGVHD, but it was not statistically signi cant (B). RIC had the same cGVHD as MAC (C). Abbreviations: RIC, reduced intensity conditioning; MAC, myeloablative conditioning; TBI, total body irradiation; Bu, busulfan.

Figure 4
Result of meta-analysis of long-term OS data. The forest plot showed RIC had the same long-term OS as TBI/Bu based MAC. Abbreviations: OS, overall survival; RIC, reduced intensity conditioning; MAC, myeloablative conditioning; TBI, total body irradiation; Bu, busulfan.