Leukemia relapse via genetic immune escape after allogeneic hematopoietic cell transplantation

Graft-versus-leukemia (GvL) reactions are responsible for the effectiveness of allogeneic hematopoietic cell transplantation as a treatment modality for myeloid neoplasia, whereby donor T- effector cells recognize leukemia neoantigens. However, a substantial fraction of patients experience relapses because of the failure of the immunological responses to control leukemic outgrowth. Here, through a broad immunogenetic study, we demonstrate that germline and somatic reduction of human leucocyte antigen (HLA) heterogeneity enhances the risk of leukemic recurrence. We show that preexistent germline-encoded low evolutionary divergence of class II HLA genotypes constitutes an independent factor associated with disease relapse and that acquisition of clonal somatic defects in HLA alleles may lead to escape from GvL control. Both class I and II HLA genes are targeted by somatic mutations as clonal selection factors potentially impairing cellular immune reactions and response to immunomodulatory strategies. These findings define key molecular modes of post-transplant leukemia escape contributing to relapse.


Introduction
Hematopoietic cell transplantation (allo-HCT) holds a pivotal place in the therapeutic algorithm of myeloid neoplasms (MN). 1,2 Either as an upfront treatment in diseases with low blast burden, after disease control in myelodysplastic syndromes (MDS) or following consolidation therapy in intermediate and high-risk acute myeloid leukemia (AML), its therapeutic index is principally related to the immunogenic graft-versus-leukemia (GvL) effect, exerted by donor-derived immune effectors on leukemic cells. 3 Relapse following allo-HCT may be mechanistically multifactorial and includes re-expansion of the residual leukemia with or without acquisition of novel driver lesions along with the development of various immune escape mechanisms contributing to the evasion from GvL effect. 4,5 Post-transplant alloreactivity ideally acts via class I and II human leukocyte antigen (HLA)-restricted mechanisms. Therefore, one of the biological modes of relapse after allo-HCT is the loss of expression of disparate immunodominant HLA alleles via loss of heterozygosity (LOH) or epigenetic down-regulation, which have been demonstrated in haploidentical, mismatched unrelated and matched related settings. 6,7,8 While large genomic aberrations including the loss or the duplication of an entire haplotype or of a single allele have been noted across various studies, 6,9,10,11 more granular lesions of the HLA region including microdeletions and point mutations, remain underestimated in allo-HCT. Such lesions, ultimately resulting in phenotypes of immune resistance, have been reported in the context of clonal adaptive recovery in immune-mediated aplastic anemia (AA) or as immune escape drivers in solid tumors, together with up-modulation of negative immune checkpoint regulators. 12,13,14 HLA heterogeneity and the type of HLA polymorphisms in certain loci have also been found as factors in uencing the direction of alloreactive responses. 15,16,17,18,19 HLA evolutionary divergence (HED), a metric re ecting the immunopeptidome diversity, has been recently linked to post-immunotherapy outcomes in solid tumors. 20 Furthermore, we and others demonstrated that recipient HED correlates with the e ciency of immune reconstitution after allo-HCT. 21,22 Since this score mirrors the antigenic spectrum capacity, it may be considered a surrogate of the individuals' ability to present leukemia associated antigens (LAA) in transplant recipients. 20,23,24 In analogy with solid cancer immunotherapy, we hypothesized that, also in the context of allo-HCT, a reduced variability of HLA genotypes as well as genomic mechanisms of escape can converge in overlapping immune-escape phenotypes.
Here we investigate relapsed leukemia in allo-HCT recipients for the presence of non-driving genomic lesions in immune elements as part of the mutational landscape of recurrent leukemia. To unravel the genomic dysfunctions leading to immune-escape, we study HED and both genetic and transcriptomic inactivation of HLA via somatic mutations or down-modulation as determinants of leukemic relapse in the setting of allo-HCT.
We demonstrate how both germline and somatic dysfunction of HLA heterogeneity provides an immunogenetic framework prone to post-transplant leukemia immune escape and relapse.

Results
Dysfunction of germline HLA heterogeneity as immunogenetic driver of post allo-HCT relapse.
We started from the concept that a more structurally diverse HLA allele combination could widen the spectrum of antigen presentation, thereby enhancing the GvL effect. We thus analyzed a cohort of 494 patients (median follow up 27.5  months) who received allo-HCT for MN and had available highresolution HLA genotype (Table 1, Figure 1, Data Source 1). For each patient we calculated per-locus, perclass and global HED scores. This parameter quantitates the physiochemical divergence existing between two amino acids within the peptide binding cleft of two homologous HLA alleles. To investigate the impact of HED distribution on clinical outcomes, we categorized each value using a cutoff that could be standardized and clinically meaningful. Since speci c polymorphisms in different HLA loci have been previously associated with AML susceptibility or relapse 25,26,27 thereby potentially biasing global HED scores, we did not base our study on cutoff values computed on this patient population. Instead, for the assessment of this metric we used cutoffs de ned in healthy controls (HC) (Supplementary Figure 1, Supplementary Data 1) and considered as a benchmark the 50 th percentile of these HC HED scores (locus-speci c and global), as we previously described. 28 In the matched allo-HCT context (N=393), we found that patients with high class II HED had a lower relapse rate as compared to those with low class II HED values (hazard ratio [HR]: 0.65, 95%CI: 0.45 0.958, p= 0.021). A breakdown of the speci c contribution of the class II loci, showed that HLA-DQB1 and HLA-DPB1 impacted the most on this result (p=0.04, Figure 2 A,B). Among class I loci, the only impact was seen for HLA-C, for which high HED values were associated with lower cumulative incidence (CI) of relapse (p=0.041). Class II, but not class I HED, affected the probability of relapse also when considered in cause-speci c multivariable regression models (HR: 0.54, 95%CI: 0.36-0.81, p=0.003), adjusted for age, type of donor, conditioning regimen, disease type, stem cell source, HCT-CI score, disease risk and year of transplant ( To link immune control dysfunction with abnormal HLA heterogeneity, we performed deep targeted NGS for TCR Vb CDR3 after 3-6 months following allo-HCT (N=24) and characterized the speci city of TCR repertoire using previously described bioinformatic frameworks. 22 Overall, patients with higher class I and II HED were characterized by a less expanded repertoire together with increased cancer-related clonotypes and less clonally-polarized features compared to those with lower divergence. This phenomenon was also recorded for pathogen-recognizing clonotypes, underlying a general imprinting for a more diverse and variegated antigenic response (Supplementary Figure 4). We asked whether the differences in repertoire expansion could be related to a higher frequency of CMV reactivation in relation to HED values; but no difference in terms of CMV infection was observed between patients with low vs high class I and II HED (p=0.07 for class I and p=0.691 for class II, Supplementary Data 2). Longitudinal clonotype tracking analysis showed that some of these clones were present in pre-transplant or donor samples predating post-transplant hyper-expansion (Supplementary Figure 5).
HLA and myeloid landscape.
We then assessed the distribution and the frequency of HLA mutations in a subset of longitudinally genotyped patients relapsing after allo-HCT (N=48). Specimens at diagnosis (N=40), at postchemotherapy (N=9) and post-allo-HCT (N=44) relapse were sequenced for both HLA mutations and myeloid genes (Data Source 1, Table 2). We used a newly implemented pipeline designed to recognize somatic mutations and allelic losses within HLA loci. 29 Whereas somatic HLA hits were present in 22% (N=9/40) of patients at diagnosis and in 38% (N=17/49) of patients at post allo-HCT relapse, in cases with relapse following chemotherapy no HLA hits were found ( Figure 3A, Source Data 2). In the 2 cases harboring HLA aberrations at diagnosis, no hits were found at relapse after chemotherapy (N=2; Figure  3A).
HLA lesions were classi ed as missense, non-sense, frameshift, splicing or losses and were distributed across class I and II alleles (Source Data 2). Most affected loci were A, C, and DQB1 with allelic losses occurring predominantly in DRB1 and DQB1 ( Figure 3B). Mutations and losses were seen irrespectively of donor type and transplant setting (N=10/20, 50% in matched related; N=2/8, 25% in haploidentical; N=5/17, 29% in unrelated transplants, Figure 3C). Of note and in line with previous results, HLA hits occurred in the mismatched haplotype in the haploidentical setting. With a median time to posttransplant disease recurrence of 5.8 months (1.67-56.3), HLA aberrations were more frequent in patients with late (>6 months) relapses (42% vs 29% in early relapses; Figure 3D and E).
When we analyzed the frequency and distribution of most common somatic myeloid driver hits (for panel description see Supplementary Data 3) in relationship to HLA lesions, no difference in terms of myeloid landscape in samples prior to transplant vs. post-transplant relapses were noticed. Therefore, HLA aberrations accounted for the predominant genomic lesions in post-transplant relapses. However, reverse analysis indicated that relapses with newly acquired HLA lesions were enriched in RUNX1, DNMT3A, EZH2, EP300 mutations ( Figure 3G) as compared to those with wild type HLA. Of note is that the acquisition of HLA mutations was not related to HED. In fact, no differences were found with regard to the global, class I, II and locus speci c divergence between HLA wild type and mutant sub-cohorts, suggesting that HED was not indicative of the immune pressure or propensity for immune escape in this setting. Similarly, mutated HLA loci did not show higher HED as compared to respective loci in healthy controls (Supplementary Figure 6).
Notably, all patients sequenced at relapse showed a prevalent recipient chimerism, along with the fact that speci c HLA events were not found in recipient prior to transplant. These observations indicated that HLA mutations were somatic and occurred at relapse".
Donor lymphocyte infusion (DLI) for the treatment of relapse, was performed in half of the patients of this subgroup (18 AML and 7 MDS) at a median time of 7.8 (range 2.9-61.08) months after transplant. DLI were administered mostly in matched related context (14, 56%). A complete response was seen in 5 patients (20%). Nine patients of this subcohort harbored HLA aberrations both in class I and II alleles spanning across exonic, intronic or UTR regions. Of them 6 with exonic mutations or allelic losses did not achieve response to DLI, while 3 patients harboring UTR mutations in HLA-DPA1, -DPB1 and -A were still alive at >1 year post-relapse with controlled disease after DLI and other immunochemotherapy-based strategies (including 5-azacytidine and sorafenib).
HLA and non-HLA immune transcriptional dysregulation in relapses following allo-HCT.
In addition to somatic HLA hits, we also investigated the transcriptional changes of HLA and immune genes occurring from diagnosis to post allo-HCT relapses in a subgroup of 13 relapsed patients using deep RNA sequencing of longitudinal (pre and post-transplant) samples (Supplementary Figure 7, Supplementary Data 4). In post-transplant specimens, HLA genes were often down-regulated, speci cally class II loci HLA-DRB1, -DQB1 and -DPB1 ( Figure 4A, Source Data 3). In those with low HLA expression, somatic HLA aberrations were virtually absent, except for 1 case with a mutation in HLA-DRB1 ( Figure  4B). Overall, expression of classical immune checkpoint genes was not signi cantly altered at relapse, whereas mRNA levels of interferon-gamma response pathway and antigen presentation machinery genes were predominantly decreased ( Figure 4C). In addition, we also observed a down-modulation of Th1 and As an illustrative example, we performed single-cell RNA sequencing on pre-sorted CD34+ cells obtained at diagnosis and at post-transplant relapse of an AML patient with IDH2, DNMT3A and RUNX1 mutations (Supplementary Figure 9). CD34 enrichment was performed to mitigate donor cell contamination in this specimen, collected when donor chimerism was <30%. Although no changes in HLA expression and immune checkpoint regulators were found, the top represented down-regulated signatures included an enrichment of IL-1, IL-2, IL-5, TNF and TGF response pathways, indicating an impaired adaptive immune response.
Low contribution of non-HLA immune genomic aberrations in post-transplant leukemia relapses.
We then asked whether, as in other malignancies, AML/MDS relapses could be constellated by immunogenomic alterations in genes outside HLA loci. To answer this question, we performed whole exome sequencing (WES) on a subset of patients at diagnosis (N=9), post-chemotherapy relapse (N=2) and post-HCT relapse (N=8). These samples were investigated for the presence of mutations affecting antigen presentation and processing machinery and immune checkpoint regulators on target cells (Supplementary Data 5). Applying stringent criteria for variant selection (see methods) and excluding all possibly germline variants, as well as variants with unknown signi cance (reported in Supplementary Data 6), we did not nd a speci c enrichment in pathogenic or likely pathogenic somatic aberrations in post-transplant specimens, supporting the concept that direct immune evasion from HLA-related mechanisms represents the major contributor of GvL escape. Of note is that two of these patients (one at diagnosis and the other at post-transplant relapse) were carriers of HLA aberrations, con rmed in the respective WES specimens.

Evaluation of contribution of HED and KIR ligands on relapse
HLA-C alleles serve as KIR-ligand and can be classi ed in C1 (Asp80) and C2 groups (Lys80) according to their KIR speci city. 30,31,32 HLA-C2 homozygosity with or without donor activating KIR2DS2 genotype (or other KIR genotypes) has been shown to increase the risk of relapse in HLA-matched allo-HCT. 30 Thus, the effect of HED in C locus on propensity for relapse could depend on the presence of homozygous C1/C1 or C2/C2 vs heterozygous C1/C2 genotypes. When we classi ed all the recipient/donor HLA-C alleles of the matched cohort according to the genotypic groups and investigated the clinical outcomes, we did not observe any impact on OS, acute and chronic GvHD and relapse (Supplementary Figure 10A-D) in univariable models. Similarly, when C haplotype was tested in a multivariate setting (including also class I or C HED, age, donor and conditioning type), we did not observe any contributing effect on relapse, while HED-C continued to affect the risk of recurrence (Supplementary Figure 10E-F). To complete these observation KIR genotypes were characterized for 13 MRD donor/recipient pairs and described in Supplementary Data 7.

Discussion
The structural variability of HLA loci along with the presentation of a diversi ed spectrum of antigens cooperate with a highly heterogeneous TCR repertoire to maintain immune competence. 33 During tumor immunotherapy with immune checkpoint inhibitors, various modes of immune escape have been described including adaptive and genetically-encoded clonal escape. 20,34,35,36 Similarly, in immunemediated AA, somatic LOH, deletions and mutations of HLA loci have been described. 12,13,14 Such pathophysiologic mechanisms may also be operative as acquired resistance factors in allogeneic antileukemia surveillance following allo-HCT, wherein either down-modulation of HLA expression or clonal deletion of disparate alleles have been described at relapse. 6,7,8 Here, through a combination of NGSbased HLA and myeloid genotyping and immune-transcriptomic analyses, we de ned more granular molecular mechanisms culminating in immune-resistance phenotypes.
With a previously validated bioanalytic framework, 29,37 we were able to identify somatic single nucleotide mutations and small indels in HLA genes along with speci c allelic losses. We show that somatic HLA mutations can constitute a potential escape mechanism in patients with MN relapsing after allo-HSC in both matched related and unrelated settings. Indeed, both mutations and deletions of one of the HLA alleles could be detected in fully matched allo-HCT, whereas it is remarkable that in the mismatched setting HLA mutations affected both matched and mismatched alleles. We therefore conclude that the immune selection process leading to clonal escape targets the allele presenting the most immunodominant antigenic peptide, likely not limited to the mismatched ones. 6 Indeed, HLA LOH events often involve large genomic portions of the short arm of chromosome 6, thereby affecting a whole haplotype and not only the mismatched allele and resulting in a global impairment of antigen presentation. 4,38,39,40 The nding of HLA mutations in relapsed allo-HCT is further supported by the low response rate to DLI. The analysis of the available cohort showed that patients with HLA mutations (including non-UTR alterations) had decreased likelihood of responsiveness to this salvage strategy.
In contrast to tumor control post-immunotherapy, class II HLA genes more than class I were the major drivers of immunologic escape. This result underscores the role of tumor-surveillance of CD4 T-cell speci cities 41,42,43 and antigen-presenting cells, including leukemia-derived dendritic cells 44,45,46,47 in GvL effect. As to the function of HLA-C alleles, whose HED scores also in uenced the risk of relapse, one can speculate that their allelic heterogeneity determines the degree of natural killer (NK) alloreactivity in the genetic context of inhibitory and stimulatory KIR genes. 48,49,50,51 Indeed, our multivariate analysis showed that locus C HED could in uence, over the KIR ligand strati cation, the probability of disease recurrence.
A reduced immunopeptidome diversity due to low amino acid divergence at the antigen-binding site level between HLA alleles may impair the GvL effect, con guring a pathophysiological condition similar to the loss of heterozygosity. Indeed, the simple divergence of recipient germline HLA polymorphisms impacts on the risk of relapse. However, high divergence did not translate into selection pressure for the emergence of HLA escape mutants, thus generating an immunogenetic environment able to prevent relapse without favoring HLA aberrations. To that end, the impact of high HED is also demonstrated by its relationship to the TCR speci city spectrum. For instance, our study also showed that high global mean HED is indeed associated with a broad spectrum of anti-tumor and pathogen-directed TCR speci cities. Nevertheless, the locus-speci c frequencies of HLA mutations were in line with the ndings of the analysis on divergence. Class II loci constantly appeared more mutated than class I genes, with DQB1 and DRB1 alleles strongly compromised by losses and mutations.
Another important aspect of immune-escape relapse after allo-HCT is highlighted by the results of the transcriptional analysis of paired diagnostic and post-transplant relapse samples. We demonstrate that in some cases, HLA somatic aberrations culminated in lower locus-speci c HLA expression, particularly in DRB1. However, downregulation of HLA genes was in most of cases not associated with genomic alterations, underpinning the existence of other non-genomic mechanisms as suggested by previous observations. 7,52 While it is important to point out that baseline HLA expressions may be determined by the genetic subtype of AML (for instance, NPM1 mutant leukemias were reported by us and others to have relatively low HLA-DR expression), 53,54,41 our analysis illustrates the dynamic changes in HLA expression, drastically reduced in most of post-transplant relapses, independently of the molecular disease subgroup.
In a recent study, epigenetic silencing of class II HLA was shown to be regulated by the PRC2 complex, whose selective inhibition was able to restore HLA class II expression and thus antigen presentation to alloreactive CD4+ T-cells. 55 In addition to HLA down-modulation, our transcriptomic analysis of relapsed leukemia showed an increased spectrum of changes of immunoregulatory and immune response proteins, including those involved in HLA peptide presentation. These changes may be adaptive and potentially reversible, or instead be a result of transcriptional cascades affected by somatic mutations.
In sum, our study unravels the immune escape nature of post-transplant relapse in a signi cant proportion of patients and provides insights concerning the role in such a setting of the germline and somatic dysfunction of HLA heterogeneity. 33 In particular, we demonstrate that germline-determined class II HLA divergence and somatic class II HLA mutations, indels or losses can enable an environment of GvL resistance, immune evasion and unfavorable outcomes. Future studies exploring larger cohorts of patients are warranted to better establish how germline and somatic HLA dysfunction may inform daily clinical practice.  Table   1). We integrated immunogenomic and transcriptomic data from 494 patients ( Table 1, Data Source 1) who received allo-HCT for AML (N=294), MDS (N=125) and MPN (N=75) and followed in all three institutions: CCF (N=344), Nancy Hospital (N=143) and WashU (N=7). Pre-transplant clinical HLA genotyping was available in 487 patients and was used as benchmark for HED computation. An ad-hoc targeted high throughput sequencing for classical HLA loci and myeloid genes was performed on selected samples at disease onset (N=40), post-chemotherapy relapse (N=9) and post-transplant relapse (N=44). In addition, 13 patients presenting with disease recurrence were pro led by means of RNAseq on sequential diagnosis/post-transplant relapse samples (median 179 days post-HCT, IQR 129-297), whereas single cell transcriptomics was performed on paired specimens for a patient with early relapse (day +92).

Sample collection
Blood or bone marrow (BM) specimens at diagnosis, post-chemotherapy or post-HCT relapses were collected in ethylenediaminetetraacetic acid (EDTA) tubes, and cryopreserved until further use after Ficoll-Paque isolation and suspension in Dimethyl sulfoxide (DMSO) and Fetal bovine serum (FBS) containing media.For each sample we prioritized DNA extraction for HLA genotyping, and targeted myeloid high throughput sequencing. T-cell receptor immunosequencing, was performed on a subset of selected samples, taking into account recipients' HED scores. In case of residual cells, RNA extraction and sequencing were performed.BM specimens collected at the moment of diagnosis and post-transplant relapse for a patient experiencing early relapse were pro led for single-cell RNA sequencing, after owbased CD34+ cell separation.
Genotyping studies HLA sequencing and myeloid targeted sequencing were performed on patient samples included in the biorepository of the Translational Hematology and Oncology Research Department at CCF. Genomic DNA was isolated directly from cryopreserved unfractionated peripheral or BM blood mononuclear cells with the Nuclei Lysis Solution (Promega) according to manufacturer's instructions.
HLA targeted sequencing was performed with TruSight HLA v2 (Illumina, San Diego California) as indicated previously. 56 In brief, 11 HLA loci (Class I HLA-A, B, and C; Class II HLA-DRB1/3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1) were ampli ed with a long-range polymerase chain reaction (PCR). After ampli cation, a transposon-based DNA tagmentation was applied to generate DNA amplicons, via DNA fragmentation and addition of adapter sequences. Additional PCR steps provided sequence adapters and indexing primers to generate sequencing-ready DNA libraries. Prepared libraries were then loaded directly onto a MiSeq System for sequencing. SSO-PCR methods were used to generate HLA genotypes for patients transplanted before 2013. HLA genotyping data used for HED computation were issued from the histocompatibility laboratories of the institutions involved and were coded based on a 2 or 4-eld (4-6 digits) nomenclature.
Myeloid targeted sequencing was performed on 94 specimens paralleled sequenced for HLA, using a custom panel for detection of myeloid somatic variants from three sequencing platforms: TruSight, TruSeq, and Nextera (Illumina, San Diego, CA, USA). Forty-one most frequent leukemia associated genes common to the three panels were considered for this study (Supplementary Data 3). Sequencing libraries were generated according to an Illumina paired-end library protocol. The enriched targets were sequenced using a HiSeq 2000 or MiSeq (Illumina, San Diego, CA, USA). Paired-end sequenced reads were aligned with the Burrows-Wheeler Aligner (BWA) 57 to GRCh37 reference and post-alignment processing included sorting, marking of duplicates, indexing, base recalibration, according to Genome Analysis Tool Kit v.4 best practices. 58 Variant calling was performed with HaplotypeCaller. 58 Variants were annotated using Annovar. Variants with minimum coverage less than 20 or number of high-quality reads less than 5 were ltered out. An in-house developed bio-analytic pipeline identi ed somatic/germline mutations using sequences derived from controls and mutational databases such as dbSNP138, 1000 Genomes or ESP 6500 database, and Exome Aggregation Consortium (ExAC) as previously published. 59 Whole exome sequencing was applied to a subset of samples studied for the presence of molecular events in other immune non-HLA genes, related to antigen presentation machinery and T cell activation and potentially involved in immune escape (supplementary appendix).

HLA mutational analysis
The bioinformatic approach to investigate somatic HLA mutational status is reported in the supplementary appendix and has been previously described. 56 KIR genotyping and HLA-C status assignment KIR genotyping was performed as previously described, using a sequence-speci c oligonucleotidic probe platform (One Lambda, West Hills, CA) and sequence-speci c primers (Thermo Fisher Scienti c, Waltham, MA). 67,68 Recipient HLA-C status was assigned according to the allelic C group con gurations: I) C1 homozygous: if patients carried only alleles belonging to the supertypes: C*01, C*03, C*07, C*08, C*09, C*10, C*12, C*14, C*16, C*17; C2 homozygous, in case of presence of supertypes C*02, C*04, C*05, C*06, C*15 and C1/C2 heterozygous in presence of a alleles of both groups. 30,31 TCRβ chain sequencing and analysis Immunosequencing of the CDR3 regions of human TCRβ chains was performed using the ImmunoSEQ Assay (Adaptive Biotechnologies, Seattle, WA), as previously described. 69,70,71 In brief, extracted genomic DNA was ampli ed in a bias-controlled multiplex PCR, with i) a rst PCR step consisting in forward and reverse ampli cation primers speci c for every V and J gene segment, to allow the ampli cation of the hypervariable CDR3 region, and ii) a second PCR adding a proprietary barcode sequence and Illumina adapter sequences. CDR3 libraries were sequenced on an Illumina MiSeq system according to the manufacturer's instructions. ImmunoSeq Analizer 3.0 suite was used for sample export and preliminary statistics and quality control steps while R Bioconductor 72 environment and Immunarch R 73 suite were used for all the downstream analyses as previously described (Supplementary Data 8). 21 HED computation High-quality, protein level (2 nd eld) HLA data in patient and control cohorts were used as input for HED computation.HED scores were generated for all the subjects and genotypes in the study using the algorithm published by Pierini and Lenz (https://sourceforge.net/projects/granthamdist/) for the calculation of the amino acid sequence divergence at the antigen binding sites of HLA molecules. Brie y, starting from a dictionary including all the protein sequences of exons 2 and 3 for class I alleles and exon 2 for class II alleles, assembled from the IPD-IMGT/HLA database v.3.41 we calculated HED for 6 class I (A, B, C) and II HLA loci (DRB1, DQB1, DPB1). The scores so computed were used for all the downstream analyses.
Bulk RNA sequencing and analysis Paired diagnosis/post-transplant relapsed samples obtained from 13 patients (7 previously reported 52 ) served for transcriptomic studies. For this last group of samples, total RNA was puri ed with the NucleoSpin RNA kit (Takara Bio USA, Inc.; Mountain View, CA, USA) according to the manufacturer's instruction. RNA quality check was performed with Agilent 2100 Bioanalyzer. mRNA was enriched using oligo(dT) beads and then was fragmented randomly by adding fragmentation buffer. The cDNA was synthesized using mRNA template and random hexamers primer, after which a custom second-strand synthesis buffer (Illumina, dNTPs, RNase H, and DNA polymerase I) was added to initiate the secondstrand synthesis. After a sequential process of terminal repair, a ligation, and sequencing adapter ligation, the double-stranded cDNA library was completed through size selection and PCR enrichment. Illumina technology (NovaSeq 6000) was used for sequencing, after pooling, and > 30 million reads were acquired for each sample. FastQC was used to check sequenced reads quality. After trimming and adapter removal, raw reads were mapped to the human hg19 reference genome using RNA STAR. 74 For the transcriptomic study, cohort assembly was performed starting from read counts for all the samples in study. Batch effect was removed through a two-way ANOVA algorithm 75 via limma package prior to further analyses, taking into account the design matrix used to describe comparisons between the samples (diagnosis vs post-HCT relapse), to mitigate the effects derived from different sequencing batches..
Genes that were lowly expressed across >80% of the samples were ltered out. For each sample a normalization factor was calculated through the trimmed mean of M values (TMM) method and nal logarithmic counts per million were calculated (log2CPM). Differential expression and gene set enrichment analysis were assessed using edgeR 3.32.1 and limma 3.46 with R computational environment (v. 4). 76 Benjamini-Hochberg procedure was used for multiple testing correction.

Single-cell RNA sequencing and analysis
Pre-sorted CD34+ cells BM cells from patient CCF#8 were processed for single-cell library preparation as previously described. 77 Library preparation was performed as per manufacturer instructions with the Illumina Nextera XT DNA sample preparation kit (Illumina, San Diego, California). Brie y a rst tagmentation reaction was carried out on 1 ng of cDNA at 55 C for 10 min. The ampli cation of adapterligated fragments was performed using Nextera PCR master mix, index primers and barcodes were added to identify each cell for 15 cycles. The PCR products from each cell were then pooled together and puri ed using Sera-mag SpeedBeads (0.6:1 ratio). A nal quality check of the cDNA library was performed using an Agilent high-sensitivity DNA chip, obtaining a broad peak between 300-800bp. Single cells were sequenced using 100bp paired end sequencing on the HiSeq2500. Raw reads were assessed using FastQC v0.11.5 and aligned to the hg19 human reference genome using the STAR aligner v2.5.3a, after trimming and adapter removal. Rsubread v1.32.4 was used to assemble read matrices from aligned bam les. Cells with fewer than 150,000 reads, along with < 65% unique mapping and <35% exonic region coverage, were excluded from further analysis. Genes expressed in less than 10% of cells per patient were removed from downstream analysis. Only those cells that passed quality control were included in the downstream analysis. Seurat R package was used on feature-barcoded matrices for clustering and differential analyses and for visualization purposes.

Statistical analyses
Median, interquartile ranges (IQR), mean and 95%CI intervals were used where appropriate. Frequency and distribution of categorical variables were expressed as percentage. For all relevant comparisons, after testing for normal distribution, comparative analyses between two groups were performed, by Wilcoxon matched-pair signed rank test at 95% CI. Fisher's exact test or Chi-square were applied for independent group comparisons.
Each HED variable was categorized in high and low HED based on the 50 th percentile cutoff of the distribution in healthy controls.
Probabilities of survival for overall survival (OS), de ned as the time from transplantation to death for any cause or last follow-up, was calculated using Kaplan-Meier estimates, 78 with differences between the curves based on log-rank tests for univariate comparisons. Cumulative incidence (CI) of acute and chronic GVHD and relapse were calculated in a competing risk setting, where death was considered the competing event. 79 Death was considered as a competing event for relapse. Multivariate cox regression models were built on cumulative incidence of relapse retaining class I and II HED considered as main effect variables.
All statistical tests were two-sided, and a P-value <0.05 was considered statistically signi cant.
All of the analyses and data visualization were performed using the statistical computing environment R (4.0.0 R Core Team, R Foundation for Statistical Computing, Vienna, Austria) and excel Microsoft 365.

Con ict-of-interest disclosure
The Authors declare no competing interests.

Data availability
All the data that support the ndings of this study are available within the Article and Supplementary Files.
Source data are also provided in this manuscript. To access sequencing data, researchers will need to apply to dbGaP controlled access portal and submit a project request [https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login]. Data will be available under the accession number phs003235.v1 starting from May 2023. For any request, please contact maciejj@ccf.org.

Code availability
The pipeline used for the HLA mutational study has been deposited in the following repository: https://github.com/SMNPAG/HLA-mutations under the DOI: 10.5281/zenodo.7651570. assistance in the development of the HLA mutational pipeline. We thank all the reviewers who contributed to improving this work.   analyzed. In addition, bulk and single cell RNA sequencing were also performed. Of note is that 7 samples used for the bulk RNA sequencing were previously published (Christopher et al. NEJM 2018).

Figure 2
Impact of HED on cumulative incidence of relapse. A) Density distribution of class I and class II HED scores in post-HCT relapsed (red) and non-relapsed patients (blue). B) Impact of per-class and per-locus HED scores on cumulative incidence (CI) of relapse. Low (red) and high (blue) HED are de ned according to the 50 th percentile of the corresponding locus in healthy controls. Shaded bands represent 95% con dent interval. Non-adjusted p-values indicate the signi cance of the log-rank test C) Cause speci c multivariable cox regression analysis of relapse including class I HED as main effect variable. D) Cause speci c multivariable cox regression analysis of relapse including class I HED as main effect variable.
Black squares indicate the odd ratio and error bars the 95% con dent intervals. All the p-values were two sided.
Source data 1 Abbreviations