Ectopic Expression of a Combination of 5 Genes Detects High Risk Forms of Adult T-cell Acute Lymphoblastic Leukemia

Objectives T cell acute lymphoblastic leukemia (T-ALL) denes a group of hematological malignancies with heterogeneous aggressiveness and highly variable outcome, making therapeutic decisions a challenging task. We tried to discover new predictive model for T-ALL before treatment. Methods A specic pipeline designed to discover aberrantly active genes was applied here on the RNA-sequencing data of 109 T-ALL patients from our multicenter clinical study. A prognostic classifying test, based on the detection of the combined expression of a subset of these genes was designed and further validated in an additional cohort of 32 adult T-ALL patients by using RT-qPCR. Results The expression of 18 genes was signicantly associated with shorter survival, including ACTRT2, GOT1L1, SPATA45, TOPAZ1 and ZPBP (5-GEC), which were used as a basis to design a prognostic classier for T-ALL patients. The molecular characterization of the 5-GEC positive T-ALL unveiled specic characteristics inherent to the most aggressive T leukemic cells, including a drastic shut-down of genes located on the mitochondrial genome and an upregulation of histone genes. These cases fail to respond to the induction treatment, since 5-GEC either predicted positive minimal residual disease (MRD) or a short-term relapse in MRD negative patients. Conclusion Overall, our investigations led to the discovery of a homogenous group of leukemic cells with profound alterations of their biology. It also resulted in an accurate predictive tool that could signicantly improve the management of T-ALL patients. we exploited genome-wide RNA sequencing of bone marrow samples obtained in a well characterized series of T-ALL patients, on which we applied our specically designed strategy to detect the ectopic expression of tissue-restricted genes, and correlated these expressions with the survival probabilities of patients. This work led to the discovery of 18 genes, whose ectopic expression is signicantly correlated with prognosis in T-ALL patients. By combining 5 of these genes, we dened an optimal classication system which, compared with a full assessment of the existing mutational status of NOTCH1/FBXW7/RAS/PTEN, largely improves our ability to predict outcome in adult T-ALL patients.

Introduction T-cell acute lymphoblastic leukemia (T-ALL) emerges from a malignant monoclonal proliferation of cells that exhibit developmental arrest at varying stages of differentiation. Although modern intensi ed chemotherapy has greatly improved survival, long-term outcome of T-ALL adult patients remains unsatisfactory, with only 50% survival at 5 years [1,2].
Currently, T-ALL treatment strategy largely relies on post-treatment minimal evaluation of residual disease (MRD) [3]. Assessment of MRD is usually carried out either by PCR ampli cation of clonotypic IG/TCR gene rearrangements or by ow cytometric detection of leukemia-associated phenotypes. MRD has been con rmed as a powerful predictor of long-term survival in adult patients with ALL in many studies [4][5][6].
However, MRD is not available at the time of diagnosis. In addition, a proportion of T-ALL patients diagnosed as MRD negative after the induction treatment will relapse. Therefore, there is still a need to nd reliable biomarkers that could guide treatment or predict prognosis at diagnosis.
A deep understanding of the T-ALL pathogenesis, involving the expression of oncogenic transcription factors as well as genetic alterations, should contribute to the identi cation of relevant prognostic markers. So far, the expression of only a few transcriptional factors are currently used as predictive biomarkers or as indicators to help treatment planning [1,7,8]. Since NOTCH1 signaling plays a central role in T-cell lineage speci cation and NOTCH1 mutations have been found in up to 70% of adult T-ALLs, the relation-ship between gene alterations and prognosis has mainly focused on NOTCH1 signaling [9,10]. A number of studies have also evaluated the prognostic relevance of the NOTCH1/FBXW7 mutation but it is still controversial [11][12][13][14]. Later, Trinquand and colleagues proposed to use the combination of NOTCH1/FBXW7 mutations and RAS and PTEN (NOTCH/FBXW7/RAS/PTEN) abnormalities as a re ned oncogenetic classi er [12]. The NOTCH1/FBXW7/RAS/PTEN classi cation approach has not yet been evaluated in a Chinese population.
Our previous work demonstrated that malignant tumors frequently reactivate a large number of genes whose expression is normally tissue-restricted [15,16]. There is emerging evidence that these aberrantly activated genes play pivotal roles in tumorigenesis and that they may serve as valuable cancer-speci c biomarkers to predict prognosis as well as response to various treatments [17][18][19][20][21]. In particular, our investigations demonstrated that male germ cells express the largest number of tissue-restricted genes, and pointed to male-speci c genes as a considerable reservoir of cancer biomarkers. Accordingly, we successfully identi ed the ectopic activation of a group of 26 male-and placental-speci c genes as a predictor of poor prognosis in lung cancer [15]. Later, we found that the ectopic expression of six genes, which are normally expressed exclusively in embryonic stem cells, placenta or germ cells, could also predict prognosis in B cell acute lymphoblastic leukemia [22]. Altogether, our observations demonstrated that these ectopic expressions of tis-sue-restricted genes are potential source of new biomarkers to guide risk strati cation and predict outcome [15,22] as well as to help designing new therapeutic strategies [20,23]. However, these ectopic expressions are highly context-dependent and the identi cation of the best relevant biomarkers requires an extensive analysis of their relationships and correlation with the clinical and biological data associated with each cancer type.
Here, we exploited genome-wide RNA sequencing of bone marrow samples obtained in a well characterized series of T-ALL patients, on which we applied our speci cally designed strategy to detect the ectopic expression of tissue-restricted genes, and correlated these expressions with the survival probabilities of patients. This work led to the discovery of 18 genes, whose ectopic expression is signi cantly correlated with prognosis in T-ALL patients. By combining 5 of these genes, we de ned an optimal classi cation system which, compared with a full assessment of the existing mutational status of NOTCH1/FBXW7/RAS/PTEN, largely improves our ability to predict outcome in adult T-ALL patients.
Materials And Methods Trial Registry, number ChiCTR-RNC-14004969 (for sample collection) and ChiCTR-ONRC-14004968 (for treatment)] as previously described [14]. All patients or guardians provided informed consent for sample collection and research in accord with the Declaration of Helsinki.
Genomic DNA and total RNA of bone marrow were extracted using AllPrep DNA/RNA/Protein Mini Kit (Qiagen) or TRIzol reagent (Invitrogen). Bone marrow minimal residual disease (MRD) was analyzed by ow cytometry at the end of the induction treatment. MRD negative was de ned as < 0.01% residual leukemia cells. MRD was not available in 3 patients, who all died before the end of induction treatment.

RNA-Seq data analysis
Raw RNA-seq data obtained from bone marrow samples of 109 pediatric (n=55) and adult T-ALL (n=54) enrolled in our center7 as well as from 13 normal samples from the dataset PRJEB4337 available on NCBI BioProject portal (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB4337) were used for the detection of aberrant expression of genes and correlation with prognosis. Reads from fastq les were aligned using STAR 2.5.2b software for UCSC hg19 reference genome. The aligned reads were counted using HTSeq framework (version 0.9.1). RPKM (reads per kilobase million) values were obtained by dividing the RPM (reads per million) values by a cumulated length of exons in kilobases and logtransformed by computing log2(1+RPKM).

Analysis of mutational pro les
Mutation calling from RNA-seq data of training cohort has been reported previously [3]. Mutational hotspot regions of NOTCH1, FBXW7, PTEN, NRAS and KRAS were sequenced using Sanger sequencing in the 32 additional patients of the test cohort. The primer sequences used for NOTCH1 and PTEN were the same as previously described [15,16].

Identi cation of biomarkers of aggressive T-ALL based on ectopic expression of tissue-speci c genes
A dedicated bioinformatic pipeline was applied rst to identify genes with tissue-speci c expression and second to detect their aberrant expression in T-ALL. Using RNA-Seq expression data from different normal human tissues, we rst identi ed 3195 transcripts whose expression was restricted to testis, placenta or embryonic stem cells. None of these genes are expressed in normal hematopoietic tissues. Second, for each tissue-restricted gene, we established a threshold of log-transformed RPKM values differentiating background noise from expression, and then compared the expression value of each T-ALL sample with the threshold. The expression data in T-ALL samples were binarized, positive if the expression value was above the threshold, and negative otherwise. Procedures of these two steps are listed in the online supplementary methods.
6. Analysis of association between ectopic expression and patient outcome Cox proportional hazard model was used in order to test if the expression of the gene was signi cantly associated with overall survival (OS) and event-free survival (EFS). The ectopic expression of a gene was considered as signi cantly associated with the survival if the Cox model p-value was less than 0.05 and the hazard ratio above 1.5. The statistics and bioinformatic pipelines for survival analysis and the design of optimal combinations of genes are detailed in the online supplementary methods.
7. Real-time qualitative PCR (RT-qPCR) test of the aberrant expression of the 5 ectopic genes cDNA was synthesized from total RNA using Super-Script III First-Strand Synthesis SuperMix Kit (Invitrogen) according to the manufacturer's procedures. RT-qPCR reactions using SYBR Green (TaKaRa) and a 7500 ABI RT-qPCR machine (Applied Biosystems, USA). The 2 -ΔΔCt method was used to estimate the fold induction of each gene as described in Rousseaux et.al [9]. In short, the expression value was calculated (2^(Ct of gene of interest in testis -Ct of gene of interest in sample))/ (2^(mean Ct of the 4 control genes in testis -mean Ct of the 4 control genes in sample)), and expressed as the ratio of expression relative to testis. The four control genes were Actin, U6, RELA, AUP1. Assays were done in triplicates. Seven normal bone marrow samples and three cord-blood samples were used to determine a threshold of aberrant expression (corresponding to the mean expression value + two standard deviations of these 10 samples). A gene was considered positively expressed when its expression value was found above this threshold.

Statistical Analysis
Pearson's chi-square or Fisher's exact tests were used to compare categorical variables. Overall survival (OS) and event-free survival (EFS) were measured from the date of diagnosis of T-ALL to the date of death (OS and EFS) or relapse (EFS) or to the date of last contact (censored). Log-rank test was used to compare OS or EFS survival between groups and illustrated by Kaplan-Meier curves. The last follow-up was carried out in September 2020. Multivariate analyses were performed using Cox regression models. P-values <0.05 were considered statistically signi cant. We used open source packages available in R (version 3.3.0) and python (version 3.7) to perform statistical analyses.

The association of NFRP classes with event-free survival (EFS) is of borderline signi cance in our T-ALL patients
A total of 86 newly diagnosed adult T-cell acute lymphoblastic leukemia (T-ALL) were included in the present study (Table 1). For 54 of these patients, RNA-seq data were available from our previous work [7]. The present study included an additional 32 patients, which without RNA-seq data but with detailed clinical information.

A combination of ectopically expressed genes can be used to reliably predict prognosis of T-ALL patients at diagnosis
These observations prompted us to seek for new biomarkers which could reliably stratify patients before treatment.
We applied a strategy speci cally designed to identify the aberrant expression of genes which are normally silent in non-germline adult tissues and to test the association of these ectopic expressions with Page 10/23 survival probabilities.
By using available RNA-seq data in large series of normal human tissues, we identi ed 3195 transcripts with an expression restricted to testis, placenta or embryonic stem cells, of which 448 were found ectopically expressed in at least 10% and not in more than 90% T-ALLs samples. We then used a rst cohort of T-ALL patients for whom RNA-seq as well as survival data were available. In addition to the 54 T-ALL adult patients, in order to strengthen the power of the approach, RNA-seq data obtained from 55 samples of children with T-ALL were also included in the training cohort. Considering each of the 448 genes ectopically expressed in a subgroup of T-ALL, we compared survival probabilities of the two groups of patients, whose malignant cells respectively did or did not express the gene. A total of 18 different genes (Supplementary Table 2) were identi ed whose activation was signi cantly associated with OS and/or EFS in our T-ALL series.
In order to assess the value of combinations of these genes in terms of prognostic biomarkers, we then tested all possible combinations of the 18 genes for their potentiality to stratify T-ALL patients. Among them, the 5-gene set of ZPBP, GOT1L1, ACTRT2, SPATA45 and TOPAZ1 (all restricted to male germ cells) was identi ed as an optimal classi er for prognostic strati cation in T-ALL patients (p < 10 -4 for OS and p < 10 -5 for EFS). All T-ALL patients were then assigned to 2 groups according to the ectopic activation of the 5 genes (Fig. 1A). Those expressing at least one of the 5 genes were assigned to the "5-gene expression classi er" (5-GEC) positive group. The other patients, expressing none of the ve genes were assigned to the 5-GEC negative group. As illustrated by Kaplan-Meier plots, this classi cation system can well separate patients into different risk groups considering all T-ALL cases, or subsets of either children or adult T-ALL patients ( Supplementary Fig. 2). In particular, 5-GEC positive and negative T-ALL adult patients showed signi cant differences in terms of survival probabilities (log-rank p = 0.01 for OS and p = 0.004 for EFS (Fig. 1A)).
In order to validate the predictability of the 5-GEC, we detected the expression of the 5 genes in a second cohort, the test cohort of 32 T-ALL adult patients by using RT-qPCR. As a result, out of the 32 cases, 6 patients were assigned to the 5-GEC negative group, whereas the other 26 patients were 5-GEC positive.
Kaplan-Meier plots also demonstrated signi cant differences in both OS and EFS (Log-rank test p = 0.029 and p = 0.032 respectively, Fig. 1B).

3.
A strati cation based on 5-GEC predicts MRD status and identi es MRD negative patients with high risk of relapse MRD status following induction therapy in patients with ALL has been routinely used to predict outcome, and has been reported to strongly and consistently associate with clinical outcomes in ALL [4]. Consistently, positive MRD was predictive of signi cantly inferior OS and EFS in our cohort (p < 0.001 for both OS and EFS, Fig. 2A). However, MRD status is not available at the time of diagnosis. Additionally, recurrence of the disease also occur in patients with negative MRD decreasing the probability of overall and event-free survival [1]. Interestingly, our newly identi ed 5-GEC classi er turned out to be a very e cient predictor of MRD positivity (Fig. 2B, chi-square test, p < 0.001). Moreover, within the MRD negative subgroup, 5-GEC positivity was associated with shorter survival with high signi cance (p = 0.036 for EFS, Fig. 2C), thus differentiating patients who are likely to respond well to standard therapy from those who may bene t from more intensive therapy. This observation was further con rmed in our test cohort ( Supplementary Fig. 3). Interestingly, the GSEA pro les of these aggressive forms of T-ALL revealed a major down-regulation of most cellular activities. Gene sets constituted of genes involved in cell proliferation and mitosis, or RNA ribosomal and translation activities, as well as mitochondria and related metabolic activities, were among the most signi cantly downregulated in 5-GEC positive T-ALL (Fig. 3A), suggesting that these aggressive T-ALL forms were those enriched in "dormant" cells. Remarkably, the 5-GEC positive T-ALL cells are not expressing many of the genes normally expressed in hematopoietic stem cells (Fig. 3B). The GSEA signature of 5-GEC positive ALL within the MRD negative group well illustrates this speci city (Fig. 5). One speci c feature of the 5-GEC positive ALL signature is that it is highly enriched in mRNAs from genes encoding histones and chromatin proteins as opposed to MRD positive ALL, which showed a depletion for these same mRNAs (Fig. 5: 1st and 2nd rows). Another striking characteristic of these 5-GEC positive ALL is the complete shutdown of mitochondria-encoded transcripts. Indeed, mitochondria related genes are globally depleted in both MRD positive and 5-GEC positive cells, but the expression of the 13 genes located on the mitochondria genome remain high in MRD positive ALL (as compared to MRD negative). In 5-GEC positive ALL, the situation is different since these same 13 genes are completely shut down (Fig. 5: 4th row), suggesting that a dramatic impairment of mitochondria transcriptional activity is speci cally associated with these 5-GEC positive ALL.

Discussion
In this study, we found that patients positive for N/F mutation only have a trend to-wards a more favorable outcome, whereas NFRP class I was signi cantly correlated with longer survival, in agreement with data reported by the GRAALL group [12]. However, this oncogenetic classi er based on NFRP classes only remained of marginal signi cance for the prediction of OS and EFS in the multivariate analysis.
Based on our previous work, we strati ed patients according to the aberrant/ectopic expression of genes that are normally epigenetically repressed in most non-tumor adult somatic cell types. We found that a combination of a subset of 5 tissue-restricted genes (5-GEC) could e ciently stratify patients into groups with different prognosis. In addition, this new classi cation system could also predict prognostic in an independent group of patients. More importantly, this new classi cation system implemented at the time of diagnosis could predict MRD positivity with high e ciency, since nearly all MRD positive patients had been assigned to the 5-GEC positive group. Additionally, MRD negative patients which had been assigned to the 5-GEC negative group showed no event of relapse or death, whereas the MRD negative patients of the 5-GEC positive group were of signi cantly higher risk of death or relapse.
In particular, we adapted our approach to fully exploit RNA-seq data, which provide a more accurate and e cient technology to explore transcriptomes. This enabled the detection not only of ectopically activated protein-coding genes but also of tissue-speci c non-coding sequences. Our results here suggest that these non-coding transcripts actually largely contribute to these ectopic activations. Indeed, among the 18 genes whose expression was associated with inferior survival in T-ALL, 11 were protein-coding genes, whereas 7 corresponded to non-coding sequences. The roles and functions of these non-coding transcribed RNAs in their normal context of expression or in cancer cells are entirely unknown and their discovery opens a new eld for future research.
The normal functions of the protein-coding genes themselves are also poorly known. Among them, GOT1L1 was reported to show L-aspartate aminotransferase activity and thus could be involved in the synthesis of D-aspartate, which serves as the agonist of N-methyl-D-aspartate receptor (NMDAR) [28]. Leanne et al [29] reported that low activity of NMDAR is signi cantly correlated with favorable patient prognosis in several cancer types, which may provide a possible explanation to our nding that high expression of GOT1L1 is associated with shorter survival and a mechanism of GOT1L1 in leukemogenesis. TOPAZ1 contains an evolutionarily conserved domain named PAZ, which is involved in the speci c recognition of siRNAs [30]. It has been suggested that the PAZ do-main plays an important role in regulating human embryonic stem cell and glioma stem cells self-renewal [31,32]. These observations suggest potential mechanisms by which these genes could contribute to cancer development, but detailed investigations are required to fully understand their functions and the impact of their ectopic expression in cancer cells. Although the biological roles and functions of these ve testisspeci c genes remain to be discovered, the fact that their expression is restricted to male germ cells and cancer, makes them as very attractive therapeutic targets.
We also found that the aggressive 5-GEC T-ALL were mostly depleted in pathways that are essentially involved in active and proliferative cells, such as RNA and DNA synthesis, mitosis and DNA replication.
Interestingly, these pathways were also downregulated in MRD positive as compared to MRD negative T-ALL. Based on recent reports that ribosome and protein biogenesis function in normal and leukemic stem cells [33][34][35], it is reasonable to speculate that these changes might be associated with metabolic changes and involved in T-ALL progression and treatment resistance. Moreover, E2F and MYC target genes were among the most depleted gene sets in both MRD positive and 5-GEC positive groups. These ndings further reinforce the overwhelming importance of the proliferative status in the ability of cells to respond to chemotherapy in cancer [36,37]. Additionally, the RB-E2F pathway is also known to play a pivotal role in cell proliferation [38][39][40] and has recently been reported to play a critical role in controlling cell quiescence depth [41]. These reports suggest that MRD negative or 5-GEC negative ALL patients would be more likely in a state of hyper-proliferation, and therefore prone to respond more e ciently to chemotherapy. This is also consistent with results from pediatric B-ALLs which showed that underexpression of genes promoting cell proliferation is associated with resistance to chemotherapy [42].
Although there are many common features in the expression pro les shared between MRD positive and 5-GEC positive T-ALLs, the latter still has its unique features. Our 5-GEC positive group has a higher contingent of "dormant" cells that show extremely low translation, transcription and proliferation rates and low mitochondrial activity. Most strikingly, genes located on the mitochondrial genome are totally silenced in 5-GEC positive T-ALL, whereas they are still expressed in the MRD positive T-ALL. On the basis of recent reports that mitochondrial and metabolic remodeling is a central feature of normal and leukemic stem cells [34,43] and that regulated mitochondrial metabolism is required to maintain stem cell self-renewal [44], our results further strengthen the notion that mitochondrial dormancy is an important characteristic of stem cells and could be involved in chemotherapy resistance and disease progression. However, although the transcriptomic signature of 5-GEC positive leukemia suggests a "dormant" phenotype, we have no additional evidence for a "stem cell-like" nature of the 5-GEC positive leukemic cells. Actually, as illustrated in Fig. 3b, the 5-GEC positive T-ALL signature is depleted in hematopoietic stem cell expression signatures. Thus, our new gene expression classi er is more likely to link prognosis with the pathogenesis of a speci c form of aggressive T-ALL and may provide a lead to better explain malignant transformation and progression of these ALL.

Conclusions
T cell acute lymphoblastic leukemia (T-ALL) is an aggressive hematologic disease associated with dismal survival in adult patients. Despite extensive exploration of the genetic and epigenetic landscapes of T-ALL, prognostic biomarkers that could guide treatment selection mostly rely on post-induction minimal residual disease. Identi cation of novel biomarkers that can stratify patients at diagnosis is still needed. Following a dedicated strategy to screen whole genome expression data in T-ALL samples, our reserch scored the out-of-context activation of silent tissue-restricted genes. By correlating these expressions with survival probabilities, they identi ed a set of 5 genes, whose awakening not only predicted positive minimal residual disease but also a high risk of relapse in a subset of patients with apparently negative minimal residual disease. The 5-genes positive T-ALL also pointed to a particular metabolic state of the aggressive T-ALL group harboring a low mitochondrial genome activity.