Genomic signatures in pediatric advanced stage Burkitt lymphoma/leukemia in a Chinese population detected by next generation sequencing

Background: Burkitt lymphoma/leukemia (BL/BAL) is the most common lymphoma in children, and the sporadic subtype is dominant in Chinese populations. MYC gene translocations are essential for sporadic BL/BAL (sBL/BAL), but other gene mutations also play important roles in the development of sBL/BAL. Methods: The clinical data of ten Chinese children with sBL/BAL were collected, and next generation sequencing of their tumor tissues was conducted. Cases of BL and diffuse large B cell lymphoma (DLBCL) were also collected from a database, and bioinformatics analysis was conducted. Results: Nine boys and one girl were enrolled in the study, including six BL patients (stage III) and four BAL patients (stage IV). The average age at diagnosis was 100.10±13.39m; overexpression of CD20 was detectable, and MYC rearrangements were conrmed. The patients received combination treatment of chemotherapy and rituximab. All patients achieved complete remission and survived. Germline causal gene mutations were detected in four (40%) patients by whole-exome sequencing (WES); ID3, BRCA2, ARID1A and SMARCA4 mutations, in addition to MYC mutations, were the most common somatic mutations. The gene functions in etiology were different between the BL and DLBCL datasets. The identied mutated genes were enriched and connected by GO or KEGG pathways, and it seemed that the PI3K-Akt signaling pathway or the EGFR-TKI resistance pathway played important roles in the etiology of sBL/BAL.


Background
Burkitt lymphoma/leukemia (BL/BAL) is the most common subtype of non-Hodgkin lymphoma (NHL) in children and adolescents. It occurs in 30-50% of pediatric NHL cases. There are three reported variants of BL/BAL: endemic, sporadic and immunode ciency-related BL/BAL [1]. Sporadic BL/BAL (sBL/BAL) is familiar throughout the world and it is more common in China relative to the endemic and immunode ciency-associated BL/BAL types [2]. The Murphy staging system and the revised International Pediatric Non-Hodgkin Lymphoma Staging System (IPNHLSS) have been proposed for pediatric BL cases, and they are classi ed into localizedstages (stage I or II) or advanced-stages (stage III or IV) [3]. With intensive combination chemotherapy, the prognosis of sBL/BAL in children and adolescents has improved dramatically. Currently, 5-year event-free survival (EFS) of local and advanced stages of BL/BAL has reached 100% and 85-90%, respectively [1,2].
It is well known that t(8;14)(q24;q32) or its variants play a key role in BL/BAL. A translocation of the MYC gene, which is located at band 8q24, is detectable in over 95% of cases. Epstein-Barr virus (EBV) infection is also common in BL/BAL. The MYC translocation is the essential driver for overexpression of the MYC gene, and activation of the MYC gene leads to cell cycle progression, inhibition of differentiation and the promotion of cell proliferation and/or genomic instability [4]. Additional chromosomal abnormalities, recurrent abnormally expressed transcripts and/or gene mutations are also detected in BL/BAL patients and have roles in the initiation, progression and aggressiveness of the disease [5]. For instance, somatic mutations of TCF3 can activate the PI3K/MAPK/MTOR pathway, in part by increasing a tonic form of B cell receptor signaling, and patients with germline mutations in the SH2 domain protein 1A gene (SH2D1A) suffer from an increased risk of BL/BAL [6].
Next-generation sequencing (NGS) studies have provided valuable insight into the landscape of genomic alterations in malignancies by using whole-exome sequencing (WES) and/or RNA sequencing (RNAseq).
These approaches are helpful for researchers to explore the etiology, pathogenesis and mechanisms of such diseases [7]. Bioinformatics analysis has also been widely used in cancer research.
So far, WES sequencing or mRNAseq of sBL/BAL among Chinese children has not been demonstrated. In this study, we performed WES and/or RNAseq for ten sBL/BAL patients, and we investigated the role of molecular alterations in BL/BAL and diffuse large B cell lymphoma (DLBCL) using large databases. Our objective was to reveal the signaling pathways involved in its pathogenesis and the relationship between clinical characteristics and gene mutations.

Patients
Ten patients with newly diagnosed sBL/BAL at the Children's Hospital of Chongqing Medical University (CHCMU) between February 2018 and October 2019 were enrolled in the study. The diagnosis was in accordance with the World Health Organization criteria of 2016, and patients were staged with the revised IPNHLSS [3]. Patients who were ≥18 yr at diagnosis or diagnosed with Burkitt-like lymphoma with 11q aberration, secondary lymphoma, or had human immunodeficiency virus infection were excluded; patients who were classi ed into local stages (stage I or stage II) or had received chemotherapy before hospitalization, were also excluded from the study. The patients received chemotherapy in accordance with the modi ed non-Hodgkin Lymphoma 1995, Berlin-Frankfurt-Münster, (BFM-95) protocol [8,9]. Intrathecal injections (iT) were administered as the protocol required and cranial radiotherapy was carried out for the patients with central nervous system (CNS) involvement. The details of the risk groups, course of treatment and drugs used in the modi ed BFM-95 protocol are listed in the Supplementary material (S1-S4).
The Ethics Administration Office of CHCMU granted ethics approval for this research. Informed consent was obtained from the patients or their guardians. Clinical data, laboratory ndings and the outcomes of the enrolled patients were collected from the medical record system and analyzed retrospectively.
Diagnosis and classi cation of the BAL patients used bone marrow (BM) samples. BM samples were subjected to FAB typing, ow cytometry (FCM), cytogenetic analysis of the chromosomal karyotype, and FISH of MYC, ETV6-RUNX1, MLL, BCR-ABL1, TCF3-PBX1 and PDGFRB. In addition, 43 fusion genes were assayed by a multiplex RT-PCR as described previously [11].

DNA, RNA isolation and sequencing
Tumor DNA samples from the BL or BAL patients were obtained from formalin-xed specimens or BM samples at diagnosis, and germline samples were collected from the oral mucosa of the patients and their parents' peripheral blood (PB). Genomic DNA was extracted using a QIAmp DNA Minikit (QIAGEN, China). The genomic DNA was enriched and sequence acquisition was carried out (Agilent SureSelect Human All Exon V6). Then, PCR ampli cations of the whole exome were sequenced (Illumina HiSeq PE 150 bp).
BM samples at diagnosis were collected from BAL patients, and total RNA was extracted using a QIAamp RNA Blood Mini Kit (Qiagen, Cat.52304). An enriched and captured mRNA library was constructed and ampli ed by using a KAPA mRNA HyperPrep Kit (KAPA/Roche, Cat.KK8581). PCR ampli cations of mRNAseqs were sequenced using PE150 (Illumina HiSeq ×10).
All discovered variants were divided into the following four categories according to prior literature reports [12] and software analysis: 1) Pathogenic genotypes that were con rmed by literature reports; 2) Likely pathogenic genotypes that were reported in the literature and/or that affected the protein by functional prediction; 3) Inde nite variants; and 4) Single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs). Pathogenic genotypes and likely pathogenic genotypes were recorded as causal gene mutations. All causal gene mutations in the tumor samples were con rmed by Sanger sequencing.
Samples in the control group were cross-checked and detected by Sanger sequencing, and somatic or germline causal gene mutations were identi ed.

PPI network construction and GO, KEGG pathway enrichment analysis
The detectable mutations by WES sequencing were collected, and R-package wordcloud2 was used to visualize the frequency of mutations in these cases. We used STRING (STRING, http://string-db.org, RRID: SCR_005223) to nd protein to protein interactions (PPI) and to visualize the interactions using Cytoscape (version 3.7.1). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using R package clusterPro ler and the Biological Networks Gene Ontology tool (BiNGO, RRID:SCR_005736).

Identi cation of DEGs between the BL and DLBCL datasets
The gene chip datasets GSE4475, GSE10172, GSE43677, and GSE48435 were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). We selected the BL and DLBCL datasets for downstream analysis. After quality control (QC), we obtained 41 BL samples and 178 DLBCL samples. Differential expression analysis used the R package limma. The database of essential genes (DEGs) was screened at a statistical signi cance Benjamini and Hochberg false discovery rateadjusted p-value cutoff of 0.05 and an absolute value of fold change greater than 1. A volcano plot showing the DEGs was constructed by using the ggplot2 package in R.

WGCNA analysis of the BL and DLBCL dataset
We used the R package weighted gene coexpression network analysis (WGCNA) to construct coexpression modules, and 219 samples were used to calculate the Pearson's Correlation Matrices. A power of 6 was selected. An unsigned hybrid coexpression network was then calculated using standard settings. We select 5000 genes to construct a topological heatmap. We calculated Pearson's correlations between the module eigengenes and the traits data to identify module-trait relationships. At last, we selected blue module (related to BL) and red module (related to DLBCL) genes to construct a gene regulatory network and did the GO enrichment analysis.

Statistical Analysis
Events were de ned as any of the following situations: refractory or relapsed disease, disease progression, death or diagnosis with a secondary malignancy. With follow-up to April of 2020, data on clinical features, laboratory ndings, WES sequencing or RNAseq, treatment responses, treatment-related mortality (TRM) and the event-free survival (EFS) rates of the patients were collected and analyzed.
The EFS was calculated from the date of diagnosis to the last follow-up, lost to follow-up or to the rst event. The four BAL patients were diagnosed with ALL-L 3 by FAB morphology. Mature B-ALL phenotypes were visible and overexpression of CD20 was detectable. Restricted expression of Kappa or Lambda chains were detected in two patients each. t (8;14) and/or MYC rearrangements were con rmed by chromosomal karyotype and/or FISH, but the common fusion genes were undetectable. The six BL patients were con rmed by LN biopsy. Overexpression of CD20, Ki-67 and MYC were detected by IHC staining. FISH of MYC and EBER were positive while FISH of BCL-2, BCL-6 and MLL were negative.

Treatment and prognosis
Six BL patients ( ve patients were classi ed in the R3 group and one in the R2 group) and four BAL (R4 group) patients received chemotherapy in accordance with the modi ed BFM-95 protocol. The treatment effects were evaluated after one or two courses of chemotherapy for the BAL and BL patients, respectively, and all of them achieved complete remission (CR). Chemotherapy was completed after 5-6 m. After that, the patient with CNS involvement received cranial radiotherapy (total dose of 18 Gy, divided into 10 times). With follow-up to May of 2020, all 10 patients were alive without events, the TRM rate was 0%, and the EFS rate was 100%.
Results of NGS sequencing WES sequencing was conducted for the ten patients. Germline causal gene mutations were detected in four (40%) patients, and in addition, several somatic causal gene mutations were identi ed in the ten patients.
The data obtained by WES sequencing were analyzed using R-package wordcloud2 to visualize the frequency of candidate gene mutations, construct PPI networks, and conduct GO enrichment and KEGG pathway analysis.
(1) The identi ed gene mutations were shown by tag cloud using word size according to the gene frequency (Fig-1a), which demonstrated that ID3, BRCA2, ARID1A and SMARCA4 mutations, in addition to the MYC mutations, were the most common mutations. The identi ed gene mutations were also subjected to PPI network analysis (Fig-1b), which showed the proteins that were modi ed by MYC, HRAS, TP53 and NOTCH1. The gure indicates that these identi ed gene mutations play important roles in the development of BL/BAL.
(2) The identi ed genes were enriched by GO analysis (Fig-2a). P value was de ned as 0.05, and the top gene functions in the etiology were: leukocyte differentiation and regulation of hemopoiesis in biological processes (BP), nuclear chromosomal parts and chromatin in cellular components (CC), and chromatin binding and transcription coregulator activity in molecular function (MF). The genes were also enriched and connected by KEGG pathways (Fig-2b), and it seemed that the PI3K-Akt signaling pathway played a key role in the etiology of BL/BAL. RNAseq was conducted and the gene transcripts were analyzed for the four MAL patients. The results are listed in the Supplementary material.

Identi cation of DEGs and WGCNA analysis
For further understanding of the development of BL/BAL, the upregulated genes were selected in the BL and DLBCL datasets. We identi ed the DEGs and performed WGCNA analysis comparing the BL and DLBCL datasets. The genes expression analysis for the BL and DLBCL microarray data were different as shown by the volcano plot (Fig-3). The upregulated genes in the BL and DLBCL datasets were enriched by GO and KEGG pathway analysis (Fig-4 and Fig-5) where the P value was also de ned as 0.05. The functions of the DEGs were completely different between the BL and DLBCL datasets.
The Topological Overlap Matrix (TOM) of the coexpressed genes in different modules in the top 5000 genes was shown in a heatmap (Fig-6), and eigengene adjacency heatmap for the different modules was also presented for the module and trait relationships. The coexpression network of the signi cant genes related to BL (Fig-7a) or DLBCL (Fig-7b) was constructed and GO enrichment of the signi cant genes related to BL (Fig-8a) and DLBCL (Fig-8b) was conducted. The two datasets also had different coexpression networks and GO enrichment genes. Discussion BL/BAL is an aggressive non-Hodgkin lymphoma derived from germinal center B cells, and BAL is regarded as the leukemic phrase of BL [1]. The peak age of onset of sBL/BAL is 11 yr, and boys are affected much more frequently than girls. The enrolled ten patients included nine boys and one girl. The average age of these patients was 100.10±13.39 m, and thus the patients in this study are similar to that reported in the previous literature [1][2]10].
The prognosis of sBL/BAL was poor in past decades, but with short, intensive chemotherapy, the survival rate has improved steadily. Overexpression of CD20 and the restricted expression of Kappa or Lambda chains are remarkable in BL/BAL, and with combination treatment of intensive chemotherapy and a speci c CD20 monoclonal antibody (rituximab), the survival rate of pediatric sBL/BAL has exceeded 90%. The EFS of the ten patients in the study was 100%, and the treatment, diagnosis, and results of these patients was consistent with the literature [13][14].
The molecular hallmark of BL/BAL is a translocation of the oncogenic MYC, and similar translocations are also expressed in other types of NHL [10]. Although a translocation of oncogenic MYC is detectable in these subtypes of NHL, their clinical manifestations and prognosis are diverse, and diversity of sBL/BAL also exists between children and adults [1,4,15]. This suggests that pediatric sBL/BAL can be distinguished from other types of NHL by gene expression pro ling, and differentiation of the gene expression pro ling potentially re ects distinct pathogenetic mechanisms.
Datasets of BL and DLBCL were collected, and differential expression analysis was performed. The differences were statistically signi cant and they revealed different gene expression pro les between BL and DLBCL, which indicates the pathogenetic mechanisms of BL and DLBCL are distinct.
Data of WES sequencing from the ten pediatric sBL/BAL of Chinese populations were analyzed. ID3, BRCA2, ARID1A and SMARCA4 mutations, in addition to MYC mutations, were common (Fig-1). By literature review [10,[15][16][17][18], NGS sequencing analysis has revealed the importance of the B-cell receptor signaling pathway in the pathogenesis of BL. Mutations of the transcription factor TCF3 or its negative regulator ID3 have been reported in around 70% of sBL cases. These mutations activate B-cell receptor signaling, which sustains BL cell survival by engaging the PI3K pathway. Other recurrent mutations in CCND3, TP53, RHOA, SMARCA4, and ARID1A occur in 5-40% of sBL/BAL cases.
Both the number of mutations overall and the proportion of cases with mutations in TCF3 or ID3 are lower in endemic than in sporadic BL. An inverse correlation between EBV infection and the number of mutations has been observed, suggesting that these mutations may take the place of the virus in the activation of B-cell receptor signaling. It has been predicted that these identi ed gene mutations play important roles in the development of BL/BAL. Genetically susceptible individuals, such as those with germline SH2D1A mutations, are at a greatly increased risk of developing BL. It is interesting that germline causal gene mutations were detected in four of our ten patients, but larger samples and multiple centers are needed to verify their exact detection rate.
These mutated genes were enriched and connected by GO or KEGG pathways (Fig-2), and it seemed that the PI3K-Akt signaling pathway has a key role in the etiology of BL/BAL. Similar ndings were also found in the datasets and reported in previous studies [16,17]. BL/BAL may be inhibited by activation of this signaling pathway [18].
Epidermal growth factor receptor (EGFR) is a tyrosine kinase. EGFR gene mutations and overexpression of its protein are associated with cancer growth [19]. Tyrosine kinase inhibitors (TKI) against EGFR (EGFR-TKI) are used to treat cancer patients with EGFR mutations, such as those with lung adenocarcinoma, etc. [20]. However, an EGFR-TKI resistance pathway was found by KEGG enrichment in our BL/BAL patients, which reveals that EGFR-TKI treatment is invalid for BL/BAL. Further research is necessary to investigate this nding and its implications.

Conclusions
BL/BAL is a highly aggressive but curable subtype of lymphoma. With combination treatment of intensive chemotherapy and rituximab, the survival rate of BL/BAL has improved steadily, but additional research into its pathogenetic mechanisms is necessary. The molecular hallmark of sBL/BAL is MYC translocation, but additional chromosomal abnormalities and gene mutations also occur and play roles in the progression of the disease.
In this study, NGS sequencing was used for pediatric sBL/BAL patients from a Chinese population. Other recurrent mutations in addition to MYC mutations were detected, and possible signaling pathways were also demonstrated. Laboratory studies to validate these ndings are necessary.

Trial registration
The study has been registered retrospectively at the Chinese Clinical Trial Registry (ChiCTR1900025690 and ChiCTR-IPR-14005706).

Declarations
Ethics approval and consent to participate The Ethics Administration Office of CHCMU granted ethics approval for this research (No.2018-75).
Informed consent was obtained from the patients or their guardians. The study has been registered at the Chinese Clinical Trial Registry (ChiCTR1900025690 and ChiCTR-IPR-14005706). Consent for publication Not applicable. Availability of data and material Most data generated or analysed during this study are included in this published article and its supplementary information les. The raw data during the current study are not publicly available due to un nished study but are available from the corresponding author on reasonable request.

Competing interests
The authors declare that there are no competing interests associated with this manuscript. Funding The study was supported by the National Natural Science Foundation of China (Project No. 81900162) and the Chongqing Science and Technology Commission of China PR (Project No. cstc2018jsyj-jsyjX0015). Authors' contributions JW X and J Z conceived and designed the study. L S and NG Y prepared the gures and tables. XY L analyzed and interpreted the data. XY L and JW X drafted the manuscript. JW X revised the manuscript. All authors read and approved the initial manuscript.      The coexpression network of the signi cant genes related to BL or DLBCL