Patients
Ten patients with newly diagnosed sBL/BAL at the Children’s Hospital of Chongqing Medical University (CHCMU) between February 2018 and October 2019 were enrolled in the study. The diagnosis was in accordance with the World Health Organization criteria of 2016, and patients were staged with the revised IPNHLSS [3]. Patients who were ≥18 yr at diagnosis or diagnosed with Burkitt-like lymphoma with 11q aberration, secondary lymphoma, or had human immunodeficiency virus infection were excluded; patients who were classified into local stages (stage I or stage II) or had received chemotherapy before hospitalization, were also excluded from the study. The patients received chemotherapy in accordance with the modified non-Hodgkin Lymphoma 1995, Berlin-Frankfurt-Münster, (BFM-95) protocol [8, 9]. Intrathecal injections (iT) were administered as the protocol required and cranial radiotherapy was carried out for the patients with central nervous system (CNS) involvement. The details of the risk groups, course of treatment and drugs used in the modified BFM-95 protocol are listed in the Supplementary material (S1-S4).
The Ethics Administration Office of CHCMU granted ethics approval for this research. Informed consent was obtained from the patients or their guardians. Clinical data, laboratory findings and the outcomes of the enrolled patients were collected from the medical record system and analyzed retrospectively.
Pathologic diagnosis
Pathologic diagnoses of the BL patients were confirmed by lymph node (LN) biopsies. Immunohistochemical (IHC) staining was applied, including for: MYC, Ki-67, CD20, BCL-2, BCL-6, PAX-5, CD10, CD7, CD3, and MUM-1, et al.; and fluorescence in situ hybridization (FISH) of MYC, MLL, BCL-2, BCL-6 and EBER was also performed [10].
Diagnosis and classification of the BAL patients used bone marrow (BM) samples. BM samples were subjected to FAB typing, flow cytometry (FCM), cytogenetic analysis of the chromosomal karyotype, and FISH of MYC, ETV6-RUNX1, MLL, BCR-ABL1, TCF3-PBX1 and PDGFRB. In addition, 43 fusion genes were assayed by a multiplex RT-PCR as described previously [11].
DNA, RNA isolation and sequencing
Tumor DNA samples from the BL or BAL patients were obtained from formalin-fixed specimens or BM samples at diagnosis, and germline samples were collected from the oral mucosa of the patients and their parents’ peripheral blood (PB). Genomic DNA was extracted using a QIAmp DNA Minikit (QIAGEN, China). The genomic DNA was enriched and sequence acquisition was carried out (Agilent SureSelect Human All Exon V6). Then, PCR amplifications of the whole exome were sequenced (Illumina HiSeq PE 150 bp).
BM samples at diagnosis were collected from BAL patients, and total RNA was extracted using a QIAamp RNA Blood Mini Kit (Qiagen, Cat.52304). An enriched and captured mRNA library was constructed and amplified by using a KAPA mRNA HyperPrep Kit (KAPA/Roche, Cat.KK8581). PCR amplifications of mRNAseqs were sequenced using PE150 (Illumina HiSeq ×10).
The original whole exome sequencing (WES) data were read using Illumina pipeline Software (version 1.3.4), and the data were analyzed referring to databases (dbSNP, 1000 Genomes Project, ClinVar, ESP6500, ExAc, Ensembl, HGMD, UCSC, et al.). Mutated genotypes were determined using the software GATK, LRT, Mutation Taster and SAMtools.
All discovered variants were divided into the following four categories according to prior literature reports [12] and software analysis: 1) Pathogenic genotypes that were confirmed by literature reports; 2) Likely pathogenic genotypes that were reported in the literature and/or that affected the protein by functional prediction; 3) Indefinite variants; and 4) Single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs). Pathogenic genotypes and likely pathogenic genotypes were recorded as causal gene mutations. All causal gene mutations in the tumor samples were confirmed by Sanger sequencing. Samples in the control group were cross-checked and detected by Sanger sequencing, and somatic or germline causal gene mutations were identified.
PPI network construction and GO, KEGG pathway enrichment analysis
The detectable mutations by WES sequencing were collected, and R-package wordcloud2 was used to visualize the frequency of mutations in these cases. We used STRING (STRING, http://string-db.org, RRID: SCR_005223) to find protein to protein interactions (PPI) and to visualize the interactions using Cytoscape (version 3.7.1). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using R package clusterProfiler and the Biological Networks Gene Ontology tool (BiNGO, RRID:SCR_005736).
Identification of DEGs between the BL and DLBCL datasets
The gene chip datasets GSE4475, GSE10172, GSE43677, and GSE48435 were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). We selected the BL and DLBCL datasets for downstream analysis. After quality control (QC), we obtained 41 BL samples and 178 DLBCL samples. Differential expression analysis used the R package limma. The database of essential genes (DEGs) was screened at a statistical significance Benjamini and Hochberg false discovery rate-adjusted p-value cutoff of 0.05 and an absolute value of fold change greater than 1. A volcano plot showing the DEGs was constructed by using the ggplot2 package in R.
WGCNA analysis of the BL and DLBCL dataset
We used the R package weighted gene coexpression network analysis (WGCNA) to construct coexpression modules, and 219 samples were used to calculate the Pearson’s Correlation Matrices. A power of 6 was selected. An unsigned hybrid coexpression network was then calculated using standard settings. We select 5000 genes to construct a topological heatmap. We calculated Pearson’s correlations between the module eigengenes and the traits data to identify module-trait relationships. At last, we selected blue module (related to BL) and red module (related to DLBCL) genes to construct a gene regulatory network and did the GO enrichment analysis.
Statistical Analysis
Events were defined as any of the following situations: refractory or relapsed disease, disease progression, death or diagnosis with a secondary malignancy. With follow-up to April of 2020, data on clinical features, laboratory findings, WES sequencing or RNAseq, treatment responses, treatment-related mortality (TRM) and the event-free survival (EFS) rates of the patients were collected and analyzed.
The EFS was calculated from the date of diagnosis to the last follow-up, lost to follow-up or to the first event. SPSS 19.0 (IBM Corp., Armonk, NY) software was applied for statistical analysis. Survival curves were calculated with the Kaplan-Meier method. Proportional differences between patient groups were analyzed by Pearson’s chi-squared (χ2) tests or Fisher's exact tests. P values <0.05 were regarded as significantly different.