Datasets
The Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) database was searched to find the gene expression profiling datasets of patients with DLBCL. Only datasets containing clinical metadata (especially overall survival) (7 datasets) were retained, and the rest were excluded. Additionally, every effort was made to select expression datasets from all types of microarray chips such as Affymetrix and Illumina, if possible. The datasets were downloaded in SOFT file format and were subsequently transformed logarithmically using tools provided in geWorkbench 2.5.1 package [14], if necessary. More details on the clinical characteristics of the studied datasets are provided in Table 1. The datasets included GSE10846 (n= 420), GSE31312 (n=470), GSE32918 (n=172), GSE69051 (n=157), GSE4475 (n=123), GSE11318 (n=203), and GSE34171 (n=91). Since GSE32918 and GSE69051 have originated from a similar research study [15] and have some common samples, they were merged as a single data set and termed “GSE32918/69051”. The number of samples for these datasets was determined after corrections were made based on the common samples (172 samples for GSE32918 and 157 samples for GSE69051). In addition, the genetic aberrations of the desired gene(s) at the genome level were evaluated by employing 3 single nucleotide polymorphism (SNP) array data sets—namely GSE58718 (n=242), GSE57277 (n=148), and GSE34171 (n=180). GSE34171 contains both gene expression and SNP data (Table 1).
Identification of the common gene(s) associated with survival in the gene expression data sets
The association between gene expression and overall survival was examined using the univariate Cox proportional hazards analysis. In this analysis, the association between a group of covariates (genes) and the response variable (overall survival) was evaluated. The univariate Cox analysis was performed using BRB-Array tools, developed by Richard Simon and the BRB-ArrayTools Development Team. In this analysis, the findings were strengthened by employing rigorous pipeline criteria and retaining only genes with a P value <0.001 and a false discovery rate (FDR) <5%. Subsequently, the common gene(s) significantly associated with overall survival between all the data sets was/were extracted. For this purpose, only common gene(s) with consistent associations were selected, while genes with inconsistent associations (negatively associated with overall survival in a dataset and positively associated with overall survival in another) were excluded. Moreover, the patients were categorized into 2 risk groups (high-risk vs. low-risk) based on the median of the selected common gene expression values (> median value vs. < median value), and overall survival was compared between the groups using the Kaplan–Meier analysis and log-rank test at a P value <0.01. The Kaplan–Meier analysis and the log-rank test were performed in SPSS 16.0 package (Chicago, USA). Since RTN1 was the only gene that fulfilled the criteria and was selected as the final gene, the subsequent analyses were exclusively performed on this gene.
As a confirmatory step, we checked whether RTN1 was differentially expressed between the 2 predefined survival classes using the significance analysis of microarray (SAM) analysis. In this analysis, 2 classes (long survival [≥5 y] vs. short survival [<5 y]) were created and, thereafter, the genes that were differentially expressed were detected. The SAM analysis was performed using the method added in BRB-Array tools. In this analysis, the FDR and the number of permutations were set at 5% and 1000, respectively.
Prognostic efficacy of the RTN1 gene in a multivariate model
The prognostic efficacy of RTN1 was also evaluated in a multivariate Cox proportional-hazards regression analysis, where the RTN1 gene expression and all the individual components of the international prognostic index (IPI) (ie, age, stage, lactate dehydrogenase level, Eastern Cooperative Oncology Group [ECOG] performance status, and number of extranodal sites) [16] were entered as covariate variables. Additionally, the molecular subtype (ie, ABC-like, GCB-like, and type 3) and sex were incorporated as another 2 variables into the model. The multivariate analysis was performed solely on the datasets with the clinical IPI data (i.e., GSE10846 and GSE31312). This analysis was carried out using Survival package (http://cran.r-project.org/package=survival) and SPSS 16.0 package (Chicago, USA).
Prognostic efficacy of the RTN1 gene in molecular subtypes of DLBCL
We also checked the prognostic worth of RTN1 in molecular subtypes of DLBCL (i.e., ABC-like, GCB-like, and type 3) using similar strategy described above. In brief, in each subtypes two risk groups constituted based on median of the RTN1 expression values (> median value vs. < median value) and then overall survival was compared between the groups using the Kaplan–Meier analysis and log-rank test at a P value <0.01.
Correlation between RTN1 and BCL2L1 expressions
One of the main targets of RTN1 is BCL2L1 via the inhibition of its anti-apoptotic activity [5]. Moreover, BCL2L1 is a member of BCL-2 protein families deregulated in lymphoma tumors, especially DLBCL [17-19]. Accordingly, the associations between 2 probe-sets of RTN1 (i.e., 203485_at and 210222_s_at) and 4 probe-sets of BCL2L1 (i.e., 206665_s_at, 212312_at, 215037_s_at, and 231228_at) were evaluated using correlation analysis. The correlations were graded based on the classification proposed by Papasouliotis et al (2006) [20](i.e., r=0.93 to 0.100 as excellent, r=0.80 to 0.92 as good, r=0.59 to 0.79 as fair, and r<0.59 as poor correlations). The correlation analysis was performed using SPSS 16.0 package (Chicago, USA) in all the datasets, and a P value <0.05 was considered significant.
Evaluation of RTN1 at the genome level in the DLBCL samples
For the assessment of the chromosomal aberrations of the RTN1 gene, 3 datasets—namely GSE58718 (n=242), GSE57277 (n=148), and GSE34171 (n=180)—were used to extract copy number variations (CNVs) from the SNP data. GSE58718 was generated based on Illumina HumanCNV370-Duov1 DNA Analysis BeadChip, while GSE57277 and GSE34171 were generated using Affymetrix Mapping 250K SNP Arrays. In brief, PennCNV package [21] was used to call and analyze the CNV data. For the Illumina datasets, signal intensity data in the form of log R ratios (LRRs) and B allele frequencies (BAFs) were directly generated from the downloaded raw file. For the Affymetrix datasets, LRRS and BAFs were calculated by processing raw intensity (.CEL) files in Affymetrix Power Tools (https://www.affymetrix.com/support/developer/powertools/changelog/index.html), followed by PennCNV-Affy package. Finally, these LRRS and BAFs were used to generate CNV calls. CNVs with lengths <1 kb, confidence scores <10, or containing <5 SNPs were discarded. A CNV was considered to be a recurring acquired copy number alteration (rCNA) if it occurred in more than 2.5% of the patients and was not reported in the Database of Genomic Variants, build 36 (hg18) (DGV, http://projects.tcag.ca/variation/) [22]. The location of RTN1 was explored on chromosome 14 (14q23.1: Start 59,595,976/End 59,870,966) for chromosomal aberrations.
Verification of the results by quantitative real-time PCR (qRT-PCR)
Thirty patients (20 males and 10 females) with DLBCL were recruited. Formalin-fixed paraffin-embedded (FFPE) tissue specimens were used to assay RTN1 and BCL2L1 expressions via the qRT-PCR technique. Samples were retrieved from tissue banks at Shahid Bahonar Hospital and Shafa Hospital (Kerman University of Medical Sciences). The diagnosis of DLBCL was established by 2 expert pathologists according to the revised European/American lymphoma classification. The mean age of the patients was 60 years (range 22-81 years). The median follow-up was 49 months (range: 6–80 month). All patients were treated with a regimen that included an anthracycline (cyclophosphamide, doxorubicin, vincristine, and prednisone [CHOP] or CHOP-like regimens. The patients were divided into two survival groups (long survival [≥3 y] vs. short survival [<3 y]). The study protocol was approved by our institutional review board, and written informed consent was obtained from all the patients. We also used 7 normal lymph nodes as controls.
The qRT-PCR procedure was carried out as previously described [23-24] with some modifications. Total RNA was extracted from the FFPE tissues according to a method described elsewhere (https://bio-protocol.org/e161). In brief, paraffin was first removed from the FFPE tissues using 100% xylene and then total RNA was extracted using Tripure isolation reagent (Sigma-Aldrich, USA). The quality and the quantity of the isolated RNA were examined with a NanoDrop spectrophotometer (Roche, Germany). Complementary DNA (cDNA) was synthesized with a Maxime RT PreMix Kit (Parstous, Iran) according to the manufacturer’s instructions. The cDNA synthesis reaction was run at 47 °C for 60 minutes, followed by 85 °C for 5 minutes. SYBR green-based qRT-PCR was performed on the synthesized cDNA with the Roche Real-Time PCR System. Cycle conditions were 95 °C for 10 minutes, followed by 40 cycles of 95 °C for 15 seconds, 52 °C for 45 seconds, and 72 °C for 1 minute. Relative quantifications values were calculated as described previously, where for each gene, relative quantifications values > 1 and < 1 indicated the up-regulation and down-regulation of that gene, respectively [25]. The data are presented as fold changes in the gene expression level of the target gene. The fold changes in the gene expressions were compared between the 2 groups (long-survival vs. short-survival) using the Student t-test at a P value <0.05. The primers designed for RTN1, BCL2L1, and HPRT are presented in Additional Table 1. HPRT was employed as the reference gene for normalization.