Patient samples and inclusion criteria
This study was approved by the Harbin Medical University Ethics Review Board (Harbin, China). The study design and patient selection strategy have been published previously.8, 9 Briefly, in our initial prospective cohort of CRC patients, a total of 168 patients were included according to the inclusion criteria. All of the patients provided written informed consents. The inclusion and exclusion criteria are as follows: (1) all patients are newly diagnosed with stage I-IV primary CRC, and their diagnosis was histologically confirmed by a senior pathologist (HL); (2) fresh frozen tumour tissues were collected from all patients; (3) patients with cocurrent any other types of cancer were excluded (n = 3); (4) patients with a family history of CRC in first-degree relatives were excluded (n = 5); (5) patients who received anti-cancer therapy before surgery were excluded (n = 11).
All CRC patients were diagnosed and operated at the First Affiliated Hospital and the Third Affiliated Hospital of Harbin Medical University between May 2010 and December 2012. The tumour specimens were staged according to the 2009 seventh version of the AJCC TNM staging system. The clinical characteristics and medical records were collected. The primary outcome was overall survival (OS), defined as time from surgery to death from any cause. The secondary outcome was disease-free survival (DFS), defined as time from surgery to a local or regional relapse, distant metastasis, or CRC-specific death, whichever came first. Outcomes were observed during the follow-up period through March 15, 2018 via an established protocol. Postoperative patients were followed up at 3–6 months intervals for the first year and then annually. We used a telephone-delivery follow-up questionnaire to collect information on the date and cause of death of CRC patients. The recorded date and cause of death of each CRC patient were validated using the medical certification of death and the Harbin Death Registration system. Four cases lacked follow-up data and were then excluded in this analysis. Of these 164 eligible CRC patients, the median follow-up period was 61.1 months (ranging from 4.9 to 80.8 months) and 75 cases died.
RNA extraction and qRT-PCR assays
Fresh tumour tissue samples were collected and immediately stored at -80℃. Total RNA were extracted from fresh frozen tissues (0.5g) using TRIzol reagent (Invitrogen). cDNA was reverse transcribed from 2 µg total RNA using MultiScribe™ reverse transcriptase (Applied Biosystems). The RNA and cDNA concentration was measured using NanoDrop 2000c (ThermoFisher, USA). cDNA was then amplified and quantified by quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) with Fast SYBR® Green Master Mix (Applied Biosystems) on the LightCycler 480 platform (Roche).The housekeeping gene GAPDH was selected as an internal control. No-template control was included in each batch and all reactions were performed in triplicate. The primer sequences are as follows. MALAT1 (NR_002819.4): F-(5’- GCTCTGTGGTGTGGGATTGA − 3’), MALAT1-R-(5’- GTGGCAAAATGGCGGACTTT − 3’); GAPDH (NM_002046.7): F-(5’-GGTGGTCTCCTCTGACTTCAACA − 3’), R-(5’- CCAAATTCGTTGTCATACCAGGAAATG − 3’). Melting curve analysis was used to monitor the specificity of PCR reactions. The resulting data was analysed using the Gene Scanning and TM Calling modules (Roche). Two co-authors (HL and YXZ) blinded to outcomes and independently recorded the results. The relative expression level of MALAT1 was determined using the 2−ΔCt method. The ΔCt value of each sample was calculated by subtracting the average Ct value of MALAT1 from the average Ct value of GAPDH. According to the median value of 2−ΔCt, patients were categorized into higher or lower MALAT1 expression groups.
External validation dataset
The colorectal dataset (CORD) from TCGA was used as external validation population. The MALAT1 expression profile data, the clinicopathologic information, and survival data were downloaded from the TCGA database and the UCSC Xena resource.20, 21 After excluding those without MALAT1 expression data (n = 102) or survival data (n = 30), a total of 596 patients were included in our analyses, including 475 patients with colon cancer and 121 with rectal cancer. The median follow-up period for these 596 patients was 22.5 months, with a range of 0.2 to 150.1 months, and a total of 121 cases died.
The gene expression RNA-seq-HTSeq-FPKM-UQ dataset for TCGA colon and rectum adenocarcinoma was performed using the UCSC Xena website tools, and then used in our analyses. The relative quantification of MALAT1 expression level was presented as N-fold differences and termed as ‘NMalat1’, which was determined by dividing the value of MALAT1 expression by the value of GAPDH. Then, patients were categorized into the higher (≥ median of NMalat1) or lower (< median of NMalat1) groups.
We used a Cox proportional hazards regression model to calculate the sample size. Given a pre-estimated overall survival rate of 50% in this initial cohort population, a sample size of 128 cases was required to achieve 90% power to detect an estimated hazard ratio (HR) of 1.5 with a two-sided 5% level of statistical significance. Finally, we included additionally 25% more patients and targeted a total sample size of 164 patients. The sample size was estimated using PASS software (version 11.0.7, NCSS LLC., USA).
We reported means (standard deviations) and counts (frequencies) for continuous and categorical variables, respectively. To minimise covariate differences between groups, we performed a PS-based analysis.22 Group differences were compared using the standardised differences method with a significant imbalance level of standardised difference ≥ 25%. The PS value was calculated with MALAT1 expression level as the dependent variable using a multivariate logistic regression model that included demographic factors and clinical/pathological characteristics. We used the PS-adjustment method in order to incorporate all the patients in our analysis.23
Survival curves were estimated by the Kaplan-Meier method, and the differences between survival rates between groups were examined with log-rank tests. The univariate and PS-adjustment multivariate Cox-proportional hazards regression models were used to assess prognostic significance and the results were reported as hazard ratios (HRs) and 95% confidence intervals (CIs). The associations between MALAT1 expression status and those clinical/pathological covariates were reported as odds ratios (ORs) and 95% CIs. Statistical significance was defined as a two-sided P < 0.05. All statistical analyses were conducted with SPSS Statistics (v.23.0, IBM, USA).
Several predesigned sensitivity analyses were performed to explore the robustness of the results. Firstly, we compared the univariate HR and the PS-adjustment HR using the confounding RR,24 which was calculated to evaluated the relative impact of the PS adjustment on the results. Secondly, we performed a conventional multivariate Cox regression analysis as a sensitivity analysis. Additionally, for the external cohort population, we performed a post hoc sensitivity analysis by excluding those patients with a shorter follow-up duration (≤ 1 or ≤ 3 months) in order to explore the potential confounding impact. Finally, we performed extensive post hoc subgroup analysis according to clinical/pathological factors. In post hoc subgroup analyses, we used the Bonferroni adjustment method to correct the level of statistical significance.
In order to better understand the current evidence for the association between MALAT1 expression and CRC prognosis, we systematically review the relevant researches and performed a meta-analysis. We systematically searched eligible studies assessing the prognostic significance of MALAT1 expression on CRC patient outcomes in PubMed, EmBase, and ProQuest through May 25, 2020. The inclusion criteria were as follows: (1) prospective cohort studies addressing the prognostic associations of MALAT1 and CRC outcomes; (2) studies that reported effect estimates including HRs with corresponding CIs; (3) studies with the sample size more than 50 participants; (4) there was no restriction on language, race, or any other participant characteristics. Data extraction was conducted independently by two co-authors (HL and YXZ). The maximally adjusted effect sizes and 95% CIs were extracted and summarised using random-effects models. The Q test and the I2 Statistic were used to test the between-study heterogeneity. The pooled effect estimates were presented as forest plots. We performed E-value analysis,25 as a post-hoc sensitivity analysis, to explore whether an unmeasured confounding factor could explain the observed associations.