Comparison of gene expression profile by LNM in CESC
The expression profiles of CESC patients with LNM were compared to those without metastasis to identify LNM-related differential expressed genes (DEGs). Standardizations and correction processing of the GSE26511 cohort were applied and illustrated with a boxplot diagram (Figure 1A, 1B). Samples from TCGA and GEO were stratified separately into lymph node positive (LN+) or lymph node negative (LN-) groups. As shown in the volcanic plot and heatmaps, a total of 899 DEGs and 875 DEGs were identified to be LNM - relevant DEGs from TCGA cohort and GSE26511 cohort, respectively. Twenty-two common DEGs were identified as LNM - relevant DEGs in LN+ and LN- groups (Figure 1C, D, E, F). ∣log2Fold change (FC)∣> - 4 and adjusted p-value < 0.001 were set as the cutoff criteria. The expression profiles of LNM-related DEGs were visualized, respectively, on the volcanic plot and heatmaps (Figure 1C, D, E, F). Unsupervised hierarchical clustering analyses revealed that the identified DEGs possess distinct expression patterns among analyzed CESC patients, and these DEGs could be used to effectively distinguish patients with or without LNM.
Functional enrichment revealed a significant correlation between DEGs and immuno-terms. Biological process (BP) terms enriched in the Gene Ontology (GO) analysis included “cell - cell adhesion via plasma-membrane adhesion molecules” and “keratinization” and “hormone metabolic process” in CESC cohort (Figure 2A, B), and “morphogenesis of a branching epithelium,” “epidermis development and epithelial cell proliferation-related” in the GSE26511 cohort (Figure 2E, F). The “TGF-beta learning pathway,” “Wnt signaling pathway,” and “Neuroactive ligand-receptor interaction” were enriched in the Kyoto encyclopedia of genes and genomes (KEGG) analysis in both cohorts (Figure 2C, G). Single-sample Gene Set Enrichment Analysis (GSEA) revealed immune-terms such as “the adaptive immune response,” “epidermal cell differentiation and keratinization epithelial-mesenchymal transition pathway” in both cohorts (Figure 2D, 2H). A significance value of p < 0.01 and false discovery rate (FDR) < 0.05 were set as thresholds. See supplementary information for detailed enrichment results.
Construction of lncRNA-mRNA co-expression modules of CESC
To construct the lncRNA-mRNA based molecular modules for LNM of CESC, we first compared Affymetrix microarray data from GSE26511 cohort and RNA-seq data from TCGA cohort. Venn diagrams showed 22 DEGs (DUOXA2, PLA2G2A, LRRC17, EMX2, PENK, RCAN1, NLRP7, SOX17, TAC1, GPC4, KRT4, ERP27, MUCL1, NLGN1, GCNT3, NRCAM, SOSTDC1, EFNA3, PIP5K1, NDP, BEX5, KRT75) from the comparison of GSE26511 and CESC cohorts (Figure 2A). Then, the significant differentially expressed lncRNAs (DE-lncRNAs) with a t-test p-value <0.01 and FDR < 0.15 between LN+ and LN- groups from the CESC cohort were determined using an unpaired two-tailed Student’s t-test. The multiple testing correction was conducted using FDR defined by Benjamini - Hochberg. Volcano plots and heatmaps showed distinct gene expression profiles of cases belonging to LN + vs. LN – groups (Figure 3A, 3B). Subsequently, for comparison - based investigation on LNM, 480 genes were upregulated and 127 genes downregulated in the LN+ compared with LN- group. Fold change >1.5 and p < 0.05 were set as cutoff criteria. Finally, we compared the 22 differentially expressed genes and DE-lncRNAs obtained from TCGA cohort, extracted, and established an mRNAs-lncRNAs interaction network of DEGs and DE-lncRNAs to better understand the interplay among the identified DE-lncRNAs((Figure 3C.). Finally, we identified 106 LNM-relevant lncRNAs which could effectively distinguish patients with or without LNM.
Construction of lncRNA-mRNA co-expression modules of CESC
Using a cross-validation approach with the TCGA cohort, 12 DE-lncRNAs were identified as showing a significant correlation between lncRNA expression and OS. LASSO regression analysis was performed to select genes with the best prognostic value and to build an IPS in the TCGA cohort.
To determine which DE-lncRNAs might be a significant OS biomarker for patients with CESC at an early stage, LASSO logistic regression analysis was performed to construct the prognosis-predicting models in the training set at a 20-fold cross validation and DE-lncRNAs with the best prognostic values were selected (Figure 4A and 4B). The dashed perpendicular line delineated the first-rank value of logλ with the minimum segment likelihood bias and LNM-relevant lncRNAs were obtained (Figures 4A). The most significant differential expression lncRNAs(DE-lncRNAs) showing the survival status and survival time of patients with CESC in the early stage between high-risk and low-risk score groups are shown in Figure 4B. The relative expression standards of the LNM-relevant lncRNAs for each patient are displayed in Figure 4B. The survival analysis demonstrated that the OS of the low-risk score group was longer than that of the high-risk score group (p < 0.001) (Figure 4C). The areas under the time-dependent ROC curves (AUCs) for 1-, 2-, 3‐year OS were 0.73, 0.83, 0.83, respectively, indicating a good predictive performance of this prognostic model (Figure 4D).
Immune cell infiltration assessment and its correlation analysis with prognostic markers
The immune cell infiltration between the LNM sample in early cervical cancer and the control sample was investigated (Figure 5A). The relationship between 64 immune cells and matrix cell is illustrated in Figure 5B. Patients in the high-risk score group had lower ratios of B-cell, CD8+ T cell and CD4 + T cell (p<0.05) compared with those in the low-risk score group (p < 0.05). Moreover, we determined that the expression levels of class-switched memory B-cells, memory B-cells and naive B-cells in the high-risk score group were clearly lower than those in the low-risk score group (p < 0.05) (Figure 5C).
Tumor mutation spectrum and CNV analysis
We analyzed the somatic variations in 138 cervical cancer samples from the TCGA cohort. Risk scores were calculated and the mutation spectrum was investigated through a waterfall map of the high-risk and low-risk score groups, and various colors with annotations represented different types of mutations (Figure 6A, B). We performed a Genomic Identification of Significant Targets in Cancer (GISTIC) analysis to calculate CNV in different risk score groups. The CNV including chromosome arms 2q, 11q and 18q deletion, 11, 19q and 17q amplification were the most common form in the high-risk score group. The common extension regions were 11q22.1 and 17q12, and the common deletion region was2q37.3 and 11q23.3 (Figure 6C). The CNV including chromosome arms 11q, 19q and 14q deletion, 11q amplification were the most common form in the low-risk score group. The common extension regions were 11q22.1 and 3q28, and the common deletion regions were 11q24.2 and 2q37.3 (Figure 6D). Moreover, we calculated the tumor mutational burden (TMB), TNB, MSI and mutant-allele tumor heterogeneity (MATH), and the level of MSI showed significant difference between the two groups (p = 0.026) (Figure 6E – H).
Prediction of chemotherapy and targeted treatment responses and their correlation with prognostic markers
We attempted to evaluate the response of the two risk scoring systems to all chemotherapeutic drugs in the Genomics of Drug Sensitivity in Cancer (GDSC) database and identify the applicable drugs that triggered significant inhibition of LNM in early CESC. The LASSO regression analysis was applied to train the prediction model and the structure of the network prediction model in GDSC cell line data of the GDSC database; the prediction accuracy was verified 10 times through cross validation. PF4708671and GW4417556 were significantly different than the IC50 predictive value between the high-risk and low-risk score groups (Figure 7A). Next, we performed a Tumor Immune Dysfunction and Exclusion (TIDE) algorithm to predict the possibility of immune response, however, there was no significant difference between high-risk and low-risk score groups (p = 0.62) (Figure 7B). However, when we compared the profiles of the two risk scoring systems self-defined with published profiles of 47 melanoma patients who responded to immunotherapy[22] by subclass mapping, we observed that the low-risk score group was more promising for responding to the PD-1 treatment (Bonferroni correction p = 0.02) (Figure 7C).
Risk coefficient of LNM - relevant lncRNAs and the correlation with clinical features
The LNM- relevant lncRNAs may precisely predict OS of patients with early-stage cervical cancer from TCGA cohort through the univariate and multivariate regression method. A comparison of different clinical variants between high-risk and low-risk score groups is demonstrated in Figure 8A. In addition, we performed risk stratification with age, body mass index, history of tumors, pregnancy, oral contraception, pathological grade, and TMN stage and explored the clinical significance for patients with cervical cancer (Figure 8B, 8C). The LNM- relevant lncRNAs provide important prognosis predictions for patients with CESC. In the Cox regression analysis, tumor stage, pregnancy history, and the number of pregnancies were confirmed to be independent prognostic factors of patients with CESC. For the quantitative prediction of prognosis, we constructed a prediction model containing the independent factors by the training set and developed a nomographic chart (Figure 8D). The calibration curve verified that a good agreement was obtained between the predictive model and desirable model for the prognosis of patients with CESC (Figure 8E, F) between the high-risk and low-risk score groups.