Identication of a 12-LncRNA long Lymph Node Metastasis Prognostic Signature for Cervical Cancer

Background: Despite advances in treatment strategies, the prognosis in early-stage patients with lymph node metastasis (LNM) remains poor, and the 5-year overall survival is ~30%. Therefore, in this study, we aimed to explore the LNM-related long noncoding RNAs (lncRNAs) for the prognosis of cervical cancer patients. We obtained LNM-relevant mRNA and lncRNA expression proles, copy number variant (CNV), and clinical follow-up data of patients with early-stage cervical cancer from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) databases. Results: Patients in the training set were divided into a high-risk score group with shorter overall survival (OS) or a low-risk score group with longer OS (p < 0.001); similar results were also determined for the validation set (p < 0.001). In addition, the CNV including chromosome arms 2q, 11q and 18q deletion, 11, 19q and 17q amplication were the most common forms in the high-risk score group. PF4708671 and GW4417556 might be sensitive drugs in chemotherapy, and the programmed death-ligand 1 expression of the low-risk score group was higher than that of the high-risk score group. Conclusions: The prognostic model based on LNM-relevant lncRNAs could help determining the immune status of cervical cancer patients at an early stage; this is conducive to accurate clinical diagnosis and individual treatment.


Background
Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) occur in cervical epithelial cells and is a global public health problem with over 500,000 women diagnosed each year. On an average, over 300,000 patients die annually due to this disease. Lymph node metastasis (LNM) status is not only an important prognostic factor [1] but is also a critical reason for disease recurrence [2]. Consequently, prognosis remains extremely poor for early-stage patients with LNM compared to those without LNM, and the 5-year overall survival (OS) rate decreases by approximately 30% [3]. Despite therapeutic strategy (i.e., surgery, chemoradiotherapy, and immunotherapy) improvements, the human papillomavirus (HPV) vaccine, and introduction of targeted drug development, some patients have continued to demonstrate a poor response to treatment strategies and experience recurrence or metastases; ultimately dying from their disease [4]. Therefore, predictive biomarkers based on high risk of poor prognosis and unfavorable therapeutic e cacy of drugs may be crucial for investigating a novel strati cation of patients with CESC and creating individualized identi cation and treatment strategies of risk factors to further improve OS.
Tumor cells propagate to lymph nodes, which are the rst stations involved in avoiding immune-systemmediated clearance, and generate appropriate immune responses as an essential immunological hub [5] in various cancers. Lymph nodes also function as an important hub for the metastatic cell arrest, cell growth, and secondary distant metastasis. The tumor cells evolved in immune evasion mechanisms that decrease the immunogenicity may also result in LNM progression and metastasis of CESC [6]. Tumor cells evade the host immune system through immunosuppressive mechanisms, including initiating immune checkpoint inhibitory molecules such as cytotoxic T lymphocyte-associated antigen 4(CTLA4) and PD-L1 on tumor cells and inducing immune suppressive cells, for example myeloids, in the tumor environment (TME) [7,8]. Immunotherapy targeted PD-1, PD-L1, and CTLA4 protect tumor cells from immune clearance and provides a novel strategy to improve the clinical outcome of CESC patients [9]. During clinical application of immune checkpoint inhibitors on recurrent patients with CESC, various degrees of heterogeneity such as copy number variation (CNV) in intratumor heterogeneity tumor cells, have been demonstrated to engage in invalid therapy, acquired resistance, and lethal outcomes [3]. CNV is recognized as a structurally varying fragment in which copy number differences increase or decrease by 1 kB-3 Mb, and the susceptibility of CNVs results in gains or losses of genomic DNA. Somatic CNVs have provided insight into a series of tumors and can be used to distinguish regions of the genome implied in tumor phenotypes including LNM. However, the roles of the tumor in ltration lymphocytes(TILs) [10] and CNV [11] [5]during these processes remain unclear.
Long non-coding RNAs (lncRNAs), a group of RNA molecules longer than 200 nucleotides in length, are extremely important in epigenetics modulation, and lncRNAs dysregulation is crucial for multiple genetic changes of tumor cells and closely correlated with carcinogenesis, invasion, and metastasis [12]. Several previous studies demonstrated that lncRNAs are key modulators that regulate LNM via regulation of epithelial-mesenchymal transition, metastasis-relevant genes, signal pathways and interaction with TME [13,14]. LncRNA CCAT1, which is located in the vicinity of c-Myc, promotes the epithelialmesenchymal transition (EMT) of cervical cancer cells by sponging miR-181a-5p/ matrix metalloproteinase -14 (MMP -14) and heparin-binding EGF-like growth factor (HB-EGF) [15]axis. Moreover, further mechanisms have shown that CCAT1 binds to c-Myc, which is an oncogene, and thus, enhances tumor progression via Wnt/β-catenin signal pathway [16]. LncRNAs also act as a focal point to further understand the pathogenesis of CESC and offer additional treatment strategies. SNHG14 involved in PD-L1 expression, SNHG14/miR-5590-3p/ZEB, promotes the expression as a positive feedback loop to diffuse large B-cell lymphoma progression which results in the activity restriction of CD8 + T cells and implicates the immune evasion [17].
Recent advances have been made to investigate the correlation between lncRNA dysfunction and genomic and biological changes of tumor cells, tumor antigen presentation, activation of TILs and modulation of immunosuppressive cells to discriminate new CESC classi cation systems with different prognostic outcomes, as well as to determine the e cacy of chemotherapy and potential therapeutic targets [18,19], however, these have not achieved molecular taxonomic consensus. Our study made an innovative improvement on LNM-relevant classi cation which strati ed CESC patients at an early clinical stage into high-and low-risk score groups, and revealed distinct differences in patient prognosis, tumor in ltration lymphocytes, somatic CNV, immunotherapy e ciency, and chemotherapy e ciency. Then, a scoring system was constructed to evaluate the CESC patient comprehensive LNM pattern and LNMrelevant lncRNAs were used as an independent prognostic biomarker for evaluating clinical prognosis, and acted as predictors of immunotherapy and chemotherapy to provide important information for comprehensive therapy in CESC. Our study revealed a wide spectrum of interactive effects between the LNM signature and tumor phenotypes in CESC.
As tumor invasion and metastasis occur, tumor cells disseminate from primary sites, invade adjacent lymphatic vessels, extravasate into the peripheral circulation and transfer via hematogenous systems to supply vessels of instant organs [20]. LNM is a nonrandom multi-step procession which implies a tumorhost interaction and has high organ speci city [21].
In this current study, we obtained expression pro les and clinical data from patients with early-stage CESC from The Cancer Genome Atlas database (TCGA) and extracted lymph node relevant lncRNAs. Then, we constructed a lncRNAs LNM-relevant prognostic signature using LASSO Cox regression to predict the clinical outcome of patients with CESC. Twelve lncRNAs, which correlated with TILs, were established as a signature and we further veri ed the accuracy of the model in different cohorts to provide insight into prognosis predictions and responses to chemotherapy and immunotherapy in CESC.

Comparison of gene expression pro le by LNM in CESC
The expression pro les of CESC patients with LNM were compared to those without metastasis to identify LNM-related differential expressed genes (DEGs). Standardizations and correction processing of the GSE26511 cohort were applied and illustrated with a boxplot diagram ( Figure 1A, 1B). Samples from TCGA and GEO were strati ed separately into lymph node positive (LN+) or lymph node negative (LN-) groups. As shown in the volcanic plot and heatmaps, a total of 899 DEGs and 875 DEGs were identi ed to be LNM -relevant DEGs from TCGA cohort and GSE26511 cohort, respectively. Twenty-two common DEGs were identi ed as LNM -relevant DEGs in LN+ and LN-groups ( Figure 1C, D, E, F). |log 2 Fold change (FC)|> -4 and adjusted p-value < 0.001 were set as the cutoff criteria. The expression pro les of LNMrelated DEGs were visualized, respectively, on the volcanic plot and heatmaps ( Figure 1C, D, E, F). Unsupervised hierarchical clustering analyses revealed that the identi ed DEGs possess distinct expression patterns among analyzed CESC patients, and these DEGs could be used to effectively distinguish patients with or without LNM.
Functional enrichment revealed a signi cant correlation between DEGs and immuno-terms. Biological process (BP) terms enriched in the Gene Ontology (GO) analysis included "cell -cell adhesion via plasmamembrane adhesion molecules" and "keratinization" and "hormone metabolic process" in CESC cohort (Figure 2A, B), and "morphogenesis of a branching epithelium," "epidermis development and epithelial cell proliferation-related" in the GSE26511 cohort ( Figure 2E, F). The "TGF-beta learning pathway," "Wnt signaling pathway," and "Neuroactive ligand-receptor interaction" were enriched in the Kyoto encyclopedia of genes and genomes (KEGG) analysis in both cohorts ( Figure 2C, G). Single-sample Gene Set Enrichment Analysis (GSEA) revealed immune-terms such as "the adaptive immune response," "epidermal cell differentiation and keratinization epithelial-mesenchymal transition pathway" in both cohorts ( Figure  2D, 2H). A signi cance value of p < 0.01 and false discovery rate (FDR) < 0.05 were set as thresholds. See supplementary information for detailed enrichment results.

Construction of lncRNA-mRNA co-expression modules of CESC
To construct the lncRNA-mRNA based molecular modules for LNM of CESC, we rst compared Affymetrix microarray data from GSE26511 cohort and RNA-seq data from TCGA cohort. Venn diagrams showed 22 DEGs (DUOXA2, PLA2G2A, LRRC17, EMX2, PENK, RCAN1, NLRP7, SOX17, TAC1, GPC4, KRT4, ERP27, MUCL1, NLGN1, GCNT3, NRCAM, SOSTDC1, EFNA3, PIP5K1, NDP, BEX5, KRT75) from the comparison of GSE26511 and CESC cohorts ( Figure 2A). Then, the signi cant differentially expressed lncRNAs (DE-lncRNAs) with a t-test p-value <0.01 and FDR < 0.15 between LN+ and LN-groups from the CESC cohort were determined using an unpaired two-tailed Student's t-test. The multiple testing correction was conducted using FDR de ned by Benjamini -Hochberg. Volcano plots and heatmaps showed distinct gene expression pro les of cases belonging to LN + vs. LN -groups ( Figure  Construction of lncRNA-mRNA co-expression modules of CESC Using a cross-validation approach with the TCGA cohort, 12 DE-lncRNAs were identi ed as showing a signi cant correlation between lncRNA expression and OS. LASSO regression analysis was performed to select genes with the best prognostic value and to build an IPS in the TCGA cohort. To determine which DE-lncRNAs might be a signi cant OS biomarker for patients with CESC at an early stage, LASSO logistic regression analysis was performed to construct the prognosis-predicting models in the training set at a 20-fold cross validation and DE-lncRNAs with the best prognostic values were selected ( Figure 4A and 4B). The dashed perpendicular line delineated the rst-rank value of log λ with the minimum segment likelihood bias and LNM-relevant lncRNAs were obtained (Figures 4A). The most signi cant differential expression lncRNAs(DE-lncRNAs) showing the survival status and survival time of patients with CESC in the early stage between high-risk and low-risk score groups are shown in Figure 4B. The relative expression standards of the LNM-relevant lncRNAs for each patient are displayed in Figure   4B. The survival analysis demonstrated that the OS of the low-risk score group was longer than that of the high-risk score group (p < 0.001) ( Figure 4C). The areas under the time-dependent ROC curves (AUCs) for 1-, 2-, 3-year OS were 0.73, 0.83, 0.83, respectively, indicating a good predictive performance of this prognostic model ( Figure 4D).

Immune cell in ltration assessment and its correlation analysis with prognostic markers
The immune cell in ltration between the LNM sample in early cervical cancer and the control sample was investigated ( Figure 5A). The relationship between 64 immune cells and matrix cell is illustrated in Figure   5B. Patients in the high-risk score group had lower ratios of B-cell, CD8 + T cell and CD4 + T cell (p<0.05) compared with those in the low-risk score group (p < 0.05). Moreover, we determined that the expression levels of class-switched memory B-cells, memory B-cells and naive B-cells in the high-risk score group were clearly lower than those in the low-risk score group (p < 0.05) ( Figure 5C).

Tumor mutation spectrum and CNV analysis
We analyzed the somatic variations in 138 cervical cancer samples from the TCGA cohort. Risk scores were calculated and the mutation spectrum was investigated through a waterfall map of the high-risk and low-risk score groups, and various colors with annotations represented different types of mutations ( Figure 6A, B). We performed a Genomic Identi cation of Signi cant Targets in Cancer (GISTIC) analysis to calculate CNV in different risk score groups. The CNV including chromosome arms 2q, 11q and 18q deletion, 11, 19q and 17q ampli cation were the most common form in the high-risk score group. The common extension regions were 11q22.1 and 17q12, and the common deletion region was2q37.3 and 11q23.3 ( Figure 6C). The CNV including chromosome arms 11q, 19q and 14q deletion, 11q ampli cation were the most common form in the low-risk score group. The common extension regions were 11q22.1 and 3q28, and the common deletion regions were 11q24.2 and 2q37.3 ( Figure 6D). Moreover, we calculated the tumor mutational burden (TMB), TNB, MSI and mutant-allele tumor heterogeneity (MATH), and the level of MSI showed signi cant difference between the two groups (p = 0.026) ( Figure 6E -H).

Prediction of chemotherapy and targeted treatment responses and their correlation with prognostic markers
We attempted to evaluate the response of the two risk scoring systems to all chemotherapeutic drugs in the Genomics of Drug Sensitivity in Cancer (GDSC) database and identify the applicable drugs that triggered signi cant inhibition of LNM in early CESC. The LASSO regression analysis was applied to train the prediction model and the structure of the network prediction model in GDSC cell line data of the GDSC database; the prediction accuracy was veri ed 10 times through cross validation. PF4708671and GW4417556 were signi cantly different than the IC50 predictive value between the high-risk and low-risk score groups ( Figure 7A). Next, we performed a Tumor Immune Dysfunction and Exclusion (TIDE) algorithm to predict the possibility of immune response, however, there was no signi cant difference between high-risk and low-risk score groups (p = 0.62) ( Figure 7B). However, when we compared the pro les of the two risk scoring systems self-de ned with published pro les of 47 melanoma patients who responded to immunotherapy [22] by subclass mapping, we observed that the low-risk score group was more promising for responding to the PD-1 treatment (Bonferroni correction p = 0.02) ( Figure 7C).

Risk coe cient of LNM -relevant lncRNAs and the correlation with clinical features
The LNM-relevant lncRNAs may precisely predict OS of patients with early-stage cervical cancer from TCGA cohort through the univariate and multivariate regression method. A comparison of different clinical variants between high-risk and low-risk score groups is demonstrated in Figure 8A. In addition, we performed risk strati cation with age, body mass index, history of tumors, pregnancy, oral contraception, pathological grade, and TMN stage and explored the clinical signi cance for patients with cervical cancer ( Figure 8B, 8C). The LNM-relevant lncRNAs provide important prognosis predictions for patients with CESC. In the Cox regression analysis, tumor stage, pregnancy history, and the number of pregnancies were con rmed to be independent prognostic factors of patients with CESC. For the quantitative prediction of prognosis, we constructed a prediction model containing the independent factors by the training set and developed a nomographic chart ( Figure 8D). The calibration curve veri ed that a good agreement was obtained between the predictive model and desirable model for the prognosis of patients with CESC ( Figure 8E, F) between the high-risk and low-risk score groups.

Discussion
CESC remains the second leading cause of female death worldwide and is a growing concern to the public. Despite the advances in immune checkpoint inhibitor treatment, over three-fourths of patients suffered from recurrences within the rst 2 -3 years, even with the standard treatments [23]. LNM, the most important route of metastasis in CESC, showed a greater resistance for chemotherapy and should be critically determined by the prognosis [24,25]. Recently, accumulating evidence supports the contribution of lncRNAs to a wide range of tumor cell metastasis by affecting TME, EMT [26,27]immune cell in ltration initiation [28][29][30], and tumor antigen-presentation [31]; remarkably interacting with immune cell in ltration [32] and genetic change [33] in CESC. However, LNM -relevant lncRNAs -based tools for the prognosis of CESC remain undetermined. Our study proposed a LNM-relevant lncRNAs signature which exhibited prediction accuracy in both the training and validation set. It can also be utilized as an independent prognosis predictor for survival surveillance and a therapeutic indication for more precise treatment for early-stage CESC. Recently, studies on lncRNAs identi ed that the LncRNA CCHE1 in CESC patients was enhanced and associated with large tumor size, advanced Federation of Gynecology and Obstetrics (FIGO) stage, and indicated poor prognosis [34]. LncRNA H19 acts as a molecular sponge of miR-138-5p upregulation and exhibited signi cant association with advanced FIGO stage and LNM, tumor differentiation and poor prognosis of cervical cancer patients through H19/miR-138-5p/SIRT1 axis [35]. This evidence veri ed the importance of lncRNAs in LNM and its wide application prospects on the prognostic prediction of CESC, and emphasized the importance of more detailed investigations on lncRNAs dysfunction.
Hence, the samples were grouped according to risk score or the gene expression using LASSO logistic regression analysis and immune related RNAs analysis and we built a twelve lncRNA signature. The AUC value was higher than 0.7 in OS for 1 year, 3 years, and 5 years, which supported our twelve lncRNA signature as having better prediction power than other clinical factors.
The correlation between lncRNAs themselves and TIL types and numbers was a primary emphasis of our study. The number and composition of TILs re ected the process in the host anti-tumor immunity and interaction with tumor cells and the TME, and provided a marker for prognosis and treatment e cacy [36]. The functions during the tumor immunoediting procession revealed that the cytotoxic immune response was characterized by CD4+, CD8+, antigen -presenting cells, and the in ltration of other lymphoid elements [37]. The presence of CD8+ and CD4+ T cells is a negative metastasis index for a better outcome of the CESC patients, with systemic tumor-speci c immune response [38]. CD8+ T cells, the major population of cytotoxic T lymphocytes (CTLs), are responsible for killing tumor cells in an MHC class Imatched manner, while CD4+ cells mainly exert to CD8+ cells [39,40]. CD8+ T cells promoted the progression of HPV-mediated CESC with the immunosuppressive cytokine IL-10 [41]. In recent years, immune therapy has emerged as a new strategy for activating the immune system and enhancing the capability of killing tumors. A few studies have demonstrated that in the isolated mononuclear cells from tumor tissue, lymphocytes consisted of higher proportions of B-cells (CD19+) and T cells 53 compared with normal cells, but their role in CESC remained unclear [42]. In our ndings, risk score and the abundance of immune cells in CESC were investigated. The twelve lncRNA signature had a negative relationship with B-cells, CD8+ and CD4+ T cells. Class-switched memory B-cells, memory B-cells and naive B-cells showed a negative relationship with risk score. Our results might offer a promising direction in this eld, as previous studies on B-cells in CESC are limited.
We also investigated the predictive value of immunotherapy and chemotherapy. Copy number ampli cation or deletion often results in abnormal expression of oncogene and tumor suppressor genes, as well as tumor cells abnormalities, including cell adhesion and recognition [43]. Prior evidence has con rmed that an increase in the copy number of the PD-L1 gene was closely associated with the anti-PD-1 / PD-L1 treatment of locally advanced cervical cancer [44]. Therefore, we analyzed the copy number pro le of 138 samples from TCGA cohort and grouped samples according to risk score after the removal of germline CNV data. We screened the effective chemotherapeutic drugs and discovered a new target of PF4708671 and GW4417556.
Based on the two different prognostic models, we validated the superior ability in dividing high-and lowrisk score groups and the strong effect on prediction for chemotherapy response. However, a difference in immunotherapy response was not identi ed. The patients in the low-risk score group showed a favorable prognosis, which indicated that the two models might be capable of risk classi cation. However, when compared at depth with the expression pro le of a melanoma dataset, patients in the low-risk score group presented with a higher level of immune checkpoint molecules and showed higher immunogenicity. This implied that the two models might be an index to evaluate chemotherapy e cacy. In addition, the prognostic risk model based on the twelve lncRNA signature exhibited accurate prognostic performance of patients with lymph nodes and could provide insights in prognostic prediction for OS in patients with CESC.
There were some limitations in this study. Firstly, we lacked information on normal tissue, precancerous lesions, and advanced stage cells, thus, factors of other conditions were not considered during biomarker identi cation. Secondly, the results obtained using the bioinformatics analysis alone were not su ciently credible and as a result require further experimental validation. Therefore, genetic screening in a multicenter, large-sample clinical study is needed.

Conclusions
We identi ed LNM-relevant hub genes and co-expression lncRNAs of CESC, then established a twelve lncRNA signature to stratify patients with CESC into a high-risk score or low-risk score group to evaluate the prognosis of CESC patients. The underlying mechanisms of the lncRNAs signature in the LNM of CESC were preliminarily distinguished. This prognostic model may offer a promising approach for investigating genetic instability and immunotherapy, but its application value requires further investigation.

Data retrieval and preprocessing
The RNA-seq data, CNV and clinical follow-up data of 138 patients with CESC were downloaded from TCGA database (https://portal.gdc.cancer.gov/). Standardized expression pro le data GSE26511 of 39 patients with CESC were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/), with the Affymetrix Human Genome U133 Plus 2.0 Array expression platform used for an external validation. The inclusion criterion ensured that patients were followed up for more than 30 days and The International FIGO stage was at stage I-II. A total of 177 patients were included in our study and divided into a lymph node metastasis group (LNM+) and lymph node without metastasis group (LNM -). The CNV data of 177 samples were acquired after the removal of germline CNV data. All data were obtained from public datasets, and hence ethics approval and informed consent were not required.
Enrichment scores for two cohorts were calculated separately for each sample with the Single-sample Gene Set Enrichment Analysis (ssGSEA) algorithm. The "clusterPro ler" package [47] was used for both the functional enrichment analysis of GO and KEGG analysis to describe the potential function of differential expression mRNAs and identify GO categories and relevant pathways of protein networks, chemical information, and genomic information. FDR < 0.25 and adjusted p-value < 0.05 were set as the cutoff criteria and "msigdb_v7.0_GMTs" was set as the reference genes. Finally, DEGs were strati ed into LN+ and LN-groups by median risk score to annotate biological functions (1,000 permutations), and the co-expression of DEGs in two cohorts was set as LNM -relevant DEGs.

Construction of lncRNAs classi ers for LNM and function analysis
To identify the DE-lncRNAs in the TCGA-CESC cohort with | log2 FC | > 1, adjusted p < 0.05 volcano plots were performed using "ggplot2" package (R package) and 1.3 construction of lncRNAs classi ers was done for LNM and function analysis. DEGs associated with LNM and DE-lncRNAs in CESC cohort were screened and removed based on the co-expression to explore the lncRNAs that correlated with LNM. The Pearson correlation coe cient (PCC) > 0.3 and p < 0.05 were set as the threshold and the mRNA-lncRNA co-expression network was generated using Cytoscape software (version 3.8.0) Identi cation of lncRNA biomarkers associated with LNM and prognosis The primary endpoint for prognostic analysis in the study was OS, de ned as the time from the date of the rst diagnosis to the date of the death. According to the inclusion criteria: (i) primary tissues; (ii) patients' OS information was complete; and (iii) without neoadjuvant chemotherapy. Then, the LASSO logistic analysis, a statistical method, was employed to identify the most important DE-lncRNAs which may serve as prognosis indices of early clinical stages. Good prognosis Kaplan -Meier analysis and logarithmic-rank tests were applied to analyze the OS based on TCGA-CESC cohort which was set as the training cohort using the "survival" package (R package).
A ROC curve was generated to estimate the prediction of patient survival. The sensitivity and speci city of the prognostic models were analyzed by ROC curves using the " ROCR[48] " package (R package) and the accuracy and sensitivity were estimated by the area under the curve (AUC). In addition, OS was established based on the coe cients of LASSO logistic regression in the training cohort, which could differentiate between LN+ and LN-. The LASSO regression was applied to ensure that the lncRNAs selected were from well distinguished samples with different risk score levels, and that PCA was performed on the expression pro les of all lncRNAs prior to LASSO treatment and lncRNAs with a risk score value after LASSO treatment.
Immune in ltration analysis xCell (https://xcell.ucsf.edu/ ) is a novel gene signature-based method used to estimate cell subpopulations from the tissue expression pro le based on gene characteristics [49]. xCell performs an ssGSEA to estimate the component and abundance of 64 immune and matrix cell types, including 14 stromal cells. The "GSVA" R package (version 1.34.0) was applied to conduct an ssT analysis. The abundance scores were calculated separately for stromal and immune cells in 138 samples from TCGA cohort using the xCell algorithm on bulk tumor gene expression data and output samples (p < 0.05), and we then calculated the stromal and immune in ltration of 64 cells. We visualized the distribution of 64 cell in ltrations in each sample using the "pheatmap" package. The correlations between 22 immune cells were displayed on a heatmap using the "corrplot" package (p <0.05). Spearman correlations between LNM-relevant lncRNAs and in ltration of immune cells across human cancers were analyzed and visualized using the "ggplot2" package. The TIDE algorithm [50] and subclass mapping algorithm [51] were used to predict the response to immune checkpoint blockade.

Immune in ltration analysis
The somatic mutations of 138 CESC samples from TCGA were calculated and visualized by the "Maftools" package [52]. Variant annotations were performed and various plotting functions were analyzed to generate intricate publication-quality images. Then, TMB, MATH [53] and Microsatellite Instability (MSI) were calculated. The frequent changing regions in CESC samples with a q-value < 0.05 were identi ed using the GISTIC [54], including chromosomal arm level CNV and minimal common regions among the samples. The number of new antigens in the TCGA cohort were derived from the Cancer Immune Atlas (https://tcia.at/home).
Prediction of the LNM-relevant lncRNA according to chemotherapy GDSC[55] is a public pharmacology portal, accessible for sensitivity prediction of 138 chemotherapeutics. To determine effectiveness of chemotherapy, we calculated the half-maximal inhibitory concentration (IC50) with the Ridge Regression and veri ed 10-fold cross validation using the "pRRophetic" package[56]. All parameters were set, the batch effect and pathological type were removed, and gene duplication was summarized as the average.

Risk characteristics and prognosis
To evaluate the correlation between LNM-relevant lncRNAs and clinical characteristics, CESC samples were divided into high-risk and low-risk groups according to clinical characteristics such as age, T stage and AJCC stage. AJCC stage, based on survival time and risk score in subgroups, was calculated to con rm whether the ability to predict OS in various subgroups (including age, gender, T stage and AJCC stage) was retained. Next, CESC samples were grouped according to clinical characteristics and scored in subgroups in the validation and testing cohorts. Univariate and multivariate Cox regression analyses were utilized to evaluate the independent prognostic value of risk factor and risk scores in terms of OS. A nomogram was generated using the "rms" package and evaluated via a correction curve. The differential expression genes of 39 samples in GSE26511 cohort and 138 samples in TCGA-CESC cohort.
(A, B) Column diagram of the differential expression genes (DEGs) before and after sample correction in GSE 26511 cohort. (C, D) Volcano plot of the DEGs in GSE26511 cohort and TCGA cohort, the red represents upregulated DEGs. The blue represents downregulated DEGs, and the grey represents nondifferential genes. (E, F) The heatmap plot of DEGs in TCGA cohort and GSE26511 cohort. DEGs: Differential expression genes. The color from red to blue represents the P value from large to small  blue circle represents DE-lncRNAs. The color from red to blue represents the P value from large to small, the size of the circle represents the number of genes in the enrichment pathway, with a larger circle representing more gene data. Red connection between nodes represents positive correlation, and blue represents negative correlation.

Figure 4
Identi cation and validation of Lymph node metastasis relevant lncRNAs based on risk sore in TCGA cohort (A,) Lasso logistic regression analysis and principal component analysis. LASSO coe cient pro les of the 12-lncRNAs signature and plots of the ten-fold cross-validation error rates. (B) Distribution of risk score, survival status of each patient, and heat map expression of 12-lncRNAs in risk score high and risk score low groups. (C) Kaplan-Meier survival curves for overall survival (OS) from risk score high and risk score low groups strati ed by 12-lncRNA signature. (D) Time-dependent receiver operating characteristic (ROC) curve for predicting OS of 12-lncRNA signature in 1 year, 3 year, 5year.  axis: riskscore. Vertical axis: in ltration abundances of immune cell. Correlations between the risk score based on 12 lncRNAs signature and immune score of immnue cells. P < 0.05, **P < 0.01, ***P < 0.001. Figure 6 Genomic features of different risk score in CESC patients at Early stage A, B. The mutation distribution of frequency mutation genes in different risk score groups. The horizontal is molecular subtypes, ordinate (left) is the mutation frequency of the gene in each sample, ordinate (right) is mutation genes and colors in heat-map are different mutation types. (C, D) List of the most frequently altered genes. (E -H). The expressions of lncRNAs in different groups with copy ampli cation are signi cantly higher than that in the samples with normal copies. Differential expression of the TMB, math, MSI and TNB with copy ampli cation n different risk group.