Identication of a prognostic three-long noncoding RNA signature in lung squamous cell carcinoma via bioinformatic analysis

Background Lung squamous cell carcinoma (LSCC) is a form of cancer that is associated with high rates of relapse, poor responsiveness to therapy, and a relatively poor prognosis. The relationship between long noncoding RNA (lncRNA) expression and LSCC patient prognosis remains to be established. In the present study, we discovered that lncRNAs were differentially expressed in LSCC tumor tissues relative to normal control tissues, and we explored the prognostic relevance of these lncRNA expression patterns using data from the Cancer Genome Atlas (TCGA). These multidimensional data were analyzed in order to identify lncRNA signatures that were associated with LSCC patient survival outcomes. Kaplan-Meier survival curves revealed prognostic capabilities for three of these lncRNAs (LINC02555, APCDD1L-DT and OTX2-AS1). A Cox regression analysis revealed this three-lncRNA signature to be signicantly associated with patient survival. Further GO and KEGG analyses revealed that the predicted target genes of these three lncRNAs were also potentially involved in cancer-associated pathways.


Results
These multidimensional data were analyzed in order to identify lncRNA signatures that were associated with LSCC patient survival outcomes. Kaplan-Meier survival curves revealed prognostic capabilities for three of these lncRNAs (LINC02555, APCDD1L-DT and OTX2-AS1). A Cox regression analysis revealed this three-lncRNA signature to be signi cantly associated with patient survival. Further GO and KEGG analyses revealed that the predicted target genes of these three lncRNAs were also potentially involved in cancer-associated pathways.

Conclusions
Together these results thus indicate that this novel three-lncRNA signature can be used to predict LSCC patient prognosis.

Background
Lung cancer is a highly heterogeneous disease, with genetic, epigenetic, and environmental factors all acting to shape its development and progression. Lung cancer mortality rates are the highest of all forms of cancer, accounting for 25% and 30% of all cancer-associated deaths in the USA and China, respectively [1][2]. In 2015 alone, 733,000 new cases of lung cancer were diagnosed in China (69% in males and 31% in females), while 218,527 new cases were diagnosed in the USA during this same period (52% in males and 48% in females). SEER data indicate that lung cancer patients exhibit a 5-year survival rate of just 18.1% [3]. Lung squamous cell carcinoma (LSCC) cases account for a signi cant fraction of overall lung cancer cases [4]. LSCC more often occurs in men, is related to the smoking of tobacco, and is often associated with high rates of relapse, poor responsiveness to therapeutic intervention, and a generally poor patient prognosis [5][6]. While there have been many advances in the eld of clinical oncology as a whole in recent years, rates of 5-year overall survival (OS) for LSCC patients still remain low. As such, it is vital that novel approaches be identi ed that can be used to predict the prognosis of LSCC patients so as to guide clinical decision making and treatment efforts in these individuals.
Long non-coding RNAs (lncRNAs) are RNA molecules > 200 nucleotides in length that lack coding potential [7] Emerging evidence indicates that some lncRNAs do encode proteins and play roles in transcriptional, and epigenetic gene regulation, and cancer [8][9]. These lncRNAs have been shown to frequently be dysregulated in cancer, with their altered expression patterns having a direct impact on tumor cell gene expression at the post-transcriptional and epigenetic levels, as well as on the proliferation, survival, invasion, and metastasis of these cells [10][11][12]. However, relatively few studies to date have speci cally examined the relationship between lncRNA expression and LSCC patient prognosis. In the present study, we therefore explored patterns of differential lncRNA expression in LSCC tumor tissues and normal control tissue samples in an effort to assess the prognostic relevance of such lncRNA expression patterns. Through this approach we were able to develop a three lncRNA signature which was found to be signi cantly associated with the survival of LSCC patients.

LSCC patient datasets
Level 3 expression and clinical data pertaining to 432 LSCC patients and 49 control samples were downloaded from The Cancer Genome Atlas (TCGA, https://tvga-data.nci.nih.gov/tcga/). Datasets and patient records were used in order to assess both patterns of lncRNA expression as well as clinocpathological and demographic variables including gender, age at time of diagnosis, and TNM staging ( Table 1). The Ethics Committee of the Institutional Review Board of Ningbo Yinzhou Second Hospital and Cixi People's Hospital in Zhejiang Province approved this study. Samples were included in the present analysis if they were from patients with an OS > 1 month for whom lncRNA differential expression data and information pertaining to clinical details and prognosis were available. The language package in R was used in order to interpret the lncRNA sequencing data, while the limma package was used when assessing differential lncRNA expression between LSCC and control samples, with differential expression being expressed based upon fold change (FC) values. Those lncRNAs with a log 2 |FC| > 1.0 and p < 0.05 were considered to be signi cantly differentially expressed.

Statistical analysis
The prognostic relevance of differentially expressed lncRNAs in LSCC was assessed using Kaplan-Meier curves and log-rank tests. We ultimately constructed a signature using a linear combination of the expression levels of these three lncRNAs and the estimated regression coe cients in the multivariate Cox regression analysis. This three lncRNA signature-derived risk score was then used to stratify patients into high-and low-risk groups, using the median risk score in this cohort as a cutoff point for strati cation purposes. Kaplan-Meier curves and log-rank tests were then used to compare survival outcomes between these high-and low-risk patients. In addition, receiver operating characteristic (ROC) analyses were used in order to compare the sensitivity and speci city of this three lncRNA risk score as a means of predicting patient survival outcomes. P < 0.05 was the signi cance threshold. R version 3.5.1 (http:// www.rproject.org/) was used for all statistical testing [13].

Functional analysis
Correlating genes to the differentially expressed lncRNAs were obtained using the co-expression method.
Pearson correlation coe cients between the expression pro les of the three prognostic lncRNAs and their protein-coding genes (PCGs) were calculated to determine their relationships. Those PCGs with a Pearson's R > 0.40 and p < 0.05 were considered to be lncRNA-related. These putative lncRNA targets were then subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses. Furthermore, these target genes were incorporated into a protein-protein interaction (PPI) network using the STRING database [14], with Cytoscape being used for network visualization [15]. Select protein pairs from this network with > 10 nodes were the outputs of this analysis.

Patient characteristics
The study investigated 481 patient samples, including 432 LSCC and 49 normal tissues. Table 1 lists detailed clinical characteristics, including gender, race, age at diagnosis, and disease stage. Of the enrolled patients, 26.2% were female, and 78.4% were order than 60 years. The most common tumor grades were I (49.3%) and II (32.4%). A total of 936 differentially expressed lncRNAs, including 687 upregulated and 249 downregulated lncRNAs, were identi ed between LSCC and normal tissues in Figure  1.
The relationship between lncRNA expression and LSCC patient OS We began by using univariate and multivariate Cox proportional hazard regression models in order to identify those lncRNAs which were associated with LSCC patient prognosis. In total, we identi ed three candidate lncRNAs in these LSCC patients (p < 0.01; Fig 2). A multivariate model con rmed that the expression of the lncRNAs LINC02555 (HR = 1.08, p = 0.025), APCDD1L-DT (HR = 1.10, p = 0.004), and OTX2-AS1 (HR = 0.89, p = 0.006) were all independently associated with LSCC patient OS. Kaplan-Meier survival curves and log-rank tests were further used to examine the relationship between these lncRNAs and patient survival. We found that two of the tested lncRNAs (LINC02555 and APCDD1L-DT) were negatively associated with LSCC patient OS, whereas the lncRNA OTX2-AS1 was positively correlated with OS (Fig. 3).
The prognostic utility of the three lncRNA signature Using this three lncRNA signature, we were able to assign risk scores to patient samples , after which these samples were separated into high-and low-risk groups based upon the median risk score value (n = 216 samples/group). We then found that patients in the high-risk group had a signi cantly shorter OS than did patients in the low-risk group (p < 0.001) (Fig 4a). We then used an ROC analysis in order to assess the prognostic utility of this three lncRNA signature. The AUC values for these curves as predictors of LSCC patient 3-and 5-year survival were 0.67 and 0.62, respectively, corresponding to an effective survival prediction (Fig 4b). Patients in the high-risk group expressed higher levels of the lncRNAs LINC02555 and APCDD1L-DT on average relative to low risk patients, whereas low-risk patients expressed higher levels of OTX2-AS1 lncRNA.

Functional enrichment analysis and PPI networks
In order to identify potential targets for these three lncRNAs which were associated with LSCC patient prognosis, we conducted a co-expression analysis as detailed in the Materials and Methods section. We then performed GO and KEGG pathway analyses on these co-expressed genes in order to unravel their potential physiological roles (Fig 5). These co-expressed genes were primarily enriched in genes associated with biological processes such cell adhesion molecule binding, ubiquitin-like protein ligase binding, protein serine/threonine kinase activity, ubiquitin protein ligase binding, actin binding, ATPase activity, phospholipid binding, and phosphoric ester hydrolase activity. These genes were additionally signi cantly enriched in KEGG pathways including endocytosis, focal adhesion, MAPK signaling, lysosomes, neurotrophin signaling, ubiquitin mediated proteolysis, axon guidance, herpes simplex virus 1 infection, and the cell cycle. Furthermore, PPI networks were also obtained using the STRING tool. As

Discussions
Lung cancer currently ranks as the deadliest form of cancer, and as such it is a primary focus for many cancer research efforts [16]. While LSCC patient prognosis has improved signi cantly in recent years owing to improvements in multidisciplinary treatment strategies, and chemotherapeutic/radiotherapeutic treatment regimens, LSCC recurrence rates remain high and as such this disease can impose a heavy burden upon patients, their families, and on medical institutions [17][18]. Di culties in accurately diagnosing LSCC and in predicting patient outcomes have led to low 5-year survival rates in affected patients [2]. Recent work suggests that the ability to more reliably predict LSCC patient prognosis at time of initial diagnosis is associated with a signi cant improvement in patient outcomes. As such, it is vital that novel biomarkers that can reliably predict LSCC patient outcomes be identi ed. It is similarly important that the molecular mechanisms governing the development and progression of LSCC be fully elucidated.
Many studies have clearly shown that the development of LSCC can be driven by interactions between genetic, transcriptomic, and proteomic factors [19][20]. Changes in lncRNA expression patterns can also in uence all stages of the oncogenic process, yet the prognostic relevance of these lncRNAs has not been su ciently studied to date. As such, in the present study we examined lncRNA expression patterns in LSCC and were thus able to identify three lncRNAs that were signi cantly linked with LSCC patient OS. These three lncRNAs were then subjected to additional analyses aimed at identifying their putative target genes and potential biological roles through the use of pathway enrichment analyses. These results indicated that these three lncRNAs may play roles in regulating LSCC molecular pathogenesis, clinical progression, and patient prognosis, thus clearly demonstrating the prognostic relevance of lncRNA expression patterns in LSCC patients in a clinical setting.
Multiple studies [21][22] have demonstrated that functional lncRNA expression can modulate oncognesis via altered regulation of gene expression and signaling within tumor cells. Indeed, certain lncRNAs are able to promote the development, progression, and metastasis of tumors through their ability to regulate the proliferation, differentiation, migration, and survival of these cancerous cells [23]. Huang et al. [24] found that increasing the expression of the downregulated lncRNA LINC00961 resulted in increased Bax expression and the corresponding apoptotic death of NSCLC cells. Xu et al. further provided evidence suggesting that the lncRNA HULC is able to promote LSCC cell proliferation owing to its ability to PTPROdependent phosphorylation and activation of NF-κB [25]. Similarly, Wang et al. found that increased expression of the lncRNA MIR31HG in NSCLC led to enhanced tumor cell ge tinib resistance owing to associated activation of the EGFR/PI3K/AKT signaling pathway. In this report, we analyzed highthroughput data and were thereby able to identify two upregulated lncRNAs (LINC02555 and APCDD1L-DT) and one downregulated lncRNA (OTX2-AS1) in LSCC patients, all three of which were signi cantly associated with patient clinical outcomes.
We further sought to gain insight into the functional importance of the three lncRNAs identi ed in this study via using a co-expression analysis-based approach to identify putative lncRNA target genes that were then subjected to GO and KEGG enrichment analyses. This approach revealed the lncRNAassociated target genes to be enriched for functionality in the context of endocytosis, focal adhesion, MAPK signaling, and lysosomal activity, all of which are closely linked with oncogenesis and tumor progression [27][28]. To date no studies have speci cally studied LINC02555, APCDD1L-DT, or OTX2-AS1 in the context of LSCC. As such, future in-depth molecular analyses will be needed to con rm the ndings of our predictive co-expression analysis.
There are multiple limitations to the present study. For one, these results are derived solely from bioinformatics analyses and as such necessitate additional functional validation. Furthermore, we did not explore the molecular mechanisms linking the expression of these three lncRNAs to LSCC patient prognosis, and as such future experimental studies will be required in order to elucidate these mechanisms. As such, large-scale multi-center trials will be essential in order to validate and expand upon our ndings.

Conclusions:
In summary, in the present article we were able to identify three different lncRNAs that could be used to predict survival outcomes in patients with LSCC. Further large-scale multi-center trials will be needed to con rm our ndings, and to explore the molecular mechanisms linking these lncRNAs to clinical outcomes in LSCC patients. While much work is still required before this lncRNA signature can be implemented in a clinical setting, we nonetheless feel that our ndings may have signi cant value as a future diagnostic or prognostic tool in the context of LSCC patient identi cation and care. Authors' contributions RJZ and MLH carried out the study design, analysis and interpretation of data. RJZ, MMW and MDZ drafted the manuscript. All read and approved the nal manuscript. Figure 1 Volcano plot of differentially expressed lncRNAs. Red and green dots represent upregulated and downregulated lncRNAs, respectively.