Inference of Molecular Subtypes of Uterine Corpus Endometrioid Carcinoma with Different Survival Based on Cancer Hallmarks-associated Long Non-Coding RNAs and Gene Modules

Background: Uterine corpus endometrioid carcinoma (UCEC), a common gynecological malignancy with high incidence, affects the mental and physical health of women. It is increasingly evident that long non-coding RNAs (lncRNAs) have causative roles in cancers including UCEC. However, very little is known about the cancer hallmark-related risks for UCEC based on lncRNAs and genes. Methods: In this study, a computational integrated pipeline was development to evaluate cancer hallmark-related risk for each UCEC patient based on gene and lncRNA expression. Results: Some UCEC-specic cancer hallmark-related genes and lncRNAs were identied. Core modules were extracted from co-expressed UCEC-specic lncRNAs and genes networks for each cancer hallmark. Some core modules showed specic features and functions. Follow a multi-dimensional rank approach and core modules, each UCEC sample was given an integrated cancer hallmark-related risk score. We divided all the UCEC patients to diverse hallmark risk groups including multi-hallmark, media-hallmark and single-hallmark groups and they showed different prognosis. We also identied some key lncRNAs which participated in multiple kinds of cancer hallmarks. These key lncRNAs were associated with some essential pathways such as estrogen receptor signaling. Conclusions: In conclusion, the present study provide novel insights to the classication and treatment of UCEC.


Background
Uterine corpus endometrioid carcinoma (UCEC) is the sixth-most-common leading cause of cancer-related death among women in the United States [1,2], with an estimated 65,620 new cases and 12,590 deaths in 2020 [3]. Over time, the number of newly diagnosed UCEC in the United States has steadily increased and this trend is expected to continue [4]. Most UCEC women of early stages diagnosed show favorable outcomes. However, there are some low-grade, early, well differentiated UCEC women in which unexpected recurrence and adverse outcomes may occur. UCEC patients usually have diverse survival and drug response. Thus, UCEC is one of the few human malignant tumors for which mortality is increasing, which underscores the urgent need to construct novel and effective strategies for distinguishing subtypes with diverse survival and drug response of UCEC.
Recent years, many kinds of non-coding RNAs have been discovered and researched in multiple kinds of diseases including cancer [5,6]. Long non-coding RNA (lncRNA) was a major kind of non-coding RNAs which >200 nt in length [7,8]. Accumulating evidence has suggested that lncRNAs serve as important and essential roles in physiological and pathological processes of various cancers [9,10]. Numerous studies have reported that the occurrence, development, prognosis and treatment of UCEC were closely related to the abnormal expression of a large number of lncRNAs. For example, lncRNA SNHG16 induced by TFAP2A modulates glycolysis and proliferation of UCEC and is associated with poor survival rate [11].
Yao et al. reported that lncRNA LSINCT5 is up-regulated and signi cantly inhibited cell proliferation, cell cycle progression, and induced apoptosis in UCEC. Thus, comprehensive and systematic exploration of the characteristics about lncRNA in UCEC could provide assistance for identifying novel molecular associations of potential mechanistic signi cance in the development of UCEC.
It is generally known that the biology of cancer is extremely complex, individualized and various. However, some key traits have been revealed to reduce cancer complexity during the past decade. These traits could be represented by a few distinctive and complementary capabilities (which are considered as "cancer hallmarks") that could promote tumor growth and metastasis. These cancer hallmarks could provide a logical framework for understanding the signi cant diversity of multiple kinds of cancers [12].
Hanahan and Weinberg proposed six hallmarks and ten hallmarks of cancer in 2000 and 2011 [13,14]. These two researches both suggest that multiple kinds of cancers share some common cancer hallmarks which dominate the convert from normal cells to cancer cells. These ten cancer hallmarks are effective principles to depict the characteristics of cancers. However, we have little insight into how to use these hallmarks to depict the roles of lncRNAs and distinguish UCEC patients.
In present study, we developed an integrated algorithm for evaluating hallmark-related risks of UCEC patients and distinguishing them to diverse groups based on gene and lncRNA expression pro les ( Figure   1). Some UCEC-speci c lncRNAs and genes in ten kinds of hallmarks were identi ed. Co-expressed UCEC-speci c lncRNAs and genes networks in diverse cancer hallmark were constructed. Some core modules were extracted from these co-expressed UCEC-speci c lncRNAs and genes networks. Each UCEC patient was given a comprehensive score for a speci c core module in each cancer hallmark based on multi-dimensional rank approach. All the UCEC patients were divided to diverse hallmark risk groups including multi-hallmark, media-hallmark and single-hallmark groups. These diverse hallmark risk groups were associated with different survival. Some key lncRNAs which were related to multiple cancer hallmarks were also discovered and showed speci c functions. Collectively, this study clari ed the roles of cancer hallmark-related lncRNAs in UCSC and distinguished them to diverse risk groups.

Collection of gene and lncRNA expression pro les of UCEC patients
The lncRNA expression and gene expression (level 3) data, as well as clinical data of UCEC patients were obtained from The Cancer Genome Atlas (TCGA, Release: 2019-07-21). The download link of these data was https://gdc.xenahubs.net/download/TCGA-UCEC.htseq_fpkm.tsv.gz. We also obtained genome annotation data, including genome sites and symbols of genes and lncRNAs from GENCODE 31 (19.06.19). The dataset includes 545 UCEC tumor and 38 control samples. All the lncRNAs and genes which were not expressed in all samples would be excluded. The minimum value of all samples were given to any remaining expression values of 0. All the expression values were transformed follow log2(value+1).
Gathering cancer hallmark-related GO terms and genes All the cancer hallmark-related GO terms were obtained from a previous study [15]. We downloaded all the genes of these cancer hallmark-related GO terms from Gene Ontology using AmiGo (version: 2.5.12; http://amigo.geneontology.org/amigo) [16]. Thus, all the cancer hallmark-related genes were got for follow analysis.
Identi cation of UCEC-speci c cancer hallmark-related genes and lncRNAs test was applied to identify differential expressed cancer hallmark-related genes and lncRNAs between UCEC and control normal samples based on expression pro les. lncRNAs and cancer hallmark-related genes were considered as UCEC-speci c lncRNAs and cancer hallmark-related genes if they were differentially expressed (P < 0.01). All the UCEC-speci c lncRNAs and cancer hallmarkrelated genes were divided to up-and down-regulated lncRNAs and genes based on fold values.
Construction of UCEC-speci c cancer hallmark-related genes and lncRNAs co-expressed networks and identi cation of core modules We calculated Pearson's correlation coe cients (PCCs) for each UCEC-speci c cancer hallmark-related gene and lncRNA pair in all cancer hallmarks. The UCEC-speci c gene and lncRNA pairs which their PCCs were higher than 0.3 or smaller than -0.3 and the p-values were smaller than 0.01 were extracted and considered as co-expressed UCEC-speci c gene and lncRNA pairs. UCEC-speci c cancer hallmark-related genes and lncRNAs co-expressed networks were constructed by Cytoscape 3.3.0 (http://www.cytoscape.org/) based on co-expressed pairs which their PCCs were higher than 0.5 or smaller than -0.5. In order to more accurately identify cancer hallmark-related genes and lncRNAs pairs in UCEC, we extracted core modules from co-expressed networks using MCODE module in cytoscape. Thus, certain numbers of core modules were identi ed for each cancer hallmark in UCEC.
Evaluating hallmark-related risk for UCEC patients and dividing them to diverse risk groups based on core modules For each UCEC patient, risk score in each cancer hallmark was calculated and given. Multidimensional rank approach was applied to obtain risk score for each cancer hallmark based on genes and lncRNAs expression values in core modules. Up-and down-regulated genes and lncRNAs ranked in positive and negative order. 1000 permutation test was performed follow randomly perturbing cancer samples. The UCEC patient was considered as a risk sample for a speci c cancer hallmark if permutation P values was smaller than 0.05 in any core module. Then, all the UCEC patients were divided to non-hallmark (0 risk hallmark), media-hallmark (1-5 risk hallmarks) and multi-hallmark groups (6-8 risk hallmarks).
Survival analysis for diverse risk cancer hallmark groups and functional analysis for UCEC-speci c genes and lncRNAs Kaplan-Meier survival analysis was performed for diverse risk cancer hallmark groups. Log-rank test was using to asses statistical signi cance (P< 0.05). R 3.6.2 framework was performed for all analyses. Enrichr tools (http://amp.pharm.mssm.edu/Enrichr) with default parameters was used for functional analysis based on UCEC-speci c genes in each cancer hallmark [17]. Signi cantly enriched pathways (P< 0.05) were obtained for each cancer hallmark in UCEC.

Results
Some lncRNAs and cancer hallmark-related genes were speci c and co-expressed in UCEC In order to depict the roles of cancer hallmark in UCEC, we identi ed UCEC-speci c lncRNAs and cancer hallmark-related genes by differential expression. 1880 (12.43%) UCEC-speci c lncRNAs were identi ed between UCEC and control samples (Figure 2A). These UCEC-speci c lncRNAs included 891 and 989 upand down-regulated lncRNAs ( Figure 2B). UCEC-speci c genes were also identi ed for each cancer hallmark ( Figure 2C). In all kinds of cancer hallmarks, there were more than 50% UCEC-speci c genes. Specially, there was 70.57% UCEC-speci c genes in cancer hallmark genome instability and mutation.
The results indicated that these cancer hallmark-related genes maybe serve as essential roles in UCEC. We assumed that lncRNAs and cancer hallmark-related genes could function by cooperating in UCEC.
Thus, some co-expressed lncRNAs and cancer hallmark-related genes pairs were identi ed in UCEC for each kind of cancer hallmark. Most PCCs of co-expressed lncRNAs and cancer hallmark-related genes pairs were concentrated between 0.3 and 0.5 ( Figure 2D). There were 8631 co-expressed lncRNAs and cancer hallmark-related genes pairs which their PCCs were higher than 0.5 ( Figure 2E). In each kind of cancer hallmark, the numbers of pairs, lncRNAs and genes were diverse ( Figure 2F). For example, there were 2525, 365 and 384 pairs, genes and lncRNAs in cancer hallmark self su ciency in growth signals.
However, only there were 84, 47 and 12 pairs, genes and lncRNAs in cancer hallmark reprogramming energy metabolism. The results indicated that diverse cancer hallmarks play different roles in UCEC. All above results indicated that lncRNAs and cancer hallmark-related genes cooperative pairs were important in UCEC.
Some core modules were extracted from co-expressed lncRNAs and cancer hallmark-related genes networks in each cancer hallmark For each cancer hallmark, co-expressed lncRNAs and cancer hallmark-related genes which their PCCs were higher than 0.5 were extracted for constructing co-expressed networks. In cancer hallmark evading apoptosis, lncRNAs and cancer hallmark-related genes co-expressed network was constructed ( Figure  3A). This co-expressed network contained 380 nodes (234 UCEC-speci c lncRNAs and 147 cancer hallmark-related genes) and 906 edges. We found that some cancer hallmark-related genes and lncRNAs such as SLC25A27, AC005288 and GD5-AS1 played core roles in this co-expressed network. Most of cancer hallmark-related genes and lncRNAs showed positive correlations in UCEC. In eight cancer hallmarks, there were diverse numbers of core modules were identi ed ( Figure 3B). Cancer hallmarks insensitivity to antigrowth signals, self su ciency in growth signals and tissue invasion and metastasis had most core modules. The numbers of cancer hallmark-related genes and lncRNAs were also different ( Figure 3C). For example, there were more lncRNA in core module 1 in cancer hallmark insensitivity to antigrowth signals. These core modules maybe show speci c functions. For example, a core module in cancer hallmark evading apoptosis contained three lncRNAs and three genes ( Figure 3D). PSMB8, PSMB10, PSME2 and PSMB8-AS1 were all proteasome-related genes or lncRNAs. The proteasome is a multicatalytic proteinase complex which is characterized by its ability to cleave peptides. Specially, cancer hallmark-related gene PSMB8 and lncRNA PSMB8-AS1 showed strong positive correlation (P< 0.001, PCC=0.79). These genes and lncRNAs in this core module showed close interactions. Another core module also showed close interactions ( Figure 3E). All the results explained these core modules in coexpressed networks could function and serve as speci c biomarks in UCEC.
Speci c cancer hallmark-related risk were evaluated for each UCEC patient based on core modules We inferred that each UCEC patient maybe have diverse hallmark-related risk. Thus, we calculated risk scores for each UCEC patient based on core modules in each cancer hallmark. Only eight cancer hallmarks were extracted for calculating risk scores due to core modules. The density distribution of risk scores in all core modules of each cancer hallmark were similar ( Figure 4A). Only a small number of UCEC patients showed higher risk scores. The differences of average risk scores for diverse top 20 core modules were also present ( Figure 4B). These 20 core modules were signi cantly associated with more UCEC patients ( Figure 4C). For example, there were 32.03% samples were signi cant in core module 1 in cancer hallmark insensitivity to antigrowth signals. This core module was a key module which had most related samples and also been explained in above results. We also discovered the percent of signi cant risk-related samples in 0%corresponding top ranked samples. 70% UCEC samples ranked before corresponding orders in most core modules ( Figure 4D). These results indicated that UCEC patients showed differences of cancer hallmark-related risk scores.
UCEC groups with diverse cancer hallmark risk showed speci c features Each UCEC patient could be associated with diverse cancer hallmark follow above pipeline. The numbers of UCEC in each cancer hallmark were different ( Figure 5A). For example, there were more than 350 UCEC samples were associated with cancer hallmark tissue invasion and metastasis. The cancer hallmark reprogramming energy metabolism was related to 300 UCEC samples. Specially, some UCEC patients were associated with multiple cancer hallmarks. 13.59% UCEC patients had some relationships with any kinds of cancer hallmarks ( Figure 5B). 11.55% UCEC patients were only related to one kind of cancer hallmark. Thus, we could divide all the UCEC patients to diverse groups with different numbers of cancer hallmarks ( Figure 5C). The three diverse groups contained non-, media-and multi-hallmarks. 70.76% samples belonged to media-hallmarks groups. The UCEC patients with more cancer hallmarks usually had better survival days ( Figure 5D). In addition, we also divided all the UCEC patients to another three cancer hallmark-related risk groups based on hierarchical clustering ( Figure 5E). We also discovered that these diverse cancer hallmark-related risk groups showed different prognosis ( Figure 5F, G). Group 2 signi cantly had better survival than group 1 and 3 (P= 0.014 and 0.023). All the results suggested that diverse hallmark-related risk groups had respective features and prognosis.
Some key lncRNAs could participate in multiple kinds of cancer hallmarks and showed speci c functions In order to further depict the roles of lncRNAs in cancer hallmarks for UCEC, we extracted some key lncRNAs which participate in multiple kinds of cancer hallmarks. There were 11 lncRNAs were associated with more than three kinds of cancer hallmarks ( Figure 6A). lncRNAs AL590764.1, ANKRD10-IT1, NORD and AP000766.1 could participated in four kinds of cancer hallmarks ( Figure 6B). However, the classes of these four kinds of cancer hallmarks were diverse. We inferred that these lncRNAs may serve as essential roles in UCEC. Thus, we further performed functional analysis for these lncRNAs in each kind of cancer hallmark. The lncRNAs were enrichment in some essential functions for cancer development ( Figure 6C). For example, lncRNAs associated with evading apoptosis were enrichment in some hormone-related pathways such as negative regulation of intracellular estrogen receptor signaling, negative regulation of intracellular steroid hormone receptor signaling and regulation of intracellular estrogen receptor signaling. The estrogen receptor status is reported to be an important marker of UCEC [18,19]. In addition, we found most lncRNAs were associated with gap junction assembly pathway. Nishimura M et al. reported that gap junctional intercellular communication was suppressed via 5' CpG island methylation in promoter region of E-cadherin gene in UCEC cells [20]. These results indicated that cancer hallmarkrelated lncRNAs could serve as essential roles in UCEC.

Discussion
In this study, a calculated integrated approach which evaluate cancer hallmark-risk for each UCEC patient based on gene and lncRNA expression was developed. Some core modules were extracted from coexpressed UCEC-speci c lncRNAs and genes networks in diverse cancer hallmark. All the UCEC patients were divided to diverse hallmark risk groups based on respective comprehensive risk scores. These diverse hallmark risk groups showed different survival. Some key lncRNAs were related to multiple cancer hallmarks and showed speci c functions.
Dividing cancer samples to groups with diverse molecular characteristics could provide assistance for cancer diagnosis and individualized treatment. Wright GW et al. described an algorithm that determines the probability that a patient's lymphoma belongs to one of seven genetic subtypes based on its genetic features [21]. Lee S et al. presented an explainable deep learning model with attention mechanism and network propagation for cancer subtype classi cation [22]. Most of these developed clari cation methods only focused on coding genes. In our work, lncRNAs were considered as essential and key factors for constructing the computational approach about cancer patients classi cation. The computational approach not only divided UCEC patients to diverse hallmark risk groups but also discovered some key lncRNAs which serve as important roles in UCEC. These lncRNAs were signi cantly enriched in some essential cancer development functions.
Analysis of cancer hallmarks could greatly improve our understanding of the occurrence, development and metastasis of many cancer types. In present study, we applied cancer hallmarks to evaluate and distinguish UCEC patients. We discovered that most cancer hallmark-related genes were differentially expressed in UCEC. It indicated that cancer hallmark-related genes maybe had essential functions in UCEC. Specially, we found that each UCEC patient showed obvious differences for risk of cancer hallmarks. Although cancer hallmarks were considered as common properties for cancer, each UCEC patient still showed personalized features. This result supported that there were great differences among different cancer patients and the application of personalized medicine in cancer treatment is very important. Thus, our study could help to evaluate risk for UCEC patient and establishes personalized therapy.

Conclusions
In summary, the present study developed a standardized computational procedures to evaluate cancer patient hallmark-related risk based on genes and lncRNAs expression. All the UCEC patients were divided to diverse cancer hallmark-related risk groups and showed different prognosis. Some key cancer hallmark-related lncRNAs were signi cantly associated with UCEC development. Collectively, our study leads to a novel starting point for future functional explorations, the identi cation of biomarkers, and lncRNA-based targeted therapy for UCEC.

Disclosure of interest
The authors declare that they have no competing interest. The work ow of evaluating cancer hallmark-related risk for UCEC patients based on genes and lncRNAs expression. Step 1. Identi cation of UCEC-speci c cancer hallmark-related genes and lncRNAs and calculating co-expression pairs based on genes and lncRNAs expression pro les. Step 2. Construction of UCEC-speci c cancer hallmark-related genes and lncRNAs co-expressed networks and extraction of core modules. Step 2. Evaluating cancer hallmark-related risks for UCEC patients and dividing them to diverse groups.   two cancer hallmark-related risk groups. The difference between the two curves was evaluated by a twosided log-rank test.