1.Pan cancer analysis of PTEN mutation
For further confirm the mutation frequency of PTEN for Pan cancer, cBioPortal database was used to analyze the landscape of PTEN in different tumors. Just as expected, the results indicated that PTEN had the highest mutation frequency in EC (65%) followed by Glioblastoma Multiforme (34%) and Uterine Carcinosarcoma (19%) (Figure 1A). In addition, the mutation sites of PTEN were scattered in the whole gene region, and Arginine mutation was the most frequent (Figure 1B). Furthermore, we compared the mutations of PTEN to TP53 in EC, and the results showed that the mutation frequency of PTEN (64%) was even higher than that of TP53 (37%) (Figure 1C).
2. Clinical significance of PTEN mutation in EC
The Kaplan-Meier survival analysis results showed that PTEN mutation status was significantly correlated with the better prognosis of patients with EC (P<0.01, Figure 2A). To investigate whether the mutation types associated with the prognosis, patients with EC were divided into seven groups based on PTEN mutation types. However, survival curve analysis (Kaplan-Meier plots) displayed that there was no significant difference in overall survival among the seven groups (P=0.97, Figure 2B). In addition, the correlation between PTEN mutation status and EC clinical stage was also analyzed (Fig. 2C). the result showed that the number and proportion of PTEN mutant survivors were higher than that of wild type in stage I.
3. GO enrichment and pathway analyses for differentially exprKEGG essed genes
PTEN mutation status may be a potential biomarker for guiding treatment in EC. Therefore, we identified the differentially expressed genes (DEGs) between PTEN-mutant and PTEN-wild groups. A total of 66 DEGs were successfully identified, including 40 upregulated and 26 downregulated genes (Supplementary Table S1). These DEGs were plotted in a Volcano map to demonstrate differences between the two groups, where the red and blue dots represented the up and downregulated genes, respectively (Figure 3A). And the gene expression of each sample could be seen from the heat map, where the red indicates high relative expression and green indicates low relative expression (Figure 3B). A previous study suggested that PTEN may drive cancer progression and regulate cancer treatment, further affect the prognosis of patients. To gain a better understanding of how PTEN may affect the prognosis of EC patients, we next performed the enrichment analysis of DEGs. As a result, the top 10 results of enrichment terms disclosed by biological processes in the GO analysis included (Figure 3C). The results revealed that DEGs enriched in GO biological process categories were closely related to the developmental system (Figure 3C, Supplementary Table S2), including 'DIGESTIVE SYSTEM DEVELOPMENT', 'EPITHELIAL CELL DEVELOPMENT', 'DEGESTIVE TRACT DEVELOPMENT'. We also noticed some morphogenesis functions were more affected, such as ‘EMBRYONIC LIMB MORPHOGENESIS’, ‘EMBRYONIC APPENDAGE MORPHOGENESIS’, ‘EMBRYONIC FORELIMB MORPHOGENESIS’, etc. In addition, ‘RESPONS TO ESTRADIAL’ and ‘RESPONS TO STEROID HORMONE’ were the metabolic and endocrine pathways. As for the molecular function categories, DEGs were closely related to enzyme activation & inhibition activity (Figure 3C, Supplementary Table S3).
Furthermore, it was suggested by KEGG pathway enrichment analysis that some key pathways were relevant to the tumorigenesis and development of EC, such as ‘PROTEOQLYCANS IN CANCER’, ‘ENDOERINE RISISTENCE’ and ‘HUMAN T-ELL LEUKEMIA VIRUS 1 INFECTION’ (Figure 3D, Supplementary Table S4). Interestingly, some other cancer-related and disease-related pathways were also enriched, such as ‘BREAST CANCER’, ‘HEPATOCELLULAR CARCINOMA’ and ‘CUSHING SYNDROME’, etc.
4. Ingenuity pathway analysis (IPA)
IPA (Ingenuity Pathway Analysis, pathway analysis software) is a graphical interface bioinformatics software based on cloud computing. It can analyze, integrate and understand omics data from the perspective of biological pathways, and is suitable for transcriptomics and proteomics27. Big data analysis such as science and metabolomics is also suitable for some small-scale experiments that generate lists of genes and chemical substances. The results of omics data analysis mainly include five results: classical pathways, upstream transcription regulation, downstream regulator effects, diseases and functions, and molecular interaction networks. Its biggest advantage is that it can predict the activation/inhibition of the pathway based on the up and down adjustment of the molecules in the uploaded data, and then predict the change trend of the entire pathway after it is activated32. We used IPA to enrich 115 classical pathways in EC (P<0.05), among which the most important classical pathways include'MSP-RON Signaling in Macrophages Pathway','Molecular Mechanisms of Cancer','MSP-.RON Signaling in Cancer Cells Pathway', "SPINK1 Pancreatic Cancer Pathway" and "Glioblastoma multiforme signal transduction", in which "Glioblastoma multiforme signal transduction" is inhibited (Figure 4A, Supplementary Table S5). In addition, the disease and biological function enrichment analysis of DEGs was performed using IPA. The analysis results of GO, KEGG and IPA show that DEGs are mainly related to endocrine and reproductive pathways, including "cancer" and "endocrine system diseases" (Figure 4B, Supplementary Table S6).
5. Establishment of the prognostic signature for EC patients
Identification of prognosis biomarkers for EC and as potential therapeutic targets remains an important clinical issue. Therefore, we investigated the potential relationship between PTEN related genes and the prognosis of patients with EC. The 526 samples of TCGA-EC were divided into training set (N=368) and validation set (N=158). Then, a total of 18 overall survival-associated genes were identified based on univariate cox regression analysis of DEGs (P<0.05, Figure 5A, Supplementary Table S7). In the follow-up LASSO regression analysis of these survival related genes, 10 genes (CLDN9, UCHL1, BEX2, SLC47A1, PGR, SLC25A35, SCGB2A1, MSX1, CRABP2 and MAL) were screened to be independent prognostic factors for survival while the optimal lambda was 0.0130152 (Figure 5B-C, Supplementary Table S8). Finally, 10-genes prognostic signature was established.
6. Evaluation and validation of prognostic value of the PTEN-related prognostic gene signature
The EC patients were divided into a high-risk group and a low-risk group based on the median of the risk score (Figure 6A). And more red dots were found in high-risk group indicating that with the increase of risk score, the rate of death increased in the training set (Figure 6A). Then, prognostic signature genes expression heatmap showed that MSX1, SLC47A1, PGR, SLC25A35 and SCGB2A1 were down-regulated, while CLDN9, CRABP2, MAL, UCHL1 and BEX2 were up-regulated in high-risk group (Figure 6A). The K-M results showed that patients in the high-risk group showed significantly poorer OS than patients in the low-risk group in the training set (P<0.001, Figure 6B). Time-dependent receiver operating characteristic curve (ROC) with AUC values was used to assess the prediction accuracy of the prognostic signature. The AUCs for 1-year, 3-year, and 5-year survival were 0.658, 0.680 and 0.720 for the training set, respectively (Figure 6C).
To validate the predictive ability of this 10-genes prognosis signature, similar procedures were carried out in the validation set (Figure 7A). According to the median of the risk score, 158 samples of EC were divided into high- and low-risk group (Figure 7A). And the more dead patients were distributed in high-risk group based on the survival state scatter diagram. The genes expression heatmap showed the similar result with the training set (Figure 7A). Moreover, the K-M results also showed a significantly different survival rate between high- and low- group (P=0.04, Figure 7B). In addition, the AUCs for 1-year, 3-year, and 5-year survival were 0.780, 0.729 and 0.697, respectively (Figure 7C). Collectively, our results indicated a good performance of the prognostic signature for survival prediction.
7. Correlation between risk score and clinical features
In order to analyze the correlation between risk score and clinical characteristics, 526 patients with clinical information including age and figo_stage were included in further analysis. The risk score of Age (≤50 VS Age >50) showed significantly different, as well as the figo_stage (Stage I VS Stage II VS Stage III VS Stage IV) showed the similar result (Figure 8A, 8B). In addition, risk score was extremely significantly difference between PTEN-wild and PTEN-mutant groups (P<0.0001, Figure 8C). Finally, the heat map of prognostic characteristic gene expression was drawn. It was found that high levels of CLDN9, CRABP2, MAL, UCHL1 and BEX2 were positively correlated with risk value. (Figure 8D).
8. Establishment and validation of survival prediction nomogram
In order to further analyze the independent prognostic ability of the 10-genes prognostic signature, Univariate and multivariate Cox regression analyses were performed. The results indicated that age, figo_stage, PTEN mutation status and risk score were included in the multivariate Cox regression analysis with P <0.005 (Figure 9A). Then age (P =0.005), figo_stage (P <0.001) and risk score (P <0.001) were identified as independent prognostic factors to establish nomogram for predicting survival rate (Figure 9B). The 1-, 3- and 5-year survival probability were estimated by patient's age, figo_stage and risk score in the nomogram (Figure 9C). And then, 1-, 3- and 5-year calibration curve were drawn to validate the accuracy of survival rate prediction. The results showed that 1-year survival rate was more credible than 3- year survival rate according to the nomogram. Conversely, the prediction was not trustworthy in 5-year survival (Figure 9D-F).
9. Interaction of prognostic signature genes
Finally, we return to the analysis of gene interaction on the tumorigenesis and development of UCEC. The ‘Dental Disease, Developmental Disorder, Gastrointestinal Disease’, ‘Cancer, Cellular Movement, Endocrine System Disorders’, ‘Cancer, Organismal Injury and Abnormalities, Reproductive System Disease’, ‘Dermatological Diseases and Conditions, Inflammatory Disease, Organismal Injury and Abnormalities’, ‘Cellular Movement, Digestive System Development and Function, Gastrointestinal Disease’ and ‘Cancer, Cell Death and Survival, Organismal Injury and Abnormalities’ were the top 6 diseases and functions pathways (Supplementary Table S9). Ulteriorly, the genes of all pathway were used to construct the interaction network diagram. The results indicated that prognostic signature genes PGR, CRABP2, BEX2 and MSX1 were interacted with more genes than SLC47A1, SCGB2A1, SLC25A35, UCNL1, MAL and CLDN9 (Figure 10). We speculated that PGR, CRABP2, BEX2 and MSX1may be related to the tumorigenesis and progress of EC.