A prognostic risk model constructed by proteins in predicting prognosis for ovarian cancer

Backgrounds Ovarian cancer is the most lethal malignant tumor in gynecological cancers worldwide. Approximately 70% of patients have a poor prognosis, who experienced progression or recurrence within 5 years. The aim of this study attempts is to screen out the potential prognosis-related proteins and establish a prognostic risk model for predicting the prognostic risk for patients with ovarian cancer. Method Data were obtained from the Cancer Proteome Atlas (TCPA) and the Cancer Genome Atlas (TCGA). The proteins signicantly related to survival risk in ovarian cancer patients were screened out by Kaplan-Meier test and COX regression analysis. A prognostic risk model was constructed based on the optimal proteins selected by multivariate Cox analysis. The prognostic risk model was validated in different clinical characteristics. The sankyl diagram was used to visualize the relationship between the prognosis-related proteins and their co-expression proteins. A prognostic risk model consisting of seven proteins that signicantly related to prognosis was established. Patients with high risk score were associated with poor survival and relative protein expression. In the multivariate cox regress analysis, only age and the risk score were the independence prognosis factors. The AUC for the risk score was 0.721 in ROC curve for patients under 70 years old. Pearson’s correlation analysis showed that 25 co-expression proteins correlated with the prognosis-related proteins. Our order to new proteomic data Proteome and the survival data


Introduction
Ovarian cancer is the most mortality gynecological cancers worldwide [1]. Approximately 75% of patients were diagnosed at an advanced stage and present a 5-year survival rate of 29 % [2]. Currently, the survival rate is strongly in uenced by the extent of disease and the amount of residual tumor following primary debulking surgery. However, considerable variation in outcome was observed in patients whose stage and debulking status matched, indicating that other determinants of survival are at play [3]. With the highthroughput expression data and reverse-phase protein arrays (RPPAs) platform were available, it is feasible to analyze the proteomic prediction of prognosis in order to develop new treatment strategy in helping the improvement of the overall survival for patients. In our study, we analyzed the proteomic data from the Cancer Proteome Atlas (TCPA), and the survival data of ovarian cancer patients in the Cancer Genome Atlas (TCGA) database. With the survival analysis and the cox regression analysis of proteins which associated with the prognosis, we screened out seven potential prognosis-related proteins and constructed a prognostic model to predict prognostic risk in ovarian cancer patients.

Data collection
Proteomic data was downloaded from TCPA up to September 2, 2020, an open-access bioinformatics resource. The clinical data of ovarian patients was collected from TCGA.

Data collation
Package "impute" of R software was used to ll up the missing proteomic data. Perl software was employed to extract survival time and survival status of patients and merge it with the corresponding proteomic data.

Screening of prognosis-related protein
Based on the median value of each protein, the survival analysis of Kaplan-Meier was used to compare their survival rate. The univariate Cox regression analysis was employed to compute the hazard risk of each protein. The volcano plot showed the relationship between the risk and protein and highlight the signi cant proteins whose p value in the two analyses were both less than 0.05.

Construction of a prognostic risk model with the risk score
The multivariate Cox analysis was performed to select the proteins with statistic difference for the construction of a prognostic risk model. The coe cient and their p value of each protein were investigated via package "survival" of R software. The risk score of each sample was calculated based on its expression level and coe cient. According to the median of risk score, patients were divided into low risk group or high risk group.
Validation of the performance of the prognostic risk model The Kaplan-Meier analysis was used to show the survival difference of patients in the low or high risk group in the model. The risk score values and survival time of patients in two groups were visualized via "pheatmap" package of R software. The heat map showed the expression level of prognosis-related protein between two groups. The forest map was used to verify whether the model could be a prognostic predictor as well as other clinicopathological characteristics via "survival" package of R software. The AUC of ROC curve was used to evaluate the prognostic accuracy of the risk score via "survivalROC" package.
The co-expression proteins analysis The co-expression proteins were ltered by R with a correlation coe cient which was set by > 0.6 and the P value was set by < 0.001. Then the sankyl diagram was used to visualize the relationship between the prognosis-related protein and their co-expression proteins.

Results
The proteomic landscape of ovarian cancer patients In the analysis of proteomic correlate with prognosis for epithelial ovarian cancer (EOC) patients, 587 clinical samples of level 4 were obtained from TCPA. The corresponding protein pro les of 411 patients were obtained from TCGA. We found that the 21 over-expression proteins were correlated with high risk, and the other 20 proteins over-expression predicted lower risk by Kaplan-Meier and Univariate Cox regression analysis. The volcano plot was used to display the whole proteins and highlight that signi cantly related to survival risk ( Fig. 1).

Prognosis-related protein screening
Meanwhile, the Multivariate Cox regression analysis was used to select the optimal prognosis-related protein. We nally found seven proteins signi cantly correlated with prognosis in EOC patients, such as MAPK_pT202Y204, RAB11, P38MAPK, AR, CMET_pY1235, GSK3αβ_pS21S9, NF-κB_pS536. The coe cient and Hazard Ratio were showed in Table 1. Patients with prognosis-related protein among highrisk and low-risk groups were analyzed in Kaplan-Meier survival curve ( Fig. 2A Construction of a mathematical model with the risk score We next established a proteomic prognosis model, in which the seven proteins were included. The risk score in this mathematical model represented the prognosis of patients, which was equal to 0.452*MAPK_ pT202Y204+(-0.994)*RAB11+(-1.108)*P38MAPK+(-0.349)*AR+(-1.502)*CMET_pY1235+ (-0.305)*GSK3αβ_pS21S9 + 0.184*NF-κB_pS536. The Kaplan-Meier and Wilcox test analysis displayed that patients with high risk score were associated with poor OS compared to low risk score (p = 5.129*10 − 8 ) (Fig. 3A). The risk curve demonstrated that patients with high risk score were associated with poor survival and relative protein expression ( Fig. 3B-D). Univariate and multivariate cox regress analysis were explored to analyze the independence prognosis factor of EOC patients. It was found that only age and the risk score were the independence prognosis factors (Fig. 4A-B). The AUC for the risk score in predicting prognosis value was 0.721 in ROC curve for patients under 70 years old (Fig. 5A). However, the AUC was below 0.6 in all clinical characteristic in elderly patients more than 70 years old (Fig. 5B).
The co-expression proteins of the Prognosis-related proteins Pearson's correlation analysis was performed to show that 25 co-expression proteins correlated with the prognosis-related proteins. The cutoff value was set by 0.6 for the correlation coe cient (Fig. 6). The Sankey chart was performed to visualize the relationship between the co-expression proteins and the prognosis-related proteins (Fig. 7).

Discussion
With the development of high throughput next-generation sequencing (NGS), many studies have developed novel risk model to provide opportunities to identify distinguishing patterns for cancer diagnosis or predict prognosis [4][5][6][7]. However, the limitation of NGS had restricted its application, such as high cost and poor reproducibility. Here, we demonstrated a prognostic risk model consisting of seven proteins in predicting outcome of EOC patients.
In this study, we included a total of 411 patients with their protein expression data from the TCPA platform and clinical data from the TCGA database. In the Kaplan-Meier analysis and univariate regression analysis of the protein expression pro le of these patients, we screened out 21 up-regulated and 20 down-regulated proteins that are associated with poor prognosis. Then, we performed the Multivariate COX regression analysis and screened out seven independent prognostic factors for patients.
Based on these seven proteins, a proteomic prognostic risk model of predicting the risk of patients was constructed. The coe cients of RAB11, P38MAPK, AR, CMET_pY1235, GSK3αβ_pS21S9 were less than zero, suggesting that the prognosis of patients with high expression of these proteins was better than that of patients with low expression. The coe cients of MAPK_pT202Y204, NF-κB_pS536 were greater than zero, suggesting that the prognosis of patients with high expression of these two proteins was poor than that of patients with low expression. After scoring the risk value of 411 EOC patients, the patients were divided into high risk score group and low risk score group and analyzed for Kaplan-Meier survival. Our results showed that the overall survival rate of patients in high risk score group was signi cantly worse than that in low risk score group. In the univariate and multivariate cox regress analysis, patients with high-risk score predicted a poor prognosis (HR = 2.064, 95%CI: 1.707-2.497; HR = 2.172, 95%CI: 1.805-2.613). We also performed an AUC and ROC curve to evaluate the performance of the prognostic risk model. In addition, we also screened other proteins co-expressed with the seven proteins in the prognostic risk model to nd out more related proteins. Our nding indicated that the prognostic risk model could act as an effective biomarker for ovarian cancer prognosis prediction. This model can help clinicians make an accurate decision and avoid unnecessary medications and drug side effects for lowrisk patients, while strongly recommend other patients with high-risk for treatment.
In the prognostic risk model, the Rab11 protein is an important subfamily in the Rab GTPase family that containing three members (Rab11a, Rab11b, Rab11c/Rab25) [8]. It has been proved that the physiological function of Rab11 proteins was the key regulators of intracellular membrane tra cking processes. In addition, Rab11 proteins are implicated in many pathological diseases such as cancers [9, 10], neurodegenerative diseases [11] and type 2 diabetes [12]. Emerging evidence has been shown that Rab11 combined with its interacting proteins plays critical roles in cancer. The expression level of Rab11-FIP2 was much lower than that in the corresponding adjacent normal tissues in Non-small cell lung cancer (NSCLC) and gastric cancer [13,14]. Upregulation of Rab11-FIP2 inhibited lung cancer cell growth both in vitro and in vivo [13]. Mitogen-activated protein kinases (MAPKs) is a family of widely conserved serine/threonine protein kinases involved in multiple cellular processes such as cell proliferation, differentiation, motility, and death. The MAPKs pathway has 4 main branch routes, JNK, p38MAPK, ERK1/2, and ERK5. p38MAPK (p38 mitogen-activated protein kinase) is a member of MAPK signal pathway that related to the initiation of apoptosis, cell cycle arrest, and immunity regulation [15,16]. It is found in a variety of tumors, including cervical cancer, ovarian cancer, liver cancer, and lymphoma. The function of p38MAPK is cell speci city. It has different effects on different tumor cells, and even plays completely opposite effects. There was evidence demonstrating that the phosphorylation of p38 was signi cantly upregulated in gastric cancer cells when treated with alantolactone, that promoting apoptosis and mitigating migration and invasion [17]. Likewise, the p38MAPK pathway regulates the apoptosis in osteosarcoma cells [18,19] and esophageal cancer cells [20]. In osteosarcoma cell MG63, apoptosis was signi cantly induced by increasing the phosphorylation of p38MAPK and decreasing the phospho-ERK1/2 (MAPK_pT202Y204) [19]. In ovarian cancer, the activation of p38MAPK enables drugresistant ovarian cancer cells to overcome its resistance to paclitaxel [21]. However, the opposite conclusion also exists. It was reported that the p38MAPK family isoform p38α, signi cantly correlated with the severity of disease and poor outcome in patients with ovarian cancer [22]. AR (Androgen receptor) is a member of the nuclear receptor superfamily, and its structure is similar to estrogen receptor, progesterone receptor, glucocorticoid receptor and thyroid hormone receptor. The AR gene is located on the X chromosome (Xq11-12) and consists of 8 exons, which encode an 11 kDa protein. AR was founded in many malignant tumor including prostate, bladder, kidney, lung, breast and ovary [23,24]. It was reported that the expression rate of ARs in primary ovarian cancer samples is 43.7%, but highest in serous cancer (47.5%) [25]. Nevertheless, there was study reported that AR negativity was associated with high grade carcinoma and poor survival [26]. The report by Nodin et al. revealed that AR expression correlated with a prolonged disease-speci c survival in the serous subtype of ovarian cancer [27]. Matins et al. had demonstrated that low AR expression was associated with shorter overall survival [28]. These result suggested that increased AR expression tends to predict the better prognosis of epithelial ovarian cancer, especially ovarian serous cancer. c-Met is the tyrosine kinase receptor of hepatocyte growth factor (HGF), which is encoded by the proto-oncogene MET gene located at position chromosome 7q31.2. The interaction of c-Met with HGF can cause auto-phosphorylation of multiple tyrosines, which in turn recruits multiple downstream signal transduction components, including Gab1, c-Cbl and PI3 kinase.  [29]. In our study, high expression of phosphorylated c-Met (Tyr1234/1235) show a better survival in epithelial ovarian cancer (p = 0.030). On the contrary, the study on ovarian clear cell carcinoma patients had demonstrated that the c-MET overexpression and copy number alterations were associated with higher tumor grades [30]. GSK-3 (Glycogen synthase kinase-3) is a widely expressed serine/threonine protein kinase that can phosphorylate glycogen synthase and inhibit its activity. It exists in two different isoforms, namely GSK3α and GSK3β. GSK-3 is an important downstream component of the PI3K/Akt cell survival pathway, and its activity can be inhibited by Akt-mediated phosphorylation of GSK-3α Ser21 and GSK-3β Ser9. The role of GSK3 in tumor formation and promotion is controversial [21,31,32]. Because it has a variety of effects on many cellular processes and pathways, and it's role may be different in different cell types.
Nuclear transcription factor κB (NF-κB) is a protein complex of transcription factors, consisting of RelA (p65), RelB, c-Rel, NF-κB1 (p105/p50), and NF-κB2 (p100/p52), which plays an important role in in ammation and immune response. The study had reported that phospho-NF-κB p65 (Ser536) expression in muscle-invasive bladder cancer may serve as a reliable prognosticator [33]. Phospho-NF-κB positivity was strongly associated with higher risks for disease progression and cancer-speci c survival but not NF-κB. Likewise, the survival analysis in this study indicated that the over-expression of phospho-NF-κB p65 (Ser536) correlated with poor prognosis.
However, there was limitation of this study. The proteomic signature is constructed based on public data sets, and still needs to be further veri cation in laboratory or clinical trials. Furthermore, the constructed model requires external veri cation set, since the expression level of each patient is different, which may affect the reliability of the nal model.

Conclusion
In conclusion, this study demonstrated that a novel signature constructed by proteins could predict prognosis for EOC patients. If it is veri ed prospectively in a larger cohort, the combination could identify EOC patients with poor prognosis after standard treatment, and may bene t from in-depth follow-up, maintenance treatment or inclusion in treatment research. Therefore, further clinical evidence is essential to verify the prognostic value of the risk score model.  Figure 1 Proteins signi cantly related to survival risk were showed by volcano plots. Red dots represented high-risk proteins, green dots represented low-risk proteins, and black dots represented no statistical signi cance.   The forest map showed that the prognostic risk model was an independence prognosis factor for overall survival. (A) The univariate cox regress analysis. (B) The multivariate cox regress analysis.