A Valuable Plasma Membrane Protein Prognostic Model in Clear Cell Renal Cell Carcinoma

Background: Renal cell cacinoma (RCC) accounts for 3% of human cancers, and clear cell renal cell carcinoma (ccRCC) is the most common pathological type of RCC. Cell surface proteins have been shown to play an important role in the occurrence and progression of various cancers. In this study, we focused on plasma membrane proteins (PMPs), to explore their potential value in ccRCC. Methods: The PMPs expression proles and ccRCC patients’ clinical information were downloaded from TCGA database. Through a series of bioinformatic methods, we established a plasma membrane proteins prognostic model and verify its value in multiple ways. Results: Multivariate cox regression analysis and area under receiver operating characteristic curve indicated that this model was an effective independent predictor of ccRCC clinical outcomes. It has good prognostic value in different groups of clinical features. Combined with other two clinical characteristics, a nomogram was constructed to predict patient survival at 1, 3, and 5 years. Conclusions: Our study is the rst to explore the prognostic value of plasma membrane proteins in clear cell renal cell carcinoma. We hope our work could provide a new viewpoint for ccRCC prognosis and drawn people’s attention to plasma membrane proteins in clear cell renal cell carcinoma.


Background
Renal cell carcinoma represents around 3% of all cancers, with the highest incidence occurring in Western countries [1]. Generally, during the last two decades until recently, there has been an annual increase of about 2% in incidence both worldwide and in Europe leading to approximately 99,200 new RCC cases and 39,100 kidney cancer-related deaths within the European Union in 2018 [2]. With the development of human society, the problems brought by tumor are more and more prominent. Therefore, an in-depth study of valuable prognostic tools for clinical decisions is vital currently. Proteomic analysis of tumor tissue samples and recognition of potential protein biomarkers in serum or plasma is an evolving eld.
Direct analysis of proteins has several advantages over indirect analysis, such as transcriptome analysis, although it requires more tissue and takes more time.
In recent years, cell surface proteins have come into focus because they are readily available and have the potential to become new drug targets. Plasma membrane proteins (PMP) account for about 50% of the cell membrane weight and their functions are complex and varied [3]. Researches show that plasma membrane proteins mediate or initiate phenotypic changes associated with malignant transformation, such as cell proliferation, adhesion, and migration [4][5]. HER-2, a highly expressed receptor protein exist in many types of cancer, can promote the proliferation and invasion of tumor cells when activated [6]. Some PMPs are differentially expressed between tumor and normal tissue, which may be potential therapeutic targets or biomarkers. After analyzed VHL-associated changes in plasma membrane proteins, Aggelis V et al identi ed 19 differentially expressed proteins, which found to be potential biomarkers for ccRCC [7]. These researches show that the PMP disorder may be closely related to the occurrence and development of cancer. However, as far as we know, large-scale gene expression signature has rarely been used to investigate the association between PMP and ccRCC. A more comprehensive understanding of the effects of PMP on tumors could help in the clinical diagnosis of renal cancer and even provide a new, precise direction for treatment.
In this study, we try to clarify the possible role of PMP played in ccRCC and explore their potential value in prognosis as well as targeted therapy. The PMP expression pro les and patients' clinical information of ccRCC were downloaded from TCGA database. Then, we identi ed differentially expressed PMPs through computational methods. A number of bioinformatic analysis were used to study underlying regulatory mechanisms. What's more, a valuable prognostic model was built to predict ccRCC patient's overall survival, as well as provide a new viewpoint for precise therapy of ccRCC. Based on the requirement to the data integrality, patients that met the following criteria were excluded from subsequent analysis: (1) patients with survival time less than 30 days, (2) insu cient information of stage, grade, age and gender. Finally, 482 tumor samples which were from different individuals and 68 paracancerous samples were selected from the training set in this study.
Meanwhile, three microarray datasets were downloaded from GEO and ArrayExpress database (GSE29609, GSE22541, E-MTAB-3267) which includes 116 KIRC patients with corresponding clinical information for external validation as testing set. The "sva" R package was used to eliminate the batch effect.
The plasma membrane protein list was obtained from The Human Protein Atlas database (http://www.proteinatlas.org/). Since the data were downloaded from public database, the ethical approval is not required.

Differential gene analysis
To identify the differentially expressed genes (DEGs) and differentially expressed PMPs (DEPMPs), "limma" R package was used to normalized expression matrix, then make comparisons between tumor and para cancerous tissues. DEGs and DEPMPs were identi ed using the threshold of a log |fold change| > 1 and an p value < 0.05. We extracted DEPMPs from all DEGs and used GO and KEGG pathway enrichment analysis to investigate DEPMPs' molecular function.

Survival related DEPMPs' molecular characteristics
We screened survival related DEPMPs through univariate cox proportional hazards regression analysis. In order to explore the clinical values of those survival related DEPMPs comprehensively, some public databases were used. The protein-protein interaction (PPI) networks was constructed by submitting gene list to the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (https://stringdb.org/). Since transcription factors play an important role in the initiation of gene expression, we also built a transcription factor regulation network through Cistrome Cancer database (http://cistrome.org/CistromeCancer/), which contains over 23,000 ChIP-seq and chromatin accessibility pro les from human and mouse genomes. It provides 318 TFs' binding information [8]. The interaction between those DEPMPs and corresponding TFs was constructed by cytoscape software.

Plasma membrane protein prognostic model (PMPPM)
Then, we used these survival related PMPs to construct a prognostic model according to multivariate cox proportional hazards regression analysis. After multiplied the expression level of PMPs involved in the model by their Cox regression coe cients, we obtained each patients' Risk Score. The median of Risk Score was regarded as a cutoff to divided patients into high risk and low risk groups, survival analysis was assessed by Kaplan-Meier (K-M) methods. In addition, the predictive value of this model was evaluated by areas under the curve (AUC) of the receiver-operator characteristic (ROC) curve using package "survivalROC" in R and multivariate cox proportional hazards regression analysis.
We further explored the relationship between the PMPPM and clinical characteristics included age, gender, grade, TNM stage, T stage and metastasis. The characteristics of each PMP involved in this model were found in UALCAN database (http://ualcan.path.uab.edu/index.html). Besides, we obtained copy number variations information from Cbioportal database (http://www.cbioportal.org/) [9]. Finally, a nomogram was established to investigate patients 1-year, 2-year, and 3-year overall survival.

Statistical analysis
Heatmap of DEGs and DEPMPs were plotted using "pheatmap" R package with zero-mean normalization.
Two groups of boxplots were analyzed using Wilcoxon-test. "clusterPro ler" R package was used to build GO and KEGG pathway enrichment analysis. We calculated area under the ROC curve through the "survival ROC" R package. For Kaplan-Meier curves, p-values and hazard ratio (HR) with 95% con dence interval (CI) were generated by log-rank tests and univariate Cox proportional hazards regression. All analytical methods above and R packages were performed using R software version 3.6.1 (The R Foundation for Statistical Computing, 2019). All statistical tests were two-sided. P-value <0.05 was considered as statistically signi cant.

Identify for DEPMPs
The patients' clinical information is shown in Table1.A total of 7369 DEGs were screened, which included 5467 up regulated and 1902 down regulated genes (Fig 1A). 98 up regulated and 61 down regulated DEPMPs were found in these DEGs (Fig 1B). GO enrichment analysis indicate that these DEPMPs were mainly enriched in actin lament organization. For CC, they were enriched in membrane region. Besides, they were mostly enriched in cell adhesion molecule binding in MF categories (Fig 2A). KEGG pathway enrichment analysis show that DEPMPs were mainly enriched in actin lament organization as well as regulation of actin lament organization (Fig 2B).
3.2 Characteristics of survival related PMPs 45 survival related DEPMPs (HR>1) and 56 survival related DEPMPs (HR<1) were screened out by univariate cox regression analysis. PPI network shows that those proteins interact with each other in ccRCC ( Fig S1). In addition, 318 transcription factors (TFs) expression pro les were examined and 60 of them were differentially expressed in ccRCC and normal tissues (Fig S2A). Then we established a network with 101 survival related DEPMPs and these 60 TFs. The cut-off values are correlation scores > 0.4 and P-value < 0.01. The regulation diagram clearly illustrates the regulation relationship between TFs and these PMPs. (Fig S2B).

Construct and analyze PMP prognostic model
We constructed a prognostic model according to multivariate Cox regression analysis' results (Table2). ccRCC patients were divided into two groups with different clinical outcomes (Fig 3). outcomes, the PMPPM may be a signi cant tool for differentiating ccRCC patients (Fig 4A). The AUC value was 0.758, indicate that the model has certain potential in survival monitoring (Fig 4B). After adjusting for age, gender, tumor grade, tumor stage, tumor size and distant metastasis status and other parameters, the multivariate cox regression analysis shows the PMPPM was an independent predict factor (Fig 4C, 4D). The Risk Score was signi cantly higher in advanced grade patients, advanced stage patients, distant metastasis patients ( Fig S3). As to the nine gene themselves, we have shown here a box plot of the differences in the expression of these genes in tumor versus normal tissue (Fig S4A). Besides, we identi ed that, in mutation, Ampli cation was the most common type and CYFIP2 is the most frequently mutated gene. (Fig S4B). In addition, we get their protein and pan-cancer mRNA expression levels and survival analysis from UALCAN database as a supplement (Fig S5-7).

Validation of PMP prognostic model
After using R package, sva, to eliminate the batch effect, we used the RNA-sequencing data from GEO and ArrayExpress database to validate the PMP prognostic model. The risk score of every patient in testing set was calculated out as above, and the patients were divided into high-risk and low-risk groups based on the median risk score of training set. It turned out that the high-risk group also had visibly worse prognosis than the low-risk one (Fig 5A). Besides, the AUC of the ROC for risk score was 0.741, meaning it performed well in assessing and predicting the prognosis of patients with ccRCC ( Fig 5B). In different groups of clinical characteristics, the low-risk group also had visibly better prognosis than the high-risk one (Fig 6,7). Taken together, we believe that the PMP prognostic model we constructed had certain e ciency and credibly clinical application value.

Predictive nomogram
All independent prognostic factors identi ed by multivariate Cox regression analysis included the PMPPM were used to establish a nomogram to predict patients' overall survival at 1, 3, and 5 years (Fig8).

Discussion
Protein is the material basis of life, the basic organic matter that constitutes cells, and the main undertaker of life activities. Membrane proteins play an important role in many life activities of organisms, such as cell proliferation and differentiation, energy conversion, signal transduction and material transport. It is estimated that about 60% of drug targets are membrane proteins. Abnormal membrane protein expression causes a variety of diseases including cancer. In recent years, research on the structure and function of membrane proteins has become a hot topic [10]. However, the exact mechanism or the role of plasma membrane proteins in renal cell carcinoma still unclear. In this study, we downloaded a large number of ccRCC data from TCGA, which helped to obtain a comprehensive analysis of plasma membrane proteins in ccRCC patients. After compared gene expression between ccRCC and normal patients' tissues, we identi ed 159 DEPMPs. GO and KEGG pathway enrichment analysis shows that these PMPs were mainly enriched in actin lament organization. Studies have found that actin lament organization plays a role in a variety of tumors such as prostate cancer, head and neck cancers and melanomas [11]. However, there is no reports about actin lament organization in renal cancer, further experimental exploration is needed in the future.
101 survival related PMPs were screened out by univariate Cox regression analysis. With the help of online websites, we could learn more about the molecular characteristics and internal or external relationships of those survival related PMPs. First, the protein-protein network shows that those PMPs interact closely with each other. Then, the TF-PMP network shows that transcription factors FOXM1, NCAPG, CENPA, MYBL2, EOMES, IRF4, IKZF1 and BATF are closely related to these PMPs. Some previously studies have shown those TFs were connected with occurrence and progression of ccRCC [12][13][14][15][16][17][18]. Base on the above ndings, we have good reason to believe that those PMPs play a signi cant role in ccRCC as a whole.
In order to explore whether these survival related PMPs have prognostic value in ccRCC, we constructed a prognostic model according to multivariate cox regression analysis. Survival and ROC analysis indicated that the prognostic model shows considerable value of prognostic prediction. The positive results were also con rmed by external data. What's more, we did a comprehensive analysis of the relationship between the model and clinical parameters. The risk score was higher for advanced grade, stage, and distant metastasis patients. In addition, we also found these nine PMPs involved in this model were closely related to tumor grade, stage and distant metastasis respectively.
With the help of an online database, we explored these PMPs further. We found that mutations are common in these genes and CYFIP2 was the most frequently mutated gene. CYFIP2, cytoplasmic FMR1interacting protein 2, was reported to be a candidate p53 target gene. CYFIP2-induced apoptosis is part of a coordinated p53-dependent response in cancer cells [20]. Nevertheless, studies on CYFIP2 in kidney cancer are rare. The survival analysis of CYFIP2 in different subgroups include tumor grade, race, and gender show that CYFIP2 was closely related to the overall survival of ccRCC patients. Due to the high frequency of CYFIP2 mutation, we think it is necessary to pay more attention to its certain mechanism in ccRCC. Current research shows that the remaining eight PMPs are all closely related to cancer too. ULBP2 expression was reported to increase in ovarian cancer cells and high expression of ULBP2 is an indicator of poor prognosis in ovarian cancer [21]. EPB41L4A is a target gene for the Wnt/β-catenin pathway. High expression of EPB41L4A indicates good survival in multiple myeloma [22]. Gene fusions are frequent early genomic rearrangements in prostate cancer. The unique in-frame MPP5-FAM71D fusion product is important for proliferation of prostate cancer cells [23]. CASP4, a member of the in ammatory caspases, has been shown to promote the proliferation of many kinds of cancer cells [24]. Xie et al found a novel SNP of ARHGEF12 that may involve ARHGEF12-RhoA-p38 signaling in erythroid regeneration in ALL patients after chemotherapy [25]. Recent advances have demonstrated that kinetochore-associated proteins are upregulated and serve signi cant roles in the carcinogenesis of numerous types of cancer.
KNTC1 may have an essential role in mediating cell viability and apoptosis in human ESCC cells and may serve as a novel therapeutic target for esophageal squamous cell carcinoma [26]. Ubiquitin-conjugating enzyme E2S (UBE2S) knockdown suppressed the malignant characteristics of breast cancer cells, such as migration, invasion, and anchorage-independent growth [27]. In this study, we found that they were associated with overall survival in patients with ccRCC. However, up to now, existing research cannot fully explain the certain mechanism of those genes. Thus, more large sample prospective studies and basic experiment further de ned the relation between kidney cancer and plasma membrane protein is needed.
It should to be noted that some limitations exist in this study. First, at present, the molecular mechanisms behind key PMPs are still covered, our study need to be validated by more experiment. Second, other researches may draw different results due to different experimental variations and statistical methods.
Beyond these limitations, this study focused on potential molecular mechanisms and clinical signi cance of PMP. we hope this prognostic model could inspire medical scientists on ccRCC prognosis and precise therapy.       The nomogram plot was built based on three independent prognostic factors in ccRCC.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.