Establishment of a prognostic model based on immune inltration-related genes and clinical information in ovarian cancer

Background Ovarian cancer is one of the most lethal gynecological cancers,and is in the top ve cancer types associated with death in women. Immune inltration of ovarian cancer is a critical factor in determining patient's prognosis. In addition to immune inltration, key mutations also have a greater impact on the development and prognosis of OV. Methods Using data from TCGA and GTEx database combined with WGCNA and ESTIMATE methods, genes related to OV occurrence and immune inltration were identied. Lasso and multivariate Cox regression were applied to dene a prognostic score (IGCI score) based on immune genes and clinical information. The IGCI score has been veried by K-M curves, ROC curves, C-index on test set. Based on mutation data from TCGA, we identied key mutations related to immune inltration by Chi-squared tests and we also did survival analysis of these mutations. Results In this section, we found 46 genes related to disease occurrence and immune inltration, IGCI score was established based on six factors: Age, White, Pharmaceutical, FGF7, CCR1 and CD14. In test set, IGCI score (C-index = 0.630) is signicantly better than AJCC stage (C-index = 0.541, P < 0.05) and CIN25 (C-index = 0.571, P < 0.05). Chi-squared tests revealed that 6 mutations are signicantly (P < 0.05) related to immune inltration : BRCA1, ZNF462, VWF, RBAK, RB1, and ADGRV1. According to mutation survival analysis, we found another 5 key mutations signicantly related to patient prognosis (P < 0.05): CSMD3, FLG2, HMCN1, TOP2A, TRRAP. RB1 and CSMD3 mutations had small p-value (P < 0.1) in both Chi-squared tests and survival analysis. Then we conducted a drug sensitivity analysis of key mutation and found when RB1 mutation occurs, the ecacy of six anti-tumor drugs has changed signicantly (P < 0.05). Conclusion Based on is an algorithm designed by Yoshihara et.al to calculate an immune score based on differential gene signatures between stromal cells and tumor cells 9 . This tool has been used to identify microenvironment-related markers in several solid tumors 10–12 immune and found adequate forcarcinogenesis 47 , and in addition, RB1 may promote chemotherapy resistance 48 . Based on comprehensive studies on the BRCA state and DNA repair, a model has been proposed to illustrate the relationship between P53 and RB1, as well as homologous repair deciency 49 . RBAK was computationally predicted as a down-stream target of miR-155 in lymphoma 50 , although the function of RBAK in solid tumors remains poorly understood. Our analysis revealed differential regulation of RBAK and RB1 in patients, which implies the importance of interactions between these two genes in ovarian cancer and their potential as drug targets. Further studies are required to conrm the involvement of RBAK in this process following its interaction with RB1. expression is related with immune scores and patient survival, suggesting vWF as a novel biomarker of the basic immune state and predicts long-term survival in cancer patients. For the rst time, we report the immune-related mutations of ZNF462, ADGRV1 and FLG2 in ovarian cancer. As a zinc-nger protein, ZNF462 may take part in embryonic development and is associated with neurodevelopmental abnormalities 58 . ZNF462 may be the target of miR-210 59 , which could be induced by hypoxia-inducible factor-1alpha in pancreatic cancer. ADGRV1 is one of the biallelic pathogenic identication markers of Usher syndrome 60 , which is characterized by congenital bilateral sensorineural hearing loss 61 . The deciency of FLG2 causes defective adhesion between cornied cells, which leads to peeling skin syndrome 62 . However, the roles of these two genes in cancer remain unclear, and further studies are required.


Background
Ovarian cancer is one of the most lethal gynecological cancers, with an overall 5-year-survival rate of 48%, and is in the top ve cancer types associated with death in women 1 . Due to the unobtrusive nature of the symptoms, nearly 75% of patients present at an advanced stage, leading to a 5-year-survival rate of only 29% in the advanced stage. The rst-line therapy is tumor-debulking surgery followed by platinum-based chemotherapy; however, recurrence occurs in nearly 75% of patients 2 . Moreover, there is no con rmed effective drug for patients following platinum-resistant recurrence. Although recent clinical trials have shown that poly ADP-ribose polymerase (PARP) inhibitors extend progression free survival (PFS) of patients with the BRCA1/2 mutation 3,4 , or homogenous repair de ciency (HRD) 5 , or platinum-sensitive recurrence regardless of BRCA mutation 6 , the overall survival (OS) rate is still low.
Immunotherapy is highlighted in the search for new strategies for maintenance therapy in OV. Nonetheless, both active immunotherapy (e.g. ovarian cancer vaccine) and positive immunotherapy (e.g. adoptive T cell therapy) have been studied, although the response rates are not ideal 7 . To identify novel drug targets and select patients that respond effectively to immunotherapy, it is essential to establish methods to predict the basic immune status of patients. The rst step in this process is to analyze differentially expressed genes (DEGs) and immune-related pathways associated with prognosis. Due to the variety of gene clusters identi ed in different types of statistical analysis, it is necessary to select an appropriate scoring strategy in relation to tumor immunity. The tumor microenvironment, which consists of immune cells, tumor cells and other components 8 , is a promising target for the exploration of novel biomarkers.

ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data) is an algorithm designed by Yoshihara et.al to
calculate an immune score based on differential gene signatures between stromal cells and tumor cells 9 . This tool has been used to identify microenvironment-related markers in several solid tumors [10][11][12] .
In the present study, we combined the ESTIMATE algorithm with other bioinformatic analysis tools to build an immuno-microenvironment-related prognosis model (IGCI score) for OV. Two mutations RB1, CSMD3, which are closely related to immune invasion and prognosis were identi ed in OV, and found that in patients with RB1 mutation, the sensitivity of some drugs has changed.

Materials And Methods
Data Processing "Fragments per kilobase of exon model per million reads mapped" (FPKM) standardized RNA sequencing data (including 379 ovarian cancer samples) and single-nucleotide polymorphism (SNP) data (including 436 ovarian cancer samples) were obtained from TCGA (The Cancer Genome Atlas, https://portal.gdc. cancer.gov) database. Due to the lack of normal tissue samples, we downloaded the RNA sequencing data of 88 normal ovarian tissues from GTEx 13 (The Genotype-Tissue Expression, http://commonfund.nih.gov/GTEx/) database and performed FPKM standardization. When combining sequencing data obtained from GTEx database and TCGA, the data was batch corrected using "limma" package in R. For the results of multiple measurements of the same gene, the mean value was used to re ect the nal gene expression level.
Screening of differentially expressed genes The R package "limma" was used to screen DEGs between cancerous tissues and normal tissues. Genes with expression was less than 0.3 in all samples were removed and then preformed DEGs screening. The screening process for DEGs used the Wilcoxon signed rank test 14 . Genes with a false discovery rate (FDR) < 0.05 and | log 2 fold change |>1 were identi ed as DEGs.
Weighted gene correlation network analysis (WGNCA) of DEGs that the gene regulatory network in the organism obeys the basic structure of the scale-free network. To achieve this goal, the correlation coe cients between genes were weighted as follows: = | cor(i, j)| A topological overlap matrix (TOM) was then constructed and the distance between genes was de ned by considering other genes related to these two genes.
Dynamic clustering methods were used to determine the nal gene modules. Genes clustering within the same module often have similar functions. Correlation analysis was performed between the rst principal component of the gene modules and the tumor phenotypes (for discrete variables, 0 represents no occurrence and 1 represents occurrence), and we obtained the gene modules closely related to the occurrence of cancer. The parameters in this process were: MaxBlocksize = 7,000, deepSplit = 2, minModuleSize = 40, and mergeCutHeight = 0.30. To obtain the key genes in the key modules, we obtained the

DEGs related to immune in ltration
The ESTIMATE algorithm 9 is a method of gene set analysis to evaluate the purity of tumor tissue. The ESTIMATE algorithm rst performs whole-genome sequencing data on known immune cells and tumor cells, and then performs screening of DEGs. Such DEGs are selected as the background. After that, other tumor tissue sequencing data can be analyzed by GSEA 17 in this genetic background, and the score based on the degree of enrichment (ImmuneScore) can be used to evaluate the immune cell content in this tumor tissue. The StromalScore is calculated via a similar process. In this study, the ESTIMATE algorithm was used to calculate the StromalScore and ImmuneScore values in all tumor samples to clarify the degree of immune in ltration in samples. According to the median ImmuneScore, tumor samples were divided into high-and low-score groups. DEGs related to ImmuneScore between low-score and high-score group were identi ed using the criteria: FDR < 0.05 and | log2 fold change |>1; DEGs related to StromalScore were identi ed in the same way. The nal DEGs identi ed from the intersection of these two groups of DEGs were considered to be key genes related to immunity during the development of ovarian cancer.
These genes and key genes from WGCNA were used in the construction of subsequent patient prognosis models. The ESTIMATE algorithm was applied using the "estimate" package in R.

Survival analysis and SNP analysis
The genes located at the intersection of the DEGs obtained by WGCNA and the ESTIMATE were regarded as being closely related to disease development and immune processes. We use patient clinical information and DEGs to build a prognostic model. We excluded samples that lacked clinical or gene expression information in the TCGA database, and ultimately obtained 258 patient data. We group patients according to 1: 1 ratio randomly. The training data set has 130 samples and the test data set has 128 samples. We then built a prognostic model using the training set. Lasso regression 18 was used to eliminate collinearity between different factors, with 10-fold cross-validation performed 1,000 times. The penalty coe cient lambda selection criterion was used to obtain the smallest Partial Likelihood Deviance. Multivariate Cox proportional hazards regression analysis 19 was then used to build an effective prognostic model. The variable selection method is the forward-backward selection method. By adding weight to factors, we obtained the risk score for each patient according to the following formula: where β is the coe cient of the factor in the Cox regression model and Valuei is the factor level. We divided patients into high-and low-risk groups according to the median risk score in the training set, and plotted the Kaplan-Meier survival curves using a log-rank test in the training and test set. In addition, we predicted the survival of patients 1, 3, and 5 years after the onset of disease, and plotted receiver operating characteristic (ROC) curves. In order to make our prognostic model more practical, we have established the corresponding nomogram, and calculated the corresponding C index and calibration curves in the training and test data sets. We compared the prognostic results of AJCC stage and CIN25 20 with our results to demonstrate the rationality of our model. The "glmnet", "survival", "survminer", "caret", "survivalROC", "rms" and "GenVisR" packages in R were used in these analyses.
Then, we performed a survival analysis in relation to the presence or absence of SNPs. After excluding patients lacking SNP data, mRNA data or clinical data, data of 272 patients from the TCGA database were analyzed. We performed survival analysis on all genes with mutations identi ed in at least 15 patients using log-rank tests 21 . In addition, we also performed Chi-squared tests on the SNPs and ImmuneScore groups of patients to identify the key gene mutation related to the immune process of ovarian cancer. The mutations of no less than 5 patients were included in the Chi-squared test.
The SNPs with P value of less than 0.1 in both survival analysis and SNP analysis can be regarded as critical SNPs in ovarian cancer. We used the Genomics of Drug Sensitivity in Cancer (GDSC, https://www.cancerrxgene.org/) database for drug sensitivity analysis of key SNPs.

Functional and pathway enrichment analysis
Gene ontology (GO) analysis 22 is used to identify the GO terms of enriched genes when the background of the genes and the species being studied are clear.
In the absence of enrichment results for this group of genes, they should conform to the hypergeometric distribution. Kyoto Encyclopedia of Genes and . For the genes of interest, we also performed enrichment analysis in the KEGG database to identify the key gene regulatory pathways. We focused on terms that were signi cantly enriched in GO and KEGG, and ranked P-values from small to large. In the GO analysis, we identi ed the top ve gene terms (P<0.05). In the KEGG analysis, we identi ed the top three pathways (all P<0.05). The "clusterPro ler", "org.HS.eg.db", "GOplot", and "digest" packages in R were used in these analyses.
The overall work ow of this study is shown in Figure 1A.

Results
Identi cation of DEGs in ovarian cancer process and immune invasion process We aimed to identify DEGs that are closely related to the occurrence of ovarian cancer and the immune in ltration process. Compared with 88 normal samples, 2,908 genes were up-regulated and 3,162 genes were down-regulated in 379 tumor samples ( Supplementary Fig. 1A). These 6070 (2908 + 3162) genes could be regarded as closely related to the occurrence of OV, and were used in subsequent WGCNA analysis.
It is generally believed that the degree of tumor immune in ltration is closely related to the content of stromal cells and immune cells in tumor samples. Compared with the low StromalScore group, a total of 734 genes were up-regulated and 398 genes were down-regulated in the high StromalScore group ( Supplementary Fig. 1B). Compared with the low ImmuneScore group, 629 genes were up-regulated and 520 genes were down-regulated genes in the high ImmuneScore group (Supplementary Fig. 1C). From the intersection of StromalScore and ImmuneScore, the up-(420 genes, Fig. 1B) and down-regulated genes (263 genes, Fig. 1C) were identi ed, these genes can be regarded as genes that play a key role in the progress of tumor immune in ltration.

Results of DEGs GO and KEGG gene enrichment analysis
To verify the method of grouping according to the scores assigned by the ESTIMATE algorithm was indeed applicable to our investigations, and achieve a better understanding of the roles of the identi ed 683 DEGs (420 + 263) in ovarian cancer, GO and KEGG pathway enrichment analysis was performed. The top ve GO terms were: T cell activation; regulation of lymphocyte activation; leukocyte cell-cell adhesion; lymphocyte differentiation; and regulation of T cell activation. These terms showed that the DEGs obtained according to the ESTIMATE algorithm are closely related to the immune process in tumor tissues and con rmed the effectiveness of the StromalScore and ImmuneScore ( Fig. 2A,2B). It can be observed that many genes are involved in these ve GO terms. CCL19, ZNF683, PLA2G2D, and CD2 genes not only had the largest fold-change in expression, but were also in all of the top ve GO terms, indicating that these genes may be more critical in the immune process. The top three KEGG terms were: viral protein interaction with cytokine and cytokine receptor between viral proteins and cytokines as the basis of viral infection and pathogenicity; cytokine-cytokine receptor interaction; and chemokine signaling pathway (Fig. 2C, D). About 15% of human cancers can be attributed to virus infection 23 . In addition to their association with tumor metastasis and in ammation, chemokines are also closely related to regulation of the immune system. Chemokines not only affect the migration and differentiation of lymphocytes 24 , but also are closely related to the maturation, differentiation and functional effects of T and B lymphocytes 25 . PF4, CXCL9, CXCL13, and CCL19 were also enriched in all of the top three KEGG pathways and had the largest fold-changes on expression.

WGCNA of DEGs related to ovarian cancer
To make the connectivity of the gene regulatory network obey the power law distribution, we exponentially weighted the correlation coe cients of genes. A soft threshold (weight) of beta = 8 better meets the requirements of scale-free networks (Fig. 3A). We obtained 14 gene modules through dynamic clustering and then performed a correlation analysis between the gene modules and the occurrence of tumors (Fig. 3B). Based on previous reports 26 , we assumed that when the correlation coe cient > 0.65, the module was the key gene module in the process of disease, and the hub genes were selected from these modules.
As shown in Fig. 3B, the red, blue, turquoise, black, green, purple, pink, grown, magenta, and yellow modules had a coe cient > 0.65 and were included in the subsequent analysis. In addition, the correlation analysis between modules (Fig. 3C) showed that the similarity among red, blue, and turquoise gene modules was high; the similarity among the brown, magenta, and yellow gene modules was high; and the similarity among the tan, black and pink gene modules was high. The scatter plot of gene importance is shown in Fig. 3D and Supplementary Fig. 1D-1F. A total of 2,526 genes were obtained by selecting key genes in the upper quartile of the horizontal and vertical coordinates of these gene modules 27 . A total of 46 genes were obtained from the intersection of these genes with the DEGs obtained using the ESITIMATE algorithm (Fig. 3E). These genes were regarded as related both to the occurrence of ovarian cancer and the degree of immune in ltration.
The establishment of the IGCI score The degree of immune in ltration is often closely related to the tumor recurrence and the amount of tumor stem cells. The tumor recurrence is related to the patient's nal survival status. Therefore, 46 genes from the intersection of the key genes of WGCNA and the DEGs by ESTIMATE algorithm were used and then Lasso regression was performed on these genes and 7 clinical factors (Age; Asian,1 means yes, 0 means not; Black,1 means yes, 0 means not; White,1 means yes, 0 means not; Stage; Pharmaceutical.Therapy, 1 means yes, 0 means not; Radiation.Therapy, 1 means yes, 0 means not) in the training set. The summary of the patient's clinical information is shown in Table 1. To prove our random grouping of patients is reasonable, we used the t-test for continuous variables and the chi-square test for discrete variables to compared the clinical information of the training set and the test set. As shown in Table 1, all clinical information has no signi cant difference between the training set and the test set (P > 0.05), indicating this grouping can be used in subsequent studies. In lasso regression, by minimizing partial likelihood deviance, we selected the penalty coe cient lambda (Supplementary Fig. 2A, 2B); the remaining 10 factors were included in the subsequent study. The levels of these 10 factors in the training set are shown in Supplementary Table 1. Through Multivariate Cox proportional hazards regression analysis, we constructed a prognostic score based on immune genes and clinical information (IGCI) for patients with ovarian cancer: The high level of Age, FGF7 and CD14 was found to be disadvantageous for patients' prognosis, while high level of White, Pharmaceutical.Therap, and CCR1 was favorable factor for patients' prognosis. We calculated the hazard radio (HR) of each gene and the corresponding 95% con dence interval, as shown in Figure 4A. More details of the multivariate Cox proportional hazards regression are shown in Supplementary Table 2.
The patient groups were subdivided according to median IGCI score in the training set. As the IGCI score gradually increased, the survival time decreased, and the proportion of patients' deaths gradually increased in both training and test set (Supplementary Figure 2C,2D), which indicated the accuracy of IGCI score.
In addition, analysis of patient survival status based on IGCI score groups showed signi cant differences ( Figure 4B,4C). The P-value in the training set is less than 0.001 ( Figure 4B), and the P-value in the test set is less than 0.05 ( Figure 4C). In the training set, the 1-  Figure 4I).The above results further verify that the genes in the IGCI score are closely related to the immune in ltration process of OV, and may be effective prognostic markers.
In order to explore the content of prognostic genes at the protein level, we used data on immunohistochemistry (IHC) datasets (The Human Protein Atlas database, http://www.proteinatlas.org/) to explore the content of FGF7, CCR1, and CD14 in protein levels in ovarian cancer and control groups. The results are shown in Figure 5A,5B. The corresponding sample information is shown in the Supplementary Table3. The IHC database lacks the corresponding data for the CCR1 gene. FGF7 (Antibody:HPA043605) and CD14 (Antibody:HPA001887) have higher protein content in the tissues of ovarian cancer patients, the staining of IHC sections is deeper, and the results of CD14 are more obvious. These two genes are disadvantages in our prognostic model. The results of IHC and the prognostic model are consistent.
To assist the clinical work of ovarian cancer, we established a nomogram based on the IGCI score ( Figure 5C). At present, AJCC stage is often used to predict the prognostic status of OV patients. In addition, Carter et al. 20 proved that a signature of chromosomal instability containing 25 genes (CIN25) can also effectively predict the prognostic status of OV patients. In order to verify the validity of the IGCI score and the nomogram, we calculated the C-index of the IGCI score and compared with the C-index of the AJCC stage and CIN25. The results are shown in Table 2. The C-index of the IGCI score in the training and test set were 0.701 (95% CI: 0.625, 0.779) and 0.630 (95% CI: 0.542, 0.719), which were signi cantly higher than the results of the AJCC stage (P <0.05) and CIN25 (P <0.05). In addition, we have plotted the 3-year and 5-year survival rate calibration curves to evaluate the predictive power of the IGCI score. In the training (Supplementary Figure 3D,3E) and test ( Figure 5D,5E) set, our prediction results are close to the ideal results (red lines), and the errors are within the standard error range. This shows that the prediction results of the IGCI score are accurate.

Genetic mutation analysis
To understand the types of gene mutations that are closely related to ovarian cancer, we rst generated a waterfall map of the type of genetic mutation ( Figure   6). From TCGA database, we obtained 409 samples with complete information for types of genetic mutations and clinical data and the top 10 genes with the most mutations were selected for display ( Figure 6). The waterfall map showed no clear relationships among the stage of cancer, patient age and gene mutations ( Figure 6). Compared to other genes, TP53 had a variety of mutation types, while TTN and DST were found to be more prone to mutations in the We performed a Chi-squared test (Table 3)   while the e cacy of AZD7762, RO-3306 in RB1 mutant samples decreased (P <0.05). The above results indicate that the occurrence of RB1 mutation may make the patient's response to drugs change greatly. Compared with the wild type, the prognosis of patients with RB1 mutation is better ( Figure 7B). On this basis, the use of drugs with enhanced e cacy for these patients may get better treatment results.

Discussion
Key factors in the IGCI score In the IGCI score, age is a poor prognostic factor, which is closely related to the decline in physical function of the elderly. Interestingly, race is also an effective prognostic factor, and white people tend to get lower scores. And "Pharmaceutical.Therapy" has the largest absolute value of the coe cient in the prognosis model (3.6304), which indicates that the timely use of drug treatment will effectively improve the prognosis of patients with ovarian cancer. "Radiation.Therapy" is not included in the score, and this information also provides a reference for the clinical treatment of ovarian cancer.
The FGF7 gene has the largest absolute coe cient (0.4089, Supplementary Table 2) compared to other genes in the IGCI score, indicating that the change in although it can be speculated that FGF7-related regulation may also be a potential target in ovarian cancer. The absolute value of the coe cient of FGF7 in the IGCI score was the largest, indicating that FGF7 can also be used as a prognostic predictor of ovarian cancer.
When it comes to the CCR1, Tumor cells secrete chemokines, which act on stromal cells through CCR1 to induce chemotaxis, and cooperate with stromal cells, promote the invasion process and transfer to the blood circulation or lymphatic system 31 . Chemokines participate in the formation, invasion, and metastasis of tumors such as epithelial cell carcinoma, squamous cell carcinoma, and mesenchymal cell carcinoma 32 . Interestingly, in the IGCI score, the coe cient of valid information for the patient's prognosis. Or maybe CCR1 has regulatory functions in the body that we have not yet understood.
The CD14 antigen is a 365-amino acid phosphatidylinositol-binding glycoprotein, which is mainly expressed on monocytes and macrophage membranes in body tissues 33 . In tumor tissues, macrophages mainly in ltrate the pericarcinoma and cancer interstitial tissues 34 . The detection rate of macrophages represents the immune status of local tumor tissues. Therefore, CD14 can effectively re ect the level of immune in ltration in patients with ovarian cancer. In our prognostic model, the HR of CD14 is = 0.879 < 1 (Fig. 4A), which is a favorable factor for prognosis. Cancer patients with a large degree of immune in ltration often have better prognosis 35 , which is consistent with the results of our model.

Key genes in the gene mutation analysis
Six mutations are of great signi cance between high ImmuneScore grouping and low ImmuneScore grouping according to Chi-square test: BRCA1, ZNF462, VWF, RBAK, RB1, ADGRV1. According to SNP survival analysis, we found the mutation of ve genes signi cantly affect the prognosis of patients: CSMD3, TRRAP is a subunit of histone acetyltransferase and a key cofactor for c-Myc, which is an oncogenic DNA-binding transcription activator. By recruitment of TRRAP, c-Myc activates RNA polymerases I and III to control ribosome biogenesis and cell growth. It has been con rmed that TRRAP positively regulates the accumulation of mutant p53 in lymphoma, and TRRAP inhibition by histone deacetylases decreases mutant p53 levels 45 . In addition, TRRAP depletion leads to down-regulation of TOP2A, which is consistent with our results and indicates that the association between these two genes is worthy of exploration in ovarian cancer.
The SNP of RB1 is closely related to immune process (Table 3). This evidence strongly illustrates the potential of RB1 as an immune and prognostic marker in ovarian cancer. A previously reported model indicated that RB1 functions as an essential tumor suppressor, which physically interacts with RBAK 46 . In ovarian cancer, the concurrent inactivation of P53 and RB1 is adequate forcarcinogenesis 47 , and in addition, RB1 may promote chemotherapy resistance 48 . Based on comprehensive studies on the BRCA state and DNA repair, a model has been proposed to illustrate the relationship between P53 and RB1, as well as homologous repair de ciency 49 . RBAK was computationally predicted as a down-stream target of miR-155 in lymphoma 50 , although the function of RBAK in solid tumors remains poorly understood. Our analysis revealed differential regulation of RBAK and RB1 in patients, which implies the importance of interactions between these two genes in ovarian cancer and their potential as drug targets. Further studies are required to con rm the involvement of RBAK in this process following its interaction with RB1.
CSMD3 is a member of CSMD gene family. The nonsynonymous mutation of CSMD3 has been identi ed in familial colorectal cancer but not in healthy controls 51 . Whole-exome sequencing has revealed that CSMD3 is the second most frequently mutated gene after TP53 in non-small cell lung carcinoma, and loss of CSMD3 causes proliferation of airway epithelial cells 52 . Additionally, CRISPR/Cas9-mediated knockout of CSMD3 inhibits the death of PDX tumor cells, which also suggests that CSMD3 is an important tumor suppressor 53 .
HMCN1, which is a conserved extracellular member of the immunoglobulin superfamily, manages epithelial cell attachments. As a cell polarity regulatory gene, HMCN1 is signi cantly up-regulated in gastric carcinoma 54 .Moreover, mutation of HMCN1 is associated with metastasis in breast cancer 55 . In ovarian cancer, HMCN1 may promote invasiveness by regulating cancer-associated broblasts 36 . Newly generated brocytes act as a "wall" that prevents the entry of immune cells into the ovarian cancer site.
As a plasma glycoprotein, von Willebrand factor (vWF) mediates the attachment of platelets confronted with damaged endothelium 56 . A large populationbased study demonstrated the association between coagulation, in ammation and survival of cancer patients 57 ,indicating that increased mortality in cancer survivors is dependent on high vWF levels. Our data also show that vWF expression is related with immune scores and patient survival, suggesting vWF as a novel biomarker of the basic immune state and predicts long-term survival in cancer patients.
For the rst time, we report the immune-related mutations of ZNF462, ADGRV1 and FLG2 in ovarian cancer. As a zinc-nger protein, ZNF462 may take part in embryonic development and is associated with neurodevelopmental abnormalities 58 . ZNF462 may be the target of miR-210 59 , which could be induced by hypoxia-inducible factor-1alpha in pancreatic cancer. ADGRV1 is one of the biallelic pathogenic identi cation markers of Usher syndrome 60 , which is characterized by congenital bilateral sensorineural hearing loss 61 . The de ciency of FLG2 causes defective adhesion between corni ed cells, which leads to peeling skin syndrome 62 . However, the roles of these two genes in cancer remain unclear, and further studies are required.
In the present study, data from TCGA and the GTEx database was combined with the ESTIMATE algorithm to identify 46 genes closely related to OV occurrence and immune in ltration process. Using genes and clinical information together, we established the IGCI score containing six essential factors. The IGCI score and the corresponding nomogram have been effectively veri ed by the ROC curves, C-index, and calibration curves on the test set. The prediction ability of the IGCI score is better than AJCC stage (P < 0.05) and CIN25(P < 0.05). In addition, by analyzing the gene mutations related to the process of ovarian cancer, the gene mutations that are closely related to the patient's prognosis and the degree of immune in ltration were identi ed. We conducted drug sensitivity analysis on key gene mutation which provides a reference for subsequent research. Authors' contributions to the StromalScore median. Cyan represents the high-score (H) group and pink represents the low-score (L) group. C. Heatmap of DEGs screened after grouping patients according to the ImmuneScore median. Cyan represents the high-score (H) group and pink represents the low-score (L) group. D-F.Scatter      Waterfall chart for genetic mutation analysis. Each column in the gure represents a sample. The legend above shows the density of synonymous and nonsynonymous mutations in each sample, and the legend on the left shows the ratio of gene mutations in 409 samples. The legend on the right shows the type of genetic mutations and the legend below shows the clinical information of the samples.