Identication of Core Predication-Related Candidate Genes in Ovarian Cancer Based on Integrated Bioinformatics and Experienment

Background: Ovarian cancer is one of the deadliest and most common gynecological malignancies. This study aims to use comprehensive bioinformatics analysis to try to identify the core candidate genes related to the prediction of ovarian cancer for the early diagnosis and prognosis of ovarian cancer. Methods: Obtain expression proles from Gene Expression Omnibus database, identify differentially expressed genes (DEG) with p<0.05 and (logFC)>1.5, perform functional enrichment, protein-protein interaction (PPI) network construction, functional module analysis, and survival analysis And correlation analysis to obtain the target gene, through immunohistochemical staining, clinicopathological feature analysis to verify the expression and clinical signicance of TTK. Results: 1. Identied 135 genes with the same expression. 33 up-regulated DEG were mainly enriched in mitotic spindle assembly checkpoints, chromosome segregation regulation, etc.; 102 down-regulated DEG was mainly enriched in neurotransmitter level regulation, protein serine/threonine Regulation of acid kinase activity, etc. Then the PPI network was constructed to screen 20 hub genes and perform survival analysis and expression correlation analysis. At the same time, the modules that met the requirements were screened and the genes were analyzed by pathway enrichment. It was found that TTK was highly expressed in ovarian cancer and led to a poor prognosis.2. Distant metastasis, lymph node metastasis, clinical staging (stage III-IV), and poor differentiation are independent risk factors for high TTK expression (P<0.05).3. TTK, CA125, HE4 three biological indicators show excellent diagnostic value in joint monitoring of ovarian cancer. Conclusions: TTK plays a vital role in the tumorigenesis, aggressiveness and malignant biological behavior of EOC, and can be used as a potential biomarker and potential therapeutic target for early diagnosis and predictive evaluation of EOC.


Introduction
Ovarian cancer is one of the most common gynecological malignancies in the female reproductive system worldwide, with the characteristics of high metastasis, chemotherapy resistance, and postoperative recurrence (1,2). The 5-year survival rate of early (I, ) ovarian cancer is about 90%, while only 20-40% of patients with advanced (III, ) ovarian cancer have a survival time of more than 5 years (3, 4). More than 70% of patients are at an advanced stage at the time of diagnosis, and the morbidity and mortality of OC patients have increased signi cantly in recent years. Despite advances in treatment, the 5-year survival rate of OC patients is still less than 40% 5 . Epithelial ovarian cancer (EOC) has the highest mortality rate among gynecological malignancies, and it is still the deadliest type that threatens the life and health of women 6 . Despite some understanding of it, treatment and survival trends have not changed signi cantly because early diagnosis remains a challenge. This is partly due to several factors; lack of clear screening tools, vague signs and symptoms may be "disguised" as other nonmalignant diseases (7).
Given the lack of speci c diagnostic and prognostic molecular markers for EOC, many studies have con rmed the effectiveness of serum human epididymis secretory protein 4 (HE4) in the preoperative diagnosis of patients with ovarian tumors. Verify its speci city. The sensitivity of HE4 and carbohydrate antigen 125 (CA125) overlapped (79%)and HE4 showed a signi cantly higher speci city than CA125 (93% vs. 78%). They also con rmed that HE4 is superior to CA125 in the diagnosis of ovarian cancer. Although HE4 has higher sensitivity and speci city than CA125 in the diagnosis stage, the combination of the two markers seems to be bene cial (8). Nevertheless, a single indicator used to evaluate EOC is greatly affected by individual differences, so nding new indicators and combining them with existing indicators to predict the development and outcome of EOC has important clinical signi cance. Besides, gene dysregulation has been shown to play a key role in the occurrence of EOC (7). In the era of targeted therapy, mutation analysis of cancer is a key aspect of making treatment decisions. Therefore, looking for a sensitive and speci c biomarker for early diagnosis and predictive evaluation of EOC, and becoming a target for ovarian cancer treatment Vital. Currently, bioinformatics analysis methods are often used in research to identify potential biomarkers that affect disease development.
In this study, we downloaded four original microarray data sets (GSE54388, GSE27651, GSE18520, and GSE26712) from the NCBI Gene Expression Comprehensive Database, with a total of 329 samples, including 297 epithelial ovarian cancer samples and 32 normal ovarian samples, Use R software to identify differentially expressed genes (DEG) between epithelial ovarian cancer and normal controls, and perform functional enrichment analysis. Also, a PPI network of 135 DEGs and key modules was established, and module analysis, survival analysis, and correlation analysis were performed. Through literature review, the important gene TTK related to epithelial ovarian cancer prediction was nally obtained.
Threonine and tyrosine kinase (TTK) is a dual-speci c protein kinase that can phosphorylate threonine/serine and tyrosine (9). It is the core component and main regulator of the spindle assembly checkpoint (SAC), which can recruit and coordinate other SAC protein kinases to the kinetochore, thereby ensuring faithful chromosome separation and maintaining genome stability (10,11). Elevated levels of TTK are easily found in many types of human tumors, such as glioblastoma, thyroid cancer, breast cancer, hepatocellular carcinoma, pancreatic cancer, and prostate cancer (12-18). This differential expression is suggested that can be used as a molecular biomarker for clinical diagnosis. We reviewed relevant clinical studies and trials on TTK in several human cancers (19), however, no experimental studies on TTK expression in EOC patients were found. In this study, we used immunohistochemistry to detect the expression of TTK in ovarian epithelial tumor specimens and analyzed the relationship between TTK expression and clinicopathological parameters of EOC patients.
Conclusion gene expression pro le data.This study included four GEO data sets, including a total of 297 ovarian cancer samples and 32 healthy control samples (Table 2). They were standardized by the limma software package in the R/Bioconductor software (Fig. 1). 812, 2820, 1495 and 536 DEGs were screened out respectively (P < 0.05, |logFC|>1.5). The differentially expressed genes in the sample data of the 4 data sets are shown in Fig. 2 (Fig. 2). Use VennDiagram package to perform gene integration of DEG that meets the standard. In conclusion, compared with normal OV tissue, a total of 135 (33 up-regulated genes and 102 down-regulated genes) in OC tissue samples were identi ed as DEG (Table 3). enrichment analysis.The cluster pro ler package was used in R software to biologically annotate 33 upregulated DEGs and 102 down-regulated DEGs after integration, and the GO function enrichment with Pvalue < 0.05 was obtained. The signi cant results of GO enrichment analysis showed that: 1. In the cell composition, the up-regulated DEG is mainly enriched in the double-strain tight junction, late promotion complex, apical junction complex, tight junction, and nuclear ubiquitin junction complex. Down-regulated DEG is mainly enriched in the extracellular matrix, collagen-containing extracellular matrix, and blood particles; 2. In biological processes, up-regulated DEG is enriched in mitotic spindle assembly checkpoints, chromosome separation and regulation, and cell cycle The regulation of later transitions, the positive regulation of ubiquitin-protein ligase activity, the involvement of signal transduction in gene expression regulation and chromosome separation, etc. The down-regulated DEG is obviously enriched in the regulation of neurotransmitter levels, the regulation of blood coagulation, protein serine/thereon The regulation of amino acid kinase activity, the process of mucopolysaccharide metabolism, and the Wnt signaling pathway; 3. In the molecular function group, the down-regulated DEG is mainly enriched in heparin-binding and frizzled binding, while the up-regulated DEG is not signi cantly enriched in compliance with the standard. (Table 4 & Fig. 3) PPI network and module analysis.The STRING database was used to establish a PPI network, and 152 protein pairs were obtained. The PPI network was constructed after the comprehensive score > 0.4 and the removal of 29 individual nodes (Fig. 4). The gene data was input into Cytoscape software, and a PPI network diagram containing 29 up-regulated DEG and 77 down-regulated DEG was further obtained. MCODE detected a total of 4 modules, and we chose the module with a higher score for the next analysis  Cytohubba to get the top 20 hub genes, which are: KDR, SOX9, EPCAM,  WNT5A, FGF13, PDGFRA, CP, ALDH1A1, KLF4, CDC20, UBE2C, FGF9, SOX17, TTK, TRIP13, CKS2,  RACGAP1, CD24, CHGB, LAMB1. gene enrichment through KEGG pathway.In order to understand the functions of the modules, we have performed KEGG enrichment analysis for each module. The results are shown in (Table 5). TRIP13, RACGAP1, CKS2, UBE2C, TTK, and CDC20 in module 1 all up-regulate DEG, which is mainly enriched in the cell cycle and ubiquitin-mediated proteolysis pathways. There are four genes ALDH1A1, CD24, EPCAM, and SOX9 in module 2. Except for ALDH1A1 which is down-regulated DEG, the other 3 genes are all up-regulated DEG. There is no obvious pathway enrichment in this module. There are four genes CP, LAMB1, CHRDL1, and CHGB in module 3. Except for CP which up-regulates DEG, the rest are downregulated DEG. After enrichment, CP exists in iron death, porphyrin, and chlorophyll metabolism pathways, and LAMB1 is in ECM receptor Interaction, small cell lung cancer, and other pathways exist. survival analysis and expression level analysis of hub gene.We used the Kaplan Meier Plotter online website to analyze the survival of 20 hub genes and found that 13 genes associated with ovarian cancer have a poor prognosis (P < 0.05, Fig. 6). Then use the GEPIA online database to mine the expression levels of 13 genes between ovarian cancer patients and normal people. The results showed that compared with normal ovarian samples, among the 13 prognostic-related genes in ovarian cancer samples, SOX9, EPCAM, CP, UBE2C, TTK, RACGAP1, and CD24 7 genes re ected high expression (P < 0.01, Fig. 7 ). clinicopathological characteristics and TTK expression.In this study, the average age of all patients at surgery was 52 years, and the median age was 53 years. Among them, 59.1% of patients with epithelial ovarian cancer have lymph node metastasis. The proportion of middle-high-middle-differentiated cancer was 43.0%. 56.9% of patients had distant metastases. We found that the expression of TTK was negative in normal ovarian tissues. In tumor tissues, all specimens had positive cytoplasmic staining, and the expression of a benign group, borderline group, and malignant group increased in turn. We calculated the H score of TTK expression in tumor tissues. Among them, the H score is 180 (10-220). (Table 6&Figure 8) analysis of the correlation between TTK expression and clinicopathological factors.We found that there was no signi cant correlation between TTK expression and age and fertility level. However, there is a signi cant positive correlation between TTK expression and tumor differentiation, CA125 level, HE4 level, clinical stage, lymphatic metastasis, and distant metastasis. Compared with patients with normal CA125, HE4, moderately well-differentiated, stage I-II, no lymph node metastasis, and no distant metastasis, CA125 elevated, HE4 elevated, poorly differentiated, stage III-IV, lymph node metastasis, and distant metastasis patients, TTK expression rate is higher, multivariate logistic regression analysis with statistically signi cant clinical-pathological factors in the univariate analysis as independent variables, the results show: distant metastasis, lymph node metastasis, clinical stage (III-IV), Poor differentiation is an independent risk factor for high TTK expression (P < 0.05). (Table 7&Table 8) the ROC curve analysis of TTK, CA125, and HE4 alone and combined detection for diagnosis of ovarian cancer. Draw the ROC curve with the benign ovarian tumor group as the reference, and calculate the AUC, the AUC of TTK, CA125, and HE4 in the joint monitoring of ovarian cancer are 0.927, 0.899, and 0.882, respectively, which are signi cantly higher than when each index is tested separately. The three biological indicators of TTK, CA125, and HE4 show excellent diagnostic value in the joint monitoring of ovarian cancer. (Fig. 9)

Discussion
Genetic instability is a hallmark of cancer cells. This instability is caused by aneuploidy, with an abnormal genome structure and an abnormal number of chromosomes. This state is closely related to chromosome instability (CIN) (34). SAC is a key monitoring mechanism. It prevents the misdivision of chromosomes by delaying the process of mitosis until all chromosomes are correctly attached to the spindle microtubules, which can ensure the accurate separation of chromosomes. The inactivation of the spindle assembly checkpoint will lead to the premature exit of the mitotic point, which will eventually lead to chromosome instability, aneuploidy formation, and even cell death. SAC can ensure healthy cell growth and precise division. TTK is the core component of the spindle assembly checkpoint (SAC), and the function of SAC depends on the activity of TTK (35-37). Because TTK plays a vital role in maintaining chromosome stability, more and more researchers have begun to pay attention to the relationship between TTK expression and tumor development. Although TTK has conducted relevant basic and clinical studies in many human malignant tumors, we have not found similar studies in ovarian cancer. This study found that the expression level of TTK in tumor tissues was signi cantly elevated, while the expression in normal ovarian tissues was negative. It is con rmed with the existing literature that through Northern blot analysis, except for the testis and placenta, the TTK gene transcript is almost not detected in normal organs. However, high levels of TTK are easily found in many types of human malignancies, and the abnormal expression of TTK will inevitably affect the function of SAC (38). Compared with the same period last year, TTK is overexpressed in many malignant tumors, and the prognosis of patients with TTK overexpression is poor. We speculate that TTK may play a subtle role in the occurrence of ovarian cancer. It has been widely recognized that chromosomal instability is related to tumor heterogeneity, chromosomal abnormalities, and aneuploidy formation. the existence of aneuploidy can be found in the earliest stage of tumor formation, and chromosomal instability Stability is the basic process of tumorigenesis (39). At present, in gastric cancer and colorectal cancer with microsatellite instability, tumor-related TTK box shift mutations have been found, which can lead to the premature termination of TTK synthesis (40). Whether there is a similar process in ovarian cancer remains to be studied.
In addition to the difference in TTK expression in epithelial ovarian tumors and normal ovarian tissues, we further analyzed the TTK expression level in EOC patient tissues and its correlation with clinicopathological factors through the Chi-square test and found that TTK expression is related to tumor differentiation, There is a signi cant positive correlation between clinical stages. High TTK expression may contribute to tumor invasion, lymph node metastasis, and distant metastasis. For patients with the same clinical stage, the survival time of patients with low TTK expression is likely to be longer than that of patients with low TTK expression. The expression level of TTK can predict the development and outcome of EOC patients, and TTK can be used as a biomarker to predict the prognosis of EOC. To make some medical decisions for clinicians. For example, compared with patients with high TTK expression, ovarian cancer patients with low TTK expression are more likely to bene t from adjuvant chemotherapy. Clinically, for patients with early-stage ovarian cancer, if high TTK expression is found in these specimens, doctors and patients need to pay more attention, because these patients are potentially high-risk groups of lymphatic metastasis or distant metastasis in the future. Using a combination of conventional pathology and TTK expression may improve the survival prognosis of these patients.
It is worth mentioning that our research can provide a theoretical basis for TTK immunotherapy and targeted therapy in different tumors. TTK can be used as an ideal immune epitope antigen to induce strong peptide-speci c cytotoxic T lymphocyte activity, thereby ghting tumor cells. Its safety, immunogenicity, and clinical reactivity have been con rmed in clinical trials including lung cancer, esophageal cancer, and cholangiocarcinoma (41-47). Even, TTK can be used as a new therapeutic target and become a new method for the treatment of some tumors, including glioblastoma, breast cancer, hepatocellular carcinoma, lung cancer, and pancreatic cancer (48-52). And in the studies of glioblastoma, breast cancer, and lung cancer (53-55), it was found that the combination of TTK inhibitors and chemotherapeutics can improve the e cacy of chemotherapeutics and reduce the adverse reactions of chemotherapeutics. From many studies, we have found that TTK inhibits The agent can not only weaken the invasion and activity of tumor cells, increase autophagy and apoptosis, but also can combine with chemotherapy drugs to enhance the e cacy. For the above-mentioned tumors, TTK overexpression is fully utilized, TTK targeted therapy or immunotherapy is feasible, and whether TTK inhibitors can become a new EOC therapeutic target is still further con rmed.
Ovarian cancer is one of the most common malignant tumors of the female reproductive system. Its early clinical features are not signi cant and it is highly concealed. Therefore, most patients are di cult to detect and diagnose and treat in time. When the clinical diagnosis is made, the patient is already in the middle and advanced stages. It is not ideal, so improving the diagnostic accuracy of ovarian cancer has important clinical value and signi cance (56). Tumor markers, or tumor markers, are substances that are characteristically present in malignant tumor cells, or are abnormally expressed and secreted by them, or are abnormal substances produced by tumor cells to stimulate host-related cells, which re ect the occurrence and development of malignant tumors. A class of abnormal substances is used to evaluate the treatment and prognosis of malignant tumors. Tumor markers can exist in tumor tissues, blood, milk, bile, and other body uids and excrements such as urine and feces of tumor patients (57). Therefore, the detection of relevant tumor markers by immunological, biological, or chemical methods can not only assess The severity of the disease and the prognostic outcome of treatment for patients with malignant tumors, and the detection of relevant representative markers for suspected patients also helps to improve the accuracy of existing examination techniques for the diagnosis of malignant tumors (58).
So far, effective methods for early detection of ovarian cancer are still lacking. In order to improve the prognosis and improve the quality of life of patients, the early diagnosis of ovarian cancer has always attracted people's attention. The traditional tumor marker CA125 is widely used in the diagnosis, treatment, and monitoring of ovarian cancer. High sensitivity, but easily affected by physiological factors such as menstruation. In comparison, HE4 has strong speci city in the diagnosis of ovarian cancer, but it also has certain limitations. Therefore, the joint measurement of multiple indicators Materials And Methods gene expression pro ling data.GENE EXPRESSION OMNIBUS (GEO) is a public repository created by the National Center for Biotechnology Information (NCBI) to store various forms of high-throughput genomics data (14). Gene expression pro le data (GSE54388, GSE27651, GSE18520, GSE26712) are all from the Gene Expression Database (GEO). The above data sets all contain gene expression pro les and contain at least 20 cancer and healthy control samples, which have also been used to publish related literature(15-19). comprehensive analysis of microarray data set.Based on the programming language R, there are hundreds of software packages in Bioconductor for high-throughput sequencing in analytical genomics (20). Use hgu133plus2.db annotation package (version 3.2.3) and hgu133a.DB annotation package (version 3.2.3) to convert probe ID to gene name. The limma software package (version 3.40.2) (21) was used to normalize and log 2 conversions of the data in the data set, and the DEG of the ovarian cancer tissues in the 4 data sets compared with the control was identi ed through a linear model. |logFC|>1.5 and P < 0.05 were considered statistically signi cant for DEG. |logFC|>1.5 is considered to increase DEG, |logFC|<1.5 is considered to decrease DEG. The Venn-diagram software package (version 1.6.20) (22) is used to integrate the genes that meet the criteria in the four data sets and visualize them with a Venn diagram.
functional and pathway enrichment analysis.The Clusterpro ler package (version 3.10.2) was used to perform function and pathway enrichment analysis on DEGs to explore the biological signi cance, and the signi cance threshold was set at P < 0.05. Gene ontology (GO) function enrichment mainly describes the functions of genes and gene products in all organisms from three aspects: cell components (CC), biological processes (BP), and molecular functions (MF) (23). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis explains genes from their biochemical pathways and regulatory pathways (24).
PPI network construction and module screening.To evaluate the functional interaction of DEG, use the gene database STRING to search for protein interactions to map DEGS to the PPI network to generate a combined score, and the cut-off value is a comprehensive score ≥ of 0.4. Cytoscape software (version 3.7.2) (25) was used to construct a PPI network for visualization and biological analysis to identify the interaction between DEG-encoded proteins in ovarian cancer. To improve sensitivity and speci city, we used Cytoscape plug-in Cytohubba's MCC algorithm (26) to perform the next biological analysis on 20 hub genes. At the same time, the Cytoscape plug-in Molecular Complex Detection (MCODE) is used to detect dense areas of the PPI network (Degree Cutoff = 2, Node Score Cutoff = 0.2, and K-Core = 2, maximum depth = 100 is set as an advanced option) (27), Select the module that meets both the MCODE score > 3 and the number of nodes > 4 and performs KEGG enrichment analysis on DEGs.
survival analysis of central genes.Kaplan-Meier Plotter is an online database containing a large number of gene expression data and clinical data of patients with ovarian cancer (28). We use this library to analyze the selected 20 hub genes to evaluate their prognostic value. The graph can directly display the log-rank P-value and the hazard ratio (HR) of the 95% con dence interval, and select genes with a log-rank P-value of < 0.05. analysis of hub gene expression level .In order to verify the expression of the hub gene in ovarian cancer, use GEPIA (gene expression pro le interactive analysis) (29) to match the normal data of TCGA (tumor genome atlas) and GTEX (genotype tissue expression) to the selected hub genes that affect the prognosis, Setting P < 0.01 has signi cant statistical signi cance, and the level of gene expression is displayed in the form of box plots. patient and para n-embedded tissue samples.This study was approved by the Ethics Committee of the A liated People's Hospital of Shanxi Medical University. From 2015 to 2020, the Department of Obstetrics and Gynecology, A liated People's Hospital of Shanxi Medical University collected 150 patients and postoperative para n-embedded specimens. The pathological diagnosis of all tissue sections was con rmed by internal experts, as follows: malignant group n = 93, borderline group n = 27, benign group n = 15, normal group n = 15. The pathological types of ovarian cancer are 65 cases of serous adenocarcinoma, 13 cases of mucinous adenocarcinoma, 10 cases of endometrioid carcinoma, and 5 cases of clear cell carcinoma. In the malignant group, there were 40 cases, 53 cases of welldifferentiated and poorly differentiated. According to the standards of the International Federation of Obstetrics and Gynecology (FIGO, 2009), the pathological stage is judged as follows: FIGO I-II stage (36 cases) and FIGO III-IV stage (57 cases). Lymph node metastasis was judged as follows: no metastasis (48 cases), metastasis (55 cases). All patients were primary ovarian cancer, with complete clinical and pathological data. Patients who received chemotherapy, radiotherapy, and hormone therapy before surgery were not implemented in this study. (Table 1) TTK immunohistochemistry and H score. All fresh tissue specimens were collected immediately after surgical resection and immersed in 10% neutral buffered formalin solution, and then embedded in para n. The para n-embedded tissue sections were 5 meters in thickness and stained by immunohistochemistry. Immunohistochemical staining is performed manually, and each slide is processed strictly in accordance with the immunohistochemistry protocol. TTK polyclonal antibody (ab219068, Abcam, UK, 1:100). Para n-embedded tissue sections were depara nized with xylene and gradient alcohol, washed with PBS, hot antigen retrieval, 3% hydrogen peroxide to block endogenous peroxides, and incubated with rabbit TTK antibody (1:500) at 4°C overnight. The reaction enhancement solution was added dropwise for 30 minutes, and then the enzyme-labeled anti-rabbit secondary antibody reagent was incubated for 30 minutes. Diaminobenzidine was used for color development, hematoxylin counterstaining, dehydration, and neutral resin mounting. The manufacturer recommends testicular tissue as a positive control. The immunohistochemical sections were independently evaluated blindly by two experienced pathologists. TTK staining is evaluated using the H scoring system, which is calculated by multiplying the total staining intensity by the percentage of positive cells (30-33). The staining intensity was graded from 0 to 3 (0 = negative, 1 = weak, 2 = medium, 3 = strong), and the percentage of positives increased from 0 to 100. In theory, the nal high score is obtained in the range of 0 to 300.

Statistical analysis
Analyze the data with SPSS 26.0 statistical software, and perform a single factor chi-square test for the clinicopathological factors of epithelial ovarian cancer, the test level is α = 0.05; the combined predictor can be obtained by logistic regression model analysis, and the predictive evaluation value of the research factors can be passed Receiver operating characteristic curve (ROC curve) for analysis, α = 0.05.       Circles represent genes, lines represent the interaction of proteins between genes, and the results within the circle represent the structure of proteins. Line color represents evidence of the interaction between the proteins. The prognosis of 20 hub genes was analyzed with Kaplan-Meier Plotter, and 13 genes had signi cantly poor survival rate(P<0.05). cytoplasm staining with 1+ 2+and 3+intensity respectively