DDR-related genes can classify the tumor data in TCGA into two types
The workflow of this study is shown in the figure (Figure 1).The maximum value of clustering variable (k) was set to 9. The typing requirement was reached when k=2. DDR-related genes could well classify the tumor data in TCGA into two types (Figure 1A). The inter-typing survival analysis was statistically significant (p = 0.02) (Figure 1B). PCA plots indicated that bifurcation could well separate all data (Figure 1C). To verify the functional differences between the typing, the ssGSEA analysis showed that multiple immune cells were significantly different between typing (Figure 1D); GSVA analysis of the differential enrichment pathways between typing (Figure 1E) , the differential enrichment pathways between typing were mainly in the following areas: multiple neuromuscular-like disorders (Parkinson's, Huntington, and myocardial contraction), DNA damage repair (base excision repair, DNA replication, nucleotide excision repair), and metabolism and synthesis of related compounds (folate biosynthesis, glyoxylate and dicarboxylic acid metabolism, pyruvate metabolism).
Extraction of inter-typical prognosis-related differential genes and typing of tumor data in TCGA
Using “|logFC|=3, adj.Pvalue=0.001” as the condition, 9323 ITD-Genes were screened (Figure 2A). The typing condition was as above. The typing requirement was achieved when k=2, indicating that ITPD-Genes between typing could well classify the TCGA tumor data into two types (Figure 2B). Inter-typing survival analysis was statistically significant (p<0.001) (Figure 2C). P<0.05 was used as a screening index, One-way COX analysis of ITD-Genes yielded 1014 ITPD-Genes (Annex 1). ITPD-Genes were subjected to GO functional annotation (Figure 2D) and KEGG enrichment pathway analysis (Figure 2E).
Prognostic model construction and validation
The 349 IP-Genes were further analyzed using lasso regression and multifactorial Cox. The expression and gene correlation coefficients of 16 key genes were obtained (CH25H, CCR7, CACNA1C, SLC4A8, CXCL11, UBD, TRPV4, RPS6KA2, FCGBP, TOMM20L, STX18, PI3, CMBL, ISG20 AKAP12, PIGS).The modeling equation: Risk-Score= (0.001247) *CH25H+(-0.003417)* CCR7+(0.017031)*CACNA1C+(-0.008297)*SLC4A8+(-0.003082)*CXCL11+(-0.003726)*UBD+(0.004255)*TRPV4.+(0.003507)*RPS6KA2+(0.005019)*FCGBP+(-0.004344)*TOMM20L+(-0.001889)*STX18+(0.002769)*PI3+(-0.000619)*CMBL+(-0.000359)*ISG20+(0.000582)*AKAP12+(0.000396)*PIGS. According to the formula, risk values were calculated for each case. Patients were divided into high-risk and low-risk groups, using the median risk score as the boundary. The results of survival analysis between high and low risk groups in both the TCGA training set and the GEO validation set showed p<0.05, which was statistically significant (Figure 3A-B). The risk heat map of the model suggested that the number of patient deaths gradually increased and the number of survivors gradually decreased as the patient risk gradually increased (Figure 3C-D). Column line plot of the predicted 1, 2 , and 3 year survival rates of patients applying the TCGA training set, in which there were significant differences between patient age, stage, and risk score (Figure 4A). The calibration curve converged to the midline, indicating the reliability of the column plots in predicting survival (Figure 4B). The DCA curve indicates that the predictive capability of the model we constructed is similar to the capability of the column line graph (Figure 4C). The area under the ROC curve for the predictive ability of our constructed model was smaller than the column line plot but larger than the other clinicopathological features, the AUC=0.72(Figure 4D). Both the univariate and multifactor independent prognosis of the TCGA training set showed the smallest p-value for the risk model, p<0.001 (Figure 4F-G). Columnar plot of the application of the GEO validation set for predicting 1, 2, and 3 year patient survival, where there were significant differences between patient age and risk scores (Figure 5A). The calibration curve tends to the middle line, indicating the reliability of the column line plot in predicting survival (Figure 5B), and the DCA curve indicates that the predictive ability of our constructed model is similar to the column line plot ability (Figure 5C). Both univariate and multifactorial independent prognosis showed the smallest p-value for the risk model, p<0.001 (Figure 5D-E).
Risk model constructed based on DDR-related genes for immune, stem cell correlation and drug sensitivity analysis
In the typing of TCGA tumors based on DDR-related genes and ITPD-Genes between typing, the differences between high and low risk groups were statistically significant, p<0.001 (Figure 6A-B). The results of the correlation heat map of risk model genes and immune cells showed that risk genes were significantly correlated with multiple immune cells (Figure 6C). The results of stem cell correlation analysis showed that with risk score was positively correlated with stem cell index, (R=0.14, p=0.013) (Figure 6D). Immunoscoring of the tumor microenvironment showed significant differences in the tumor cell stromal environment between high and low risk groups (Figure 6E). The results of drug sensitivity analysis between high and low risk groups revealed significant differences in anti-OC drugs such as methotrexate, mitomycin C and cisplatin between high and low risk groups (Figure 6F).
Risk gene study
Mutation frequency analysis of 16 risk genes according to the cBioPortal online website showed that the mutation rate of FCGBP gene was 12%, CACNA1C gene was 10%, CMBL gene was 8%, UBD gene was 5%, PI3 gene was 4%, ISG20, AKAP12, STX18, RPS6KA2 gene was 3%, CXCL11, PIGS gene was 2.1%, TRPV4 gene was 1.9%, CH25H gene was 1.6%, CCR7 gene was 1.4%, SLC4A8 gene was 1.3%, TOMM20L gene was 0.4%, and the mutation types were mainly amplification mutations (Figure 7A). The top 5 risk genes of mutation frequency were further explored according to the HPA database for differential expression between normal ovarian epithelium and OC inter-tissue protein levels, among which UBD gene was not yet relevant in ovarian cancer epithelium and tumors in the HPA database (Figure 7B).