Immune Characteristics-Related Typing of Colorectal Cancer and the Establishment of 14-Gene Prognostic Model

This study is to establish NMF (nonnegative matrix factorization) typing related to the tumor microenvironment (TME) of colorectal cancer (CRC) and to construct a gene model related to prognosis to be able to more accurately estimate the prognosis of CRC patients. NMF algorithm was used to classify samples merged clinical data of differentially expressed genes (DEGs) of TCGA that are related to the TME shared in The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) datasets, and survival differences between subtype groups were compared. By using createData Partition command, TCGA database samples were randomly divided into train group and test group. Then the univariate Cox analysis, Lasso regression and multivariate Cox regression models were used to obtain risk model formula, which is used to score the samples in the train group, test group and GEO database, and to divide the samples of each group into high-risk and low-risk groups, according to the median score of the train group. After that, the model was validated. Patients with CRC were divided into 2, 3, 5 subtypes respectively. The comparison of patients with overall survival (OS) and progression-free survival (PFS) showed that the method of typing with the rank set to 5 was the most statistically signicant (p=0.007, p<0.001, respectively). Moreover, the model constructed containing 14 immune-related genes (PPARGC1A, CXCL11, PCOLCE2, GABRD, TRAF5, FOXD1, NXPH4, ALPK3, KCNJ11, NPR1, F2RL2, CD36, CCNF, DUSP14) can be used as an independent prognostic factor, which is superior to some previous models in terms of patient prognosis. The 5-type typing of CRC patients and the 14 immune-related genes model constructed by us can accurately estimate the prognosis of patients with CRC. to predict the survival of patients with CRC, a risk score model containing 14 immune-related genes from the TCGA database (TCGA-COAD, TCGA-READ) and GEO database (GSE39582, GPL570) has also been established. In order to obtain a relatively more accurate risk score model, the immune-related risk score model constructed by us was compared with the models of some scholars. The results showed that the risk score model we constructed was superior to some scholars' models in terms of patient prognosis, and the results of univariate and multivariate independent prognostic analysis showed that the constructed immune-related risk score model could be used as an independent prognostic factor. According to N grouping, there was no signicant difference between N0 and N1 (p>0.05), but the differences between them and N2 were signicant (p<0.001), which indicated that there was no signicant difference in risk when the number of lymph node metastases in patients with CRC does not exceed 3, while exceeds 3, the risk increased signicantly. In general, after grouping by M and Stage, the average risk score of each group gradually increased. risk model. The correlation was positively correlated with Monocytic lineage, and dendritic


Introduction
Colorectal cancer (CRC) is one of the most common cancers with high morbidity and high mortality in the world. Worldwide, there are approximately 1.8 million new cases of CRC and 900,000 CRC-related deaths each year, accounting for about 10% 1-3 of all diagnosed cancers and cancer-related deaths, ranking the fourth among 36 cancers. 4 So far, how to diagnose CRC at an early stage and accurately estimate the prognosis of patients still need further research and optimization. 5,6 In recent years, cancer immunotherapy has become an important means of cancer treatment and one of the hot spots in the eld of cancer research. [7][8][9] However, the complex composition and heterogeneity of tumor microenvironment (TME) pose a huge challenge to tumor immunotherapy. 10 CRC is one of the tumors with poor effect in immunotherapy. One problem that needs us to think is to how to perform the immune-related typing of patients with CRC and use immune-related genes to construct prognostic models, so that patients with different subtypes and different risk conditions can be given reasonable treatments and their survival can be assessed accurately. In order to solve this related problem, many researchers have made efforts and put forward some innovative viewpoints. People such as Jun Jun Wang, Xiao Wang and others [11][12][13][14][15] have previously constructed genetic models for the prognosis of CRC, which are of high value in the evaluation of the prognosis and recurrence of CRC. However, there is no one has compared the accuracy of evaluation, statistical differences among their respective models. Even some scholars also used the NMF algorithm to classify CRC samples in TCGA, most of them used methods with the rank value of 2 or 3, and they did not compare the statistical differences in OS and PFS of patients caused by typing with different rank values.
In view of the ability to better classify patients with CRC, a total of 540 CRC samples that meet the requirements from the TCGA (TCGA-COAD, TCGA-READ) and GEO (GSE39582, GPL570) database were screened by us, with the use of the NMF (method=brunet) algorithm to gain the typing, and the relative optimal typing method with a rank value of 5 was determined by comparing the impact of the three typing methods with a rank value of 2, 3, and 5 respectively on the OS and PFS of patients in each subtype. This typing method was then compared with others. In addition, with the purpose to predict the survival of patients with CRC, a risk score model containing 14 immune-related genes from the TCGA database (TCGA-COAD, TCGA-READ) and GEO database (GSE39582, GPL570) has also been established. In order to obtain a relatively more accurate risk score model, the immune-related risk score model constructed by us was compared with the models of some scholars. The results showed that the risk score model we constructed was superior to some scholars' models in terms of patient prognosis, and the results of univariate and multivariate independent prognostic analysis showed that the constructed immune-related risk score model could be used as an independent prognostic factor.

Methods And Materials
Data acquisition. The CRC-related gene expression data obtained from the TCGA (https://tcga-data. nci.nih.gov) database contains 56461 genes and their expression levels from 544 original samples, and the clinical data from the TCGA database includes 548 original samples and their clinical traits.
While the CRC expression data obtained from the GEO (http://www.ncbi.nlm.nih.gov/geo/) database (GSE39582, GPL570) contains 21654 genes and their expression levels, and the clinical data from the GEO database contains 585 clinical samples and their clinical traits. By using |LogFC|>1.5 and fdr<0.05 as lter conditions, DEGs were found in TCGA, and the 559 TCGA DEGs related to TME shared in the two databases of TCGA and GEO were obtained by the method of intersection. Finally, the samples of these DEGs were respectively intersected and merged with the clinical samples of the TCGA and GEO databases, and 540 and 579 samples were obtained respectively for subsequent typing and modeling. The genes of part of the CRC prognostic risk models constructed by the predecessors11-19 were obtained from their articles.
NMF algorithm immune-related typing. The univariate Cox survival analysis was performed on genes of 540 samples in TCGA, and survival related differential genes were obtained with p value <0.01 as the lter condition. The NMF command (rank=2:10, method=brunet, NRUN=10, seed=123456) was used to classify the samples of differential genes obtained by us (use "survival" package, "NMF" package), so that the graph of Cophenetic typing parameter and 9 heatmaps were obtained, and the optimal rank value was determined by comparing the graphical results and subsequent analysis on OS and PFS in patients with CRC for typing. There are three criteria for the selection of rank value according to the graphic results: rstly, select the point at the front of the maximum slope segment in the Cophenetic graph; secondly, the color within the group is required to be dark red, while the color between groups is required to be dark blue in the typed heatmap; thirdly, the number of samples in each group is not very different. By comparison with previous pan-cancer typing methods, differences between our typing and others could be obtained, and Sanki plots were used for visualization.
MCPcounter immune cell in ltration analysis. To explore whether there were differences in in ltration content between subtypes in our typing method, the immune in ltration analysis was performed. The MCPcounter. estimate command (use "MCPcounter" package and "limma" package) was used to calculate the immune cell in ltration through the expression data to obtain the content of immune cell in ltration in each sample. Finally, the differences in in ltration of different immune cells between subtypes were analyzed. The in ltration differences of ten immune cells, including CD8 T cells, Endothelial cells, Fibroblasts, Monocytic lineage, Myeloid dendritic cells, Neutrophils, NK cells, Cytotoxic lymphocytes, B lineage, and T cells were analyzed between subtype groups.
Construction of prognostic risk model. The 540 samples containing expression and clinical data in TCGA were randomly divided into the train group and test group by using the createDataPartition (p=0.7) command (use "survival", "caret", "glmnet", "survminer", and "timeROC" packages). The univariate Cox survival analysis was performed on the genes in the train group, and p<0.05 was used as the lter condition to obtain the gene expression levels with signi cant differences. Then, the Lasso regression model was constructed by taking the expression level of single-factor signi cantly different genes as x-axis as well as the survival time and survival state of the samples as y-axis. After that, the genes expression levels corresponding to the minimum error point was obtained through cross-validation, and the model genes and corresponding coe cients were obtained by constructing the multivariate Cox model. The formula of the model was as follow: the expression of A gene x A gene coe cient + the expression of B gene x B gene coe cient +... the expression of N gene x N gene coe cient. The model formula was applied to score the samples from the train group, test group and GEO database (579 samples), and the median risk score of samples in the train group was used for risk grouping (above the median value is high, and below the median value is low). The survival analysis was then performed on the high-risk and low-risk samples in each group, and the ROC curve analysis was used to test the accuracy of the model.
Decision Curve Analysis (DCA) and ROC curve analysis. To compare the prognostic ability and accuracy of the risk score model, nomogram 23 , and various clinical traits, DCA and ROC curve analysis were performed. The "survival", "survminer", "timeROC", and "ggDCA" packages were used to perform the Cox survival analysis on the prognostic risk score model, nomogram, and risk scores of the clinical traits, and then DCA was performed for all factors to obtain the optimal prognostic factor. Then predictTime was set to 1 to conduct the ROC curve analysis of each factor and the value of Area Under The Curve (AUC) can be obtained later, and the accuracy of prediction of each factor was judged by comparing the AUC value.
Gene Set Enrichment Analysis (GSEA). To obtain the active functions and pathways in the high-risk and low-risk groups, the GSEA was conducted. With p value <0.05 as the ltering condition (use "limma", "org.Hs.eg.db", "clusterPro ler", and "enrichplot" packages), the GSEA was performed on 540 samples in the TCGA database. Active functions and pathways in high-risk and low-risk groups could be obtained respectively through the GO, KEGG annotated gene set les and the expression matrix, and the results were nally visualized.

Results
Research ow chart. The data of TCGA database and GEO database were screened for subsequent NMF typing and the construction of prognostic risk model. For a purpose of illustration, a owchart for the main steps of this research ( Fig. 1) has been plotted.
Comparison of NMF immune-related typing. In order to get the optimal grouping method, the rank value was set as 2:10 to get the cophenetic graph and 9 heatmaps. With the choice of three selection conditions: the point at the front of the segment with the largest slope, the deep red within the group and the deep blue between the groups in the heatmap, the number of samples in each group should be as close as possible. By analyzing and comparing the three types with the rank value set to 2, 3, and 5 respectively, the optimal typing method was determined. After comparing the OS and PFS of the samples in the subtypes of each typing method, it was found that the typing method with the rank value of 5 was the most statistically signi cant (p=0.007, p<0.001, Fig. 2A-J). The three immune-related typing methods with the rank severally set to 2, 3, and 5 were compared with the pan-cancer typing methods of the predecessors, and it was found that the differences between the typing method with the rank set to 5 and others was the most obvious among the three types ( Fig. 2K-M).

Results of the MCPcounter immune cell in ltration. The MCPcounter immune cell in ltration analysis was performed on CD8 T cells, Endothelial cells,
Fibroblasts, Monocytic lineage, Myeloid dendritic cells, Neutrophils, NK cells, Cytotoxic lymphocytes, B lineage, and T cells, and the differences between subtypes were obtained (Fig. 3). The results showed that Monocytic lineage gradually increased in the subtype groups from c1 to c5, and the differences were statistically signi cant. Between them, the c1 and c5 subtypes were the most statistically signi cant. In terms of Fibroblasts, except that there was no signi cant difference between subtypes c1 and c3, there were signi cant differences between the other subtypes, and its overall trend of in ltration gradually increased from subtypes c1 to c5. The in ltration content of Myeloid dendritic cells, Cytotoxic lymphocytes, Neutrophils, NK cells and B lineage gradually increased from subtype c1 to the other subtypes respectively, and it was statistically signi cant. However, the trend of in ltration content of CD8 T cells and T cells was not obvious between subtypes.
Results of the construction of the prognostic risk model F2RL2, CCNF) were protective genes with coe cients less than 0, while the remaining 9 genes were risk genes with coe cients greater than 0.
The model formula was used to score the samples of the train group, the test group and the GEO group respectively, and the samples of each group were divided into high and low-risk groups according to the median value of the score in the train group. Survival analysis were performed on the high and lowrisk groups of the train group, test group, all samples of TCGA group, and GEO group, and the survival differences between the high and low-risk groups were obtained then (Fig. 4C-F). Moreover, the accuracy of this risk model for 1-year, 3-year, and 5-year prognosis was tested by using the ROC curve analysis (Fig. 4G-J). The results showed signi cant differences in survival among risk groups (p<0.001). According to whether the AUC value is greater than 0.65, the accuracy of the prediction was judged, and the results showed that the prediction effect of this risk model was good.
Cox Survival analysis of clinical traits by group. According to the clinical traits, 540 patients in TCGA were divided into groups, and the Cox survival analysis between the high-risk and low-risk groups of each group was carried out (Fig. 5A-J). The results showed that except no statistical difference in the survival of patients in the high-risk and low-risk groups in the T1-T2 group, others were signi cantly different (p<0.001, p=0.005), which suggested that the survival of CRC patients who invade the submucosa and muscularis propria was not associated with the risk.
Nomogram scoring forecast. With the use of clinical traits (gender, age, stage, T, N) and riskscore, a nomogram model was built, and the model was visualized to obtain the graph of scoring method and calibration curve (Fig. 5K-L). According to the nomogram, the total score of the patient's clinical traits and riskscore could be obtained, and the total score could be used to predict the 1-year, 3-year, and 5-year survival rate (Fig. 5K). In addition, the calibration curve showed that the curves of 1, 3, and 5 years were close to the gray line segment, which indicated that the nomogram was accurate in predicting the survival rate of patients with CRC (Fig. 5L).
Decision Curve Analysis (DCA). The DCA was used to analyze the ability of nomogram, risk model, and clinical traits to predict survival of patients with CRC. The results showed that the nomogram has the strongest predictive ability, followed by the risk model, while the gender was the worst (Fig. 5M). In addition, the ROC curve analysis results also showed that the nomogram has the highest prediction accuracy, followed by the risk model, while gender was the lowest (Fig. 5N). The reason why the nomogram has the strongest predictive ability was that it combines multiple factors including the risk model and clinical traits.
Independent prognostic analysis. The univariate Cox independent prognostic analysis of patients' age, gender, stage and risk scores showed that age, stage and riskscore were statistically signi cant prognostic factors (p<0.05; Table 1). The multivariate independent prognostic analysis was performed on the statistically signi cant prognostic factors of univariate Cox analysis. Then the results showed that the p-values of age, stage, and riskscore were still less than 0.05 (Table 1), indicating that the constructed risk score model could be used as an independent prognostic factor. Comparison with other prognostic risk models. By plotting survival curves for each model, it was found that the curves of the high and low-risk groups distinguished by the model constructed by us were more separated (p<0.001; Fig. 6A-F), indicating that the model could better assess the level of patient risk. The AUC values of 1-year, 3-year, and 5-year were 0.768, 0.758, and 0.801 respectively, which suggested that our model has a higher prognostic accuracy (Fig. 6G-L). The higher value of the C-index analysis, the better the prognostic effect of the model. As shown in the gure, the score of the risk model constructed by us was 0.744, and the prognostic effect was relatively best (Fig. 6M). What is more, the RMS graph showed that our risk model has good accuracy, 95%HR: 1.035 (1.025-1.045), p<0.001, which was statistically signi cant (Fig. 6N).
Results of GSEA. The GSEA was used to obtain functions and pathways that were active in the high and low-risk groups. The results of the GSEA functional analysis showed that corni cation, cpidermal cell differentiation, and epidermis development were active in the high-risk group, while in the low-risk group, spliceosomal tri-snRNP complex assembly, DNA packaging complex, protein DNA complex, Sm-like protein family complex, small nuclear ribonucleoprotein complex were the top 5 functionally active (Fig. 7A-B). Moreover, the results of GSEA pathway analysis showed that complement and coagulation cascades, ECM-receptor-interaction, PPAR signaling pathway, ribosome, Wnt-signaling_pathway were the top 5 active pathways in the highrisk group, while there was no obvious active pathway in the low-risk group (Fig. 7C-D).
Distribution of risk scores for clinical groups. Through using each clinical trait (Age, Gender, T, N, M, and Stage), the samples were divided into groups respectively, and the distribution of samples risk scores in each group was analyzed (Fig. 7E-J). The results showed that there was no signi cant difference in the distribution of risk scores in each group after grouping by age (≤65 and >65, p>0.05) and gender (Male and Female, p>0.05). After grouping by T, it could be found that there was no signi cant difference between T1 and T2 (p>0.05) as well as between T2 and T3 (p>0.05), which indicated that there was no signi cant difference in the risk between patients with CRC invading the submucosa and invading the muscularis propria, and there was also no signi cant difference in the risk between patients with CRC invading the muscularis propria and subserosal. According to N grouping, there was no signi cant difference between N0 and N1 (p>0.05), but the differences between them and N2 were signi cant (p<0.001), which indicated that there was no signi cant difference in risk when the number of lymph node metastases in patients with CRC does not exceed 3, while exceeds 3, the risk increased signi cantly. In general, after grouping by M and Stage, the average risk score of each group gradually increased.  (Fig. 7M).

Discussion
The tumor microenvironment (TME) is a component of cancer. It is a complex ecosystem that supports tumor growth, metastasis, while weakening immune surveillance, thereby affecting the disease progression and prognosis of patients with tumors. The TME contains stromal cells and immune cells, which accumulate at different stages of tumor development. Among them, the immune cells mainly include macrophages, natural killer (NK) cells, dendritic cells (DC), and lymphocytes, which are essential for the early invasion of tumors. The stromal cells mainly include endothelial cells, broblasts, mesenchymal cells, etc [29][30][31] . By searching for TME-related molecular targets to enhance the immune response of the host's anti-tumor, thereby inhibiting tumor growth and metastasis to achieve tumor treatment. Therefore, by studying TME-related cells and molecules, the prognosis of patients with tumors could be more accurately assessed.
CRC is one of the tumors with poor effect in immunotherapy 10 . How to classify patients with CRC more accurately and select appropriate treatment methods according to different subtypes is very important. At the molecular level, it is worth thinking about how to nd immune-related targets and construct immune-related molecular prognostic models to improve the therapeutic effect and more accurately assess the prognosis of patients. In order to solve this problem, many scholars have made a rmative efforts and found many effective methods. Hu and other researchers used the NMF algorithm with the rank value set to 3 to classify patients with CRC, but they did not compare the differences in different typing methods when the rank value was at 2, 3, and 5 respectively. Some scholars [32][33][34][35][36][37][38][39][40] built risk models to predict the survival of patients with CRC, but no one compared the accuracy of their respective models.
In this study, the NMF algorithm was used to perform an immune-related typing on the CRC samples of the TCGA DEGs shared by TCGA and GEO. The differences in OS and PFS of samples in subtypes with the rank value set to 2, 3, 5 respectively were compared, and it was found that the typing method with the rank value of 5 was the most statistically signi cant. Through the analysis of immune cell in ltration between subtypes, it was found that the in ltration content of Monocytic lineage was statistically different between the subtype groups, and the Fibroblasts had signi cant differences between the other subtypes except for the c1 and c3 subtypes. The overall trend of in ltrating content of them was gradually increasing from subtypes c1-c5, which suggested that our typing method could distinguish CRC patients with different in ltration content of Monocytic lineage and Fibroblasts. In the model of 14 immune-related genes constructed by us (PPARGC1A, CXCL11, PCOLCE2, GABRD, TRAF5, FOXD1, NXPH4, ALPK3, KCNJ11, NPR1, F2RL2, CD36, CCNF, DUSP14), there were 5 genes (PPARGC1A, CXCL11, NPR1, F2RL2, CCNF) had the coe cients less than 0, which were protective genes, and the remaining 9 genes (PCOLCE2, GABRD, TRAF5, FOXD1, NXPH4, ALPK3, KCNJ11, CD36, DUSP14) had the coe cients greater than 0, which were risk genes. Additionally, univariate and multivariate independent prognostic analysis showed that the immune-related risk model constructed by us could become an independent prognostic factor. Through the DCA and ROC curve analysis and comparison with models constructed by some predecessors, it was shown that the risk model constructed by us had a high prognostic accuracy. In terms of function, the results of GSEA suggested that corni cation, cpidermal cell differentiation, and epidermis development, ect., were active in the high-risk group, while functions such as spliceosomal tri-snRNP complex assembly were active in the low-risk group, which suggested that functions such as corni cation might promote the progression of CRC, while spliceosomal tri-snRNP complex assembly might inhibit the progression of CRC. In terms of pathways, the results of GSEA showed that pathways such as complement and coagulation cascades as well as ECM-receptor-interaction were active in the high-risk group, suggesting that the activation of these pathways might promote the development of CRC. From the risk score distribution of clinical groups, whether the tumor has metastasized, whether the number of lymph node metastasis exceeds 3, and whether the tumor penetrates the visceral peritoneum, etc., were associated with the risk score of patients with CRC, while age and gender were not associated with the risk score. In addition, there was no signi cant difference in the risk scores of patients with CRC between Stage and Stage as well as between Stage and Stage . Moreover, PDCD1, SERPINE1, CCDC80, etc., were signi cantly positively correlated with the model (p<0.05), indicating that these genes might promote the progression of CRC and increase the risk of patients with CRC. Besides, T cells, B Lineage, CD8 T cells, Endothelial cells, Monocytic lineage, etc., had a signi cant positive correlation with the model (p<0.05), suggesting that the in ltration of these cells might promote the development of CRC. The speci c expression, mechanism of action and speci c pathways of the 14 model genes in CRC need to be further veri ed and researched in vitro and in vivo. The correlation of genes and immune cells related to the risk model with CRC also need to be veri ed in vivo and in vitro.

Conclusion
In conclusion, the method of typing by using the NMF algorithm with the rank value set to 5 was more suitable for patients with CRC, and the 14 immunerelated genes prognostic risk model constructed by us could more accurately assess the prognosis of patients with CRC, which could be used as an independent prognostic factor.

Consent for publication
All individuals involved in this study provided consent for publication.

Data availability
The clinical and expression data of this study were all from TCGA and GEO databases as well as the patients' information in the database was obtained after patients' knowledge and consent. We can provide raw data as supplementary les.

Competing interests
The author(s) declare no competing interests. Figure 1 Flow chart of the main steps of this study. This study mainly consists of three parts: the blue part -mainly for data screening, the green part -mainly for the typing of NMF and the typing comparison, and the red part -mainly for the construction, analysis and comparison of prognostic risk models.

Figure 2
A-J. Comparison of typing methods with the rank value set to 2, 3 and 5 respectively: A. The cophenetic graph with the rank value set to 2:10 showed that the 5-6 and 3-4 segment have a larger slope, and combining the results of 9 heatmaps, the typing methods with rank value set to 2, 3 and 5 respectively were selected for comparison; B-D. The heatmap with the rank value set to 2, the OS survival analysis between the 2 subtypes, and the PFS survival analysis between the 2 subtypes; E-G. The heatmap with the rank value set to 3, the OS survival analysis among the 3 subtypes, and the PFS survival analysis among the 3 subtypes; H-J. The heatmap with the rank value set to 5, the OS survival analysis among the 5 subtypes, and the PFS survival analysis among the 5 subtypes. K-M. Comparison between each typing method and the predecessor's pan-cancer typing (the right side is the predecessor's typing): K. The comparison between typing with the rank set to 2 and others; L. The comparison between typing with the rank set to 3 and others; M. the comparison with the rank set to 5 and others.

Figure 4
A-B. Graphs of Lasso regression and cross-validation: A. The change trajectory of each variable, where abscissa represents the logarithmic value of the independent variable lambda, and the ordinate represents the coe cient of the independent variable; B. The Cross-validation graph, where the con dence intervals for different log(λ) were shown. C-J. The survival analysis and ROC curve analysis between the high and low-risk groups in the train group, the test group, all the TCGA samples group, and the GEO group: C-F. The survival curve of train group, test group, all TCGA samples group, and GEO group in sequence; G-J. The ROC curve of train group, test group, all TCGA samples group, and GEO group in sequence. The results showed that after grouping according to the risk model, there were signi cant differences in the survival of patients between the high and low-risk groups. Additionally, the AUC value in the ROC curve showed that the prediction effect of the risk score model was good.

Figure 5
A-J. Survival analysis between high and low-risk groups after clinical traits grouping. The patients were grouped according to gender, T, N, M, and Stage, and the Cox survival analysis was performed on the high and low-risk groups in each group. K-L. Nomogram and calibration curve: K. The Nomogram. It could be used to score each clinical trait and high or low-risk to obtain the total score, and according to the total score, the survival rate of patients at 1, 3, and 5 years could be found; L. The calibration curve. The closer each calibration curve is to the gray line in the gure, the higher the accuracy of the prediction is. M-N. Decision Curve Analysis (DCA) and ROC curve analysis: M. DCA. In the gure, "All" represented the curve assuming that all samples were positive, "None" represented the curve assuming that all samples were negative, and "Risk" represented the curve of the risk model. The further away from the purple curve, the stronger the ability of survival prediction; N. ROC curve analysis. "Risk" represented the curve of the risk model. The larger the Area Under The Curve (AUC), the higher the accuracy of survival prediction.

Figure 6
A-L. Survival analysis and ROC curve analysis of each prognostic risk model: A-F. The survival analysis grouped by our risk model as well as risk models of Huiqi, Jun, Libo, Xiaolong, and Xiao respectively; G-L. The ROC curve analysis performed on our risk model as well as risk models of Huiqi, Jun, Libo, Xiaolong, and Xiao respectively. M-N. C-index analysis results of respective risk models and the RMS graph: M. C-index analysis of the respective risk models. The higher the score, the higher the prognostic accuracy of the model; N. The RMS graph of the respective risk models. The results showed that the prognostic accuracy of our risk model was high, which was statistically signi cant (p<0.001).