We use Flow Path figure (Figure.8) to present the idea of the article.
2.1 Data collection and processing results
By searching TCGA and GEO databases, we got the TCGA-colon cancer samples included 39 normal samples and 398 tumor samples; and GEO data (GSE39582) included 566 tumor samples. The data of the two databases were merged and differentially analyzed, and the genes related to colon cancer were analyzed with the “limma” and “pheatmap” packages of R language. A total of 156 genes related to colon cancer were obtained and visualized (Fig. 1), among which up-regulated genes accounted for 69% and down-regulated genes accounted for 31%.
2.2 Processing, copy number variation analysis, differential analysis, survival analysis and prognosis analysis of anoikis gene data of colon cancer cell
Anoikis-related genes were obtained by searching “anoikis” in the genecards database and Harmonizone database, and the two data were combined to obtain a total of 516 genes related to anoikis, which were related to colon cancer obtained in Chap. 2.1. A total of 12 related genes were got through combined screening of genes. Figure 2-A shows that most anoikis genes have changes in copy number, and the frequency of copy number gain is slightly greater than that of deletion. The TCGA and GEO database data were combined for survival analysis, as shown in Table 1. We represent it in Figrue 2-B with a boxplot, and blue means low risk genes and red means high risk genes. Among them, hazard ratio (HR) > 1 is labeled as high-risk gene, and P < 0.05 indicates that the gene has a significant difference in prognosis (Fig. 2-C). The results of survival analysis in Table 1 show that the genes related to prognosis include NAT1, CDKN2A, IGF1, TIMP1, HOTAIR, SNAI1, UCHL1, MUC4, NOTCH3, PAK3, INHBB, and LTB4R2, most of which are high-risk genes. .Then we constructed the prognosis network map, as shown in Fig. 2-C, the purple nodes represent high-risk genes, the green nodes represent low-risk genes, and the size of the nodes represents the P value. The larger the node, the more likely it is a prognostic gene. The red line represents the positive regulatory relationship between the two genes, and the blue line represents the negative regulatory relationship. It can be seen from the connection that the two interacts closely with each other for expression. In addition, according to immunohistochemical staining test, in Human Protein Atlas (HPA), tumor is significantly more stained in COAD(Fig. 2-D).
2.3 Anoikis typing of colon cancer cells, survival analysis of typing genes, and differential analysis results of GSVA, ssGSEA, and PCA
By performing cluster analysis on the expression data of anoikis-related genes, the samples can be divided into subtypes A and B (Fig. 3-A). PCA analysis showed (Fig. 3-B) that A and B subtypes could be clearly distinguished according to gene expression. The survival analysis of the two subtypes, as shown in Fig. 3-C, shows that there is a statistical difference (P < = 0.05), and the B subtype has a better prognosis. As shown in Fig. 3-D, it can be seen that subtype A is mostly highly expressed, while subtype B is the opposite. Through GSVA analysis, subtype B mainly showed high expression of immune-related pathways, and type B mainly presented low expression (Fig. 3-E). According to the ssGSEA analysis (Fig. 3-F), there were 18 immune cells that were statistically different between the two subtypes, among which CD56bright.natural.killer.cell, CD56dim.natural.killer.cell, Eosinophil.immune.dendritic.cell, Macrophage, Mast.cell, Natural.killer.T.cell, Natural.killer.cel,l Regulatory.T.cell, T.follicular.helper.cell, and Type.1.T.helper.cell were high in the A subtype. Meanwhile, Activated.B.cell, Activated.CD4.T.cell, Activated.CD8.T.cell, Neutrophil, Plasmacytoid.dendritic.cell, Type.17.T.helper.cell, and Type.2.T.helper.cel were high in subtype B.
2.4 Construction of prognostic model, prognostic analysis and visualization results
(1) The prognostic model Lasso regression results and cross-validation results are shown in Fig. 4-A and B, and the COX regression model obtains the risk values of the two groups and the overall. The process of constructing prognostic model is shown in Fig. 4-C, that is: firstly according to the expression of anoikis-related genes, we divided the samples into subtypes A and B, from which we found out the differential genes through differential analysis; then we performed survival analysis on the differential genes to find out the prognosis related differential genes and built a prognosis-related model, according to which the patients were divided into high-risk and low-risk groups; inside type A is mostly a low-risk and last surviving sampleA small number of typed B is a low-risk and last surviving sample(Figure 4-C)and finally the survival period of the patients was predicted according to the prognostic model.
(2) The differential analysis results of the risk scores of anoikis subtypes showed that there was a statistically significant difference between subtypes A and B: the risk score of subtype B was higher, as shown in Fig. 4-D.
(3) According to the survival curve (Fig. 4E-G), regardless of the training group, verification group or the overall group (the sum of the training group and the verification group), there were significant differences in the survival time of the high- and low-risk groups. It shows that the constructed prognostic model can distinguish patients in high- and low-risk groups. And the ROC curve (Fig. 4H-J) shows that the area under the line of the training group is the largest, while the area of the verification group is the smallest. Therefore, the prognosis model bears with the greatest accuracy in terms of predicting survival-period training, while the verification the smallest. Regardless of the training group, the verification group or the overall group, the area below the line is greater than 0.5, so the accuracy of the prognosis model in predicting survival is high. Compared with other clinical traits, risk score as an independent analysis factor, P < 0.05 indicates that it can be used as an independent factor independent of others (Fig. 4-N). The nomogram can predict the survival of patients with different clinical characteristics of the prognostic model. From Fig. 4-O, it can be seen that women of 65 years old and with high risk scores have poor prognosis, and their 1-year, 3-year, and 5-year survival rates are respectively 94.3%%, 87.2%, 77.5%. Figure 4-P shows the accuracy of nomogram survival prediction. In Fig. 4-Q, We know red means high risk and blue means low risk, so, like TIMP1, SNAI1 and PAK3 are high risk genes. From the DCA curves (Fig. 4K-M), it was observed that the nomogram and risk score were superior to other clinical traits as an assessment of disease risk.
2.5 Analysis results of immune cell infiltration and immune cell content
By CIBERSORT calculation, the relative content of each type of immune cell in each sample was obtained as the result of immune cell infiltration. The results were used to further draw the correlation between risk score and immune cells, among which there was a significant negative correlation with T cell memory resting, B cell memory, plasma cells, and mast cells activated. There was a significant positive correlation with macrophages M0, macrophages, and mast cells resting, as shown in Fig. 5-A. The correlation between the genes involved in the model construction and immune cells is shown in Fig. 5-B. Most of the genes were correlated with immune cells (P < 0.05), and there were more negative correlations than positive correlations.
2.6 Results of drug sensitivity analysis and tumor microenvironment analysis
(1) Drug sensitivity analysis: first we obtained the drug sensitivity score, and analyzed the sensitivity of each drug in the high- and low-risk groups. The results are presented in Fig. 6-A, which showed that among the 42 drugs, 12 of them are more sensitive in the high-risk group, namely Dasatinib, BMS-754807, AZD8055, AZD1332, AZ960, IGF1R-3801, Tozasewrtib, Sepantronium, NU7441, Luminespib, JQ1, and UMI-77, accounted for 28.57%; while the remaining 28 drugs were more sensitive in the low-risk group, accounted for 66.67%.
(2) Tumor microenvironment analysis: firstly, according to the scoring results of clinical samples in the two databases of TCGA and GEO, the tumor microenvironment was scored by three aspects: stromal cell score, immune cell score, and total score. Then the scores were analyzed according to the high- and low-risk groups. The Fig. 6-B shows that the scores of stromal cells, immune cells and total scores were statistically different between the high- and low-risk groups (P less than 0.05), and the tumor microenvironment scores were higher in the high-risk group.
2.7 Results of single cell analysis
After searching in the TISCH2 database, we drew Fig. 7 According to the content of genes related to anoikis in colon cancer in each immune cell. Figure 7-A shows the grouping of cells, Fig. 7-B annotates the groups corresponding to different immune cells, Fig. 7-C performs data analysis on the total expression of immune cells and represents it in pie charts and histograms, and Fig. 7-D shows the contents of genes with significant differences in expression in anoikis in different immune cells. It was found that PAK3 had the highest content in Treg, as HOTAIR in CD8Tex, MUC4 in CD4Tconv, and SNAI1 in CD8T.