1.Differential expression analysis and Co-expression Analysis of DOKs between Tumor and Normal Samples. (Fig. 1A) shows the expression levels of DOK gene family in tumors. It can be observed that the overall expression levels of DOK1, DOK2, DOK3 and DOK4 are higher than those of DOK5, DOK6 and DOK7.
We used Wilcox assay to analyze the differential expression of seven DOK family genes between tumor samples and paracancerous samples. The DOK gene family is highly expressed in most tumors. However, DOK gene expression was basically low in patients with LUAD, LUSC and KICH, except DOK5 was highly expressed in LUAD. DOK6 is significantly low expressed in GBM, which is different from other DOK genes. In addition, DOK gene family was highly expressed in all gastric cancers (Fig. 1B).
Co-expression analysis showed the expression association between DOK gene families. It can be seen from the figure that DOK3 and DOK2 have the strongest synchronous co-expression(correlation coefficient = 0.66, p < 0:001). There are also significant co-expression correlations between DOK6 and DOK5, DOK1 and DOK2, and DOK1 and DOK3, with correlation coefficients of 0.3, 037 and 0.45, respectively. On the contrary, DOK2 has an opposite expression relationship with DOK5 and DOK6, the correlation coefficients are − 0.13 and − 0.18, and DOK6 and DOK7 also have an opposite expression relationship, the correlation coefficient is -0.19, P < 0.001 (Fig. 1C).
Almost all DOK family genes were significantly under-expressed in LUSC and KICH tumors. For CHOL tumors, all DOK genes were highly expressed, but only DOK1, DOK3, DOK4, and DOK7 showed statistically significant differences. Unlike DOK5 and DOK7, most DOK genes are significantly overexpressed in KIRC tumors. DOK6 is a special one in the DOK gene family. DOK6 expression is the lowest in the DOK family, and it is also the only gene with significantly low expression in GBM(figure S1-S2).
2.We performed KaplanMeier analysis on DOK1, DOK2, DOK3, DOK4, DOK5, DOK6 and DOK7 gene expression and overall survival time of 33 TCGA tumors (Fig. 2).First, we divided patients into high expression group and low expression group according to the limit of the median value of gene expression and compared whether different gene expression was associated with different survival time. There was a statistically significant difference between high DOK1 expression and poor prognosis in KIRC patients (P = 0.001). For LUAD patients, DOK1, DOK2, and DOK6 were all low expressed relative to adjacent or normal tissues, which was also consistent with the survival curve (P = 0.032, P = 0.012, and P = 0.049). The high expression of DOK2 and DOK3 is also a sign of poorer prognosis in GBM patients (P = 0.014, P = 0.035). DOK3 is highly expressed in KIRC patients, and higher expression means shorter survival time (P = 0.01), which is same as DOK5 in STAD patients (P = 0.044).
3.We mapped the forest figure to reflect the association between DOK family gene expression and prognosis of 33 TCGA tumor species (Fig. 3). Cox proportional hazard of regression method to detect DOK1, DOK2, DOK3, DOK4, DOK5, DOK6 and DOK7 prognostic role, and defines its hazard ratio (HR) > 1 was a significant prognostic factor. It can be concluded from the Fig. 3 that DOK1,DOK2 and DOK3 have significance in most cancers. For STAD patients, DOK5,DOK6 and DOK7 are all correlated with their prognosis (P = 0.04, P = 0.01, P = 0.01).
4.We focused on analyzing the significance of DOK gene family in gastric cancer and exploring whether it is associated with different pathological stages. From the Fig. 4, we found that DOK2, DOK3 and DOK5 were significantly correlated with gastric cancer at different pathological stages. Interestingly, we found that the expression levels of stage I and IV genes were the lowest, while the expression levels of stage II and III genes were relatively high. the expression of these difference may be helpful in predicting the clinical development of tumors (Fig. 4).
5.Immune subtype analysis. More than 10,000 tumor samples from 33 types of TCGA were divided into six immune subtypes: C1-C6 was wound healing, IFN-γ-dominated, inflammatory, lymphodepletion, immunosilent, and TGF-β-dominated. We downloaded immune-related data from TCGA and used Kruskal test to analyze the mRNA expression of DOK family genes, to explore the relationship between seven DOK genes and various immune types. As can be seen from the (Fig. 5A), all DOK genes are significantly associated with the C1-C6 immune subtype P < 0.001. Further analysis showed that DOK4 had the highest overall expression level in C1-C6, while DOK6 had the lowest expression level. Interestingly, C5 has an abnormally high expression in DOK6, which is consistent with DOK5, whereas C5 has the lowest expression in DOK2 and DOK7. We continue analyzed the association between DOK family genes and immune subtypes in STAD patients. As shown in the (Fig. 5B), DOK2-DOK6 were significantly correlated with the immunity of gastric cancer P < 0.001, but DOK7 was not statistically significant with the six immune subtypes. The average expression of DOK4 was the highest among the six immune subtypes. The expression of C6 was roughly the same as the overall immune analysis, and it was in a high position among all DOK genes.
6.The stemness index (DNAss) based on NDA methylation and the stemness index (RNAss) based on mRNA expression were used to measure and analyze the correlation between DOK gene and tumor stem cells. In order to investigate the association between the stemness features of pan-cancer and DOK gene expression, we calculated the stemness indices of TCGA tumor samples by using a one-class logistic regression (OCLR) algorithm and performed Spearman correlation analysis based on gene expression and stemness scores.
We can see that the correlation between DOK family genes and DNAss is generally not high. DOK6 has a large negative correlation with OV and TGCT, with correlation coefficients of -0.68 and − 0.78. DOK4 also had a significant negative correlation with TGCT, with a correlation coefficient of -0.73, suggesting that there were fewer tumor stem cells with high expression of these genes (Fig. 6A). Based on the analysis of RNAss, we found that most tumors were negatively correlated with DOK family genes, especially DOK5 and DOK6, which were significantly negatively correlated with most tumors. The special ones are LGG and THYM. DOK4 and DOK6 are positively correlated with LGG, with correlation coefficients of 0.59. In addition, DOK1 and DOK2 are also positively correlated with THYM, with correlation coefficients of 0.5 (Fig. 6B).
The occurrence, growth and metastasis of tumors are closely related to the surrounding environment. In addition to tumor cells, solid tumor tissues also include other normal cells, such as stromal cells and immune cells. Tumor cells could change this environment through autocrine or paracrine, and the body can also limit the occurrence and development of tumors by changing metabolism, secretion, immunity, structure, and other functions. All these constitute the tumor microenvironment (TME). The proportion of stromal cells and immune cells in solid tumors reflects the purity of the tumor, which has guiding significance for subsequent treatment.
We downloaded relevant data from TCGA database, ESTIMATE (using Expression data of Stromal cells and Immune cells in Malignant Tumors using Expression data) was used to calculate the scores of Stromal cells (figure S3) and Immune cells (figure S4) in tumor cells, and Spearman correlation analysis was used to describe the correlation between DOK family gene expression level and tumor purity. As can be seen from the (Fig. 6C), DOK2 and DOK3 are strongly correlated with stromal cells and immune cells of almost all tumors, which means that when DOK2 and DOK3 genes are highly expressed in patients, the purity of tumors will be reduced. The expression of DOK4 and DOK6 was positively correlated with the tumor purity of LGG, and the higher the gene expression, the lower the proportion of stromal cells and immune cells in the tumor, and the correlation coefficients were − 0.5 and − 0.6, respectively.
7.We analyzed the correlation between DOK gene and DNASS, RNASS and TME in STAD tumors by using a scatter plot. Except DOK4 and DOK7, the other DOK family genes were negatively correlated with DNAss and RNAss. DOK2 and DOK3 were highly correlated with immune scores, with correlation coefficients of 0.91 and 0.82, P < 0.001, respectively. DOK5 and DOK6 had a higher correlation with matrix score, the correlation coefficients were 0.79 and 0.76, P < 0.001 (Fig. 7).
8.Drug sensitivity analysis in pan-cancer. We downloaded and processed the transcriptional expression of DOK family genes in NCI-60 cancer cell lines and the drug activity of 263 antitumor drugs from CellMiner database to analyze the potential influence of DOK family genes on drug response by Pearson correlation analysis. From the results of the analysis, we can conclude that DOK2 is a gene that is sensitive to a variety of cancer drugs. DOK2 had a significant positive correlation with the sensitivity of nelarabine and chelerythrine, with correlation coefficients of 0.725 and 0.706, P < 0.001; DOK4 was negatively correlated with Okadaic acid, the correlation coefficient was − 0.488, P < 0.001. DOK6 was positively correlated with Estramustine, correlation coefficient was 0.547, P < 0.001 (Fig. 8). This analysis of gene expression and drug sensitivity is expected to provide new ideas for clinical treatment and subsequent experimental basic research.
9.Tumor mutation load (TMB). With the rapid development of immunotherapy, the significance of detecting tumor mutation load is becoming more and more important. TMB refers to the number of somatic mutations in the tumor genome after the deletion of germ line mutations, that is, the deletion of innate mutations, only looking at the number of mutations specific to tumor cells. The higher the TMB, the more neoantigens the tumor produces, the more easily the tumor cells can be recognized by the body's immune cells, and the more effective the efficacy of immunotherapy is likely to be. Through radar chart analysis, DOK4 has the highest TMB correlation with STAD (correlation coefficient 0.28, P < 0.001, Fig. 9A). However, DOK6 shows the opposite performance, with a correlation coefficient of -0.42, P < 0.001 (Fig. 9B). The correlation analysis of DOK family genes with TMB may provide reference for tumor immunotherapy.
Detection of microsatellite instability (MSI). Microsatellites are short tandem repeats throughout the human genome. Compared with normal cells, microsatellites in tumor cells change in length due to the insertion or deletion of repeat units, leading to the occurrence and development of tumors, which is called microsatellite instability. In the current clinical treatment, microsatellite instability is closely associated with colorectal cancer, and this phenomenon is present in about 15% of colorectal cancer, so we analyzed the correlation between DOK family genes and MSI and listed the radar map most related to COAD. DOK2 was positively correlated with MSI, the correlation coefficient was 0.22, P < 0.001, while DOK4 was the most significant gene negatively correlated with MSI, P < 0.001, (Fig. 9C-D). Further experiments are needed to prove whether patients with COAD can benefit from the expression of these two genes. Additional radar maps are shown in the attached picture (figure S5-S6).
10.Based on KEGG pathway analysis, we found that DOK family genes were enriched in multiple pathways related to STAD. DOK3, DOK6 and STAD are enriched in the autophagy pathway (Fig. 10).