Screening for differential prognostic genes:
A total of 1,549 differential genes in HCT116 were obtained from the RBP database provided in the literature[22]. A total of 2,929 significant differential genes were identified by setting the parameters log2 fold change (|logFC|) > 1 and p.adj < 0.05 in the TCGA-COADREAD dataset. A list of prognostic genes related to colorectal cancer was identified by univariate Cox regression based on the TCGA-COADREAD dataset. A total of 631 prognostic genes were obtained by setting the parameters HR > 1 and p.adj < 0.05. The RBP library, TCGA-COADREAD differential genes, and prognostic genes were used to create Venn diagram intersections. The results showed that three related genes intersected (Fig. 1a): TSFM, EIF4A1, and ARL6IP4. Visualization of these differential genes was conducted using volcano plot analysis in the TCGA-COADREAD database (Fig. 1b). Using TCGA-COADREAD, the three differential genes were analyzed for co-expression with clinical Pathologic T stage, Pathologic N stage, and Pathologic M stage. The results indicated that TSFM, EIF4A1, and ARL6IP4 showed higher expression in the Pathologic T3 stage, were primarily expressed in the Pathologic N0 stage, and predominantly expressed in the Pathologic M0 stage (Fig. 1c). We focused on the eukaryotic initiation factor EIF4A1, which is associated with translation initiation, and continued to study it as a key molecule in our research.
Expression landscape of EIF4A1
The mRNA and protein levels of EIF4A1 are widely expressed across various organs and tissues (Fig. 2a), as shown in Fig. 2a. This analysis reveals that EIF4A1 mRNA is predominantly found in the proximal digestive tract, kidney, urinary bladder, connective and soft tissue, as well as bone marrow and lymphoid tissues. According to the HPA database, high protein presence was noted primarily in the stomach, colon, rectum, placenta, tonsil, and bone marrow (Fig. 2b). IHC-stained sections of EIF4A1 in normal colorectal tissues versus colorectal cancer, as per the HPA database (Fig. 2d-g), demonstrated that normal colon tissues exhibited higher expression, appearing darker than those of the rectum. Additionally, colorectal cancer tissues showed a higher expression level than normal tissues. Analysis from the Ualcan database indicates that EIF4A1 protein expression is significantly higher in cancerous tissues (Fig. 2h). Protein expression and localization data for EIF4A1 in CACO-2 cells, obtained from the HPA database, reveal that EIF4A1 is primarily localized in the cytoplasm (Fig. 2i).
Analysis using the Timer 2.0 database revealed (see Fig. 3a) that EIF4A1 expression was considerably broader across various human cancers, including CHOL, COAD, ESCA, GBM, HNSC, KIRC, KIRP, LIHC, LUAD, LUSC, PRAD, STAD, UCEC (P < 0.001 for each), BRCA (P < 0.01), and READ (P < 0.05), generally displaying higher expression levels compared to normal tissues. Conversely, its expression was notably lower in KICH (P < 0.001) and PCPG (P < 0.05) relative to normal tissues. Pairwise tissue analysis based on TCGA database 11123 revealed (Fig. 3b): EIF4A1 is highly expressed in COAD (P < 0.001), CHOL (P < 0.05), KIRC (P < 0.001), and READ (P < 0.01), while it shows low expression in BRCA (P < 0.001) and THCA (P < 0.05). Based on the TCGA-COADREAD data of 698 samples, 51 cases of paraneoplastic were analyzed for differences, and the results suggested (Fig. 3c) that EIF4A1 showed high expression in COADREAD with significance P < 0.001, which was downloaded from the TCGA database ( https://portal.gdc.cancer.gov ) and organized RNAseq data from the STAR process of TCGA-COAD and TCGA-READ projects and extracted the data in TPM format, extracted the corresponding numbered paired paraneoplastic and cancer samples of TCGA-COADREAD dataset to perform the paired samples t-test, the results suggested that EIF4A1 was significantly higher in cancer tissues of COADREAD than in paraneoplastic tissues (Fig. 3c), with significant P < 0.001 and P < 0.001. The TCGA-COAD,TCGA-READ datasets were downloaded and organized from the TCGA database, and the data were log2(value + 1) processed, and cancer and paracancer difference analysis and paired analysis were performed in the same way, respectively. The results of the difference and paired analysis showed that the expression of EIF4A1 in the COAD and READ tissues was significantly higher than that in the paracancerous tissues and the significance of the EIF4A1 in the The significance of P < 0.001 in COAD was more statistically significant than that of P < 0.01 in READ, and these results may suggest that the differential value of EIF4A1 expression in colon tissues was greater than that in rectum (Fig. 3d-e).
Clinical diagnosis and prognostic value evaluation of EIF4A1
We conducted Cox proportional hazards regression analysis utilizing the 'survival' package to fit the survival models. Results were visualized employing the 'survivminer' and 'ggplot2' packages. Survival analyses encompassed the COADREAD, COAD, and READ datasets, evaluating three parameters: Overall Survival (OS), Disease-Specific Survival (DSS), and Progress-Free Interval (PFI). The overall survival results showed that the survival prognosis of low expression of EIF4A1 in COADREAD (P = 0.045) and COAD (P = 0.007) was better (Fig. 4b-c), and had statistical significance, while high expression of EIF4A1 in READ (P = 0.041) had better prognosis (Fig. 4d). Disease specific surviva1l analysis showed that high expression of EIF4A1 in COADREAD (P = 0.024) and COAD (P = 0.012) had a poor survival prognosis (Fig. 4f-g), but there was no statistical significance in READ (P = 0.778) (Fig. 4h). Progress-free interval analyses demonstrated (Fig. 4j-l) that high EIF4A1 expression is associated with poorer survival outcomes in COADREAD (P < 0.001) and COAD (P = 0.001). However, in READ, these differences were not statistically significant (P = 0.256). To synthesize and assess the prognostic significance of EIF4A1, we employed 'ggplot2' to visualize the Kaplan-Meier estimates for OS, DSS, and PFI, alongside Cox regression outcomes, including hazard ratios (HR) and P values (Fig. 4a, e, i).
Utilizing clinical data from the TCGA-COADREAD, TCGA-COAD, and TCGA-READ datasets, ROC analysis was conducted via the 'pROC' package, with the ROC curves subsequently visualized using 'ggplot2'. The analysis results show that (Fig. 4o-m), AUC (Area Under the Curve) of EIF4A1 in COADREAD: 0.827, AUC: 0.836 in COAD, AUC: 0.797 in READ. The results suggest that EIF4A1 has clinical diagnostic value and diagnostic potential in COADREAD, COAD, and READ. Compared with READ, COAD has higher diagnostic value, and EIF4A1 has more potential in the diagnosis of colon.
To elucidate the impact of EIF4A1 expression and clinicopathological factors on survival, we explored survival disparities between colon and rectal cancers through both univariate and multivariate Cox regression analyses. In the univariate analysis of colon cancer, pathological T stage (especially T4), pathological N stage (especially N1), advanced pathological stage, age, and EIF4A1 expression were significantly associated with the mortality risk of colon cancer patients.In rectal cancer, higher mortality risks were correlated with the T4 stage, N2 stage, and older age, as documented in Tables 1 and 2, with P-values < 0.05 confirming statistical significance. Multivariate analysis in colon cancer indicated that patients at T4 stage, with advanced pathological stages, and of older age exhibited persistently increased mortality risks even after adjusting for confounding variables.Similarly, in rectal cancer, patients at N2 stage and those of older age demonstrated significant elevations in mortality risk upon adjustment for other variables. Pathological staging emerged as a critical prognostic indicator in colon cancer, consistently associated with increased mortality risk in univariate analyses. In rectal cancer, some variables are missing, and the number of variables is insufficient after including the pathological stage in the multivariate analysis, hence the exclusion of this variable from the analysis. Age exceeding 65 years was consistently associated with an increased mortality risk in both univariate and multivariate analyses. Notable differences between the cancers included the significance of the T3 stage in colon cancer in univariate analysis, which did not persist in multivariate analysis. In contrast, the T3 stage in rectal cancer demonstrated no significant difference in either analysis.For colon cancer, the hazard ratio for the N1 stage decreased and lost significance in the multivariate analysis, whereas the N2 stage retained its significance.In rectal cancer, both N1 and N2 stages were significant predictors in univariate analysis; however, only the N2 stage maintained its significance in the multivariate analysis. The M1 stage was a significant factor in both univariate and multivariate analyses for colon cancer, but it was significant only in the univariate analysis for rectal cancer.Elevated EIF4A1 expression was linked with an increased risk of mortality in the univariate analysis of colon cancer, though this association did not hold in the multivariate analysis. Conversely, in rectal cancer, high EIF4A1 expression correlated with a reduced mortality risk in univariate analysis, a trend that did not reach significance in the multivariate analysis (Table 1, 2).
EIF4A1 transcription of expression and the clinical correlation analysis
We analyzed the correlation between EIF4A1 mRNA and protein expression and clinical outcomes through UALCAN database, and the results are as follows.
Staging: the transcript level of EIF4A1 is generally higher in COAD than in normal tissues. Its mRNA expression was significantly increased in all cancer stages (P < 1e-12 to 1.27e-08), and the differences were statistically significant. In different stages of COAD, the transcript level of EIF4A1 was significantly different between stage I and stage III (P < 4.858400e-2), stage I and stage IV (P < 3.003000e-03), and stage II and stage IV (P < 1.476810e-03), with statistical significance (Supplementary Fig. 1a). Compared with COAD, the expression of EIF4A1 in READ was higher in stages I and IV, while the expression difference in stages II and III was not statistically significant. In addition, there was a significant difference in the expression of EIF4A1 between stages I and II of READ (P < 1.479030e-03), and between stages I and III of READ (P < 2.148000e-02), with statistical significance (Supplementary Fig. 1b). In staging, the mRNA expression of EIF4A1 was mainly concentrated in stage1-2 stage, while READ was mainly concentrated in stage1 stage (Supplementary Fig. 1a, b).
Age analysis in COAD demonstrated that EIF4A1 expression was significantly elevated across all age groups compared to the normal group, particularly notable in the 41–60 (P < 5.55e-16), 61–80 (P < 1e-12), and 81–100 year olds (P < 8.89e-11) (Supplementary Fig. 1c). Conversely, in READ, EIF4A1 expression was significantly elevated in the 21–40 (P < 0.000956) and 41–60 year olds (P < 0.008163) compared to the normal group. Notably, expression in the 41–60 and 61–80 age groups was significantly higher than in the 81–100 year olds, all with statistical significance. However, no significant differences were observed in EIF4A1 expression between the 61–80 and 81–100 year old groups compared to the normal group (Supplementary Fig. 1d). Compared to READ (Supplementary Fig. 1c, d), elevated EIF4A1 expression in COAD predominantly occurred in the 81–100 age group, whereas in READ, it was primarily observed in the 21–40 age range.
In COAD, EIF4A1 expression was significantly elevated in both the adenocarcinoma and mucinous adenocarcinoma subtypes compared to normal tissues. Furthermore, EIF4A1 expression in mucinous adenocarcinoma was significantly higher than in adenocarcinoma (P = 0.04076), indicative of potential molecular differences between these subtypes (Supplementary Fig. 1e). Conversely, in READ, EIF4A1 expression in mucinous adenocarcinoma exceeded that in normal tissues, with statistical significance (P = 0.009957) (Supplementary Fig. 1f). In both COAD and READ, EIF4A1 expression was notably high in mucinous adenocarcinoma subtypes (Supplementary Fig. 1e, f).
Lymph node metastasis analysis revealed that EIF4A1 expression across varying states (N0, N1, N2) in COAD was markedly higher than in normal tissues, demonstrating significant statistical differences (P values ranging from 1.62e-12 to 1.00e-9) (Supplementary Fig. 1g). No significant statistical correlation was found between EIF4A1 expression and lymph node metastasis states in READ (Supplementary Fig. 1h). In COAD, the highest levels of EIF4A1 expression were predominantly observed in the N0 phase, whereas in READ, elevated EIF4A1 expression was primarily noted in the N1 phase (Supplementary Fig. 1g, h).
Gender analysis in COAD indicated that EIF4A1 expression was significantly elevated in both male and female patients compared to normal tissues, with P-values of < 1e-12 and 1.62e-12, respectively. However, the differences between genders were not statistically significant (P = 0.22666) (Supplementary Fig. 1i). In READ, EIF4A1 expression in male patients was notably higher than in normal tissues (P = 0.01266), whereas in female patients, this elevation did not reach statistical significance (P = 0.09802) (Supplementary Fig. 1j). In READ, the incidence of cancer among male patients may be higher than in female patients.
Regarding weight in COAD, EIF4A1 expression was significantly up-regulated across various weight categories: Normal Weight (P = 3.64e-12), Extreme Weight (P = 1.62e-12), Obese (P = 3.56e-10), and Extremely Obese (P = 0.007) (Supplementary Fig. 1k). In READ, EIF4A1 expression was up-regulated in Normal Weight, Extreme Weight, and Obese categories, with only Normal Weight reaching statistical significance (P = 0.00918). Additionally, EIF4A1 expression in Normal Weight was notably higher than in Extreme Weight (P < 0.032) (Supplementary Fig. 1l),Notably, in both COAD and READ, the highest EIF4A1 expression was observed in the Normal Weight category, with a general trend of decreasing expression as body weight increased (Supplementary Fig. 1k, l).
The correlation between EIF4A1 protein expression and clinical analysis
Given that the protein database exclusively encompasses clinical data for Colon cancer, our analysis was confined to examining the relationships between EIF4A1 protein expression levels and various clinical parameters in Colon cancer, including cancer stages, patient age, tumor history, chromatin modifier status, patient gender, and weight.
In Colon cancer, EIF4A1 protein expression demonstrated marked variability across different pathological stages.Compared to the normal cohort, EIF4A1 expression was substantially elevated in all cancer stages, particularly peaking in stages II (P = 8.77e-12) and III (P = 1.99e-12), with these findings reaching significant statistical levels (Supplementary Fig. 2a). However, there was no significant difference in EIF4A1 expression between cancer stages. This suggests that once colon cancer develops, EIF4A1 expression remains consistently high across advanced stages, indicating minimal correlation with cancer progression stages.
Regarding patient age, EIF4A1 expression was significantly up-regulated across all age groups compared to the normal cohort, with the most pronounced increase observed in the 61–80 year old group (P = 9.92e-16) (Supplementary Fig. 2b). These findings indicate that variations in EIF4A1 protein expression are potentially age-related during cancer progression, particularly among middle-aged and elderly individuals.
Tumor History: Analysis revealed that EIF4A1 protein expression was significantly elevated in both mucinous and non-mucinous tumors compared to normal samples (Supplementary Fig. 2c). Notably, the increase was more pronounced in non-mucinous tumors (P = 2.19e-21), although the differences between mucinous and non-mucinous tumors were not statistically significant (P = 0.527). This suggests that elevated EIF4A1 expression is a characteristic common among cancer patients, particularly in non-mucinous tumors.
Chromatin Modifier Status: The results indicated that EIF4A1 protein expression was significantly up-regulated in samples with modified chromatin (Supplementary Fig. 2d), with a statistically significant increase compared to normal samples (P = 2.82e-22),suggesting a significant correlation between elevated EIF4A1 expression and chromatin modification status, which may facilitate the up-regulation of EIF4A1 protein expression.
Patient Gender: Correlation analysis revealed that EIF4A1 protein expression was significantly higher in both males (P = 6.50e-13) and females (P = 1.75e-15) compared to the normal group (Supplementary Fig. 2e). However, there was no significant difference in EIF4A1 expression between males and females (P = 0.36).
Patient Weight: Compared to the normal group, EIF4A1 protein expression was significantly elevated in cancer patients across all weight categories: Normal Weight (P = 2.55e-11), Overweight (P = 3.32e-07), Obese (P = 4.30e-09), and Extremely Obese (P = 0.027) (Supplementary Fig. 2f). Specifically, EIF4A1 protein expression in Normal Weight was markedly higher than in other weight categories (Supplementary Fig. 2f). Compared to Overweight, EIF4A1 protein expression was significantly higher in the Normal Weight category (P = 0.0148), demonstrating statistical significance.Body weight appears to be a contributing factor to variations in EIF4A1 protein expression levels.
EIF4A1 interaction analysis
The EIF4A1 interaction network was constructed utilizing the GeneMANIA database (Fig. 5a). Eleven genes closely associated with EIF4A1 were identified via the STRING database, and the PPI network was constructed at a specified threshold, subsequently visualized with Cytoscape software (Fig. 5b). Key genes in this network include EIF4G1, EIF4A1, EIF4E, EIF4B, EIF3B, EIF1, EIF4G2, EIF4H, EIF4G3, PABPC1, and PDCD4. We intersected the molecular datasets from GeneMANIA and STRING to identify the most relevant gene (Fig. 5c). Intersected interaction molecules were visualized and analyzed for expression in COADREAD using the ComplexHeatmap package (Fig. 5d). Intriguingly, PDCD4 exhibited significantly elevated expression in paracancerous tissues.
EIF4A1 enrichment of function analysis
The DESeq2 package was employed to analyze differential expression between TCGA-COAD, TCGA-READ datasets, and the original counts matrix. EIF4A1 expression in COAD and READ was categorized into high and low for differential expression analysis. Data analysis was conducted using ggplot2 (version 3.3.6), with set thresholds of logFC: 1 and P-value: <0.05. Results indicated that in COAD, 1038 genes were up-regulated and 13 were down-regulated; in READ, 117 genes were up-regulated and 150 were down-regulated. These findings were visualized in a volcano plot (Fig. 6a-b). The differentially up-regulated genes from COAD and READ were sorted by logFC, and the top 100 were selected for GO/KEGG enrichment analysis. GO analysis of the EIF4A1-associated up-regulated genes in COAD revealed associations mainly with DNA, protein, and nucleosome processes (Fig. 6c). KEGG analysis identified links primarily to NET (neurotrophil extracellular trap) formation, atherosclerosis, and systemic lupus erythematosus. GO analysis of differential gene sets in READ indicated relations to intermediary metabolism, receptor ligands, and hormonal activity (Fig. 6d). KEGG enrichment analysis highlighted its primary association with the MAPK signaling pathway.
Differential gene sets for EIF4A1 high and low expression groups in COAD and READ were identified using the DESeq2 package with a threshold of P < 0.05. These gene sets were subsequently analyzed for functional enrichment using GSEA. Parameters were set as follows based on the analysis results: P < 0.05, q-value < 0.25. Five pathways with the highest positive and negative NES correlations were selected for further analysis. GSEA analysis of COAD tissues revealed significant enrichment of differential genes in DNA replication, signal transduction, gene expression regulation, epigenetic regulation, cell cycle, chromosome dynamics, and gene silencing (Fig. 6e-f). Notably, modifications such as methylation and acetylation were closely linked to cancer progression. Conversely, the top five pathways negatively correlated with NES predominantly involved energy metabolism, viral host responses, and mitochondrial function. According to GSEA analysis of READ tissues (Fig. 6g-h), pathways exhibiting positive NES correlations were predominantly enriched in functions pertaining to immune response recognition, neural signal transmission, and sensory perception. Conversely, pathways with negative NES correlations were chiefly enriched in areas involving energy metabolism, immune and inflammatory responses, and cellular stress. The complement system and complement cascade are critical for immune surveillance and response, and their inhibition may contribute to the development and progression of cancer.
Immune infiltration analysis
In COAD, EIF4A1 expression was significantly inversely correlated with tumor purity (correlation coefficient:cor = -0.163, P = 9.79e-04), indicating an increase in EIF4A1 expression as tumor purity decreased, which suggests enhanced immune cell infiltration. Furthermore, EIF4A1 expression in COAD demonstrated a strong positive correlation with the infiltration levels of CD8 + T cells (partial correlation:cor = 0.327, P = 1.34e-11), neutrophils (partial correlation:cor = 0.269, P = 4.38e-08), and dendritic cells (partial correlation:cor = 0.282, P = 8.80e-09) (Fig. 7a, c). In READ, a similar inverse correlation was observed between EIF4A1 expression and tumor purity (correlation coefficient:cor = -0.249, P = 3.02e-03). Contrary to COAD, in READ, EIF4A1 expression was negatively correlated with CD4 + T cell infiltration (partial correlation:cor = -0.21, P = 1.31e-02) and positively correlated with neutrophil infiltration (partial correlation:cor = 0.181, P = 3.39e-02), highlighting the diverse regulatory roles of different immune cells in the colorectal cancer microenvironment (Fig. 7b, c).
Correlation of EIF4A1 Expression with Immune Checkpoint Regulation and its Implications for Immunotherapy Efficacy
Given the critical link between the expression of immune checkpoint genes and the efficacy of immunotherapy, we initially investigated the association between EIF4A1 expression and genes identified as immune response-related checkpoints via the TIMER 2.0 database[30]. In COAD, EIF4A1 demonstrates a strong correlation with most immune checkpoints, whereas in READ, the correlation with EIF4A1 is comparatively weaker (Fig. 8a). Key immunosuppressive genes, including PD-L1 (CD274) and CTLA-4 (CTLA4), exhibit significant positive correlations with EIF4A1 across colon, colorectal, and rectal cancers (P < 0.001) (Fig. 8a). We identified correlations between EIF4A1 and several immune checkpoint blocker genes, including PD-1, PD-L1, CTLA4, and LAG-3.In both READ and COAD, EIF4A1 demonstrated significant positive correlations with CTLA4 and PD-L1, with these associations being more pronounced in colon cancer. Furthermore, in colon cancer, EIF4A1 expression along with PD-1, PD-L1, CTLA4, and LAG-3 was up-regulated, showing significant positive correlations. Conversely, in rectal cancer, the correlations between EIF4A1 and LAG-3, PD-1 were not statistically significant (Fig. 8b-e).
Recent research has demonstrated that high microsatellite instability (MSI-H) and tumor mutation burden (TMB) are promising predictive biomarkers for the efficacy of immunotherapy, with MSI identified in a diverse range of cancers[31, 32]. The presence of MSI signifies an unstable intracellular genomic state with the potential to precipitate cancer development.Specifically, tumors with high microsatellite instability (MSI-H) exhibit extensive genomic instability, associated with genetic disorders like Lynch syndrome, a hereditary form of colorectal cancer. We investigated the correlation between tumor mutation burden (TMB) and microsatellite instability (MSI) of EIF4A1 across COADREAD, COAD, and READ. The findings revealed that EIF4A1 expression in COADREAD, COAD, and READ positively correlates with both TMB and MSI. Notably, this correlation was more pronounced in COAD than in READ (Fig. 8f, g).