Identification of CRC ESPs based on integration of CRISPR screen and scRNA-seq datasets
Eight genome-wide CRISPR/Cas9 loss-of-function screen datasets of CRC were collected, including a total of 49 CRC cell line types and 301 samples with replicates (Fig. 1a, Supplementary Table 1). To evaluate the essential degree of each gene, the sgRNAs depletion score for each gene in each sample was calculated by CERES algorithm and defined as CERES score 11. A lower CERES score indicates a higher level of gene essentiality as all the screen datasets are negative selection in this study. Considering multiple datasets were involved, we integrated CERES scores for each gene across various datasets by utilizing Robust Rank Aggregation (RRA) algorithm19. The optimal cutoff to obtain ESPs was determined by calculating the degree of overlap between RRA results and other pan-cancer ESP sets previously identified, including Achilles’ ESPs, Hart’s ESPs, and Zhang’s ESPs. As a result, 1521 candidates of CRC ESPs were obtained (Fig. 1b). We further collected full-length scRNA-seq data of CRC (GSE81861) to analyze the detection fraction of candidate ESPs in tumor epithelial cells20. We found 396 ESPs were expressed in more than 50% of tumor epithelial cells and were thus considered as final CRC ESPs (Supplementary Table 2). Among them, 74 ESPs were found to be overexpressed in tumor epithelial cells (Fig. 1c, Supplementary Table 2). These ESPs were principally linked to cancer-related functions, such as MYC TARGETS V1/V2, positive regulation of signal transduction by p53 class mediator, and response to interferon-gamma (Fig. 1d). Notably, we found ESPs exhibit higher expression than other genes in both tumor cells and normal cells (Supplementary Fig. a), which corroborated the findings from a previous study 9.
Then we compared the CERES scores of CRC ESPs with those of pan-cancer ESPs and Hart’s non-ESPs to assess the quality of our results. As anticipated, Hart’s non-ESPs showed the highest CERES scores and indicated the lowest necessity, while the CRC ESPs were lower, implying their essential roles in CRC (Fig. 1e). Additionally, we observed larger overlap between CRC ESPs and other pan-cancer ESP sets than that of randomly selected genes, further reflecting the high quality of CRC ESPs (Fig.1f). Among 396 ESPs, 155 ESPs co-exist in all ESP sets, indicating the homogeneity of different tumors (Fig. 1g). Notably, we found three novel CRC ESPs, CTNNB1, NDUFB9, and SCNM1 that were not found in any other ESP sets (Fig. 1g). To determine whether these novel CRC ESPs are functionally important in the survival of CRC cancer cell lines, we detected the effect of CTNNB1 and NDUFB9 knockout on HCT116 cell line by CRISPR/Cas9. Remarkably, HCT116 cell line lost viability while CTNNB1 and NDUFB9 were knockout, respectively (Fig. 1h). CTNNB1 was reported as a driver gene in CRC according to the Pan-Cancer Analysis of Whole Genomes (PCAWG) study 21. NDUFB9, an important gene involved in mitochondrial function, has been identified as a prognostic biomarker for endometrial cancer 22. However, the role of SCNM1 in CRC is needed to further investigate.
Prediction of CRC ESLs
LncRNAs play an important regulatory role in various cellular processes of CRC, including cell apoptosis, proliferation, and epithelial-mesenchymal transition 23. To elucidate a comprehensive landscape of essential genes in CRC, we employed HT and PPR methods, which we proposed in our previous study, to predict essential lncRNAs 12. These methods required the construction of a two-color network comprising of both protein-coding genes and lncRNAs, where lncRNAs essentiality were predicted based on their closeness to ESPs 12. In order to look for the best two-color network, we built four different co-expression networks, consisting of an unweighted co-expression network and a topological overlap weighted co-expression network based on the RNA sequencing (RNA-seq) profiles of colorectal adenocarcinoma from The Cancer Genome Atlas (TCGA, referred to as unWCN-TCGA and WCN-TCGA, respectively), as well as an unweighted co-expression network and a topological overlap weighted co-expression network based on expression profiles of tumor epithelial cells from scRNA-seq dataset (referred to as unWCN-TEC and WCN-TEC, respectively) (Fig. 2a, see method). Then we calculated the adjusted p-value of each gene by HT to evaluate whether the directly related genes of a query gene are enriched in ESPs based on unWCN-TCGA and unWCN-TEC (Fig. 2a). We observed that the adjusted p-values of CRC ESPs, Achilles’ ESPs, Zhang’s ESPs, and Hart’s ESPs showed significantly smaller than those of Hart’s non-ESPs (Fig. 2b). Importantly, the adjusted p-values of ESPs based on unWCN-TEC were significantly lower than those based on unWCN-TCGA (Fig. 2b), suggesting the higher quality of co-expression network constructed from scRNA-seq datasets than that from TCGA RNA-seq datasets. Consequently, we identified 26 lncRNAs as potential candidates of CRC ESLs by using the cutoff of adjusted p-value lower than 0.01 based on unWCN-TEC.
Next, we utilized CRC ESPs as original seeds to calculate the score of all genes by PPR method based on WCN-TCGA and WCN-TEC, respectively (Fig. 2a). The higher score represents the higher probability of a gene being essential. As expected, the PPR scores of CRC ESPs, Achilles’ ESPs, Zhang’s ESPs, and Hart’s ESPs were higher, and WCN-TEC also showed greater advantage than WCN-TCGA as higher PPR scores of ESPs in WCN-TEC were observed (Fig. 2c). We then selected the top 5% lncRNAs in PPR results based on WCN-TEC as CRC ESL candidates (n=39). Results yielded 41 ESLs after combing the results of the HT and PPR methods, and 24 lncRNAs were found to be same (Fig. 2a). Finally, we retained the ESLs detected in more than 30% of CRC tumor epithelial cells (GSE81861) and defined them as CRC ESLs (n=29, Supplementary Table 3). Expression analysis revealed that 11 CRC ESLs were significantly overexpressed in CRC tumor epithelial cells compared to normal (Fig.2d, Supplementary Table 3). We selected 6 overexpressed ESLs, including H19, MIF-AS1, NORAD, SNHG17, SNHG6, and SNHG8, for CRISPR/Cas9 knockout experiments. The results demonstrated that knockout of these ESLs in HCT116 cell line will result in the decreasing of survival cells (Fig. 2e), implying the dependent role of those ESLs in CRC.
In our results, some ESLs were known as CRC associated lncRNAs according to previous reports, such as PVT1, H19, and GAS5 (Fig. 2f, Supplementary Table 3). By analyzing the functions and clinical association of CRC ESLs in Lnc2Cancer 3.0 database24, we found that most CRC ESLs are involved in cell apoptosis, growth, survival, and metastasis (Fig. 2f). Among these ESLs, 11 lncRNAs were previously identified as part of Zhang’s pan-cancer ESLs (n=97, Fig. 2f).. To identify potential Gene Ontology (GO) biological process functions of CRC ESLs, we conducted enrichment analysis on the top 200 associated genes of each ESL based on WCN-TEC. The results revealed that a significant number of CRC ESLs are involved in ribosome/ncRNA/rRNA/DNA processing, regulation of translation, telomere maintenance, and signal transduction by p53 class mediator (Supplementary Table 4), providing additional evidence for the crucial roles of CRC ESLs in fundamental metabolic processes of tumor cells.
Expression, DNA mutation, and DNA methylation patterns of CRC essential genes
To examine the expression characteristics of essential genes in different cell types of CRC, we utilized CRC ESPs and ESLs as gene sets to calculate enrichment scores in each cell type based on the 10X genomics scRNA-seq dataset of CRC (GSE132465) 15. As anticipated, both CRC ESPs and ESLs gene set enrichment scores were higher in epithelial cells of tumor samples compared with normal samples (Fig. 3a). Interestingly, the gene set scores of CRC ESPs were particularly elevated in tumor epithelial cells compared to other pan-cancer ESP sets (Fig. 3a), indicating the specificity of CRC ESPs for CRC tumor epithelial cells. We also observed the expressions of ESPs and ESLs in other cell lineages, such as T cells and B cells (Fig. 3a), were lower than tumor epithelial cells, implying that target essential genes may have side effect on cell lineages in TME. Notably, 4 ESPs (KRT8, MYC, PRELID3B, and SCD) were specifically expressed in tumor epithelial cells (Fig. 3b, see methods). Three of them, KRT8, MYC, and SCD are well-known biomarkers or therapeutic targets for cancer25-27, while few research has performed on PRELID3B in cancer. CRISPR/Cas9 knockout experiments confirmed that function loss of these genes affects the survival of HCT116 cell line (Fig. 3c).
Mutation of essential genes has been shown to disrupt normal cell development and promote cancer, such as MYC. We conducted mutation analysis using whole genome sequencing (WGS) datasets of CRC in PCAWG and whole exome sequencing (WES) datasets of CRC in TCGA to explore the genomic alteration of CRC ESPs and ESLs. In general, the mutation rates of ESPs and ESLs in WGS dataset ranged from 0% to 25% and 0% to 33%, respectively (Fig. 3d-e), while the mutation rates of ESPs were relatively low in WES dataset, ranging from 0% to 8% (Supplementary Fig. b). Among the mutated genes in WGS study, DYNC1H1 (25%), CHD4 (23%), CNOT1 (21%), and CTNNB1 (19%) exhibited the highest mutation frequency (Fig. 3d). Similarly, these genes also showed relatively high frequencies and ranked within the top 5 mutated genes in the TCGA CRC dataset (Supplementary Fig. b). Importantly, most CRC ESPs mutated in a co-occurrence manner in both the WGS and WES datasets (Supplementary Fig. c-d), suggesting that the mutation of ESPs typically activates synergistic oncogenic pathways.
DNA methylation also plays a crucial role in influencing gene expression. Through analysis of differentially methylated positions (DMPs) in ESPs and ESLs in TCGA-COAD, we identified a total of 71 DMPs with 45 unique genes (adjusted p-value < 0.05, absolute log2FC > 0.2, Supplementary Fig. e). Of these DMPs, 26 showed DNA hypermethylation (log2FC > 0.2), while 45 exhibited DNA hypomethylation (log2FC < -0.2). Subsequently, we calculated the correlation coefficients between the expression and DNA methylation levels of each gene. The results revealed three ESPs (MYC, SCD, EIF6) and one ESLs (PVT1) had negative correlation coefficients (Supplementary Table 5). We observed hypomethylation of two probes cg08526705 and cg00163372 in the gene body may activate MYC expression in CRC (Supplementary Fig. f-g). These two probes were found to be related with chemotherapy drug resistance previously28. In PVT1, three probes cg23898497, cg00780520, and cg10202727 were found to be associated with PVT1 expression (Fig. 3f), which were also observed in TCGA-READ (Supplementary Table 5). PVT1 is an adjacent gene of MYC on chromosome 8q24.21 and has been shown to regulate MYC expression and contribute to cancer development 29. Our findings indicated that the hypomethylation of MYC and PVT1 lead to the upregulation of their expression, and thus promote the development of CRC.
TF regulatory network of essential genes in CRC tumor epithelial cells
Based on 10X genomics scRNA-seq dataset of CRC epithelial cells (GSE81861), we generated a TF regulatory network of CRC by using SCENIC algorithm30. A total of 30,743 TF-target relationships associated with CRC ESPs and ESLs were established, including 177 TFs and 9,837 genes (Supplementary Table 6). These relationships were further categorized into high- or low-confidence annotations 30(see methods). Interestingly, 9 TFs from ESPs related to 80.63% (n=24,788) of identified TF-target relationships. Among them, 3 TFs had a highest number of high-confidence relationships compared with others essential TFs, including BCLAF1, MYC, and YY1 (Fig. 4a). Area under the curve (AUC) scores from SCENIC were used to quantify the activity of TF targets (regulon). Differential AUC scores of regulons revealed the significant activation of 79 regulons in tumor epithelial cells, including BCLAF1 regulon, YY1 regulon, and MYC regulon (Fig. 4b-c, Supplementary Table 7). Furthermore, three novel CRC ESPs (CTNNB1, NDUFB9 and SCNM1) that mentioned above were also regulated by BCLAF1, YY1, and MYC (Fig. 4d), suggesting that these TFs may act as oncogenes in CRC. Previously, knockdown of the L isoform of BCLAF1 in mouse tumor model inhibited the tumor growth, confirming the carcinogenic characteristics of BCLAF131. Above all, our findings demonstrated the intricate regulatory networks between TFs and CRC ESPs, and the TFs arise from CRC ESPs may play significant roles in tumorigenesis.
In the regulatory network, 424 relationships were linked between TFs (n=108) and ESLs (n=28, Fig. 4e, Supplementary Table 6). Notably, the lncRNAs such as FGD5 - AS1, GAS5, and PVT1 were regulated by more than 20 TFs. Among these, FGD5 - AS1, has been emerged as a crucial regulator in CRC and other types of cancer via promoting cell proliferation, drug resistance, and epithelial-mesenchymal transition32,33. In our study, 14 TFs were found to regulate the expression of FGD5 - AS1 (Fig.4f), some of them were known well associated with CRC, such as EGR134 and XBP135. Collectively, our findings provided a reliably regulatory network connecting TFs and CRC essential genes based on scRNA-seq data.
Crosstalk between CRC essential genes and TME
The CCN between cancer cells and cells in TME mediated by ligand-receptor interactions, plays a crucial role in shaping tumor behavior and TME remodeling. In our study, only three CRC ESPs, including COPA, RPS19, and TFRC, were known to act as ligands or receptors. The majority of CRC ESPs may participate in the CCN by regulating the expression of ligands or receptors in tumor epithelial cells (Fig. 5a). To verify such relationships, we first constructed a CCN associated with tumor epithelial cells using CellphoneDB36. The CCN consists of 169 connections targeting tumor epithelial cells and 141 connections source from tumor epithelial cells (Supplementary Table 8). The CCN involves 161 ligands/receptors, including JAG1 ligand, NOTCH receptors, and 11 protein complex such as the integrin complex and VEGFR complex (Supplementary Table 8). On the whole, CRC tumor epithelial cells interacted more frequently with endothelial cells, fibroblast cells and myeloid cells (Fig. 5b), which are known to play important roles in angiogenesis, tumor invasion, and immune response.
We then integrated the WCN-TEC, TF regulatory network, and CCN, resulting in a total of 1734 relationships (Supplementary Table 9), where 356 ESPs (including 9 TFs) and 28 ESLs affect the expression of 85 ligands/receptors of tumor epithelial cells, suggesting that CRC ESPs and TFs have a widespread influence on CCN (Supplementary Fig. h). We observed that six ligands/receptors of tumor epithelial cells, including GPI, CDH1, MIF, RPS19, CXADR, and CD47, are associated with more than 100 essential genes (Supplementary Table 9), indicating these ligands/receptors may act as the critical mediators between essential genes and TME. Among them, MIF secreted from tumor epithelial cells could interact with CD74, TNFRSF10D, and TNFRSF14 (Fig. 5c), the relationships may responsible for the suppression of antitumor immune response 37. Additionally, it has been reported that the CD47-SIRPα/γ interaction protects tumor cells from killing by suppressing both macrophage phagocytosis and antigen presentation of dendritic cells38. Consistent with these findings, we observed a significant activation of CD47-SIRPα/γ interactions between tumor epithelial cells and myeloid cells (Fig. 5c). Clinically, CRC patients from TCGA with high expression levels of CD47 and MIF exhibited a significantly lower overall survival rate compared to those patients with low expression levels of CD47 and MIF (Fig. 5d). Our results supported the notion that essential genes can facilitate tumor cells evasion from immune system by regulating the expression of of receptors/ligands such as CD47 and MIF in CRC.
Next, we aimed to identify existing drugs that could potentially repress CRC essential genes and TFs associated with CD47 and MIF (n=304, Fig. 5a). We utilized an integrative web platform, iLINCS 39, to analyze the expression pattern of 304 targets in pre-computed signatures of 10 anti-cancer agents in 5 CRC cell lines (GSE116439)40. We found that drugs such as sunitinib, dasatinib, topotecan, and lapatinib were sensitive to cancer cells, as they can effectively inhibit the activity of target genes at appropriate concentration and treatment time (Fig. 5e, Supplementary Table 10). For instance, sunitinib acts as an inhibitor of CSF1R, VEGF receptor, c-kit, PDGF receptor, and RET, and has been widely used in cancer therapy41. Treatment with sunitinib resulted in the down-regulation of MYC, KRT8, CDC37, and other genes in the CRC cell line KM12 (Fig. 5e, signature ID: PG_4150). In addition, the patients with high CD47 expression benefit from sunitinib monotherapy in clear cell renal cell carcinoma42, further supporting sunitinib may act as CD47 inhibitor in CRC. On the other hand, CRC cell lines treated with cisplatin or vorinostat exhibited over-expression of target genes such as MYC and TAF7 (Fig. 5e), indicating that CRC cells may develop resistance to these drugs. In conclusion, our results suggest that CRC essential genes may impact the TME through CCN and provides valuable insights into potential therapeutic strategies for CRC.