DOI: https://doi.org/10.21203/rs.3.rs-49921/v1
Background
Cancer stem cells (CSCs) play an important role in drug resistance, recurrence, and metastasis of tumors. Considering the heterogeneity of tumors, this study aimed to explore the key genes regulating stem cells in intestinal-type and diffuse-type gastric cancer (GC).
Methods
RNA-seq data and related clinical information were downloaded from The Cancer Genome Atlas (TCGA). Based on the mRNA expression-based stemness index (mRNAsi), important modules and key genes were identified by weighted co-expression network analysis (WGCNA). The expression of key genes were further verified by Oncomine database.
Results
MRNAsi scores of GC were significantly higher than that of normal tissue. Besides mRNAsi scores of intestinal-type GC (IGC) were significantly higher than that of diffuse-type GC (DGC). WGCNA showed that the brown modules for both types of GC were the most significantly associated with mRNAsi. We screened out 7 and 43 key genes for IGC and DGC, and found that these genes were closely related, respectively. Functional analysis showed the relationship between the key genes confirmed in Oncomine database and fate of cells.
Conclusions
In this study, 7 and 43 genes related to the characteristics of CSCs were identified in IGC and DGC, respectively. These genes were associated with cell cycle and homologous recombination, which could serve as therapeutic targets for the inhibition of stem cells from both types of GC.
As a common neoplasm worldwide, gastric cancer (GC) is the main cause of cancer-related deaths [1]. According to cancer statistics released by the International Agency for Research on Cancer (IARC), there were 952,000 new cases of GC worldwide in 2012, accounting for 68% of cancer patients, of which723,000 die, about 88% of all cancer patients [2]. Approximately 90% of GC are adenocarcinoma originated from gastric mucosa. According to the Lauren classification, GC can be separated into the diffuse-type (DGC) and intestinal-type (IGC), which are significantly different in terms of tissue morphology, pathogenesis, biological behavior, prognosis, and survival [3, 4]. Therefore, in-depth exploration of the molecular characteristics of IGC and DGC is of great significance for clinical diagnosis and treatment.
Recently, cancer stem cells (CSCs) have attracted more and more attention. And they are characterized as cells in the tumor which have the capacity of self-renewing and causing the heterogeneous cancer cell lineages. Many studies have shown that CSCs play a crucial role in the metastasis, differentiation, and drug-resistance of cancer [5-7]. At present, the recognition markers of hematologic tumor stem cells have been well known, but the research on solid tumor stem cells has not been very clear [8]. Concerning GC, Takaishi et al. initially identified CD44 as the first CSCs marker in 2009. CD44+ GC cells possess properties of radio- and chemo- resistance, which may explain why patients with GC was resistant to standard therapies [9]. Besides, the combination of CD44 and epithelial cell adhesion molecule (Epcam) has been found as putative gastric CSCs markers [10]. However, further studies showed that the frequency of CD44 expression in IGC was significantly higher than that in DGC [11]. ALDH1, another marker of CSCs in several types of tumors, has been detected in DGC in 2012. And ALDH+ GC cells have strong tumorigenic and self-renewal ability. However, ALDH1 mRNA expression cannot be detected in IGC[12].[11] Some scholars suggested that the high heterogeneity of GC may be the main reason for the unidentified CSCs markers. Hence, exploring the differences between IGC and DGC from the perspective of stem cells is of clinical importance. Despite there has been increasingly attention, the characteristics of CSCs in IGC and DGC remain to be further explored.
To better describe the characteristics of CSCs, Malta et al. compared the genetic maps of tumor cells and embryonic stem cells and proposed a new index, stem cell index. The stem cell index is a measurement method, which is considered as a quantitative indicator of CSCs to show how many tumor cells will be the same as stem cells. This method identifies the typical molecular characteristics of stem cells by analyzing thousands of cells at different stages of cell differentiation. According to the different analysis of the transcriptome and methylation group, it can be divided into DNA methylation-based stemness index (mDNAsi) and mRNA expression-based stemness index (mRNAsi), among which bigger mRNAsi value, which varies between -1 and 1, suggests stronger properties of cancer stem cells. Previous researches have also revealed that stem cell index scores are associated with biological processes and tumor heterogeneity, which provide novel insights for further cancer research [13]. Based on the principle of the stemness index, many scholars have found stem cell-associated key genes and possible signal pathways in bladder cancer and lung adenocarcinoma by using mRNAsi [14, 15]. This undoubtedly provides a direction for further research of tumor stem cells, yet the research of mRNAsi in GC has not been realized.
Given the close relationship between tumor stem cells and tumors, in this study we explored the characteristics of GC histological subtype by the stemness index and WGCNA model was used to determine the most relevant module of mRNAsi index, and furthermore the key genes were identified in the module. Our study intends to explore markers associated with the stem cell characteristics of IGC and DGC, which would help clarify the biological characteristics and progression of GC subtypes and inform the future diagnosis and therapy of GC.
Data collection and pre-processing
The RNA-seq data with correlative clinical data of 30 normal and 343 human gastric adenocarcinoma (GAC) specimens were downloaded from the TCGA database (https://portal.gdc.cancer.gov) in March 2020. The mRNAsi index of GAC in TCGA was downloaded from previous studies [13]. After screening the complete clinical information , 323 cases were chose, including 139 cases of IGC and 58 cases of DGC. Next, the RNA-seq data of every specimen was integrated by Perl language (http://www.perl.org), and gene names were converted into the corresponding gene symbols through the Ensemble database (http://asia.ensembol.org/index.html).
Screening of differentially expressed genes (DEGs)
DEGs between tumor and normal tissues were identified by the limma package in R 3.6.1:false discovery rate (FDR) < 0.05, P <0.01, and |log2-fold change| > 1.
WGCNA
The R package WGCNA was performed to generate module clustering [16]. As previously reported, through further analysis of genes and gene modules with similar expression profiles, the hierarchical clustering tree was constructed [14].
Identification of significant module
The correlation between genes and mRNAsi in each module was called gene significance (GS). At the same time, module membership (MM) was applied to represent the significant correlation between genes and modules in a specific module, and it was highly related to mRNAsi. A cutoff (<0.5) was applied in the mergence of highly similar modules. As for the modules we are interested in, we carried out GS and MM values for the screening of the key genes, and the threshold values of IGC and DGC were set: cor. GS>0.5, cor. MM>0.8; cor. GS>0.8, cor. MM>0.8, respectively.
Functional enrichment analysis
Gene ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed to study the biological functions of the key DEGs .And these analyses were conducted by the cluster profiler package in R [17]. P-value < 0.05 was chosen as a criteria in this section..
Co-expression analysis and protein-protein interaction (PPI) network construction
R package corrplot was selected to estimate the Pearson correlation to illuminate the co-expression relationship of key genes at the transcriptional level [18]. The 11.0 version of STRING (https://www.string-db.org) was chose to investigate and generate the PPI network among key genes.
Oncomine database validation
Oncomine (https://www.oncomine.org) was investigated the mRNA expression of key genes in the brown module between histological subtypes of GC and normal samples. The threshold was set as: gene rank=top10%; fold change=2; P-value =1E-4.
Statistical analysIs
The difference of mRNAsi scores between the normal and the tumor group was calculated
using Wilcox test in R. We performed survival analysis by the survival package to evaluate the prognostic value of mRNAsi, and a two-side log-rank test was employed to determine statistical significance. The Kruskal test was selected to clarify the interrelationships between the scores of mRNAsi and clinical characteristics by the R package beeswarm. P-value of no more than 0.05 was deemed as statistically significant.
Correlation between mRNAsi and clinical characteristics in GAC
First, we explored the association between mRNAsi and clinical features in GC. As shown, when compared with normal samples, mRNAsi of tumor samples was significantly higher (Fig. 1a). Next, the results of survival analysis indicated that overall survival of patients with lower scores was worse than those with higher mRNAsi index (Fig. 1b). Further, in terms of clinical features, patients with gastric adenocarcinoma were classified into groups by age, gender, TNM stage, tumor stage, histopathological types, respectively; for which, mRNAsi was not associated with age (P = 0.133), gender (P = 0.780), T (P = 0.140), N (P = 0.579), M (P = 0.598) stage, but had a decline trend with the improved tumor stage (Fig. 1c-h). However, in terms of histological subtype, mRNAsi in intestinal-type was significantly higher than that in diffuse-type (Fig. 1i).
Screening of DEGs and stemness-related modules and key genes in IGC and DGC
The above results indicated that there may exist specific DEGs which control the stemness of IGC and DGC, respectively. Therefore, we further filtered the modules and key genes with stemness properties. First, we compared the DEGs between IGC or DGC and normal samples, respectively. In IGC, 7494 DEGs were discerned, among which 5946 were over-expressed and 1548 were down-regulated (Fig. 2a and Additional file 1:Table S1); in DGC, 5424 DEGs were discerned, among which 4647 were up-regulated and 777 were down-regulated (Fig. 3a and Additional file 1: Table S2).
Furthermore, WGCNA was used to construct DEG coexpression network with β = 5 (scale-free R2 = 0.90) and β = 8 (scale-free R2 = 0.90) as soft thresholds, and 14 and 10 gene modules were obtained for subsequent analysis (Fig. 2b,c; Fig. 3b,c). And then, the gene expression status of relevant modules were explored to elucidate the relationship between the corresponding modules and mRNAsi in IGC and DGC samples(Fig. 2d-f; Fig. 3d-f). Here, we noted that the brown module of intestinal-type (R2=0.73, P= 3E-18) and the brown module of diffuse-type (R2=0.9, P=9e-20) both showed a positive correlation between stemness properties and gene expression. Then, we applied all the brown module to screen the key genes related with mRNAsi, with the selection criteria of cor. GS> 0.5, cor. MM> 0.8 and cor. GS> 0.8, cor. MM> 0.8 respectively. Finally, seven stemness-related genes including BUB1, RAD54L, EXO1, NCAPH, DDIAS, XRCC2, RACGAP1 were obtained for intestinal-type; 43 key genes for the diffuse-type, including DLGAP5, DTL, NCAPG2, KIF11, NCAPH, EZH2, ORC6, MTHFD2, BUB1, RAD54L, XRCC2, BUB1B, HMMR, KIF18A, KIF2C, NCAPG,EME1,PLK4,GINS1,PARPBP,SPAG5,ZWILCH, RAD51AP1, RACGAP1, CCDC138, TPX2, SPC25, MAD2L1, CCNB2,DKC1,KNSTRN,ZWINT,RAD51, NUSAP1, CHAF1A, SGO1, FEN1, CCNB1, RRM2, CDCA8, MND1, CCT6A, DBF4. Then we plotted their expression tendency in normal and tumor samples and discovered that the candidate genes were overexpressed in IGC and DGC, respectively (Fig. 4a, 4b). However, taking the intersection of key genes involved in IGC and DGC, BUB1, NCAPH, XRCC2 and 2QA were left, which were more highly expressed in IGC.
Validation of stemness-related key genes in Oncomine database
Next, the mRNAsi-related key genes were verified in Oncomine database. As shown in Fig. 5a-g, 7 key genes in IGC were highly expressed. Except for CHAFIA, 42 key genes showed higher expression in DGC. Fig. 5h-n showed 7 representative genes in DGC compared with normal tissues, and the remaining were shown in Supplementary Figure S1.
The cellular functions and pathway analysis of stemness-related key genes
In IGC, the GO analysis results indicated that key genes participated in biological process of meiotic cell cycle. The main cellular component manifested enrichment mainly at condensed nuclear density. In addition, the main molecular function enriched these genes in DNA-dependent ATPase activity, and the KEGG analysis demonstrated that the major pathway was homologous recombination (HR) (Fig. 6a, 6c). In DGC, GO analysis results indicated that key gene participated in the biological process of nuclear division. The major cellular component suggested enrichment mainly at the chromosomal region. And the main molecular function was catalytic activity. KEGG analysis showed that the main pathway was the cell cycle (Fig. 6b,6d).
Correlation between stemness-related key genes at transcription and protein levels
We explored the interactive relation of the above key genes using Pearson correlation and found that the candidate genes in the brown module of IGC or DGC had strong correlation, with the minimum correlation coefficient of 0.57 and 0.58, respectively (Figure 7a, 8). Next, we built the protein-protein interaction network using the STRING online tool. As shown in Fig. 7b and Fig. 9a, the key genes of the two types of GC formed a close interaction relationship respectively. In IGC, the most crucial key proteins were EXO1 (5 edges) and RAD54L (5 edges) (Figure 7c). The most important key proteins in DGC were CCNB1 (37 edges) and RAD51AP1 (37 edges) (Figure 9b). And BUB1 had high connectivity in both.
GAC, as the main type of GC, can be classified into the diffuse and intestinal type. IGC occupies the majority with a better prognosis, while DGC is more malignant with a worse prognosis. Although numerous researches have spotted on the diagnosis and treatment of these two types of GC, the molecular characteristics of them are not clear. In this study, using mRNAsi, we screened out 7 genes related to stem cell characteristics of IGC and 43 genes of DGC. Further analysis showed that these key genes had strong interaction and had a high co-expression relationship at the transcription and protein levels. These results suggested that key genes selected based on mRNAsi may indicate different molecular characteristics of IGC and DGC. Our study extended the knowledge of GC molecular characteristic and provided new insight into the clinical treatment and prognosis assessment of GC histological subtypes.
We initially analyzed the relationship between clinical characteristics and mRNAsi scores in GAC, and the results showed that tumor samples had higher stemness index when compared with normal samples, which was in accordance with former studies [19]. In survival analysis, overall survival of patients with lower scores was worse than those with higher mRNAsi index. Meanwhile, in terms of clinical features, mRNAsi was not correlated with age, gender, TNM stages, but only showed a downward trend with the increase of tumor stage. These differed from the properties of stem cells as we commonly understand them [20]. However, there were significant differences in mRNAsi between IGC and DGC, so we speculated that there may be some key genes differentially expressed that control the stemness properties of different subtype of GC, respectively.
Furthermore, we used WGCNA to distingsh the relationship between DEGs and stemness phenotype, to identify stemness-related key genes. Both in IGC and DGC, the brown module of both showed significant positive correlations with mRNAsi, indicating that the key genes in this color module had higher stem cell characteristics. Based on GS and MM, 7 and 43 stemness-associated genes were obtained from IGC and DGC, respectively, and in two types of GC, these key genes were highly expressed. Further functional analysis revealed that the key genes in IGC were primarily concentrated in the HR pathway, while those in DGC were principally concentrated in the cell cycle pathway. Among them, HR is a ubiquitous cellular pathway, which is essential for DNA damage tolerance as well as repair, and the recovery of replication forks withbreakage and stagnation. Studies have shown that HR was involved in the repair of lesions caused by many anticancer drugs, so this may explain the cause of drug resistance in the process of tumor treatment [21]. The cell cycle pathway is not only associated with the proliferation of GC cells but also related to the prognosis of the tumor. For example, cyclin D1 was one of the markers of poor prognosis in GC patients [22]. Thus, it may be related to the poor prognosis of DGC.
Moreover, among the key genes obtained above, BUB1 was highly expressed in both IGC and DGC and has higher connectivity in PPI network, but it was more expressed in IGC. As a common candidate key gene, BUB1 had a mitotic serine/threonine kinase checkpoint that was up-regulated in numerous cancers and associated with tumorigenesis, proliferation, and metastasis [23-25]. In the cell cycle, BUB1 gradually accumulated, peaked at G2 / M phases, and played a key part in cell division [26, 27]. Inhibition of BUB1 can further prevent tumor proliferation and increase cell apoptosis by regulating TGF β / Smad signaling pathways [28]. The different expression levels of BUB1 in IGC and DGC may provide new ideas for the diagnosis and treatment of GC.
The Oncomine validation indicated that all key genes were highly expressed in IGC, among which NCAPH gene was an important component of the condensed protein complex. In colon cancer, NCAPH has been proved to be highly expressed and its knockout can inhibit tumor proliferation by inhibiting tumor cell cycle transition and inducing apoptosis [29]. This demonstrated that NCAPH may be a possible therapeutic target for IGC. DDIAS gene was an inhibitor of apoptosis induced by DNA damage. In non-small cell lung cancer and hepatocellular carcinoma, it has been proved that this gene has the function of the anticancer cell apoptosis , further study found that DDIAS had dual functions of inhibiting the formation of the death-inducing complex and destabilizing caspase-8, thus inhibiting the apoptosis of cancer cells mediated by apoptosis-inducing ligand (TRAIL) [30]. Therefore, this gene may play the similiar role in IGC. In DGC, MAD2L1, RAD51AP1, TPX2, NCAPG, GINS1, HMMR, BUB1 were highly expressed in 43 stemness-related key genes, among which MAD2L1 was an important component of the mitosis checkpoint protein. Previous studies showed that MAD2L1 was up-regulated as a proto-oncogene which could promote cell proliferation in GC [31]. RAD51AP1 was a key protein in HR, and its upregulation in intrahepatic cholangiocarcinoma and lymphoma could promote the development of cancer [32, 33]. Therefore, this gene may also contribute to the development of DGC. TPX2, a microtubule-associated protein, was associated with the malignant behavior of GC and the overall survival in patients with GC [34]. In addition, HMRR (hyaluronic acid-mediated motor receptor) was closely related to tumor recurrence and could induce epithelial-mesenchymal transition and promote the characteristics of GC stem cells. In summary, our studies suggested that the key genes in the brown module were associated with the development of the above two types of GC, respectively.
In conclusion, we identified genes that maintained the characteristics of IGC and DGC. These genes could become therapeutic targets to inhibit the properties of both stem cells. However, our conclusions were based on retrospective data, so further biological studies are in need to verify these findings.
Taken together, 7 and 43 genes correlated to the characteristics of CSCs were identified in IGC and DGC, respectively. These genes were related with cell cycle or HR, which may serve as therapeutic targets for the inhibition of the stem cells from both types of GC. However, our conclusions derived from bioinformatic analysis still need further basic studies to validate.
GC gastric cancer
IGC intestinal-type gastric cancer
DGC diffuse-type gastric cancer
GAC gastric adenocarcinoma cancer
PPI protein-protein interaction
IARC International Agency for Research on Cancer
CSCs Cancer sten cells
mDNAsi DNA methylation-based stemness index
mRNAsi mRNA expression-based stemness index
TCGA The Cancer Genome Altas
WGCNA Weighted gene co-expression network analysis
DEGs Differentially expressed genes
GS Gene significance
MM Module eigengenes
PCA Principal component analysis
GO Gene ontology
KEGG Kyoto Encyclopedia of Genes and Genomes
STRING Search Tool for the Retrieval of Interacting Genes
PPI Protein–protein interaction
EpCAM epithelial cell adhesion molecule
FDR false discovery rate
HR homologous recombination
TRAIL inhibiting apoptosis-inducing ligand
HMRR hyaluronic acid-mediated motor receptor
Acknowledgements
Not applicable.
Authors’ contribution
Y. Gong and R. Guo designed the research; R. Guo analyzed the data; R. Guo and Y. Gong wrote the paper; Y. Gong , Y. Yuan and A. Chu revised the language of the paper; and all authors read and approved the final manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (Award No.81970501), National Key R&D Program of China (Grant No.2017YFC0908300) and National Natural Science Foundation of Liaoning Province (2019-MS-388).
Avaliability and data and materials
The datasets supporting conclusions of this article were available in the TCGA (https://portal.gdc.cancer.gov), Oncomine (http://www.oncomine.org). The datasets supporting the conclusions of this article are included within the article and its Additional files.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
There are no competing interests.