Identification of Key Genes for Controlling Cancer Stem Cell Characteristics in Gastric Cancer


 Background

Self-renewal of gastric cancer stem cells (GCSCs) is considered to be the underlying cause of the metastasis, drug resistance and recurrence of gastric cancer (GC). The purpose of this study was to characterize the expression of stem cell-related genes in GC.
Methods

The RNA sequencing results and clinical data for gastric adenoma and adenocarcinoma samples were obtained from the Cancer Genome Atlas (TCGA) database, and the results of the GC mRNA expression-based stemness index (mRNAsi) were analyzed. Weighted gene coexpression network analysis (WGCNA) was then used to find modules of interest and their key genes. Survival analysis of key genes was performed using the online tool Kaplan-Meier Plotter, and the online database Oncomine was used to assess the expression of key genes in GC.
Results

mRNAsi was significantly upregulated in GC tissues compared to normal gastric tissues (p < 0.0001). A total of 16 modules were obtained from the gene coexpression network; the brown module was most positively correlated with mRNAsi. Sixteen key genes (BUB1, BUB1B, NCAPH, KIF14, RACGAP1, RAD54L, TPX2, KIF15, KIF18B, CENPF, TTK, KIF4A, SGOL2, PLK4, XRCC2 and C1orf112) were identified in the brown module. Survival analysis and Oncomine analysis revealed that the prognosis of patients with GC and the expression of three genes (RAD54L, TPX2 and XRCC2) were consistently related.
Conclusion

Sixteen key genes play an important role in the maintenance of GCSCs. RAD54L, TPX2 and XRCC2 are the most likely therapeutic targets for inhibiting the stemness characteristics of GC cells.


Abstract
Background Self-renewal of gastric cancer stem cells (GCSCs) is considered to be the underlying cause of the metastasis, drug resistance and recurrence of gastric cancer (GC). The purpose of this study was to characterize the expression of stem cell-related genes in GC.

Methods
The RNA sequencing results and clinical data for gastric adenoma and adenocarcinoma samples were obtained from the Cancer Genome Atlas (TCGA) database, and the results of the GC mRNA expressionbased stemness index (mRNAsi) were analyzed. Weighted gene coexpression network analysis (WGCNA) was then used to nd modules of interest and their key genes. Survival analysis of key genes was performed using the online tool Kaplan-Meier Plotter, and the online database Oncomine was used to assess the expression of key genes in GC.
Results mRNAsi was signi cantly upregulated in GC tissues compared to normal gastric tissues (p < 0.0001). A total of 16 modules were obtained from the gene coexpression network; the brown module was most positively correlated with mRNAsi. Sixteen key genes (BUB1, BUB1B, NCAPH, KIF14, RACGAP1, RAD54L, TPX2, KIF15, KIF18B, CENPF, TTK, KIF4A, SGOL2, PLK4, XRCC2 and C1orf112) were identi ed in the brown module. Survival analysis and Oncomine analysis revealed that the prognosis of patients with GC and the expression of three genes (RAD54L, TPX2 and XRCC2) were consistently related.

Conclusion
Sixteen key genes play an important role in the maintenance of GCSCs. RAD54L, TPX2 and XRCC2 are the most likely therapeutic targets for inhibiting the stemness characteristics of GC cells.

Background
Gastric cancer (GC) is a common malignant tumor, and its mortality rate ranks third among cancers globally [1]. In China, GC is also the major cause of cancer-related death, and patients with advanced disease have lower survival rates and higher recurrence rates [2]. Self-renewal of gastric cancer stem cells (GCSCs) is considered to be the underlying cause of the metastasis, drug resistance and recurrence of GC [3][4]. The identi cation and targeted therapy of cancer stem cells was of great signi cance in the treatment of GC [5]. The mRNAsi, which can be used as a quantitative representation of cancer cell stemness, is an indicator describing the degree of similarity between tumor cells and stem cells.
Moreover, inhibiting DNA methyltransferase can reduce the accumulation of tumorigenic ability of GCSCs [6]. The transcriptomes and methylomes were analyzed on multiple platforms to quantify stemness, and the DNA methylation based stemness index (mDNAsi) and mRNAsi were obtained; this analysis was also applied to the TCGA dataset to calculate their scores of those samples [7]. Weighted gene coexpression network analysis (WGCNA) determines the correlation between genes based on systematic biological methods; it is used not only to build gene networks and detect modules but also to identify key genes [8][9]. WGCNA has been widely used to screen key genes related to the clinical characteristics of many tumour types, such as pancreatic cancer and kidney cancer [10][11]. In addition, the genes that regulate the maintenance and proliferation of GCSCs remain unknown. This study aimed to identify key genes related to stemness by combining WGCNA with GC mRNAsi in TCGA, thus providing new ideas for the treatment of GC.

Materials And Methods
Data download and processing RNA sequencing results of 373 tissues and 348 human gastric adenomas and adenocarcinoma samples were obtained from TCGA database (https://portal.gdc.cancer.gov). The RNA-seq results of 30 normal samples and 343 cancer samples were merged into a matrix le using a script in the Perl language (http://www.perl.org/). We then converted the Ensembl ID in the matrix le to the gene name using the Ensembl database (http://www.ensembl.org/index.html) and the Perl language script. In addition, 406 pieces of clinical data were downloaded, and the relevant clinical data were collated and extracted using scripts in the Perl language. The calculation of mRNAsi was performed from the molecular spectrum of normal cells with different degrees of stemness [7].

Screening of differentially expressed genes (DEG)
The statistical software R (version 3.6.0, https://www.r-project.org/), limma package and Wilcoxon test were used to screen DEG expression data between normal gastric tissue and GC samples. The values of the genes with the same name were averaged and the genes with expression levels < 0.2 were deleted.
The false discovery rate (FDR) < 0.05 and |log 2 fold change| > 1 were used as screening criteria.

Construction of WGCNA
The WGCNA R package was used to perform the WGCNA. The normal sample was deleted rst, and the data were checked next for missing values. If a value was missing, the data was deleted. First, the similarity matrix was constructed by calculating the correlation of all genes. Second, the WGCNA software package was used to select appropriate soft-thresholding power β to improve coexpression similarity and achieve a scale-free topology. Third, the adjacency of genes was transformed into a topological overlap matrix (TOM), and then the corresponding dissimilarity was calculated. Average linkage hierarchical clustering was conducted with TOM-based dissimilarity measurements, and the minimum size of the gene dendrogram was 30. Finally, the dissimilarity of the genes was calculated and module dendrograms were built.

Identi cation of important modules and key genes
Gene signi cance (GS) refers to the correlation between genes and sample traits. Statistical signi cance was determined using the associated p values. To reduce the number of modules, we set a threshold (< 0.25) to merge some modules that were highly similar. The gene modules were then analyzed in combination with the clinical phenotype. Module membership (MM) refers to the correlation between a module's own genes and the gene expression pro le. After selecting the module of interest, GS and MM for each key gene were calculated. We de ned cor. gene MM > 0.8 and cor. gene GS > 0.5 as the threshold for the key genes of the module.

Differential expression and correlation of key genes
We used the ggpubr package of the R language to draw a box plot of differential gene expression to understand the differential expression of key genes. The pheatmap package was used to create heat maps of key genes. Correlation analysis of key genes was performed using the corrplot package.

Protein-protein Interaction (PPI) network analysis of key genes
To explore the interaction between key genes, we imported the key genes into the Search Tool for the Retrieval of Interacting Genes (STRING, http://string.embl.de/) database [12]. The PPI network of key genes, which included genes with a combined score of > 0.4, was constructed using STRING.

Functional and pathway enrichment analyses of key genes
To study the biological functions of the module genes and key genes, we used the clusterPro ler package for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses [13]. P < 0.05 and FDR < 0.05 were used as thresholds for identifying signi cant GO terms and pathways. A histogram of GO enrichment results for the key genes was plotted using the ggplot2 package. GO includes cellular component (CC), biological process (BP) and molecular function (MF).

Prognostic analysis of key genes
To understand the relationship between the expression of key genes and prognosis, we used the online tool Kaplan-Meier Plotter (http://kmplot.com/) for survival analysis. A log-rank P < 0.05 was considered statistically signi cant. In addition, we used the online database Oncomine (http://www.oncomine.com) to assess the expression of key genes in GC.

Differential analysis of mRNAsi and screening of DEG
The Wilcoxon test showed a signi cant difference in the mRNAsi of normal gastric tissue and that of gastric tumor tissue (p < 0.0001, Fig. 1A). A total of 6,685 DEGs were identi ed by Wilcoxon test analysis, among which 5,479 were upregulated and 1,206 were downregulated (Supplementary Table 1). The 20 with the most signi cant upregulation and the 20 with the most signi cant downregulated were mapped using the pheatmap package (Fig. 1B).

WGCNA: Identi cation of the modules of interest
To identify biologically important gene modules and better understand genes associated with gastric cancer stemness, we used the WGCNA package to construct a gene coexpression network. Ultimately, we obtained 16 modules (Fig. 1C); the brown module was most positively correlated with mRNAsi with a correlation coe cient of 0.75. In addition, the turquoise module exhibited a high negative correlation with mRNAsi with a correlation coe cient of -0.77 (Fig. 1D). Therefore, the brown module was used as the most interesting module, so we further analyzed this module.

Differential expression and correlation analysis of key genes
The expression of key genes differed signi cantly between normal gastric tissue and tumor tissue. The expression of key genes in gastric tumors was signi cantly higher than that in normal gastric tissues (p all < 0.001, Fig. 2A). There was a clear correlation between the expressions of key genes, and CENPF had the highest correlation with KIF14 (Fig. 2B).
GO and KEGG analyses of key genes GO and KEGG pathway enrichment analysis was performed using the clusterPro ler package. The most signi cantly enriched GO terms were spindle (ontology: CC), sister chromatid segregation (ontology: BP) and motor activity (ontology: MF) (Fig. 3). The signi cantly enriched KEGG pathways have cell cycle and homologous recombination (Table 1). Prognostic analysis of key genes Survival analysis of key genes was performed using Kaplan-Meier curves, which showed that all key genes except KIF14 and KIF18B were related to prognosis. GC patients with high expression of RAD54L, TPX2 and XRCC2 had short survival times, while GC patients with high expression of other key genes had long survival times (Fig. 4). However, differential analysis showed that 16 key genes were highly expressed in GC tissues. We further analyzed 16 key genes using the online database Oncomine, which showed that 16 key genes were highly expressed in GC tissues (Fig. 5).

Discussion
Metastasis, drug resistance and recurrence are the main causes of the low survival rate of patients with GC. Self-renewal of GCSCs is considered to be the underlying cause of the metastasis, drug resistance and recurrence of GC [3][4]. After conventional surgical treatment or adjuvant chemotherapy, the number of GCSCs does not decrease but stemness is enriched, leading to metastasis, drug resistance and recurrence of GC [3]. Moreover, Bekaii-Saab et al. [5] indicated that identi cation and targeted therapy of cancer stem cells had important signi cance in the treatment of GC. Therefore, targeted therapy of GCSCs is crucial. This study used WGCNA based on the mRNAsi score to identify key genes related to GCSC characteristics. This study found that the mRNAsi of gastric tumor tissue was signi cantly higher than that of normal gastric tissue. In addition, we used the WGCNA package to construct a gene coexpression network. Finally, we obtained a total of 16 modules; the brown module was the most positively correlated with mRNAsi, and the turquoise module showed a high negative correlation with mRNAsi. Therefore, we selected the brown module as the module of most interest and identi ed 16 key genes that are associated with the characteristics of GCSCs: BUB1, BUB1B, NCAPH, KIF14, RACGAP1, RAD54L, TPX2, KIF15, KIF18B, CENPF, TTK, KIF4A, SGOL2, PLK4, XRCC2 and C1orf112. These genes were upregulated in GC and had a certain correlation. The most signi cantly enriched GO terms of these key genes were the spindle cellular component, the sister chromatid segregation biological process and the motor activity molecular function. The signi cantly enriched KEGG pathways were cell cycle and homologous recombination. Except for KIF14 and KIF18B, the expression of all key genes was related to the prognosis of patients with GC.
BUB1 encodes a protein that is essential for the function of the mitotic spindle check-point [14]. Grabsch et al. [15] showed that BUB1 was overexpressed in up to 80% of GCs and was associated with tumor cell proliferation. Meanwhile, related studies indicated that high expression of BUB1B was related to the invasion, lymph node metastasis, liver metastasis and recurrence of GC [16]. Kinesins are a family of molecular motors that play important roles in intracellular transport and cell division [17]. Previous studies demonstrated that KIF14 can enhance tumor adhesion, invasion and chemical resistance during tumor development [18]. Tong et al. [19] also revealed that KIF14 was overexpressed in GC and that this was related to tumor progression, invasion and metastasis. Spheroid colony formation is an effective model for the characterization of cancer stem cells [4]. Oue et al. [20] showed that KIF15 and KIF4A were upregulated in GC spheroid cells. A previous study also demonstrated that patients with high expression of KIF14 mRNA had a higher risk of malignancy, recurrence and metastasis than did those without high expression of KIF14 mRNA [21]. Overexpression of KIF18B increases the proliferation of hepatocellular carcinoma cells, and downregulation of KIF18B in vitro can inhibit the migration and invasion of cervical cancer cells [22]. KIF4A is considered to be an oncogene, indicating its involvement in malignancy, and its expression was indeed related to the occurrence, development and metastasis of GC [23]. Rac GTPase accelerating protein1 (RacGAP1), which belongs to the GTPase activation protein family, not only induces cytokinesis but also interferes with mitotic spindle assembly, and thus participating in the regulation of cell proliferation [24]. Studies have indicated that RacGAP1 is involved in cell transformation, movement, migration and metastasis [25] and is positively correlated with the proliferation marker Ki67 [26]. In addition, RacGAP1 is highly expressed in GC and signi cantly associated with tumor progression and poor prognosis [27]. CENPF is a member of the mitochondrial family and regulates the proliferation of various tumor cells [28]. Chen et al. [29] showed that CENPF was overexpressed in GC and promoted its proliferation and metastasis. TTK is also known as a mitotic kinase [30], which is highly expressed in many types of tumors and promotes the proliferation of cancer cells [31][32]. Park et al. [33] found that TTK plays an indispensable role in the development and maintenance of tumor stem cells. PLK4 is a major regulator of centromeric repeats, and its overexpression in somatic cells can disrupt spindle formation, while its depletion can inhibit cell proliferation [34]. Shinmura et al. [35] revealed that PLK4 is overexpressed in GC and plays an important role in cell proliferation, tumorigenesis, invasion and drug resistance. C1ORF112 is overexpressed in various tumors, including GC, and plays an important role in the growth of cancer cells, but it is still unclear whether C1ORF112 is particularly necessary for the proliferation of cancer cells [36]. Condensins are multiprotein complexes that play a central role in chromosome assembly and segregation during mitosis and meiosis [37]. Non-SMC condensin I complex subunit H (NCAPH) is one of the three non-SMC subunits of condensation I. Studies have indicated that non-SMC condensin I complex subunit G (NCAPG) is overexpressed in colon cancer and hepatocellular carcinoma and promotes the proliferation and migration of tumor cells [38].
In addition, survival analysis and Oncomine analysis revealed that the prognosis of patients with GC and the expression of three genes (RAD54L, TPX2 and XRCC2) were consistently related. XRCC2 plays a crucial role in DNA repair and chromosome arrangement, and its dysfunction may lead to tumor development and progression [39][40]. Wang et al. [41] found that XRCC2 is overexpressed in GC. RAD54L and XRCC2 are homologous recombination factors that are overexpressed in lung cancer and GC, and there is a signi cant correlation between the expression levels of these two proteins [42]. The inactivation of these factors leads to an obvious de ciency in homologous recombination and DNA repair in all eukaryotes [43]. In addition, related studies indicated that abnormal function of homologous recombination leads to genomic instability, resulting in the occurrence of cancer [44][45]. In this study, KEGG enrichment analysis indicated that RAD54L and XRCC2 were signi cantly enriched in homologous recombination. TPX2 is a microtubule-associated protein that activates the cell cycle kinase protein Aurora-A and then plays an important role in the formation of spindles in mitosis [46]. Liang et al. [47] showed that TPX2 was overexpressed in GC and was related to poor progression and prognosis of GC.
In conclusion, this study found that the 16 key genes identi ed play an important role in the maintenance of GCSCs. In addition, RAD54L, TPX2 and XRCC2 are most likely to be therapeutic targets for inhibiting the stemness characteristics of GC, providing a new idea for the treatment of GC. However, further research is needed to verify this hypothesis.

Availability of data and materials
The results shown here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.

Competing interests
The authors declare no competing nancial interests.
Funding    Survival curves of key genes using the online tool Kaplan-Meier Plotter. The red indicates high expression and black indicates low expression. A log-rank P < 0.05 was considered statistically signi cant.

Figure 5
Expression of key genes in gastric cancer based on the Oncomine online database. The key genes were highly expressed in gastric cancer tissues.