DOI: https://doi.org/10.21203/rs.3.rs-150872/v1
Background: Due to the lack of effective drugs, gastric cancer(GC) has a high mortality rate among other cancers, with a low 5-year survival rate and an inferior prognosis. Thus, screening of meaningful tumor biomarkers or therapeutic targets could play a vital role in the diagnosis, treatment, prognosis, and follow-up of GC.
Methods: Gene expression profiles and comprehensive clinical information of 407 patients with GC were downloaded from The Cancer Genome Atlas (TCGA) database. GC-related single-cell RNA sequencing data from the GSE118916 dataset was downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were screened from transcriptomic data in GC and normal samples by R language. The DAVID database was also used to analyze the functions and pathways of DEGs. After combining differential genes with patient survival information, target genes were identified. The interaction of DEGs in the protein-protein interaction (PPI) network was also studied.
Results: Our study identified a total of 209 differential genes, which might be positively related to GC. Gene Ontology (GO) analysis indicated numerous enrichment of DEGs in the extracellular matrix organization, extracellular structure organization, and muscle contraction. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that the DEGs were mainly enriched in focal adhesion, protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications. Further analysis showed the higher expression of Carboxypeptidase vitellogenic-like gene (CPVL) was related to the better prognosis of GC patients in both TCGA and the GEO database. FAM3 metabolism regulating signaling molecule D (FAM3D) and oxidized low-density lipoprotein receptor 1 (OLR1) were significantly associated with GC patients’ prognosis only in the GEO database. Lastly, the PPI network shows the gene expression proteins that interact most closely with CPVL protein.
Conclusion: Our study revealed that CPVL gene could be a promising target for the diagnosis and treatment of GC, which has a great significance for the future research on GC. In addition, we were the first to find a close relationship between FAM3D and GC.
Gastric cancer (GC) is a major health burden worldwide and the third leading cause of cancer-related death worldwide. Around 989,600 new cases and 783,000 GC related deaths were reported worldwide in 20181-3. The average 5-year survival rate of 90% of GC patients with adenocarcinomas is less than 20%, and GC patients have a poor prognosis4. Primary prevention is still a challenge5 because GC is mostly asymptomatic before it enters the advanced stage. Early detection with effective screening methods is crucial to reduce the mortality rate of GC6.
Biomarkers based detection is a relatively non-invasive and convenient diagnostic method, which has been widely used in clinical practice. However, the sensitivity and specificity of existing GC biomarkers are insufficient7-9. Currently, effective targeted therapies for GC are still focusing on the anti-human epidermal growth factor receptor-2 (HER2)10, and anti-vascular endothelial growth factor (VEGF) pathways11-13, more deep research studies and development of new molecular targeting drugs still need to be implemented. The study of biomarkers has a great potential in oncology research and plays an essential role in diagnosis, treatment, and prognosis.
However, heterogeneity of tumor cells was rarely distinguished from traditional RNA-seq data based on GC tissue. Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing of the transcriptome of a large number of single cells in a single sample, enabling researchers to screen out essential genes responsible for the intercellular differences in the mixed cell population 14-19.
In the current study, scRNA-seq data were analyzed to identify differentially expressed genes (DEGs) related to GC in both conventional transcriptome data and scRNA-seq data. We used The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) to incorporate transcriptome data and scRNA-seq data into the analyses and selected common DEGs. Then the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functions were enriched for DEGs. The correlation between differential genes and survival rate was analyzed based on the clinical data of the GC patients. Finally, protein-protein interaction (PPI) analysis was carried out for the prognostic genes. These analyses’ results may provide new targets for further detailed studies on GC.
2.1. Microarray data
Gene expression profiles and complete clinical information of 407 patients with GC were downloaded from TCGA database. From Gene Expression Omnibus (GEO), NCBI's public genome database, and high-throughput Gene Expression data, we downloaded the information of 15 GC and 15 normal samples from the GSE118916 dataset, and also downloaded the information of 524 GC and 401 normal cells from the scRNA-seq data of GC in the GSE112302 dataset.
2.2. Data pre-processing and DEGs screening
After pre-processing, such as data standardization, we used the R computing environment of the SVA software package for batch normalization20. Next, we performed the differential analysis(|Log2FC| >1, adjusted p-value < 0.05) by comparing tumor to normal samples in the R computing environment using limma package21. Then we did a Venn diagram to pick the genes that were different among GC and normal samples, and a total of 209 DEGs were selected.
2.3. Gene ontology and functional enrichment analysis
The Gene Ontology (GO) is composed of biological process (BP), cellular component (CC), and molecular function (MF). To find out the function of DEGs, the clusterProfile22 was used to conduct GO enrichment analysis23 and KEGG enrichment analysis24 of 209 DEGs in the R computing environment.
2.4. Survival analysis
In GC samples from the TCGA database, survival analysis in the R environment was performed using TCGA biolinks to determine whether these 209 genes were associated with prognosis. The Kaplan–Meier survival analysis was also used to assess the relationship between genes and GC patients’ prognosis in the GEO database. We used clinical information to map the survival curves of patients with high and low expression of differentially expressed genes (p <0.05) to screen out genes related to survival and prognosis time.
2.5. Analysis of protein interaction network
Search Tool25 was used to search for known protein interactions online. The confidence score >0.9. Then, we imported the protein interaction data into the Cytoscape software26.
3.1 Data download and difference analysis
The differentially expressed genes(DEGs) between tumor and normal samples were selected, and the volcanic map was drawn to show the DEGs. 209 DEGs were obtained from the TCGA database (|log2fold change | > 1, P < 0.05) (Fig. 1A). Then, 4258 DEGs were obtained from GSE118916 data set of GEO database (|log2fold change | > 0.5, P < 0.05) (Fig. 1B). 1500 DEGs were obtained from the scRNA-seq data from the GSE112302 data set, among which GIF, ATP4B, GKN1, REG3A, REG1B, PI3, PGA5, APOE, C1QB, TYROBP were the group of genes with the highest coefficient of variation between tumor and normal cells (Fig. 1C). Finally, 209 common DEGs in the three groups were selected (Fig. 1D).
3.2. Functional enrichment analysis
To further analyze the functions of the DEGs, functional enrichment analysis of GO and KEGG were used, and we found that in GO, as to biological process, the DEGs were significantly enriched in extracellular matrix organization, extracellular structure organization, muscle system process. About cellular component, DEGs significantly enriched in collagen−containing extracellular matrix, fibrillar collagen trimer, banded collagen fibril. For molecular function, DEGs significantly enriched in the extracellular matrix structural constituent, extracellular matrix structural constituent conferring tensile strength, platelet−derived growth factor binding (Fig. 2A&B). In KEGG, the DEGs were mainly enriched in focal adhesion, protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications (Fig. 2C&D).
3.3. Survival analysis
TCGA biolinks were used to draw the survival curve in the R environment, and we picked the six results with the lowest P value (Fig. 3A-F). It can be seen that the higher expression of CPVL was significantly correlated with the better prognosis of GC patients (P<0.05). To further evaluate the correlation between the six genes and GC patients’ prognosis, the Kaplan-Meier Plotter Database was used in the GEO Database. Then we found that CPVL, FAM3D, and OLR1 were associated with GC patients’ prognosis in the GSE15459 dataset (Fig.S1).
3.4. PPI network construction
Since CPVL was significantly correlated with GC patients’ prognosis in both TCGA and the GEO databases, we wanted to know the PPI network of CPVL. The DEGs are drawn on String for the PPI network, and the results are decorated in Cytoscape. Up-regulated and down-regulated genes were labeled in red and blue (Fig.4A), and genes related to CPVL were extracted (Fig.4B). It can be seen that CPVL interacts with proteins expressed by FGL2, PLBD1, APOE, CTSB, PGC, CES2, NEXN to a certain extent. These genes may play an important role in the pathogenesis of GC.
At present, the etiology and molecular mechanism of GC need to be further studied, and the effect of early screening and diagnosis also needs to be improved. In recent years, bioinformatical analysis has increasingly been used to find new therapeutic targets and diagnostic markers for various cancers, which has been proven to be a vital step for the early prevention and treatment of cancers27.
In this study, at first we downloaded the gene expression profiles and complete clinical information of 407 patients with GC from TCGA database. Then we downloaded 15 GC and 15 normal samples in the GSE118916 dataset, and 524 GC and 401 normal cells from scRNA-seq of GC in the GSE112302 Database. The limma package was used to analyze the differences between the tumor and normal samples, and 209 DEGs were obtained. The intercellular differences in GC cells cannot be detected by a large number of RNA sequences because conventional transcriptome detection is the average of all cells in the sample. We assumed that the mechanism and prognosis might be only related to certain cells in which certain expression of the genes are highly varies than the normal cells. Using the scRNA-seq data to test highly variable genes (HVG) allows researchers to screen out key genes that cause differences between cells in a mixed cell population17-19. Thus, the 209 DEGs could be the key genes closely related to the mechanism and prognosis of GC.
We conducted functional enrichment analysis on 209 DEGs in GO and KEGG. We found substantial enrichment of DEGs in the extracellular matrix organization, extracellular structure organization, and muscle contraction in GO. In KEGG, DEGs were mainly concentrated in focal adhesion, protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications. Changes in ECM ( extracellular matrix) affect cell-ECM adhesion and cell-cell adhesion, reduce the epithelial characteristics of gastric epithelial cancer cells, and promote the proliferation of cancer cells and reduce the effect of chemotherapy28,29. Increased ECM density in the tumor microenvironment leads to increased integrin clustering and enhanced FAK (focal adhesion kinase) phosphorylation, which promotes cell invasion and proliferation30. Studies have shown that the AGE-RAGE signaling pathway enhances metastasis of human GC31. By combining the screened DEGs with the clinical information of GC patients in both TCGA and the GEO databases, it was further found that the higher expression of CPVL was related to the better prognosis of GC patients. Previous studies have shown that CPVL can digest phagocytic particles in lysosomes and participate in inflammatory reactions32, and its expression also increased in GC33. However, only a few studies have reported its other roles in GC. Our study confirmed the role of CPVL gene in GC survival. These results suggest that CPVL gene may be a promising therapeutic target for GC, which provides significance for future research on GC. Then we analyzed the relationship between the other five genes and prognosis in the GEO database. We found that FAM3D, OLR1 were near related to GC patients’ prognosis in the GSE15459 dataset, while they did not correlate with prognosis in TCGA database. As the P value of FAM3D in TCGA was close to 0.05 and no studies have shown that FAM3D is associated with GC, the relationship between FAM3D and GC was worthy of further study. Studies have shown that OLR1 is positively correlated with lymph node metastasis of GC34.
According to the PPI network diagram, it can be seen that CPVL interacts with proteins expressed by FGL2, PLBD1, APOE, CTSB, PGC, CES2, NEXN. Some studies have shown that overexpression of FGL2 (fibrinogen-like protein 2) induces epithelial to mesenchymal transformation in colorectal cancer and promotes tumor progression35. One study showed that PLBD1 (Phospholipase B domain containing 1) was highly expressed in various pancreatic cancer cells and could be a promising biomarker for pancreatic cancer36. Katsuya Sakashita et al. found that the survival time of patients with high expression of ApoE (apolipoprotein E) was shorter than that of patients with low expression37. CTSB (Cathepsin B) is overexpressed in many tumors such as GC and colon cancer38. PGC ( peroxisome proliferator-activated receptor γ coactivator) degrades the extracellular matrix and promotes tumor invasion and metastasis39. In regard to CES2 (Carboxylesterpase-2), low expression of CES2 has been associated with a higher survival rate in colon cancer40. NEXN (Nexilin) deficiency increases the expression of adhesion molecules and inflammatory cytokines41. So far, there are very few other studies on these genes and identifications of CPVL gene as a target for GC prognosis and diagnosis. These associated genes are the novel findings of our study.
Our study elucidated the relationship between CPVL and GC. We found that the higher expression of CPVL gene was associated with a better prognosis of GC, suggesting that CPVL may be a new target for the treatment of GC. CPVL interacts with proteins expressed by FGL2, PLBD1, APOE, CTSB, PGC, CES2, NEXN to a certain extent. Furthermore, our study is the first to find that FAM3D is closely related to GC, which has great research potential.
Gastric cancer
The Cancer Genome Atlas
Gene Expression Omnibus
Differentially expressed genes
Protein-protein interaction
Gene Ontology
Kyoto Encyclopedia of Genes and Genomes
Carboxypeptidase vitellogenic-like gene
Oxidized low-density lipoprotein receptor 1
Human epidermal growth factor receptor-2
Vascular endothelial growth factor
Single-cell RNA sequencing
Biological process
Cellular component
Molecular function
Highly variable genes
Extracellular matrix
Focal adhesion kinase
Fibrinogen-like protein 2
Phospholipase B domain containing 1
Apolipoprotein E
Cathepsin B
Peroxisome proliferator-activated receptor γ coactivator
Carboxylesterpase-2
Nexilin
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Funding
The design of the study was supported by the funding of Zhengzhou Major Collaborative Innovation Project (No.18XTZX12003); the collection, analysis, and interpretation of data were supported by the funding of Key projects of discipline construction in Zhengzhou University (No.XKZDJC202001).
Author information
Affiliation
Henan Key Laboratory for Helicobacter pylori & Microbiota and GI cancer, Marshall Medical Research Center, The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
Shuang Gao, Fazhan Li, Pengyuan Zheng, and Yang Mi
Zhengzhou University School of Medical Sciences, Zhengzhou, Henan, China
Shuang Gao, Fazhan Li, Pengyuan Zheng, and Yang Mi
Department of Gastroenterology, The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
Shuang Gao, Fazhan Li, Pengyuan Zheng, and Yang Mi
Department of Gastrointestinal Surgery, The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.
Wanqing Wu and Yuming Fu
Department of Gastrointestinal Surgery, Henan Cancer Hospital, The Affiliated Cancer Hospital of Zhengzhou University, Zhengzhou, Henan, China
Minghai Zhao
Department of Pediatrics, t The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
Youcai Tang
Contributions
Y M, S G and FZ L: Conceptualization, Writing- Original Draft; S G:Completing the manuscript; FZ L: Analyzing the data; Y M, PY Z, MH Z, WQ W, YM F, and YC T: Revising the manuscript. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Yang Mi.
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors report no declarations of interest.
Acknowledgments
The authors gratefully acknowledge all the participants.