Bioinformatics Analysis Reveals Biomarkers With Prognostic Benets in Diffuse Type Gastric Cancer

Abstract

There are many ways to classify GC by different classi cation system, such as the Bormann classi cation, the Lauren classi cation, and the World Health Organization (WHO) classi cation (4)(5)(6).
Recent years,with the development of medicine and the deepening of understanding, some scholars tried to classify GC from molecular and genetic features level, such as The Cancer Genome Atlas (TCGA) classi cation (7) and Asian Cancer Research Group (ACRG) classi cation (8). Since the Lauren classi cation was proposed in 1965, it has been widely recognized by clinicians and pathologists and has been used up to now. The Lauren classi cation mainly divides GC into intestinal-, diffuse-and mixtypes based on the tissue structure, biological behavior and epidemiological characteristics (5). In histology, intestinal type GC cells are large in size, clear in boundary, variable in morphologic and closely arranged, exhibiting tubular and glandular differentiation. On the contrary, diffuse type GC cells are typically scattered and often appeared as solitary cells or in small clusters due to lack of adhesion, this is the reason why it's hard to observe gland formation in tumor tissue and diffuse type GC is easy to dissemination.Mix type has all of the above characteristics. In epidemiology, intestinal type is the most common type, with the highest ve-year survival rate, which is more common in men and the elderly, while diffuse type is more likely to happen in women and younger patients, with a lower 5-year survival rate (9)(10)(11). Mix type has the highest malignant degree because of its changeable biological behavior.
Recent years, with the development of medicine and bioinformatics, high-throughput sequencing has been applied as a common tool for medical research (12). Researchers could upload the data of gene expression pro le chip to the Gene Expression Omnibus (GEO) datasets of NCBI. Reanalyzing and reintegrating those datasets could provide some meaningful clues for new research. A series of microarray datasets of GC have been developed in recent years (13)(14)(15) and a large number of meaningful differentially expressed genes (DEGs) have been found.
In this study, we downloaded GSE62254 dataset from the GEO and screened out DEGs by using "limma" and "survival" package in R. Subsequently, we performed the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of DEGs and found key biological features and signaling pathways. Moreover, we constructed a protein-protein interaction network of diffuse type DEGs and screed 3 hub gene out through Cytoscape tool. Finally, using the Kaplan Meier analysis to evaluate the overall survival of patients with aberrant expression levels of the hub genes.

Differentially Expressed Gene Analysis
Using RMA algorithm in the R environment (v3.6.1) (17) to normalize and transform all the raw data to expression values. Differentially expressed genes(DEGs) between diffuse and intestinal subtypes samples were screened by using the "limma" package in R (18), the cut-off criterion were P <0.05 and |log2FC| > 0.585, which means log2FC of DEGs over 0.585 were identi ed as the diffuse subtype-speci c genes, whereas that less than -0.585 were intestinal subtype-speci c genes. To identify the gene associate with prognostic value, the "survival" package in R were used to make Cox regression analysis (19) and get the HRs and P-value of all genes in the GSE62254. The genes with P <0.05 were identi ed as OS (over survival) related genes. Then, the common two subtype genes and OS related genes were de ned as diffuse/intestinal type DEGs, using Venny's online software (http://bioinfogp.cnb.csic.es/tools/venny/index.html) to draw Venn diagrams.

GO and KEGG enrichment analysis
Using The Database for Annotation, Visualization and Integrated Discovery(DAVID, http://david.ncifcrf.gov/) ,which provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes, to analysis the diffuse/intestinaltype DEGs ontology functional annotation and KEGG pathway analysis. In order to get signi cant terms, set P value<0.05 as cut-off criterion.
PPI network construction and screening of hub gene of diffuse type GC Input the diffuse type DEGs into the Search Tool for the Retrieval of Interacting Genes (String) database for interaction network at the protein level, also known as protein-protein interaction (PPI) information,minimum required interaction score >0.700 (high con dence) was considered signi cantly. Then import the results into Cytoscape software to visualize the PPI network, using Cytohubba plugin to select the top 15 genes by four different algorithms, pick out and de ned duplicate genes as hub genes.
Patients' information and tissues samples A total of 40 GC patients who received a gastrectomy in The Third A liated Hospital of Anhui Medical University (Hefei, Anhui, China between December 2016 to July 2018 were recruited in this study. None of them received radiotherapy or preoperative chemotherapy before surgery. All specimens were handled and made anonymous according to the ethical and legal standards. Tissue samples were collected during the surgery for GC and were con rmed by tissue pathology examination. There were 20 cases for diffuse and intestinal type gastric cancers, respectively. All fresh tumor tissues specimens were collected from formalin-xed para n-embedded tissues of resection surgical procedures.

Immunohistochemical analysis
Immunohistochemistry was performed to determine the expressions of AGT, CXCL12 and ADRB2 in human diffuse type GC tissues and intestinal type GC tissues. Para n-embedded tissue were passed through dimethylbenzene and gradient ethanol solution to depara nize and rehydrate the sections. RNA extraction and quantitative real-time polymerase chain reaction (qRT-PCR) qRT-PCR was used to verify and compare the expression levels of three mutated genes (AGT, CXCL12 and ADRB2) between diffuse type GC tissues and intestinal type GC tissues. Total RNA was extracted with the trizol reagent (Invitrogen, USA) according to the manufacturer's instructions, and RNA purity was detected using a microplate reader (In nite M1000 PRO, TECAN). A PrimeScript RT reagent kit (Takara, Japan) was used for the complementary DNA synthesis reactions. Using SYBR® Premix ExTaq™ (Takara, Japan) to perform the qRT-PCR in an ABI Prism 7500 Sequence Detection System (Applied Biosystems, Foster City, CA, USA). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) were used as standardized references.
Primers were as follows: In order to improve the reliability of result, we select GSE62254 and GSE15459 datasets as research target respectively, set P value<0.05 as cut-off criterion.

Results
The owchart of the bioinformatics analytical methods is presented in Figure 1. The GSE62254 database totally included 300 different Lauren subtypes GC samples.265 samples were single out with de nite Lauren subtypes and certainly survival data, including 128 diffuse type GC samples and 137 intestinal type GC samples. The details information of these samples are shown in supplementary materials (Table  S1).According to the screening criteria of |logFC|≥0.585 and adjusted P value <0.05, 584 differentially expressed genes (DEGs), including 458 up regulated genes in diffuse type and 122 down regulated genes in intestinal type, were screened out by using the "limma" package in R, presented these DEGs in volcano plot( Figure 2A). To identify the gene associate with prognostic value, using the "survival" package in R to make Cox regression analysis and get the HRs and P-value of all genes in the GSE62254. 7389 genes with P <0.05 were identi ed as OS related genes. Using online webpage tool, Venn, to construct the Venn diagram of the DEGs and OS related genes. A total of 293 diffuse type DEGs and 62 intestinal type DEGs were picked out for further research ( Figure 2B).
Uploaded the diffuse type and intestinal type DEGs list respectively to the online website DAVID to analyze the GO function and KEGG pathway analysis, the results were considered as a signi cant one if P value < 0.05. GO analysis showed that the diffuse type DEGs were mainly enriched in cell adhesion (ontology: BP), extracellular exosome (ontology: CC), and calcium ion binding (ontology: MF),while the intestinal type DEGs were mainly enriched in cell division (ontology: BP), nucleus (ontology: CC) and protein binding (ontology: MF). Details of the results are shown in Figure 3 and Figure 4. The top 15 results from the GO enrichment analysis of the subtype-speci c DEGs are shown in Table 1.
As for KEGG pathway analysis, the results of the analysis are shown in Figure 5 and Table 2. The diffuse type DEGs were mainly enriched in cGMP-PKG signaling pathway, while the intestinal type DEGs were mainly enriched in Cell cycle.
To explore and identify subtype-speci c genes in diffuse type GC further, the 293 diffuse type DEGs were uploaded to STRING online database to analyze and construct a protein-protein interaction (PPI) network, it was identi ed that 112 nodes and 182 interactions were involved in the PPI network ( Figure 6). Downloaded the results and analyzed in Cytoscape software, the top 15 hub genes were ranked by using the four different algorithms of the CytoHubba plugin according to the predicted scores. The gene overlapped was considered as signi cant and a total of 3 overlapping hub genes were determined for further analysis, included AGT, CXCL12, ADRB2 (Table 3).
Immunohistochemistry analysis showed that the distribution density of AGT, CXCL12 and ADRB2 is related to different GC lauren types. Compared with intestinal type GC tissues, the expression of AGT, CXCL12 and ADRB2 showed strongly stained in diffuse type GC tissues ( Figure 7).
As for the results of qRT-PCR, it is consistent with the results of immunohistochemistry analysis. The expression levels of AGT, CXCL12 and ADRB2 were signi cantly higher in the diffuse type GC tissues than in the intestinal type GC tissues (p < 0.01 for AGT, p < 0.001 for CXCL12, p < 0.01 for ADRB2) (Figure 8).
To evaluate the prognostic value of the 3 hub genes in diffuse type GC, we performed a Kaplan-Meier prognosis analysis for overall survival (OS) at Kaplan Meier Plotter (https://kmplot.com/analysis/). In order to improve the reliability of result, we select GSE62254 and GSE15459 datasets as research target respectively. The results showed that the high expression of AGT (logrank p=0.0048), CXCL12(logrank p=0.0027) and ADRB2(logrank p=0.014) indicated a poor prognosis for diffuse type GC patients according to GSE62254. In GSE15459, the high expression of AGT (logrank p=0.00056) and ARDB2(logrank p=0.0012) presented similar results, indicated a poor prognosis for diffuse GC type patients, while the expression of CXCL12(logrank p=0.093) was not correlated with prognosis (Figure9).

Discussion
The GC is a highly heterogeneous disease. Since the Lauren classi cation was proposed in 1965, it has been widely recognized by clinicians and pathologists and has been used up to now. For many years, the value of histopathologic classi cation in evaluating the prognosis of GC is very limited, and Lauren classi cation is considered to be the most valuable clinicopathological classi cation. There are signi cant differences between different Lauren subtype (20)(21)(22), which suggested that some speci c biomarkers might play an important role during genesis and development of GC. Although there are many studies on the biological mechanism of GC, there are few studies on speci c GC subtypes.
To explore the Lauren subtype-speci c genes of GC, our study selected GSE62254 dataset and screened 266 samples of diffuse or intestinal GC with certainly survival data out. A total of 598 DEGs were screened out by using R including 293 diffuse type DEGs and 62 intestinal type DEGs. To deeply explore the biological pathways and functions involved by these DEGs, we performed GO and KEGG analysis. To nd the key genes for diffuse type GC progression from the numerous DEGs, we identi ed the top 15 hub genes through the PPI network and Cytoscape by using 4 different algorithms and took the overlapped genes as the research object, including AGT, CXCL12, ADRB2. In order to validate the present results, we used Kaplan-Meier curves to analyze the association of the 3 hub genes expression with OS, and the results showed that all the 3 hub genes were related to the OS of diffuse type GC in other datasets. Therefore, the results indicate that these 3 hub genes may be new diagnostic and prognostic biomarkers for diffuse type GC.
The gene AGT encodes pre-angiotensinogen or angiotensinogen precursor protein, which mainly expressed in the liver and cleaved by the enzyme renin in response to lowered blood pressure.The resulting product, angiotensin 1 (Ang ), is cleaved by angiotensin converting enzyme (ACE) to produce angiotensin 2(Ang ) followed.In another word, the product of AGT constitutes a key component of Renin-Angiotensin-System (RAS). RAS could be involved in arterial hypertension, kidney disease, and other cardiovascular conditions in previous studies (23)(24)(25), with the deepening of research, more and more clinical studies support RAS signaling promoted cancer growth and dissemination (26). RAS components expressed in many cell types of the tumor microenvironment and directly affected cell proliferation, invasion, migration, metastasis, apoptosis, angiogenesis, cancer-associated in ammation and immunomodulation (26,27), it could direct or indirect promote tumor growth in many ways, for instance, regulating cancer-associated broblasts (CAFs) (28) and promoting VEGF-mediated angiogenesis (29,30) in solid tumors. Integrating characteristics of diffuse type GC and our analysis result, we speculated that AGT might be a potential indicator for the diagnosis and prognosis of diffuse type GC.
The CXCL12 gene, also known as stromal cell-derived factor 1 (SDF1), encodes a stromal cell-derived alpha chemokine member of the intercrine (chemokine CXC) family. The encoded protein, chemokine CXCL12, binds mainly to the receptors CXC receptor 4 (CXCR4) (31)(32)(33), play an essential role in many diverse cellular functions. CXCR4 is widely expressed on hematopoietic cells,embryonic pluripotent stem cells and several types of tissue-committed stem cells (34), which have direct or indirect proangiogenic properties. Proven evidence shows the CXCL12/CXCR4 axis is associated with tumor progression, angiogenesis, metastasis, and survival. CXCL12 overexpression will enhance the proliferation and invasion of colon cancer cells through the MAPK/PI3K/AP-1 signaling pathway (35). Serve as a hub gene, high CXCR4 expression could be a biomarker indicating poor prognosis for hepatocellular carcinoma patients (36). CXCL12/CXCR4 antagonists, such as plerixafor or BKT140, have already produced and display encouraging results in anti-cancer activity (37). However, according to our result, relationship between the expression of CXCL12 gene and its prognosis in human diffuse type GC between GSE62254 and GSE15459 presented quiet different results.In my opinion, the main reason is that the situation may be different according to different characteristics such as race, gander, helicobacter pylori infection. The differences between two datasets need more comprehensive and precise studies in the future to explain.
The gene ADRB2 encodes beta-2-adrenergic receptor, which belongs to the G protein-coupled receptor superfamily. ADRB2 protein can increase cAMP, and downstream L-type calcium channel interaction via adenylate cyclase stimulation through trimeric Gs proteins, and then mediate physiological response such as bronchodilation and smooth muscle relaxation. In recent years, some studies have pointed out that ARDB2 also plays an important role in many cancers. ADRB2 signaling could negatively regulated autophagy, leading to hypoxia-inducible factor-1α stabilization and induced sorafenib resistance of HCC (hepatocellular carcinoma) (38). Moreover, ADRB2 expression was associated with HCC outcomes(39). In prostate cancer, intact sympathetic nerves were essential for tumor formation, and the ADRB2 high expression level can activate an angio-metabolic switch and affects the phenotype of the prostate cells and thereby their ability to migrate and invade (40). In GC, chronic stress caused by stress hormoneinduced activation of the ADRB2 signaling pathway plays a crucial role in GC progression and metastasis (41), ADRB2 signaling can regulates GC progression (42). Our study found that ARDB2 is highly expressed in diffuse type GC and has diagnostic and prognostic value.

Conclusions
In summary, we screened out 293 diffuse type DEGs from the GSE62254 dataset, which may contain hub genes contributing to the pathogenesis of diffuse type GC. The GO and KEGG enrichment analyses revealed that the DEGs were mainly enriched in extracellular exosome and cGMP-PKG signaling pathway. Through survival analysis, three of the top 15 hub genes including AGT, CXCL12, ADRB2, their high expressions are associated with a reduced survival time of patients with diffuse type GC. Through Immunohistochemistry and qRT-PCR, we found that the expression of previous hub genes in diffuse type GC tissues was high. According to literature, these genes have a speci c association with tumor invasion, metastasis and angiogenesis. Although we could not perform experimental research to probing potential oncogenic mechanisms, take previous reports into consideration, we suggest a hypothesis carefully that AGT, CXCL12, ADRB2 overexpression contributed to unfavorable prognosis in diffuse type GC.    Veri cation results of quantitative real-time polymerase chain reaction (qRT-PCR) of AGT, CXCL12 and ADRB2. ** Indicates a signi cant difference between diffuse type GC tissues and intestinal type GC tissue (p < 0.01). *** Indicates an extremely signi cant difference between diffuse type GC tissues and intestinal type GC tissues (p < 0.001) Figure 3 Immunohistochemical detection of AGT, CXCL12 and ADRB2 expression in different GC lauren type tissues. Original magni cations, ×400. Protein-protein interaction (PPI) network of diffuse type DEGs. Node color: the deeper the blue represents the higher the logFC of gene, Node size:the larger means the more genes connected.