The Identication of Hub Genes and Pathways in type 2 Diabetes Mellitus by Bioinformatics Analysis

Background This study aimed to identify potential core genes and pathways involved in type 2 diabetes mellitus (T2DM) through exhaustive bioinformatics analysis. This study elucidated parts of the pathogenesis of T2DM and screened therapeutic targets of the treatment. Method: The original microarray data GSE25724 was downloaded from the Gene Expression Omnibus database. Data were processed by the limma package in R software and the differentially expressed genes(DEGs) were identied. Gene Ontology(GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway analysis were carried out to identify potential biological functions and pathways of the DEGs. The STRING(Search Tool for the Retrieval of Interacting Genes ) and Cytoscape software were used to establish a protein-protein interaction(PPI) network for the DEGs.Hub genes were identied using the PPI network. Results analysis that mainly enriched of unfolded protein that were enriched the fatty acid and of leucine, Protein Convertase Subtilisin/Kexin

Omnibus (GEO) have been applied to mining the pathogenesis of various diseases. Islet is a cardinal organ involved in type 2 diabetes, islet dysfunction is the central cause of diabetes mellitus. In this study, we aimed to identify potential hub genes and pathways involved in T2DM through exhaustive bioinformatics analyses using GSE25724 microarray pro les of pancreatic islet cells obtained from healthy controls and patients with T2DM. The original microarray data were downloaded from the Gene Expression Omnibus database. Using R software packages and bioinformatics analysis to explore the molecular mechanism of the pathogenesis in diabetes.

Database selection:
The microarray dataset GSE25724, based on the GPL96 platform ([HG-U133A] Affymetrix Human Genome U133A Array)was obtained from the GEO (www.ncbi.nlm.nih.gov/geo/) database. The GSE25724 dataset was provided by Veronica Dominguez et al. Human islets were isolated from 7 nondiabetics and 6 T2DM organ donors by collagenase digestion followed by density gradient puri cation, these samples were used in the microarray.
1.2 Methodology 1.2.1 Screening of differentially expressed genes R language was used to analyze the raw data onto microarray, the normalize-Between-Arrays function of limma package was applied to normalize the intensity of expression. Then, t-tests were performed in the limma package to identify DEGs. The threshold value of DEGs was selected by a p-value <0.05 and | log2 fold change (FC) |>2. EnhancedVolcano and heatmap packages were used to visualize the DEGs.

GO function and KEGG pathway analysis
GO function analysis is a widely used bioinformatics tool to investigate the annotation of genes and proteins. It can be utilized to integrate annotation data and provides tools access to all the data provided by the project. KEGG can integrate currently known protein interaction network information. In this study, we applied createKEGGdb, org.Hs.eg.db, cluster Pro ler packages of R language to comprehend the biological function of the DEGs. Gene ontology(GO) functional analysis was applied to annotate DEGs from biological processes(BP), Cellular components(CC), and molecular functions(MF). Kyoto Encyclopedia of Genes and Genomes(KEGG) was applied to annotate the DEG pathways. The pvalue cutoff is 0.05. The ggplot2 package was used for visualization of the results of GO and KEGG pathway analysis.

Protein-protein interaction program analysis
Since proteins rarely perform biological functions independently, it is noteworthy to be aware of protein interactions. STRING (Search Tool for the Retrieval Interacting genes) (http://string-db.org/) is an online software for interactions of genes and proteins. Cytoscape is an open-source tool for network visualization of genes and proteins. Protein-protein interaction(PPI) of the DEGs was constructed from the STRING database and was visualized by Cytoscape. The Cytoscape software uses the default parameters for analysis, and the connectivity degree of each node in the network was measured by connectivity analysis. DEGs with a degree of connectivity ≥3 were de ned as having a high degree of connectivity and were used to screen for core genes. The top 5 hub genes were selected.

Identi cation of DEGs between T2DM and normal islet tissues
The dataset was standardized by the Normalize-Between-Arrays-function of the limma package, then we deleted duplicated genes and values lacking speci c gene symbols. A total of 75 DEGs were obtained. Among these DEGs, 1 genes up-regulated and 74 genes down-regulated. The DEGs from the dataset were presented in the volcano maps (Fig. 1A). The top 30 DEGs performed by heatmap were shown in Fig. 1B. 2.2 GO biological process analysis and KEGG pathways enrichment GO analysis of genes includes molecular function(MF), Biological processes(BP) ,and cell composition(CC). In our study, GO analysis was utilized to perform the functional process of the DEGs. A P-value < 0.05 was de ned to identify up and down-regulated genes in GO functional enrichment. The results are presented in Fig. 2A and Table 1. GO biological process analysis found that at the BP level, DEGs were mainly enriched in the regulation of hormone levels, hormone secretion, hormone transport, hormone metabolism, and regulation of insulin secretion. KEGG pathway enrichment analysis results showed that DEGs were signi cantly enriched in the fatty acid metabolism pathway, propionate metabolism pathway, degradation pathway of valine, leucine, and isoleucine. The results were presented in Fig. 2B and Table 2.

PPI network constructions
We used the STRING database (https://string-db.org) and Cytoscape software to investigate the PPI network, a PPI network of DEGs was performed as showed in Fig. 3. SCG5, SNAP25, SCP2, CPE, and PCSK1 were the key genes in the PPI network.

Discussion
In recent years, with the rapid development of modern biotechnology such as biochip and highthroughput sequencing, the increasing maturity of bioinformatics analysis, data analysis and mining of candidate genes play a leading role in the progress of diseases gradually. Bioinformatics analysis can provide fresh ideas about the study of the pathogenesis of diseases and screen for therapeutic targets. The incidence of type 2 diabetes is rapidly increasing, nevertheless, the exact pathogenesis is still unclear.
Type 2 diabetes is a metabolic disease with multiple genes involved. Exploring the molecular level dysfunction, in particular, targeting key abnormal genes in islet cells of type 2 diabetes can provide e cacious analysis of differentially expressed genes and related biological functions and signaling pathways to type 2 diabetes. These are extremely important for the elucidation of the pathogenesis of type 2 diabetes. Dominguez V isolated human islets from the pancreas of 7 non-diabetics and 6 type 2 diabetic organ donors by collagenase digestion followed by density gradient puri cation. They performed microarray analysis to evaluate differences in the transcriptome of type 2 diabetic human islets compared to nondiabetic islet samples. The platform is GPL96[HG-U133A]Affymetrix Human Genome U133A Array. The GEO accession number is GSE25724. In our study, we extracted the expression data from GSE25724. By using the R language limma package, we screened differentially expressed genes, which may be associated with the development of type 2 diabetes. To further investigate the interactions between the DEGs, GO function, and KEGG pathway enrichment analysis were performed. The hub genes were found by the PPI network analysis. Here, we got 75 DEGs, the vast majority of which were down-regulated, only one gene was up-regulated, it was SRY (sex-determining region Y)-box 4(SOX4). The top 30 genes with the greatest differences were showed in Fig. 1. Subsequently, we performed GO and KEGG functional enrichment on these DEGs. Besides, PPI network analysis was performed on DEGs. The GO analysis indicated that the DEGs were primarily enriched in the regulation of hormone levels, hormone secretion, hormone transport, hormone metabolic processes, and regulation of insulin secretion at the level of biological processes (BP). The DEGs were primarily enriched in unfolded protein response at the molecular functional level(MF). KEGG pathway enrichment analysis showed that DEGs were markedly enriched in fatty acid metabolism pathway, propionate metabolism pathway, degradation pathway of valine, leucine, and isoleucine.
Increased circulating lipid levels and metabolic alterations in fatty acid metabolic pathway dysfunction and intracellular signaling have turned out to be associated with insulin resistance in the muscle and liver of diabetic patients [2] . Imbalance in fatty acid metabolism can lead to impaired GSIS(glucose-stimulated insulin secretion) with concomitant oxidative and metabolic stress, endoplasmic reticulum stress, and numerous pro-apoptotic signals, all of which lead to a decrease of β-cell survival [3] . Propionate inhibits hepatic glucose gluconeogenesis via the G protein-coupled receptor 43/AMP-activated protein kinase(GPR43/AMPK) signaling pathway [4] . The plasma levels of branched-chain amino acids(valine, leucine, isoleucine) increase in conditions related to insulin resistance, such as obesity and diabetes [5] . Higher circulating level of branched-chain amino acids is strongly linked to a higher risk of type 2 diabetes [6][7] . Experimental studies have found that impairment of the adaptive unfolded protein response in mouse β-cells leads to reduce transportation from endoplasmic reticulum to Golgi protein and further increase β-cell death [8] . Our ndings were generally consistent with those papers.
A PPI network of DEGs was performed by using the STRING database and Cytoscape software. SCG5, SNAP25, SCP2, CPE, and PCSK1 were the hub genes. SCG5 encodes a secretory chaperone that prevents the aggregation of other secret proteins, including those associated with neurodegenerative and metabolic diseases. It has been mostly studied for its role in the transportation and activation of the prohormone convertase 2. SCG5 acts as a molecular chaperone for kexin2 proprotein convertase subtilisin/prohormone convertase 2 (PCSK2/PC2), preventing its premature activation in the regulated secretory pathway. SCG5 binds to inactivated PCSK2 in the endoplasmic reticulum, facilitates its transport from there to later compartments of the secretory pathway where it is proteolytically matured and activated. It is found that the changes of type 2 diabetic phenotype in GK rats may be caused by the accumulation of multiple genetic variants, including the SCG5 gene, and the mutated genes may affect biological functions including adipocytokine signaling, glycerolipid metabolism, PPAR signaling, T cell receptor signaling, and insulin signaling pathways [9] . The present study found that the SCG5 gene was mainly enriched in unfolded protein binding (GO:0051082), which might be linked to hormone preprocessing and insulin secretion. SNAP25 is interconnected with proteins that participate in vesicle docking and membrane fusion.SNAP25 regulates plasma membrane recirculation through its interaction with centromere protein F (CENPF).SNAP25 also modulates the gating characteristics of the delayed recti er voltage-dependent potassium channel, potassium voltage-gated channel, Shab-related subfamily, member 1(KCNB1) in pancreatic beta cells. Studies have evaluated the possible role of SNAP25 polymorphisms in T2DM, suggesting that the minor SNAP25 rs363050 (G) allele, which results in a reduced SNAP25 expression is associated with altered glycemic parameters in patients with T2DM, possibly because of reduced functionality in the exocytotic machinery leading to the suboptimal release of insulin [10] . Tao Liang [11] found that SNAP23 is the ubiquitous SNAP25 isoform that mediates secretion in non-neuronal cells, similar to SNAP25 in neurons. Pancreatic islet β cells contain an abundance of both SNAP25 and SNAP23. SNAP23 depletion promotes SNAP25 to bind calcium channels more quickly and longer where granule fusion occurs to increase exocytosis e ciency. In this study, we found that the SNAP25 gene was mainly enriched in hormone transport (GO:0009914), which might be linked to insulin cytokinesis and exocrine secretion.
SCP2, a non-speci c lipid-transport protein; mediates the transfer of all common phospholipids, cholesterol, and gangliosides between cell membranes. SCP2 may play a role in the regulation of steroidogenesis. It was found that SCP2 protein levels were decreased signi cantly in severely hypercholesterolemic diabetic animals. This differential expression of sterol carrier proteins SCP2 may accompany diabetic dyslipidemia, which should be considered a potential contributing mechanism through which cholesterol metabolism may be altered in diabetes [12] . Our study found that the SCP2 gene was enriched mainly in coenzyme binding (GO:0050662), which was involved in disorders of diabetic lipid metabolism.
CPE encodes a member of the M14 family of metallo-carboxypeptidases. It is a categorical receptor that directs hormone precursors into regulatory secretory pathways. It also serves as a hormone precursor processing enzyme in neural/endocrine cells, removing dibasic acid residues from the C terminus of peptide hormone precursors after initial endonuclease cleavage. Carboxypeptidase E is a peptide processing enzyme involved in the cleavage of numerous peptide precursors, including neuropeptides and hormones associated with appetite control and glucose metabolism including proinsulin. Diseases associated with CPE contain hyperinsulinemia and insulinoma. CPE is involved in the biosynthesis of various neuropeptides and peptide hormones in endocrine tissues and the nervous system. Loss of normal CPE leads to various disorders, containing diabetes, hyperinsulinemia, low bone mineral density, and de cits in learning and memory [13] . Truncating mutations in the CPE gene have been shown to cause morbid obesity, intellectual disability, abnormal glucose homeostasis, and hypogonadotrophic hypogonadism, it reveals the importance of CPE in the regulation of body weight and metabolism, and brain and reproductive function in humans [14] . GO annotations associated with this gene include cell adhesion molecule binding, carboxypeptidase activity, and peptide hormone processing(GO00016486). PCSK1 also known as neuroendocrine convertase 1, encodes a member of the Bacillus subtilisin-like preprotein convertase family, which have the capacity to regulate secretory pathways or to be a component of one of the branchings, proteases that process protein and peptide precursors. PCSK1 is involved in the processing of hormone and other protein precursors at sites comprised of pairs of basic amino acid residues. Substrates include proopiomelanocortin (POMC), renin, enkephalin, dynorphin, somatostatin, and insulin. The universal genetic variants rs6232 and rs6235 within PCSK1 are found to determine glucose-stimulated proinsulin conversion, but not insulin secretion. Besides, rs6232 in uences glucose homeostasis and insulin sensitivity independently of Body Mass Index (BMI) and proinsulin concentrations [15] . Rona JS et al [16] identify nine genetic variants associated with fasting insulinogen, including PCSK1, which is associated with the glucose homeostasis and T2DM development in humans and argues against a direct role of proinsulin in coronary artery disease pathogenesis. Mayumi Enya [17] found that the genetic variant of PCSK1 may in uence glucose homeostasis by altering insulin resistance independently of BMI, incretin level, or proinsulin conversion, and may be associated with the occurrence of type 2 diabetes in Japanese. In our study, we found that the PCSK1 gene was mainly enriched in regulating hormone levels (GO:0010817). it was concluded that the gene is related to the regulation of hormone and glucose homeostasis in diabetes mellitus.

Conclusion
In summary, the present data provide a comprehensive bioinformatics analysis of DEGs that might be linked to the progression of T2DM. We have identi ed 75 candidate DEGs and 5 hub genes SCG5, SNAP25, SCP2, CPE, PCSK15 based on pro le datasets, and bioinformatics analyses. To a certain extent, these ndings could lead to a rise in our understanding of the etiology and underlying molecular events of T2DM, and provide the research direction and theoretical basis for revealing the molecular mechanism and therapeutic targets of T2DM. However, supplementary experiments in vitro and in vivo are needed to validate the role of these screened genes and pathways in the progression of type 2 diabetes.

Declarations
Ethics approval and consent to participate This analysis was based on a previously published study and no ethical approval and patient consent are required.

Consent for publication
Written informed consent for publication was obtained from all participants.

Availability of data and materials
The datasets used or analyzed during the present study are available from the corresponding author on reasonable request.