Bioinformatics analysis of RNA-seq data revealed critical genes in type 1 diabetes and molecular docking studies

Background: Type 1 diabetes (T1D) is a serious threat to childhood life and has fairly complicated pathogenesis. Profound attempts have been made to enlighten the pathogenesis, but the molecular mechanisms of T1D are still not well known. Methods: To identify the candidate genes in the progression of T1D, Expression proling by high throughput sequencing dataset GSE123658 was downloaded from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identied, and gene ontology (GO) and pathway enrichment analyses were performed. The protein-protein interaction network (PPI), modules, target gene - miRNA regulatory network and target gene - TF regulatory network analysis were constructed and analyzed using HIPPIE, miRNet, NetworkAnalyst and Cytoscape. Finally, validation of hub genes was conducted by using ROC (Receiver operating characteristic) curve and RT-PCR analysis. Molecular docking studies were performed. Results: A total of 284 DEGs were identied, consisting of 142 up regulated genes and 142 down regulated genes. The gene ontology (GO) and pathways of the DEGs include cell-cell signaling, vesicle fusion, plasma membrane, signaling receptor activity, lipid binding, signaling by GPCR and innate immune system. Four hub genes were identied and biological process analysis revealed that these genes were mainly enriched in cell-cell signaling, cytokine signaling in immune system, signaling by GPCR and innate immune system. ROC curve and RT-PCR analysis showed that EGFR, GRIN2B, POLR2A and PRKACA might be involved in the advancement of T1D. Molecular docking studies showed high docking score. Conclusions: DEGs and hub genes identied in the present investigation help us understand the molecular mechanisms underlying the advancement of T1D, and provide candidate targets for diagnosis and treatment of T1D.


Introduction
Type 1 diabetes (T1D) (insulin-dependent) is a core challenge for endocrine research around the world [1].
Approximately 5% to 10% of the childhood population is affected with T1D worldwide [2]. T1D affects the eyes, kidneys, heart, peripheral and autonomic nervous systems [3]. Pancreatic cells, particularly β-cells, play a key role in the occurrence and progression of T1D [4]. Although disease etiology still remains unknown. Genetic and environmental factors were associated with pathogenesis of T1D [5][6]. Nisticò et al [7] reported that up regulation of molecular biomarker CTLA-4 was linked with progression of T1D.
Thus, identifying key molecular biomarkers is essential for early diagnosis, prevention, and treatment of T1D.
It worth a lot of money and time to identify disease related molecular biomarkers by experiment alone.
With the wide application of microarray platform, there were huge genomics data deposited in public databases [8]. The progression of computational tools gives us an alternative method to diagnose novel molecular biomarkers.
In this investigation, we employed the bioinformatics approach to discover the differentially expressed genes between T1D patients and healthy donors. Original microarray dataset GSE123658 was downloaded. 39 T1D patients' samples and 43 healthy donors' samples were analyzed in our investigation. Commonly altered DEGs were isolated from integrated data. Additionally, GO/ REACTOME pathway analysis, construction of protein-protein interaction network, modules, target gene -miRNA regulatory network and target gene -TF regulatory network analysis were performed to analyze these data. Four hub genes (EGFR, GRIN2B, POLR2A and PRKACA) were identi ed. ROC (Receiver operating characteristic) curve and RT-PCR analysis were used to verify clinically relevant hub genes. The aim of this investigation was to gain a better understanding of the underlying molecular mechanisms and to discover molecular biomarkers for T1D.

Data resources
Expression pro ling by high throughput sequencing dataset GSE123658 was downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) [9]. The data was produced using a GPL18573 Illumina NextSeq 500 (Homo sapiens). The GSE123658 dataset contained data from 82 samples, including 39 T1D patients' samples and 43 healthy donors' samples.

Identi cation of DEGs
The identi cation of DEGs between 39 T1D patients' samples and 43 healthy donors' samples was performed using Limma package in R bioconductor. To correct the discovery of statistically important molecular biomarkers and limitations of false-positives, we using the adjusted P-value and Benjamini and Hochberg false discovery rate method [10]. A |log2FC| > 1 and P-value < 0.05 were used as considered statistically signi cant. The volcano plot was implemented using ggplot2 package, and the heat map was established using gplots package in R language.
Gene Ontology (GO) and pathway enrichment analyses of DEGs Gene Ontology (GO) (http://www.geneontology.org) analysis is a routine analysis for annotating genes and determining biological component, including biological process (BP), cellular component (CC) and molecular function (MF) [11]. REACTOME (https://reactome.org/) [12] pathway database is applied for classi cation by correlating gene sets into their respective pathways. The ToppGene (ToppFun) (https://toppgene.cchmc.org/enrichment.jsp) [13] is a gene functional classi cation tool that objective to provide a extensive set of functional annotation tools for authors to recognize the biological explanation behind large lists of genes. P<0.05 was nd statistically signi cant.

PPI network construction and module analysis
The online Human Integrated Protein-Protein Interaction rEference (HIPPIE) (http://cbdm.unimainz.de/hippie/) [14] online database was using to predicted the PPI network information. Analyzing the interactions and functions between DEGs may provide information about the mechanisms of generation and development of disease (PPI score > 0.4). Cytoscape 3.8.0 (http://www.cytoscape.org/) [15] is a bioinformatics platform for constructing and visualizing molecular interaction networks. The Network Analyzer Cytoscape plug-in was used to nd hub genes were screened using highest node degree [16], betweenness centrality [17], stress centrality [18] and closeness centrality [19] methods and hub genes were further analyzed for pathway and GO enrichment analysis. The plug-in PEWCC1 (http://apps.cytoscape.org/apps/PEWCC1) [20] of Cytoscape was applied to detect densely connected regions in PPI networks. The PPI networks were constructed using Cytoscape and the most key module in the PPI networks was preferred using PEWCC1. The criteria for selection were set as follows: Max depth = 100, degree cut-off = 2, Node score cut-off = 0.2, PEWCC1 scores >5, and K-score = 2.
Prediction of key miRNAs miRNet database (https://www.mirnet.ca/) [21] online database was used to predict miRNAs that targeted the DEGs associated with T1D. The DEGs were selected according to the screening criterion of P value <0.05. The results were exported into the Cytoscape software for analysis. The target genes -miRNA regulatory network was constructed through network topology prosperities. The node degree was determined using the Network analysis plugin, and miRNAs with a node degree >12.

Prediction of key TFs
NetworkAnalyst database (https://www.networkanalyst.ca/) [22] online database was used to predict miRNAs that targeted the DEGs associated with T1D. The DEGs were selected according to the screening criterion of P value <0.05. The results were exported into the Cytoscape software for analysis. The target genes -TF regulatory network was constructed through network topology prosperities. The node degree was determined using the Network analysis plugin, and TFs with a node degree >12.

Validation of hub genes
Receiver operating characteristic (ROC) curve analysis was implemented to calculate the sensitivity and speci city of the DEGs for T1D diagnosis using the R package by using the generalized linear model (GLM) in machine learning algorithms [23]. An area under the curve (AUC) value was determined and used to label the ROC effect. In RT-PCR analysis, total RNA was isolated from cultured cells of diabetics and normal using a TRI Reagent® (Sigma, USA). cDNA was synthesized using 2.0 μg of total RNA with the FastQuant RT kit (with gDNase; Tiangen Biotech Co., Ltd.). The relative mRNA expression was measured in the QuantStudio 7 Flex real-time PCR system (Thermo Fisher Scienti c, Waltham, MA, USA).
The relative expression levels were determined by the 2 -ΔΔCt method and normalized to internal control βactin [24]. All RT-PCR reactions were performed in triplicate. The primers used to explore mRNA expression of ten hub genes were shown in Table 1.

Molecular Docking studies
The sur ex-docking studies of the designed molecule, standard on overexpressed genes of PDB protein were performed using SYBYL-X 2.0 perpetual software. All the designed molecules and standard were outlined using ChemDraw Software, imported and saved in sdf. templet using openbabel free software. The protein structures of EGFR(epidermal growth factor receptor) and its co-crystallised protein of PDB code 2XYJ, 2XYX and GRIN2B its co-crystallisedprotein of PDB code and its co-crystallised protein of PDB code 5EWL, 6E7V wereextracted from Protein Data Bank [25][26]. Together with the TRIPOS force eld, GasteigerHuckel (GH) charges were added to all designed molecules and standard for the structure optimization process.Additionally, using MMFF94s and MMFF94 algorithm methods, energy minimization was done.The protein preparation was carried out after incorporation of protein.The cocrystallized ligand and all water molecules were removed from the crystal structure; more hydrogen's were added and the side chain was set. TRIPOS force eld was used for the minimization of structure. The compounds' interaction e ciency with the receptor was represented by the Sur ex-Dock score in kcal / mol units. The interaction between the protein and the ligand, the best pose was incorporated into the molecular area. The visualization of ligand interaction with receptor is done by using discovery studio visualizer.

Identi cation of DEGs
With a threshold of P-value <0.05 and absolute value of |log2FC| > 1, DEGs were identi ed from dataset. 284 DEGs were screened from the GSE123658 dataset, which consisted of 142 up regulated genes and 142 down regulated genes and only top ten up regulated genes and down regulated genes are listed in Table 2. Volcano plots were used to visualize differential expression of genes between the T1D group and healthy donors group (Fig.1). A heat map showed expression pro ling of DEGs in the analysis result ( Fig.2).

Gene Ontology (GO) and pathway enrichment analyses of DEGs
To identify the pathways which had the most signi cant involvement with the genes identi ed, up regulated and down regulated genes are listed in Table 3 and Table 4. DEGs were submitted into ToppGene for GO and REACTOME pathway enrichment analysis. GO enrichment analysis revealed that in BP terms, the up regulated genes were mainly enriched in cell-cell signaling and reproductive process, whereas down regulated genes were mainly enriched in vesicle fusion and biological adhesion. In CC terms, up regulated genes were mainly enriched in integral component of plasma membrane and supramolecular ber, whereas down regulated genes were mainly enriched in whole membrane and cell junction. In MF terms, up regulated genes were mainly enriched in signaling receptor activity and structural molecule activity, whereas down regulated genes were mainly enriched in lipid binding and GTPase binding. REACTOME pathway analysis demonstrated that up regulated genes were signi cantly enriched in signaling by GPCR and muscle contraction, whereas down regulated genes were mainly enriched in innate immune system and cytokine signaling in immune system

Prediction of key miRNAs
The regulatory relationships between the target genes and their miRNAs were established using Cytoscape, which showed that the single gene was regulated by multiple miRNAs is shown in Fig.4A. Subsequently, 106 miRNAs (ex, hsa-mir-4257) targeting GRIN2B, 83 miRNAs (ex, hsa-mir-4257) targeting EGFR, 83 miRNAs (ex, hsa-mir-564) targeting EGFR, 190 miRNAs (ex, hsa-mir-4300) targeting TLN1 and 139 miRNAs (ex, hsa-mir-5694) targeting IGF2R were veri ed in miRNet database are listed in Table 6. Integrating with the result of REACTOME pathway analysis, it was indicated that these key target genes -miRNA network was mainly involved in the signaling by GPCR and innate immune system.

Prediction of key TFs
The regulatory relationships between the target genes and their TFs were established using Cytoscape, which showed that the single gene was regulated by multiple TFs is shown in Fig.4B. Subsequently, 19 TFs (ex, FOXD1) targeting RGS4, 16 TFs (ex, GATA2) targeting EYA1, 15 TFs (ex, STAT3) targeting PRKACA and 14 TFs (ex, TFAP2A) targeting GAB2 were veri ed in NetworkAnalyst database are listed in Table 6. Integrating with the result of REACTOME pathway analysis, it was indicated that these key target genes -TF network was mainly involved in the signaling by GPCR and innate immune system.

Validation of hub genes
As these 4 genes are remarkably expressed in T1D, we executed a ROC curve analysis to calculate their sensitivity and speci city for the diagnosis of T1D. As shown in Fig. 5A, EGFR, GRIN2B, POLR2A and PRKACA achieved an AUC value of > 0.985, demonstrating that these genes have high sensitivity and speci city for T1D diagnosis. We further used RT-PCR to detect the mRNA expression of the hub gene. The 4 hub genes contain two up regulated genes (EGFR and GRIN2B) and two down regulated gene (POLR2A and PRKACA). The RT-PCR data showed that although the trend of expression patterns of these 4 hub genes were consistent, among these up regulated genes, only EGFR and GRIN2B were signi cantly up regulated in T1D. In addition, the expression of POLR2A and PRKACA were reduced in T1D (Fig. 5B).

Molecular Docking studies
The docking simulations are conducted in the present study is to identify the active site conformation and major interactions responsible for complex stability with the receptor binding sites. In type 1 diabetes mellitus over expression of genes are identi ed and their proteins of x-ray crystallographic structure are selected from PDB for docking studies. The drugs containing thiazolidindione ring pioglitazone are most commonly used either alone or in combination with other antidibetic drugs. The docking studies of pioglitazone (as standard) and designed molecules containing the heterocyclic ring of thiazolidinedione have been carried out using Sybyl X 2.1 drug design software. The docking studies were performed to know the biding interaction of pioglitazone standard and designed molecules on identi ed over expressed genes of protein. The X-RAY crystallographic structure of two proteins from each over expressed genes of EGFR (epidermal growth factor receptor) and its co-crystallised protein of PDB code 2XYJ, 2XYX and GRIN2B (glutamate ionotropic receptor NMDA type of subunit 2B) and its co-crystallised protein of PDB code 5EWL, 6E7V respectively were selected for the docking. The research of designed molecules had been carried out to perceive the potential molecule. A total of 54 molecules were designed and the binding score greater than six are said to be good, few molecules and pioglitazone obtained Cscore greater than 7 with one in each of EGFR and GRIN2B over expressed genes of PDB code 2XYJ and 6E7V respectively.The designed molecules The two over expressed gens such as EGFR (epidermal growth factor receptor) and GRIN2B (glutamate ionotropic receptor NMDA type of subunit 2B) and two proteins from each were selected for docking. The compounds obtained with a good binding score with one protein in each over expressed genes out of two are PIO, TSPZP24,TSPZP25, TBPZP5, TSPZP27, TSIO6, TBPZP46, TSPZP23, TBIO34, TSPZ15 and TSPZP24 with binding score of 8.396, 7.6243, 7.3803, 17.3313, 7.2212, 7.20417.2021, 7.1987, 7.0795 and 7.0591 with PDB code of 2XYJ and 6E7V respectively (Fig. 6). Compounds such as TSIO8, TBPZP54, TBPZP49, TBPZP51, TBIO36, TBPZ44, TSPZ18, TSPZ13,   TBIO35TSPZ17, TBPZP53, TBPZ45 TBPZP50, TBIO30, TSPZ16, TBIO36, TBPZ40, TBPZ39, TSPZP24  and TSPZP24 TBPZP48 and TSPZ15, TSIO6, TSPZP24, TSPZP26 and TBPZ43, TBPZ42, TSPZ15, TBIO34 with binding score of 6.9341, 6.9257, 6.8768, 6.8488, 6.7296, 6.6452, 6.5682, 6.5625, 6.542, 6.407, 6.4023, 6.3781, 6.3091, 6.254, 6.1603, 6.160, 6.0903, 6.0703, 6.0263 and 6.1225, 5.5282 and 6.4652, 6.3819, 6.3205, 6.208 and 6.5068, 6.3741, 6 Table 7. The binding score of the predicted molecules are compared with that of the standard pioglitazone used in type 1 diabetes mellitus, the standard obtained good binding score with all proteins.
The molecule TSPZP24 and TSPZP22 has highest binding score its interaction with protein 6E7V and XYJ, and hydrogen bonding and other bonding interactions with amino acids are depicted by 3D (Fig. 7) and 2D (Fig. 8).

Discussion
Pathophysiological complications of T1D remains undetermined, hyperglycemia develops to play a key role [27]. Although T1D is rare compared to type 2 diabetics, it may affect in any part of the body, such as the eyes, kidneys, heart, peripheral and autonomic nervous systems [28]. Considering the poor prognosis of T1D, understanding the speci c molecular biomarkers of the disease is important for initial diagnosis and therapy to increase survival rates. In this investigation, we examined the expression pro ling by high throughput sequencing dataset GSE123658 including T1D group and healthy donors group to identify the molecular mechanism of T1D and seek some molecular biomarkers. Bioinformatics analysis of these biological factors is applied to explore genes that are favorable to treatment.
In total, 142 up and 142 down regulated genes were identi ed. Polymorphic ARMS2 gene was reported to be associated with retinopathy and coronary artery disease in T1D [29]. Polymorphic AS3MT gene has an important role in the development of T1D [30]. CRHR2 is a protein coding gene which was rst reported aberrantly over expressed and plays important roles in hypertension [31], but this gene might be involved in development of T1D with hypertension. Garcia-Martínez et al [32] reported that decrease expression of RNF122 proteins correlated with attention de cit hyperactivity disorder, but this gene might be linked with progression of T1D with hyperactivity disorder. Previous investigation have reported that decrease expression autophagy-regulating TP53INP2 was associated in the development of T1D [33]. Polymorphic CRTC2 gene was found to be involved in the progression of type 2 diabetes [34], but this polymorphic gene might be responsible for advancement of T1D.
GO and REACTOME pathway enrichment analyses were performed to explore interactions among the up and down regulated genes. Mutation in MYH6 gene contributes to progression of congenital heart defects [35], but this gene might be liable for progression of T1D with congenital heart defects. AVP (arginine vasopressin) has previously reported and has been suggested to play an important role in the T1D [36]. Up regulation of GRIA4 is associated with abdominal aortic aneurysm [37], but this gene might be answerable for advancement of T1D with abdominal aortic aneurysm. Kochetova et al [38] showed the possible involvement of GRIN2B in the development of type 2 diabetes, but this gene might be linked with development of T1D. The DCC (DCC netrin 1 receptor) gene has been known to participate in diabetic nephropathy [39]. Ruiz de Azua et al [40] showed that RGS4 was up regulated in type 2 diabetes, but this gene might be key for T1D. Wang et al. [41] reported that polymorphic GJA1 gene was expressed in patients with atrial brillation, but this might be play important role in T1D with atrial brillation. The high mRNA expression levels of GREM1 [42] and EGFR (epidermal growth factor receptor) [43] were correlated with T1D. García et al [44] reported that TRHR (thyrotropin releasing hormone receptor) was essential for progression of hypertension, but this gene might be associated with development of T1D with hypertension. PTPRT (protein tyrosine phosphatase receptor type T) is a key regulator of obesity and insulin resistance [45], but this gene might be involved in progression of T1D. Recent studies have shown that polymorphic FCAMR (Fc fragment of IgA and IgM receptor) gene is associated with progression of atherosclerosis [46], but this gene might be crucial for progression of T1D with atherosclerosis. Altered expression of CCL19 was observed to be associated with the progression of diabetic nephropathy [47] and may be considered to be a novel biomarker for T1D. Down-regulation of DSP (desmoplakin) has been identi ed in cardiomyopathy [48], but this gene might be linked with progression of T1D with cardiomyopathy. ITIH4 was identi ed as a key gene associated with progression of coronary heart disease [49], but this gene may be responsible for advancement of T1D with coronary heart disease. Miyashita et al [50] reported that the low expression of the GAB2 gene is correlated with Alzheimer disease progression, but this gene might be crucial in advancement of T1D with Alzheimer disease. Notably, recent investigation have proposed that IGF2R is linked with the advancement and progression of T1D [51]. Investigation have shown that MYADM (myeloid associated differentiation marker) can be used as biomarkers for hypertension [52], but this gene might be liable for advancement of T1D with hypertension. TPCN2 [53] and APOL1 [54] genes are a key factors that initiates a type 2 diabetes, but these genes might be responsible for development of T1D. MEFV (MEFV innate immuity regulator, pyrin) down regulation can enhance ischemic heart disease [55], but this gene might be key for progression of T1D with ischemic heart disease. A previous investigation reported that the expression of DTX4 was associated with obesity [56], but this gene might be linked with progression of T1D with obesity. These GO term and REACTOME pathway analyses indicated the possible direction of experimental validation.
PPI network, modules, target gene -miRNA regulatory network and target gene -TF regulatory network analysis demonstrated that both up-and down regulated genes interacted directly or indirectly. The more edges associated with genes, miRNAs and TFs indicated the more potential selection for the targets.
In addition, we also performed validation of hub genes by ROC analysis and RT-PCR. Results showed that these hub genes differentiated T1D group from healthy donors group, and may be candidates for diagnostic biomarkers and new therapeutic target. Moreover, EGFR, GRIN2B, POLR2A and PRKACA are involved in progression of T1D.
In conclusion, the present investigation was designed to identify DEGs that may be involved in the progression of T1D. A total of 284 DEGs and 4 hub genes were identi ed and may be regarded as diagnostic biomarkers and new therapeutic target for T1D. However, further investigations are vital to enlighten the biological function of these genes in T1D.

Con ict of interest
The authors declare that they have no con ict of interest.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent
No informed consent because this study does not contain human or animals participants.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.