Integrated bioinformatics analysis reveals novel key biomarkers and potential candidate small molecule drugs in diabetic nephropathy

The underlying molecular mechanisms of diabetic nephropathy (DN) have yet not been investigated clearly. In this investigation, we aimed to identify key genes involved in the pathogenesis and prognosis of DN. We selected expression proling by high throughput sequencing dataset GSE142025 from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) between DN and normal control samples were analyzed with limma package. Gene ontology (GO) and REACTOME enrichment analysis were performed using ToppGene. Then we established the protein-protein interaction (PPI) network, miRNA-DEG regulatory network and TF-DEG regulatory network. The diagnostic values of hub genes were performed through receiver operating characteristic (ROC) curve analysis. Finally, the candidate small molecules as potential drugs to treat DM were predicted using molecular docking studies. Through expression proling by high throughput sequencing dataset, a total of 549 DEGs were detected including 275 up regulated and 274 down regulated genes. Biological process analysis of functional enrichment showed these DEGs were mainly enriched in cell activation, response to hormone, cell surface, integral component of plasma membrane, signaling receptor binding, lipid binding, immunoregulatory interactions between a lymphoid and a non-lymphoid cell and biological oxidations. DEGs with high degree of connectivity (MDFI, LCK, BTK, IRF4, PRKCB, EGR1, JUN, FOS, ALB and NR4A1) were selected as hub genes from protein-protein interaction (PPI) network, miRNA-DEG regulatory network and TF-DEG regulatory network. The ROC curve analysis conrmed that hub genes were high diagnostic values. Finally, the signicant small molecules were obtained based on molecular docking studies. Our results indicated that MDFI, LCK, BTK, IRF4, PRKCB, EGR1, JUN, FOS, ALB and NR4A1 could be the potential novel biomarkers for GC diagnosis prognosis and the promising therapeutic targets. The present study may be crucial to understanding the molecular mechanism of DN initiation and progression.


Introduction
Diabetic nephropathy (DN) is a common and devastating microvascular complication of the kidneys induced by diabetes mellitus [1]. The incidence of DN is reported to be 30% to 40% patients with diabetes [2] and is the main cause of end -stage renal disease throughout the world in both developed and developing countries [3]. Numerous risk factors may affect DN progression [4]; however, how these factors affect the development of DN requires further study and no effective method has been developed.
Despite important developments toward an understanding of the pathophysiology of DN, early diagnosis, therapeutic interference, and underlying molecular pathogenesis hover a require [5]. Therefore, enlighten the rare nature belonging to DN is predominant in expand therapies to improve patient outcome.
The exact mechanisms of DN are still unknown. A number of investigation have reported possible roles of some genes and pathways such as UCP1-3 [6] and JAK/STAT3 signaling pathway [7] in the development of DN. However, these reports only concentrated on any certain molecule, gene or pathway, ignoring that the development process involves aberrant expression of a variety of genes and pathways, among which some proteins might interact with other proteins and thus play a essential role in the DN [8].
Hub genes may act as prognostic or diagnostic biomarkers or treatment targets for DN [9]. Therefore, it is urgent to search new biomarkers for DN with a powerful genome-wide technology.
The high-throughput platforms for analysis of gene expression, such as high throughput RNA sequencing, are increasingly valued as promising tools in medical eld with great clinical applications: molecular diagnosis, prognosis prediction and new drug targets discovery [10]. The Gene Expression Omnibus (GEO) is a database and online resource for the gene expression of any species. In the current investigation, expression pro ling by high throughput sequencing dataset (GSE142025) was downloaded. In total, there are 28 DN samples and 8 normal control samples datasets available. A data processing standard was used to lter the DEGs on the limma package of R language, followed by Gene Ontology (GO) and pathway enrichment analyses using ToppGene software. The DEGs protein-protein interaction (PPI) network and modular analysis were integrated using InnateDB interactome software to identify hub genes in DN. The regulatory network of miRNA and TF was constructed and the target genes with high degree of connectivity were selected. Receiver operating characteristic (ROC) curve analysis was used to predicting power of the gene signature. Molecular docking experiment was implemented for selected hub genes. This study will enhance our understanding of the molecular mechanisms of DN.

Data source
The DN expression pro ling by high throughput sequencing dataset GSE142025 was downloaded from the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo) [11]. The dataset GSE142025 was based on the GPL20301 platform (Illumina HiSeq 4000 (Homo sapiens)), including 28 DN samples and 8 normal control samples.

Identi cation of DEGs
Identify a gene that are differentially expressed across experimental conditions, was used to identify DEGs in the GSE142025 dataset with the limma package of R language, which had been processed, normalized and transformed. An adjusted P-value was retrieved by implement the Benjamini-Hochberg false discovery rate (FDR) correction on the original P-value, and a fold change threshold was preferred based on our plan to target on statistically signi cant DEGs [12]. Only genes with a fold change > 1.35 for up regulated genes and fold change < -1.24 for down regulated genes, and adjusted P-value <0.05 were considered as statistically signi cant DEGs. A volcano plot and heat map of the identi ed DEGs was also constructed, using an R package.

Gene ontology and pathway enrichment analysis of DEGs
In the current investigation, the signi cant enrichment analysis of DEGs was assessed based on the Gene Ontology (GO) and REACTOME using the ToppGene (ToppFun) The module SYBYL-X 2.0 perpetual software were used for Sur ex-Docking of the designed molecules. The molecules were sketched by using ChemDraw Software and imported and saved in sdf. format using openbabel free software. The protein structures of CyclinB1 (CCNB1) its co-crystallisedprotein of PDB code 4Y72, 5H0V and Four and half LIM domains 2 (FHL2) its NMR structure of proteins 2D8Z and 2EHE was retrieved from Protein Data Bank [26][27]. Together with the TRIPOS force eld, GasteigerHuckel (GH) charges were added to all designed derivatives for the structure optimization process.In addition, energy minimization was carried outusing MMFF94s and MMFF94 algorithm process. Protein processing was carried out after the incorporation of protein.The co-crystallized ligand and all water molecules were removed from the crystal structure; more hydrogens were added and the side chain was set. TRIPOS force eld was used for the minimization of structure. The compounds' interaction e ciency with the receptor was represented by the Sur ex-Dock score in kcal / mol units. The interaction between the protein and the ligand, the best pose was incorporated into the molecular area. The visualisation of ligand interaction with receptor is done by using discovery studio visualizer.

Results
Identi cation of DEGs DN and normal control samples (28 and 9, respectively) were rst analyzed. Limma was used to analyze the series of each chip and to identify the DEGs. Following analysis of GSE142025 dataset, 549 DEGs (275 up regulated and 274 down regulated) genes were identi ed ( Fig. 1. and Table 1). The results of the cluster analysis of DEGs revealed signi cant differences between the DN and normal control samples (Fig. 2).

Gene ontology and pathway enrichment analysis of DEGs
The identi ed DEGs were uploaded to the online software ToppGene for GO and REACTOME pathway enrichment analyses and results are listed in Table 2 and Table 3. The results of the GO analysis revealed that up regulated genes were signi cantly enriched in BP, including cell activation and regulation of immune system process, whereas down regulated genes were signi cantly enriched in response to hormone and ion transport. In terms of CC, the up regulated genes were enriched in cell surface and intrinsic component of plasma membrane, whereas down regulated genes were enriched in integral component of plasma membrane and nuclear chromatin. In terms of MF, the up regulated genes were enriched in signaling receptor binding and identical protein binding, whereas down regulated genes were enriched in lipid binding and transporter activity. REACTOME pathway analysis revealed that the up regulated genes were highly associated with pathways including immunoregulatory interactions between a lymphoid and a non-lymphoid cell, and innate immune system, whereas down regulated genes were signi cantly enriched in biological oxidations and GPCR ligand binding.

Construction of protein-protein interaction (PPI) network
The DEG expression pro les in DN were constructed according to the information in the InnateDB interactome database. The PPI network of DEGs is consisted of 2718 nodes and 4477 edges (Fig. 3A).
There are 10 genes selected as hub genes, such as MDFI, LCK, BTK, IRF4, PRKCB, EGR1, JUN, FOS, ALB and NR4A1 are listed in Table 4. A two signi cant modules were obtained from PPI network of DEGs using PEWCC1, including 15 nodes and 36 edges (Fig. 3B) and 7 nodes and 12 edges (Fig. 3C). Gene ontology and pathway enrichment analysis revealed that genes in these modules were mainly involved in innate immune system, tmmunoregulatory interactions between a lymphoid and a non-lymphoid cell, cell activation, regulation of immune system process, cell surface, response to hormone, cytokine signaling in immune system, metabolism of proteins and nuclear chromatin.

Integrated regulatory network construction
The

Discussion
Based on the PPI network and module analysis, we obtained top hub genes in the whole network. Onions et al. [139] showed that ALB (albumin) was potential biomarkers of DN and disease progression. Novel biomarkers such as MDFI (MyoD family inhibitor), FOS (Fos proto-oncogene, AP-1 transcription factor subunit), SH2D1A, SLA2, TRAT1, CD3E, JUNB and FOSB might play an important role in the development of DN.
Based on the miRNA-DEG regulatory network and TF-DEG regulatory network, we obtained target in the whole network. Recent investigation reported that the dysregulated activity of MYBL2 was associated with myocardial infarction progression [140], but this gene might be linked with progression of DN. Many investigation have reported the hsa-mir-637 [141] and NR3C1 [142] were linked with progression of hypertension, but these genes might be liable for advancement of DN. Wang et al. [143] noted that hsamir-1261 was associated with development of DN. Li et al [144] reported that hsa-mir-4458 expression was an independent marker of prognosis in myocardial infarction, but this gene might be involved in DN. Keller et al [145], Fujimoto et al [146] and Xu et al [147] demonstrated that expression of NFATC2, PDX1 and CREB1were involved in type 2 diabetes, but these genes might be key for progression of DN. Du et al. [148], Zhang et al. [149] and Zhao et al. [150] found that YY1, TP53 and SRF (serum-response factor) played a key role in DN through the epithelial-mesenchymal transition. Novel targets such as IL2RB, hsamir-4492, hsa-mir-4319, hsa-mir-4300, hsa-mir-3943, hsa-mir-548e-3p, hsa-mir-6077, hsa-mir-5586-5p, RET (ret proto-oncogene), MAP1LC3C, PTPRO (protein tyrosine phosphatase receptor type O), NR2C2, MAX (myc-associated factor X) and ARID3A might be responsible for progression of DN.
In the present investigation, the DEGs of DN and normal control samples were analyzed to achieve a better understanding of DN. GO and pathway enrichment analyses of DEGs were applied, and the proteinprotein interaction (PPI) network, module and miRNA-DEG regulatory network and TF-DEG regulatory network of these DEGs were also constructed. ROC analysis and molecular docking experiments conducted. The aim of this investigation was to identify essential genes and pathways in DN using bioinformatics analysis, and then to explore the intrinsic mechanisms of DN and distinguish new potential diagnostic and therapeutic biomarkers of DN. We anticipated that these investigations will provide further insight of DN pathogenesis and advancement at the molecular level.

Declarations Informed consent
No informed consent because this study does not contain human or animals participants.
Availability of data and materials

Competing interests
The authors declare that they have no competing interests.