Identication of Hub Genes Associated With Clinical Characteristics and Potential Therapy of Diabetic Nephropathy by Bioinformatics Analysis.

Background: To provide molecular markers and potential targeted molecular therapy for diabetic nephropathy by screening hubgenes based on bioinformatic analysis. Results: We found 91 differentially expressed genes (DEGs) between diabetic nephropathy tissues and normal kidney tissues. Majority DEGs were signicantly enriched in the extracellular matrix structural constituent, collagen-containing extracellular matrix. KEGG pathway analysis showed that most of DEGs participated in PI3K-Akt signaling pathway, AGE-RAGE signaling pathway in diabetic complications. Five high relevant sub-networks and the top 16 genes according to 12 topological algorithms were screened out and also ve co-expressed gene modules were identied by WGCNA. Eventually, 5 hub genes were identied by taking the intersection which might be involved in the progression of DN. And 11 microRNAs were associated with related genes in WebGestalt. Conclusions: We identied ve hub genes, namely COL1A2, COL6A3, COL15A1, CLU and LUM, and their related microRNAs(especially miR29 and miR196), which might be used as diagnostic biomarkers and therapeutic targets for diabetic nephropathy.


Background
Diabetic nephropathy (DN) was one of the most common microvascular complications of diabetes mellitus and also the main cause of end-stage renal disease (ESRD) [1,2]. In terms of pathophysiology, diabetic nephropathy can be divided into glomerular lesion, tubules and interstitial atrophy and brosis. The clinical features of diabetic nephropathy were characterized by hyperglycemia, continuous decrease in GFR, accompanied by sustained increase in proteinuria and SCR [3]. Therefore, these four characteristics were usually used in clinical diagnosis, but there was no speci c molecular diagnosis of diabetic nephropathy. Meanwhile, the treatment of diabetic nephropathy could only provide symptomatic supportive treatment which was limited for preventing DN progression such as blood glucose control and blood pressure lowering, etc [4,5]. Thus, it was an urgent task to identify molecular biomarkers and explore underlying mechanisms involved in the process of DN, which could contribute to nd new diagnostic markers and therapeutic approaches.
Genome-wide transcriptome analysis using expression arrays could be used to identify biomarkers for disease progression and to gain insights into the disease pathogenesis and molecular therapy [6].With the wide application of genome transcriptome analysis, a large amount of data was stored in a public database. Reanalyzing these data could provide signi cant clues for other researches [6,7]. In recent years, numerous differentially expressed genes (DEGs) had been found by microarray data analysis on DN; a few such genes were promising biomarkers in terms of DN diagnosis and prognosis [8].Moreover, most previous studies focused on DEG screening which was limited to single microarray data, rather than the interactions among genes. Weighted gene co-expression network analysis (WGCNA) could be used to identify correlations among genes. Also, WGCNA could identify highly correlated gene modules to con rm hubgenes [9]. In this study, we integrated ve microarray datasets from the Gene Expression Omnibus database (GEO) database which compared DN with normal samples for analysis to reduce false positive rate. In addition, we also compared and intersected the DEGs with the hub genes screened by WGCNA, so that the common genes were more reliable as biomarkers or therapeutic targets. MicroRNA(miRNA) was a kind of highly conservative single small non-coding RNA that could regulate the expression level and function of their target genes in the process of protein synthesis [10,11].In recent years, studies had found that the miRNA could participate in the development regulation, cell growth, differentiation and apoptosis, and oncogenesis [12].They interacted with the complementary strand of mRNAs and lead to the degradation of the corresponding mRNAs; they also interfered with protein production by suppressing protein synthesis [13].Generally, miRNAs were involved in a wide range of diseases, including neurological disease, heart disease, and cancer and so on [14].Therefore, miRNAs could achieve a novel targeted molecular therapy by regulating the expression of targeted hub genes. Search tool for the retrieval of interacting genes (STRING 11.0; https://string-db.org) was applied to establish a PPI network of DEGs. Interaction with a combined score >0.7 was set as the cut-off point (Figure 2a). Cytoscape software was used to visualize module analysis for PPI network of gene signatures and hub genes by MCODE, After MCODE detection, 30 genes were found to participate in 5 clusters (Figure 2 b-f). 16 genes were identi ed as hub genes, VEGFA, COL1A2, ALB, FOS, EGR1, EGF, COL3A1, CLU, CTGF, SST, EGR2, LUM, FOSB, COL6A3, PROM1, COL15A1 through integration in 12 algorithms of Cytohubba (TABLE 1). Ultimately, these 16 genes were intersected with 30 genes in the modules of MCODE to obtain 14 key genes (Figure 3b The top ten of key miRNAs that targeted the 14 key genes, were predicted: the miR29 family (miR29a/b/c), miR196a/b, miR199A, miR377, miR380-5P, miR191, miR186, miR1, miR206, miR-383 and miR-134. A total of eight groups were identi ed as FDR < 0.05 (TABLE 2). In particular, miR29a/b/c and miR196a/b, which were related to COL1A2/COL6A3/COL15A1, also had a high enrichmentratio.

Association between interested genes and clinical features of DN.
To verify potential roles of hub genes in DN, correlation analysis and subgroup analysis between hub genes and clinical features were performed using Nephroseq v5 online tool. First, the results showed that In the available data, only COL6A3 and COL15A1 were con rmed to be negatively correlated with GFR, respectively.

Discussion
DN was the main cause of renal failure worldwide, it was quite clear that glomerulopathy and tubulopathy played a central role in the progression of DN. Recently, diabetic nephropathy had been pay more attention to acquire earlier diagnosis and additional targeted invention [16]. Increasing evidence indicated that autophagy [17], in ammation [18], oxidative stress, advanced glycation end products (AGEs) and apoptosis [19], mitochondrial dysfunction [20], and abnormal activation of the renin angiotensin system [21] were closely involved in diabetic nephropathy. Of course, the occurrence of diabetic nephropathy was in uenced by genetic factors [22]. In this study, we successfully screened 91 DEGs between DN and normal populations from the GEO database. Bioinformatics tools such as GO and KEGG contributed to untangle the interaction between DEGs. The results showed that DEGs, especially hub genes, were mainly involved in cell differentiation, basement membrane, extracellular matrix components, collagen bril organization, and other processes. KEGG enrichment analysis of DEGs showed that many of these genes were related to ECM-receptor interaction, Rap1 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, protein digestion and absorption and PI3K-Akt signaling pathway. In terms of pathological changes, proliferation of the renal basement membrane, mesangial matrix hyperplasia and ber hyperplasia were showed with DN. Therefore, the increase of extracellular matrix (ECM) protein which included laminin, perlecan and collage [23] could be used as a marker of DN.
Meanwhile, abnormal deposition of large amounts of proteins in the basement membrane and extracellular matrix eventually lead to glomerulosclerosis. In addition, AGE/RAGE signaling elicited activation of multiple intracellular signal pathways (also contained several of the above pathways) promoted the expression of pro-in ammatory cytokines, thereby accelerating the progression of DN [24].
Fourteen differential expression genes, COL1A2, COL3A1, COL6A3, COL15A1, LUM, CLU, EGF, VEGFA, ALB, SST, FOS, FOSB, EGR1 and EGR2, were identi ed as key genes. Among these genes, the down-regulation of key genes such as Epidermal Growth Factor (EGF) was an active polypeptide used to promote the repair and regeneration of the damaged epidermis [25]. EGF mainly combined with EGFR to perform its functions, including activation of PI 3 K/Akt pathway and extracellular regulated protein kinases(ERK) pathway [26,27].Report showed that the reduced expression of EGF particularly in the proximal tubular cells was con rmed in DN [28].And blocking EGF and EGFR activity signi cantly reduced diabetes-related renal enlargement and glomerular hypertrophy [29].Vascular endothelial growth factor A (VEGFA), an important endogenous angiogenic factor, was synthesized and secreted by tubular cells. Previous studies identi ed that vascular endothelial growth factor (VEGF) A is thought to be associated with glomerular endothelial cell dysfunction and tubular cell damage in diabetic nephropathy. And the renal angiopenia, cortical atrophy and urinary protein were further aggravated with decreased expression of the VEGFA [30,31]. Albumin (ALB) encodes serum albumin, which regulates the plasma colloid osmotic pressure. Of course, albumin was a basic indicator of renal disease. Serum albumin also had a variety of physiological functions such as stabilizing colloid osmotic pressure, anti-in ammation and increasing blood volume. Somatostatin (SST) was a natural cyclic peptide inhibitor, and exogenous supplementation of SST attenuated the renal hypertrophy caused by diabetes and also decreased albuminuria [32]. The results of this study showed that the increase in SST expression led to a signi cant increase in GFR expression. Members of the Fos family, including c-Fos and FosB, dimerized with Jun family members to form the activator protein-1 (AP-1) transcription factor complex. In acute kidney injury, phosphorylation levels of c-FOS and FOSB in proximal renal tubules were increased, but C-FOS and FOSB protein were extremely unstable and rapidly degraded [33]. Meanwhile, c-FOS expression was increased in glomerular mesangial cells in high glucose environment [34]. However, the results showed that the decrease in FOS and FOSB expression lead to a signi cant decrease in GFR. Over-expression of miR29b could inhibit the activation of hepatic stellate cells by inhibiting the expression of C-FOS mRNA in liver brosis [35]. FOSB was an ap-1 transcription factor. In small intestinal neuroendocrine tumors, downregulated expression of miRNA1 upregulated the expression of miRNA1 targeted oncogene FOSB, leading to disease progression [36]. Early growth response proteins (EGRs), as a transcriptional regulatory family, were involved in the process of cell growth, differentiation, apoptosis, and even carcinogenesis [37,38]. EGR1 and EGR2 were members of the EGRs family genes [39]. EGR1 was proved as a transcriptional activator of oxidative stress and the levels of EGR1 mRNA and EGR1 protein in the kidneys of diabetic Kidney Disease mice were signi cantly increased [40]. Meanwhile, the thickness of glomerular basement membrane and renal brosis were alleviated after EGR1 knockdown in diabetic kidney disease mice [41]. EGR2 silencing reduced COL1A1 and COL3A1 mRNA levels, and diminished accumulation of myo broblasts and macrophages. Liver brosis was also alleviated because collagen accumulation was reduced by half [42]. The results showed that the decrease in EGR1 and EGR2 expression lead to a signi cant decrease in GFR and SCR.
Next, we will highlight these hub genes. Clusterin (CLU), an extracellular chaperone, was a ubiquitously expressed protein that could be identi ed in various body uids and tissues. Expression of CLU could lead to various processes including apoptosis, in ammation, lipid transportation, cell adhesion, morphologic transformation, membrane recycling, and suppression of complement system [43,44].
Increased expression of CLU was found in several renal disease models such as angiotensin-induced brosis and unilateral ureteral obstruction (UUO) [45]. Meanwhile, under-expression of CLU was shown to accelerate renal brosis induced by UUO, whereas its over-expression was shown to prevent UUO-induced renal brosis [46].However, the results of this study showed that the increase in CLU expression lead to a signi cant decrease in GFR and a signi cant increase in SCR in diabetic nephropathy patients. The enrichment analysis showed that CLU was mapped to complement and coagulation cascades. There was also considerable evidence that activated complement systems and coagulant events contributed to the progression of DN, such as complement C3, C1, and C7,but did not mention CLU [47][48][49]. Together with biglycan, bromodulin, and decorin, lumican (LUM) belonged to small leucine-rich extracellular matrix proteoglycans (SLRPs) [50]. Members of the proteoglycan family could form complexes with cytokines [51]. They were primarily considered to play a role as organizers of the extracellular matrix. Targeted disruption of the genes for decorin, bromodulin, and LUM all led to abnormal collagen bril morphology [52]. Their glomerular localization was con rmed, although they were abundant in the tubulointerstitium.Decorin and Biglycan are shown clustered in areas of tubulointerstitial brosis in some people with kidney disease, and decorin was considered to be the best predictor of progressive renal failure [53]. LUM was signi cantly upregulated in the tubulointerstitium and glomeruli at all stages of diabetic nephropathy [52]. In our work, the mRNA expression of LUM in DN patients was negatively correlated with GFR and proteinuria and positively correlated with SCR. Four key genes including collagen type III α1 chain (COL3A1), collagen type I α2 chain (COL1A2), collagen type VI α3 chain (COL6A3) and collagen type XV α1 chain (COL15A1) were up-regulated in diabetic nephropathy tissues. The mRNA expression of COL3A1 increased in the kidneys under hyperglycaemic/hyperinsulinaemic conditions [54].
Previous report showed Collagen constituted a microfbrillar network composed of several diferent types of collagen fbrils, including type I and type VI alpha 1-3 (encoded by the genes COL1 and COL6A1-3, respectively) [55].The level of collagen I (α2) was elevated in the renal cortex of diabetic mice. And the elevation of collagen I (α2) could contribute to tubulointerstitial brosis in diabetic nephropathy [56]. In our research, COL1A2 and COL3A1 were in targeted correlation with the miRNA196a/b. Little research had been reported on miR196, one study had shown that miR196 was associated with transcription factors and affects cancer development and progression [57]. Later, it might be possible to explore the effect of miR196 on diabetic nephropathy.The interaction among DN and hub genes COL6A3 and COL15A1 had not been reported yet. Collagen type VI α3chain (COL6A3) was a beaded lament-forming collagen involved in cell-binding [58]. In human adipose tissue, up-regulation of COL6A3 associated positively with obesity-related infammation, insulin resistance and metabolic dysregulation. COL6A3 might partly promote insulin resistance by restricting adipogenesis [59]. We had observed knockdown of COL6A3 suppressed macrophage chemoattractant protein MCP1, supporting pro-infammatory efects of COL6A3 [60]. Insulin resistance and in ammation were the risk factors for diabetic nephropathy. Moreover, the results of this study showed that the increase in COL6A3 expression lead to a decrease in GFR in diabetic nephropathy patients. Collagen type XVα1 chain (COL15A1) encoded cytoskeletal protein restin, which stabilized microvessels and cells as well as suppresses angiogenesis [61]. Abnormal angiogenesis was an evolving mechanism proposed to contribute to the early phase of DN [62]. Results indicated that the increase in COL15A1 expression lead to a signi cant decrease in GFR. And correlation analysis of key genes with microRNAs showed both COL6A3 and COL15A1 were in targeted correlation with the miRNA29a/b/c. The miR29 family consisted of miR29a, miR29b, miR29c, and the difference between these subtypes was based on their genomic locus [63]. Reported that the higher expression of miR29 family in T2DM patients and susceptible to T2DM [64]. MiR29 family played an imperative role in chronic conditions of kidney diseases. Various targets of miR29 family in signal transduction pathways, including Wnt/βand TGF-β1signaling pathways involved in brosis, were regulated by targeting pro brotic genes in DN [65]. Meanwhile, miR29 family was involved in maintenance of genes expression that encoded for ECM proteins such as collagen I, III and IV, not VI and XV [66]. And miR29 family also regulated in ammatory genes to slow the progression of DN. Finally, multiple studies had shown that miR29a/b could act as an anti-brosis factor, providing a treatment for glomerulosclerosis, while miR29c promoted in ammatory response by inhibiting the expression of TTP, leading to the further development of DN [65,67].
Of course, there were many shortcomings of this study. First, although 5 original microarray data sets of diabetic nephropathy were integrated and the total number of samples was su cient, diabetic nephropathy did not account for the majority of the samples. Second, the lack of detailed demographic information made it di cult to obtain a more convincing correlation between the selected genes and the severity of diabetic nephropathy using the same samples. Third, due to the lack of online data, the information of selected genes and related clinical indicators was insu cient, resulting in the unclear speci c role of some genes. Therefore, further clinical and basic studies are needed in the future to verify our results and clarify the biological role of these genes and related microRNA in diabetic nephropathy.

Conclusions
Through transcriptome data analysis, 91 DEGs, 14 key genes especially 5 hub genes and 11 micrornas(especially miR29 and miR196) were identi ed as biomarkers for the clinical diagnosis of DN and as potential targets for new therapies. However, the speci c molecular mechanism and biological functions of these genes still require further research.

Acquisition and processing of microarray data
Gene expression data related to diabetic nephropathy was downloaded from GEO online database (https://www.ncbi.nlm.nih.gov/geo/). By searching the following search terms: "diabetic nephropathy," "Homo sapiens," and "Expression pro ling by array," ve microarray data including GSE30122, GSE1009, GSE47185-GPL11670, GSE47185-GPL14663 and GSE104954-GPL22945 were screened out by searching the following search terms: "diabetic nephropathy," "Homo sapiens," and "Expression pro ling by array".
There were a total of 140 samples in these ve microarray data, including 61 diabetic nephropathy samples and 79 normal kidney samples, which from different batches renal tissue samples. The probe data in the matrix le was transformed into gene symbol data through affymetrix platform le, if a single gene corresponds to multiple probes, the average expression value is calculated. After the integration of ve matrix data through Perl, batch normalization was conducted through the SVA (surrogate variable analysis) package of R language (version 3.6.1).
5.2. Screening of differential genes and enrichment analysis of GO and KEGG.
DEGs were calculated and screened by Limma (Linear Models for Microarray Data) package of R language. Only when the FDR (False Discovery Rate) < 0.05 and |logFC|>1, can it be identi ed as the DEGs for subsequent analysis. Gene Ontology (GO), which consists of three independent ontology, namely biological processes (BP), molecular function (MF) and cellular component (CC), was also used for disease and functional analysis [6]. KEGG is an integrated database resource for the biological interpretation of genome sequences and other high-throughput data [15]. GO function and KEGG pathway enrichment analysis were performed by "ClusterPro ler" package of R. Cut-off values for GO and KEGG enrichment analysis: P = 0.05, q = 0.05.
To analyze the interactions between gene-encoded proteins, the DEGs were introduced into the searching tool of the interaction genes database (STRING11.0) to construct a protein-protein interaction (PPI) network, and then the results were imported into CytoHubba used to calculate the value of each DEG according to 12 topological algorithms in Cytoscape. Download the data and sorted it, selecting the top 16 hub nodes. The public hub nodes calculated by ≥ 5 topological algorithms were nally designated as key genes. Besides, sub-networks were screened and the node degrees were calculated using Molecular Complex Detection (MCODE) of Cytoscape software. The parameters were set as follows: MCODE score ≥ 3, degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k-score = 2, and then the high relevant modules were obtained. Meanwhile, all the samples of different batches were divided into DN tubular tissue and DN glomerular tissue, which were compared with healthy kidney tubular tissue and healthy kidney glomerular tissue respectively for analysis, and relevant key genes were obtained according to the above analysis method. The intersections between key genes associated with glomeruli and key genes associated with tubules were taken. 5.4. Screening of hub genes with weighted gene coexpression network analysis.
Weighted gene co-expression network analysis (WGCNA) was suitable for identifying gene modules with the certain phenotype [9]. The risk of DN was considered the result of multi-molecule interaction. WGCNA was used to identify the related gene modules (each module contained multiple molecules). A scale-free co-expression network was constructed for all genes in ve microarray data using the WGCNA package [9] in R version 4.0.3 (https://www. r-project.org/)and gene modules were identi ed, and the gene module with the highest correlation was found in the module-trait diagram. Based on gene signi cance and module membership, the "hclust" function in the WGCNA package was used to screen the hub genes of the targeted-module. Finally, DEGs, genes of highly relevant MCODE module, genes derived from 12 algorithms and genes of highest correlation modules in WGCNA were intersected to obtain the hub genes.

Correlation analysis of key genes and microRNAs.
WEB-based GEne SeT AnaLysis Toolkit (WebGestalt) was a functional enrichment analysis web tool, which supported multiple enrichment analysis algorithms, and covered a comprehensive database of functional annotations. In my research, WebGestalt was used to predict miRNAs that targeted the key genes associated with DN. The genes in the modules with high correlation analyzed by MCODE were imported into WebGestalt, the miRNAs targeted by WebGestalt were predicted. Differentially expressed microRNAs were selected according to the top ten P values of Signi cance Level.
5.6. Acquisition and processing of data related to gene and clinical characteristics.
Nephroseq was a free platform to the academic community for integrative data mining of genotype/phenotype data, which was supported by the University of Michigan O'Brien Renal Center. The primary goal of the Applied Systems Biology Core was to provide to the renal research community a platform for integrative data mining of comprehensive renal disease gene expression data sets. By inputting interested genes, Relevant data between relevant genes and GFR, SCR and proteinuria were obtained in Nephroseq (https://www.nephroseq.org), and then the data was imported into GraphPad Prism 6 to draw correlation analysis diagram.    Figure 6 Correlation between mRNA expression of up-regulated hub genes and SCR in DN patients, P value was a signi cant difference, r was the correlation coe cient.