Potential Biomarkers of Diabetic Kidney Disease Based on Weighted Gene Co-expression Network Analysis

Background: Diabetic kidney disease (DKD) is one of the serious complications of diabetes. However, there is no unied and clear understanding of the pathogenesis of DKD at present. The evidence suggests that genetic predispositions play a signicant role in an individual’s susceptibility to DKD. WGCNA analysis may provide a new way to search for genes closely related to DKD. Methods: In this study we analysed key genes and pathways involved in DKD from RNA-seq count results using 23 peripheral blood samples from patients with DKD. Weighted gene co-expression network analysis (WGCNA) was performed with the WGCNA package. Results: We found turquoise and purple module were characterized as having the highest correlation with large proteinuria stage in DKD. Furthermore, the key genes, including DCN, F2R, LTBP2, B2M, PSMB8, and NLRC5, were evaluated through a protein-protein interaction (PPI) network combined with a co-expression network, which were expected to be potential diagnostic and therapeutic targets of DKD. Conclusions: These signalling pathways and hub genes will provide effective targets and ideas for further study of the pathogenesis of DKD and are expected to have roles in the prevention of, early screening for, and the treatment of DKD. and fusion of podocytes 4 . The pathogenesis of DKD is complex and multifactorial. The evidence suggests that genetic predispositions play a signicant role in an individual’s susceptibility to hyperglycaemia-induced kidney disease. Studies by Khodaeian M showed that single nucleotide polymorphisms of a variety of genes are involved in the occurrence and development of DKD 5 . Rizvi S indicated that the risk of DKD was increased by several times in people who inherited high-risk alleles in the susceptibility sites of ACE, IL, TNF-α, COL4A1, GLUT and other genes . Many other genes have also been linked to the occurrence and development of DKD 7–8 . However, these candidate genes are still not sucient to explain the susceptibility to DKD.


Background
Diabetic kidney disease (DKD) is one of the most common serious complications of diabetes. The prevalence of DKD in diabetic patients is approximately 20-60% 1 . Those cases that do not receive a timely intervention in the early stage gradually develop into chronic renal insu ciency or even end-stage renal disease (ESRD). At present, DKD has become the leading cause of ESRD in developed countries. It seriously reduces the quality of life of patients and threatens their lives 2 .
DKD is the result of kidney damage caused by chronic exposure to a high blood glucose level. Its main clinical manifestations are a blood glucose metabolism disorder, haemodynamic changes, and albuminuria 3 . The main pathological features of DKD are glomerular hypertrophy, mesangial hyperplasia, thickening of the basement membrane, loss of podocytes and fusion of podocytes 4 . The pathogenesis of DKD is complex and multifactorial. The evidence suggests that genetic predispositions play a signi cant role in an individual's susceptibility to hyperglycaemia-induced kidney disease. Studies by Khodaeian M showed that single nucleotide polymorphisms of a variety of genes are involved in the occurrence and development of DKD 5 . Rizvi S indicated that the risk of DKD was increased by several times in people who inherited high-risk alleles in the susceptibility sites of ACE, IL, TNF-α, COL4A1, GLUT and other genes 6 . Many other genes have also been linked to the occurrence and development of DKD 7-8 . However, these candidate genes are still not su cient to explain the susceptibility to DKD.
In recent years, the rapid development of genomics, transcriptomics, high-throughput technology has provided new ideas and directions for gene analysis, which not only contributes to a better understanding of the pathogenesis of DKD but also contributes to nding potential biomarkers for DKD diagnosis and therapy. However, most of the current research focused on screening for differentially expressed genes(DEGs) and ignored any possible correlations among the genes. Genes do not lead an independent existence. To obtain a better understanding of the relationships among genes, more candidate genes related to DKD and gene regulatory network models should be identi ed.
Weighted gene co-expression network analysis (WGCNA) is one of the most representative methods of developing gene co-expression networks. It can simultaneously analyse multiple complex sample gene expression patterns. Compared with the analysis of differentially expressed genes, WGCNA does not need to carry out multiple hypothesis testing of the relationship between the phenotype and the gene, but instead it directly mines the genes with similar expression patterns and clusters them into gene modules. Genes in the same module may be tightly functionally related or members of the same signalling pathway. Then, a signi cant correlation analysis between the gene modules and clinical traits is carried out to determine the relationship between the modules of interest and the related traits.
Currently, WGCNA has been used in the genetic analysis of multiple species, including humans, mice and plants, and has obtained meaningful results [9][10][11][12] . However, most of the current research on WGCNA uses genome-wide expression dataset downloaded from the database, and few studies have been performed using this method to identify the signi cant modules and highly connected key genes in DKD. In this study, RNA sequencing was performed on the collected clinical samples, and the RNA-seq results of the samples were analysed by weighted gene co-expression network analysis, aiming to nd the founctional modules and key genes that were signi cantly correlated with DKD, with the ultimate goal of providing more ideas for the early prevention, diagnosis and treatment of DKD.

Study Objectives
This study recruited 23 patients with diabetic kidney disease admitted to the A liated Union Hospital of Tongji Medical College of Huazhong University of Science and Technology between Jan 2018 and Jan 2019.
The inclusion criteria were as follows: the diagnostic criteria and classi cation for DM, given by the World Health Organization (WHO) in 1999, was adopted to de ne a diabetic patient. DKD patients were de ned by the standard 24 h urine albumin excretion rate (UAER) and glomerular ltration rate (GFR). All patients were divided into Group A (normal proteinuria stage, UAER < 30 mg/24 h), Group B (microproteinuria stage, 30 mg/24 h ≤ UAER < 300 mg/24 h) and Group C (large proteinuria stage, UAER ≥ 300 mg/24 h) according to the UAER.
The exclusion criteria were as follows: we excluded patients with anaemia, neoplasms, severe cardiovascular, cerebrovascular and liver diseases, chronic glomerulonephritis, known kidney diseases other than DKD, infection, autoimmune diseases, and acute diabetic complications such as ketoacidosis. Moreover, patients with poorly controlled hypertension, fever, vigorous physical activity, urinary tract infections, pregnant women, and those on their menstrual period were excluded to avoid non-speci c albuminuria.
Written informed consent was obtained from all patients analysed in this study. The study protocol was reviewed and approved by the Medical Ethics Committee of Tongji Medical College of Huazhong University of Science and Technology (Ethics Approval Number: S1039).

Data Collection
Demographic and clinical measurements, including age, sex, height, body weight, body mass index (BMI), diabetic duration, and blood pressure, were collected via interview and con rmed by checking the patients' records. Medication history was carefully recorded. All blood samples were drawn from the patients after 12 h overnight fasting. Routine tests included fasting blood glucose(FBG), serum creatinine, serum uric acid(SUA), and lipid pro les [total cholesterol(TC), triglyceride(TG), high-density lipoprotein(HDL), low-density lipoprotein(LDL)] were conducted using a AU5800 automatic biochemical analyser. Haemoglobin A1c (HbA1c) was measured using the HLC-723G8 HbA1c analyser. To determine the level of 24 h UAER, we collected urine (24 h urine collection for two consecutive days), and the mean value was adopted. All specimens were tested in the Department of Clinical Laboratory at Huazhong University of Science and Technology Tongji Medical College A liated Hospital.

RNA Extraction and Analysis
We collect 3-5 ml of cubital venous blood using a DKDA/RNA Shield blood collection tube containing 6 ml of Shield protective solution (ZYMO RESEARCH). Then, the samples were transported to Beijing Huajie Gene Medical Technology Co. Ltd. for further analysis. Genomic RNA was extracted from blood leukocytes using the DKDA/RNA Blood tube kit. TruSeq RNA Library Prep (48 Samples) and TruSeq RNA CD Index Plate (96 Indexes, 96 Samples) were used to construct a transcriptomic library. RNA sequencing was performed using the Illumina Nextseq500 sequencing platform and the Illumina/MiniSeq Mid Output Kit (300 cycles).

Weighted Gene Correlation Network Analysis (WGCNA)
Co-expression networks were built according to the protocols of WGCNA in an R environment. Brie y, we used a measure of similarity between the gene expression pro les that was based on a matrix of pairwise Pearson's correlation coe cients. This similarity is a measure of the level of concordance between gene expression pro les across the samples. Then, the similarity matrix was transformed into an adjacency matrix using a power adjacency function, which encodes the connection strengths between pairs of nodes. The power β = 7 was chosen based on the scale-free topology criterion, resulting in a scale-free topology index (R2) of 0.95 for the cohort. To detect the modules, we used average linkage hierarchical clustering with a dissimilarity measure (the Topological Overlap Measure-TOM). This measure represents the overlap observed between shared neighbours, which is a robust measure of network interconnectedness. Modules were detected as branches of the dendrogram, which were cut using the Dynamic Tree-Cut algorithm. To represent and summarize each module, the module eigengene was calculated. This measure is de ned as the rst principal component of a given module. It can be considered to be representative of gene expression pro les and to capture the maximal amount of variation in the module. To quantify module trait associations, given that we had a summary pro le (eigengene) for each module, we correlated the eigengene with external traits and looked for the most signi cant associations. This calculation was referred to as the Module-Trait Relationships (MTRs). For intramodular analysis, we evaluate the Gene Signi cance (GS) and Module Membership (MM), the latter of which is also known as eigengene-based connectivity (kME). GS is the absolute value of the correlation between a speci c gene and a trait. The MM is the correlation between the module eigengene and the gene expression pro le. Using the GS and MM, we can identify genes that have a high signi cance for a clinical trait and important module membership.

Functional enrichment analysis of co-expression modules
Functional enrichment analysis was performed on the genes in the modules of interest. The corresponding genes' information was mapped to the DAVID (Database for Annotation, Visualization, and Integrated Discovery) dataset. GO (Gene Ontology) analysis results and KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis results were extracted out. P value ≤ 0.05 after correction was used as the threshold.

Module visualization and hub genes
The modules of interest were visualized by Cytoscape software, and the maximum intramodular connectivity of a gene was informally referred to as an intramodular hub gene.

Construction of co-expression modules of DKD
The expression values of 5000 genes in 23 cases of DKD were used to construct the co-expression module with the WGCNA package tool. The ashClust tools package was used to perform the cluster analysis on these samples. One sample was not in the cluster and did not pass the cuts. One of the most critical parameters was the power value, which mainly affected the independence and the average connectivity degree of co-expression modules. First, the power value was screened out (Fig. 1). When the power value was equal to seven, the independence degree was up to 0.95, and the average connectivity degree was higher. Therefore, the power value was used to construct co-expression modules, and 12 distinct gene co-expression modules were identi ed in DKD. These co-expression modules were constructed and are shown in different colours (Fig. 2). These modules ranged from large to small by the number of genes they included. The number of genes in the 12 modules are shown in Table 1.
Interaction analysis of co-expression modules with common expression patterns that were associated with particular clinic traits was conducted based on the correlation between the module eigengene and the clinic traits, including gender,, height, weight, HbA1c, FBG, Group A, Group B, and Group C (Fig. 4). ME 3.3 Functional enrichment analysis of co-expression modules.
The "clusterPro ler" package, "ggplot2" package and "Hmisc" package in R language were used to annotate the GO function and analyse the KEGG pathways of the turquoise module genes (Fig. 6).
The GO analysis results for the turquoise module genes showed that most of these genes had MF such as extracellular matrix structural constituent, glycosaminoglycan binding, integrin binding, collagen binding, heparin binding, sulfur compound binding, Wnt − protein binding. These genes were mainly concentrated in proteinaceous extracellular matrix, extracellular matrix component, basement membrane, complex of collagen trimers, focal adhesion, as revealed by the GO cellular component analysis. GO biological process analysis showed these genes were mainly involved in extracellular matrix organization, extracellular structure organization, cell − substrate adhesion, collagen metabolic process, glycosaminoglycan catabolic process, mucopolysaccharide metabolic process. KEGG pathway analysis indicated that multiple pathways were involved. According to P value, the top ten were included focal adhesion, ECM-receptor interaction, protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications, arrhythmogenic right ventricular cardiomyopathy (ARVC), amoebiasis, hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), PI3K-Akt signaling pathway, Small cell lung cancer.
The GO analysis results for the purple module genes showed that these genes were mainly concentrated in MHC class I protein complex, phagocytic vesicle membrane, early endosome membrane, recycling endosome membrane, COPII-coated ER to Golgi transport vesicle, as revealed by the GO cellular component analysis. GO biological process analysis showed these genes were mainly involved in type I interferon signaling pathway, cellular response to type I interferon, response to type I interferon. KEGG pathway analysis indicated that multiple pathways were involved. According to P value ,the top ten were included antigen processing and presentation, viral myocarditis, allograft rejection, graft-versus-host disease, type I diabetes mellitus, autoimmune thyroid disease, cellular senescence, human cytomegalovirus infection, human papillomavirus infection, Epstein-Barr virus infection.

Module visualization and hub genes.
The turquoise and purple modules were constructed using Cytoscape software (shown in Fig. 7), and we calculated the intramodular connectivity. The hub genes in the two modules are in bold with yellow in

Discussion
There is an appreciable inter-individual variation in the rate of progression of kidney disease in diabetic patients. A simple biological model may not be su cient to observe both the clinical and theoretical modalities to understand the mechanisms of gene-environment interactions that result in diabetic kidney disease. In this study, WGCNA was used to identify the gene module of co-expressed genes in patients with diabetic kidney diseases and to explore the key genes in the regulatory gene network. The results of this could be used to identify the population at risk of developing kidney disease among diabetic patients in order to implement early therapeutic strategies to prevent untoward sequelae, which has important clinical implications.
In this study, we rst clustered the top 5000 genes of the RNA-seq count data of DKD samples based on gene expression patterns. Then, we divided these genes into 12 modules based on co-expression patterns. Finally, we combined these modules with eight clinical features, included sex, height, weight, fasting glucose, glycosylated haemoglobin, normoalbuminuria stage (Group_A), microalbuminuria stage (Group_B) and macroalbuminuria stage (Group_C).
We found that the turquoise and purple modules had signi cant correlations with the occurrence and development of the macroalbuminuria stage of DKD, with the turquoise module having a negative correlation and the purple module having a positive one. These ndings suggest that the genes whose expression levels change in these modules have a signi cant impact on the occurrence and development of DKD. Further analysis of the genes in these modules revealed that all of these modules had a high degree of internal connectivity. GO function enrichment analysis and KEGG pathway analysis were carried out on the genes in these modules. The results of KEGG showed that the genes in these modules participated in the development of diabetes complications, including the AGEs-RAGE signalling pathway, the PI3K-Akt signalling pathway, the ECM-receptor interaction pathway, etc. Finally, PPI analysis was carried out on these module genes, and the top three genes were determined by their degree algorithm using the CytoHubba plug-in in Cytoscape. The hub genes of the turquoise module were DCN, F2R and LTBP2; and the hub genes of the purple module were B2M, PSMB8, and NLRC5.
These ndings are consistent with another study. Our previous research showed that differentially downregulated genes were mainly distributed in the cytoplasmic membrane and extracellular matrix; had the functions of protein binding, integrin binding and other molecular functions; and participated in extracellular matrix tissues and other biological processes 13 . The analysis of the KEGG pathways suggested that the differentially downregulated genes were mainly involved in the ecm-receptor interaction signalling pathway, the PI3K-Akt signalling pathway, and the signalling pathways of dilated cardiomyopathy, hypertrophic cardiomyopathy and arrhythmia-induced right ventricular cardiomyopathy. As mentioned above, the change of gene expression in the turquoise module had a signi cant negative in uence on the occurrence and development of the large albuminuria stage of DKD, which is consistent with the results of downregulating differentially expressed genes in our previous study. It also con rmed that the gene modules divided by the WGCNA method have biological and clinical signi cance in the study of DKD.
For the AGEs-RAGE signalling pathway, the advanced glycosylation end products (AGEs) and the receptor of AGEs (RAGE) play important roles in the occurrence and development of DKD 14 . AGEs are irreversible products generated by a series of complex processes such as dehydration, oxidation and chemical reordering of amino, aldehyde or ketone groups of macromolecular substances. The rate of generation of AGEs is signi cantly accelerated in a state of hyperglycaemia. Studies have shown that inhibiting the glycation process can effectively delay the occurrence and development of DKD 15 .
RAGE, as an immunoglobulin superfamily receptor, is a receptor for AGEs that has been studied thoroughly. It is poorly expressed in healthy tissues but is signi cantly upregulated in tumours in response to in ammation and elevated blood sugar 16  Coagulation factor 2 receptor (F2R), also well-known as a protease-activated receptor 1 (PAR1), is the rst known thrombin receptor and plays a critical role in transmitting thrombin-mediated activation of intracellular signalling in many types of cells. Recent studies have suggested that F2R may be critically involved not only in mediating bacteria-induced detrimental coagulation but also in innate immune and in ammatory responses 24 . Other studies have suggested that F2R might be used as an adipogenic marker to provide a potential target for understanding metabolic syndromes such as obesity, type-2 diabetes, and atherosclerosis 25 .
Β2M mRNA expression in cells in the urinary sediment is higher in T1D patients with DKD, maybe re ecting a tubulointerstitial injury promoted by albumin. Researchers have suggested that this protein contributes to diabetic (and possibly, to non-diabetic) tubulopathy 26 . Another study showed that higher serum B2M was an independent risk factor for subclinical atherosclerosis and diabetic nephropathy in patients with T2D without renal impairment 27 .
Many studies have shown that proteasome subunit beta type 8 gene (PSMB8) participates in the PI3K-Akt signalling pathway, and is involved in various diseases such as glioma, LAML, DKD, etc. 28-29 .
NOD-like receptor family caspase recruitment domain family domain containing 5 (NLRC5) has important roles in in ammation and innate immunity. NLRC5 is highly expressed in the kidney in streptozotocininduced diabetic mice, db/db mice and patients with diabetes. NLRC5 promotes in ammation and brosis during DN progression, partly through its effects on the NF-κB and TGF-β/Smad pathways.
NLRC5 may be a promising therapeutic target for DN treatment 30 . There are not many studies on RAB42 and NTM in the pathogenesis of DKD.

Conclusion
In summary, the gene modules signi cantly correlated with macroalbuminuria of DKD were divided by WGCNA. The multiple signalling pathways involved in these modules were all involved in the development of DKD. Then, we selected the top ten hub genes, and these screened genes were involved in the process of macroalbuminuria and brosis of DKD. These signalling pathways and hub genes will provide effective targets and ideas for further study of the pathogenesis of DKD and are expected to have roles in the prevention of, early screening for, and the treatment of DKD.

Declarations
Ethics approval and consent to participate The study protocol was reviewed and approved by the Medical Ethics Committee of Tongji Medical College of Huazhong University of Science and Technology (Ethics Approval Number: S1039).

Consent for publication
Not applicable.

Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its Additional les.

Competing interests
The authors con rm that there are no con icts of interest.

Funding
There is no funding for this research.  Clustering dendrograms of genes, with dissimilarity based on topological overlap, together with assigned module colors. As a result, 12 co-expression modules were constructed and was shown in different color.
These modules were ranged from large to small by the number of genes they included. The number of genes in the 12 modules were listed in Table 1.