Predicting diagnostic gene biomarkers in patients with diabetic kidney disease based on weighted gene co-expression network analysis and machine-learning algorithms

DOI: https://doi.org/10.21203/rs.3.rs-1696152/v1

Abstract

Objective: The present study was designed to identify potential diagnostic markers for diabetic kidney disease (DKD).

Methods: One publicly available gene expression profile (GSE142153 dataset) from human DKD and control samples were downloaded from the GEO database. Differentially expressed genes (DEGs) were screened between 23 DKD and 10 control samples. Weighted gene co-expression network analysis (WGCNA) was used to find the modules related to DKD. The overlapping genes of DEGs and Turquoise modules were narrowed down using the LASSO regression model and support vector machine recursive feature elimination (SVM-RFE) analysis to identify candidate biomarkers. The area under the receiver operating characteristic curve (AUC) value was obtained and used to evaluate discriminatory ability.

Results: A total of 110 DEGs were obtained: 64 genes were significantly upregulated and 46 genes were significantly downregulated. WGCNA found that the turquoise module had the strongest correlation with DKD (R= -0.58, P=4×10-4). Thirty-eight overlapping genes of DEGs and turquoise modules were extracted. The identified DEGs were mainly involved in p53 signaling pathway, HIF-1 signaling pathway, JAK−STAT signaling pathway and FoxO signaling pathway between and the control. CXCL3 and LINC00282 were identified as diagnostic markers of DKD with an AUC of 0.97 (95% CI 0.9–1.0) in CXCL3, AUC of 1 (95% CI1–1) in LINC00282.

Conclusion: CXCL3 and LINC00282 were identified as diagnostic biomarkers of DKD and can provide new insights for future studies on the occurrence and the molecular mechanisms of DKD.

Introduction

Diabetes mellitus (DM) is a group of metabolic diseases characterized by hyperglycemia resulting from defects in insulin secretion, insulin action, or both. The prevalence of diabetes is increasing worldwide. The International Diabetes Federation estimates that there are 425 million (18–99 years) people with diabetes worldwide in 2017, which will reach 693 million people by 2045.[1] Diabetic kidney disease (DKD) is a key micro-vascular complication of diabetes that induces a progressive decline in renal function, over five stages, leading to kidney failure.[2] At present, DKD is the leading cause of end-stage renal disease worldwide.[3] Among people with diabetes, the development of DKD carries a higher mortality risk.[4]

DKD is classically identified by the presence of proteinuria in people with diabetes. However, increasing evidence has shown that a significant number of patients with diabetes may have decreased glomerular filtration rate (GFR) without significant albuminuria, known as non-albuminuric DKD.[5] Although both albuminuria and GFR are well-established diagnostic biomarkers of DKD. However, both albuminuria and GFR loss are non-specific markers of DKD, as they are altered in most chronic glomerulopathies.[6–7] Meanwhile, a number of patients with DKD do not follow the classic pattern of DKD.[8–9] Given the limitations of current markers, there is a need to identify novel diagnostic biomarkers for DKD.

Recent years, microarray technology and integrated bioinformatics analysis have been performed to identify novel genes related to DKD.[10–11] For example, the upregulation of the FcER1 gene in DKD patients was found.[12] At present, there are some successful cases of using bioinformatics to screen molecular markers[13–14], but the research mainly uses traditional of bioinformatics algorithms, which may lead to excessive data interference and poor reliability of the results. Therefore, on the basis of dividing gene modules by clustering principle, select the target module to conduct regression analysis to analyze the correlation between genes and features. We applied bioinformatics analysis using the system biology method combined with machine learning algorithms to investigate candidate diagnostic marks for DKD to improve the accuracy of screening molecules.

Weighted gene co-expression network analysis (WGCNA) is a non-traditional data analysis method. The traditional method of analyzing data is to find differentially expressed genes and process each gene separately. WGCNA method classifies genes into several modules according to the similarity of gene expression changes.[15] Compared with classical differential expression gene analysis methods, the WGCNA method can greatly reduce the problem of multiple hypothesis testing, reduce the dimension of high-dimensional data and integrate multiple data, such as combining gene expression data with clinical indicators for analysis.

The least absolute shrinkage and selection operator (LASSO) algorithm is a regression method, which can be used to clarify the specific correlation degree of two related variables. Compared with traditional Cox regression and logistic regression, the lasso algorithm can reduce the dimension. Based on WGCNA, the lasso algorithm can improve the accuracy of screening target feature related genes. The support vector machine (SVM) is a kind of general learning method with small samples. SVM-RFE is an algorithm that combines SVM with recursive feature screening (RFE). SVM-RFE belongs to the backward search algorithm, which reduces the dimension of space by selecting and eliminating unnecessary features.

In order to find biomarkers for the diagnosis of DKD, we downloaded one microarray dataset of DN from the GEO database. Differentially expressed genes (DEGs) analysis and WGCNA were performed between the DKD and controls. Machine-learning algorithms were used to filter and identify diagnostic biomarkers of DKD.

Materials And Methods

Microarray Data

The series of matrix files of the GSE142153 dataset was obtained from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE142153, which was based on the GPL6480 platform of Agilent-014850 Whole Human Genome Microarray 4x44K G4112F. The GSE142153 dataset included 23 DKD and 10 controls collected from circulating endothelial cells. The probes were changed into gene symbols based on their probe annotation files.

Data Processing and DEG Screening

The limma package of R (http://www.bioconductor.org/) was used for background correction, normalization between arrays, and differential expression analysis between 23 DKD and 10 control samples. Samples with an adjusted false discovery rate P < 0.05 and |log fold change (FC)| > 1 were considered as the threshold points for DEGs.

Establishment of gene co-expression network

The gene co-expression network was established by using the WGCNA package of R language. The genes with similar expression changes were divided into the same module, and then the modules related to DKD were selected for further analysis.

Functional Enrichment Analysis

Overlapping genes between DEGs and correlation modules were extracted. GO enrichment analyses were performed on the overlap genes using the “clusterProfiler” in R.[16] KEGG was used to identify the most significant functional terms between the overlapping genes. A gene set was regarded as significantly enriched if a P < 0.05.

Candidate Diagnostic Biomarker Screening

Two machine-learning algorithms were used. The LASSO regression algorithm was carried out using the “glmnet” package in R to identify the genes significantly associated with the discrimination of DKD and normal samples. SVM-RFE algorithm was employed to select the optimal genes from the meta-data cohort.[17] The overlapping genes between the two algorithms were included.

Diagnostic Value of Feature Biomarkers in DKD

To test the predictive value of the identified biomarkers, we generated a ROC curve using the gene data from 23 DKD and 10 control samples. The area under the ROC curve (AUC) value was utilized to determine the diagnostic effectiveness in discriminating DKD from control samples.

Statistical Analysis

The data were analyzed by R (version 4.1.3) for windows. LASSO regression analysis was carried out using the “glmnet” package, and the SVM algorithm was performed using the “e1071” package in R. ROC curve analysis was used to determine the diagnostic efficacy of the diagnostic biomarkers included. A value of P < 0.05 (two-sided test) was considered significant.

Results

Identification of DEGs in DKD

Data from a total of 23 DKD and 10 control samples from GSE142153 were analyzed in this study. The DEGs of the data were analyzed using the limma package after removing the batch effects. A total of 110 DEGs were obtained: 64 genes were significantly upregulated and 46 genes were significantly downregulated (Figure.1).

Gene co-expression network analysis

The gene co-expression network was analyzed using the WGCNA package after filtering the genes with low expression and little change in expression, a total of 15 modules were obtained. The correlation between the intrinsic gene of the module and DKD was calculated to find the relevant gene module. It was found that the turquoise module had the strongest correlation with DKD (R= -0.58, P = 4×10− 4) (Figure.2).

Go analysis and KEGG pathway enrichment analysis

38 overlapping genes of DEGs and turquoise modules were extracted(CXCL3, DEFA3, MSH2, LINC00282, SLC4A10, OSR2, PFKFB3, TNFAIP8L2, VSIG1, HAL, FAM46C, LINC01410, NRCAM, CD200R1, IL10, PAQR8, MAFB, SIRT4, GFOD1, ZNF781, NUDCD1, LRRC32, ZNF404, SGK223, GCSAM, ATF3, LOC257396, THBS1, ATP6V0E2-AS1, CDKN1A, HAB1, MAFF, IGIP, HAS1, FOXL2, LRRIQ3, IGKV1-5, and HCAR3). GO analyses were conducted to investigate the function of 38 overlapping genes. The results of go analysis showed that there were 421 biological process (BP) items, 5 cell composition (CC) items and 29 molecular function (MF) items. The first five items with P < 0.05 were visualized respectively. BP analysis includes important pathways such as negative regulation of immune system process, response to steroid hormone and leukocyte migration; CC analysis is mainly manifested in the external side of the plasma membrane, basolateral plasma membrane and mismatch repair complex; The expression of MF is mainly DNA -binding transcription activator activity, RNA Polymerase II- specific, DNA- binding transcription activator activity, etc (Fig. 3A). The results showed that 38 differential genes were mainly involved in immune regulation, plasma membrane and cell membrane. The KEGG results demonstrated that the enriched pathways mainly involved p53 signaling pathway, HIF-1 signaling pathway, JAK − STAT signaling pathway and FoxO signaling pathway (Fig. 3B).

Identification and Validation of Diagnostic Feature Biomarkers

Two different algorithms were used to screen potential biomarkers. The 38 overlapping genes of DEGs and Turquoise modules were narrowed down using the LASSO regression algorithm, resulting in the identification of 4 variables as diagnostic biomarkers for DKD (Fig. 4A). A subset of 2 features among the 38 overlapping genes was determined using the SVM-RFE algorithm (Fig. 4B). The 2 overlapping features (CXCL3 and LINC00282) between these two algorithms were ultimately selected. Therefore, the two identified genes were used to establish the diagnostic model using a logistic regression algorithm in the metadata cohort.

Diagnostic Effectiveness of Feature Biomarkers in DKD

The diagnostic ability of the two biomarkers in discriminating DKD from the control samples demonstrated a favorable diagnostic value, with an AUC of 0.97 (95% CI 0.9–1.0) in CXCL3, AUC of 1 (95% CI1–1) in LINC00282. Therefore, the feature biomarkers had a high diagnostic ability (Fig. 5).

Discussion

DKD is one of the most common diabetic complications, as well as the leading cause of chronic kidney disease and end-stage renal disease around the world. Because of the lack of an effective early diagnosis, patients with DKD often lose the chance to benefit from treatment, resulting in poor outcomes. [18] At present, urinary albumin-to-creatinine ratio and eGFR are well-established diagnostic biomarkers of DKD.[19] However, diagnosing DKD also faces challenges associated with both albumin-to-creatinine ratio and eGFR loss are non-specific markers of DKD, and a number of patients with DKD who do not follow the classic pattern of DKD. [8–9] Therefore, researchers are increasingly searching for novel diagnostic biomarkers of DKD.

Recently, mRNAs and microRNAs have emerged as promising biomarkers in DKD [13–14, 20–21]. For example, let-7b-5p and miR-21-5p could serve as biomarkers to predict the risk of ESKD in T1DM, where the elevated expression of the let-7b-5p and miR-21-5p are independent risk factor for ESKD. In particular, let-7c-5p and miR-29a-3p were independently associated with more than a 50% reduction in the risk of rapid progression to ESKD in T1DM.[20] Another study of patients with T1DM without albuminuria revealed that 18 microRNAs were associated with the development of albuminuria and nine of them were used to define a gene signature for microalbuminuria[21]. However, the research mostly uses traditional bioinformatics algorithm, which may lead to excessive data interference and poor reliability of the results. To improve the accuracy of screening molecules, we applied bioinformatics analysis using the system biology method combined with machine learning algorithms to investigate candidate diagnostic marks for DKD.

As far as we know, this is the first retrospective study to identify diagnostic biomarkers in patients with DKD by GEO datasets with WCGNA and machine learning algorithm. We collected one cohort from the GEO datasets and conducted an integrated analysis of the data. A total of 110 DEGs were identified, including 64 upregulated genes and 46 downregulated genes. The turquoise module had the strongest correlation with DKD with gene co-expression network analysis. 38 overlapping genes of DEGs and turquoise modules were found. The results of enrichment analyses indicated that diseases enriched by the overlapping genes were mainly associated with immune regulation, plasma membrane and cell membrane. These findings are in general agreement with the previous finding that an inflammatory response involving leukocytes participates in the pathogenesis of DKD. [22]

The KEEG results demonstrated that the enriched pathways are generally involved in p53 signaling pathway, HIF-1 signaling pathway, JAK − STAT signaling pathway and FoxO signaling pathway. Ma Z et al found that a positive correlation between p53 signaling pathway and renal fibrosis in patients with diabetes.[23] At the same time, they found that p53 microRNA-214/ULK1 axis signaling pathway participates in the occurrence of DKD by inhibiting renal tubular autophagy. Guo W et al. found SIRT1/P53/NRF2 pathway modulates the pathogenesis of DKD. SRT2104, which is a novel, first-in-class, highly selective small-molecule activator of SIRT1 can enhance renal SIRT1 expression and activity, deacetylated P53, and activated NRF2 antioxidant signaling, providing remarkable protection against the DM-induced renal oxidative stress, inflammation, fibrosis, glomerular remodeling and albuminuria in the diabetic mice models.[24–25] Serum HIF-1α may be involved in the DKD process through inflammation, angiogenesis, and endothelial injury.[26] However, the signal pathway is unknown. At present, we found that in mesangial cells, elevated glucose levels induce HIF activity by a hypoxia-independent mechanism. Elevated HIF activity in glomerular cells promotes glomerulosclerosis and albuminuria, and inhibition of HIF protects glomerular integrity. However, tubular HIF activity is suppressed and HIF activation protects mitochondrial function and prevents the development of diabetes-induced tissue hypoxia, tubulointerstitial fibrosis and proteinuria. [27]Therefore, We need further research. The JAK-STAT pathway transmits signals from extracellular ligands, including many cytokines and chemokines as well as growth factors and hormones, directly to the nucleus to induce a variety of cellular responses. [28] Gene and protein expression studies of kidney biopsies from people with early- and late-stage DKD have shown increased activation and expression of the JAK-STAT signaling pathway across the spectrum of DKD.[29] Inhibitors of JAK/STAT pathways are promising therapeutic options to improve the renal outcome of patients with DKD, but appropriate clinical trials are necessary. [30]

Based on two machine-learning algorithms, two diagnostic markers were identified. C-X-C motif chemokine ligand 3(CXCL3) is a member of the CXC subfamily of chemokines produced by inflammatory cells. It mainly recruits and activates a variety of cells expressing CXC chemokine receptor (CXCR) 1 and 2, and participates in the regulation of cell migration, invasion and angiogenesis.[31] At present, the research on CXCL3 mainly focuses on tumor immunity.[32] Blocking the CXCL3 signal transduction pathway can inhibit the pathophysiological processes such as cell migration, invasion, angiogenesis, tumorigenesis and fibrosis, which may become a potential prevention and treatment target for a variety of diseases. We need further research to understand the role of CXCL3 in DKD.

LINC00282 is also known as transmembrane protein 272(TMEM272), which has been predicted to be an integral component of the membrane (https://www.ncbi.nlm.nih.gov/gene/283521). At present, the role of LINC00282 in the process of DKD is not clear. The LINC00282 may become an entry point for future research of on DKD.

The limitations of this study should be acknowledged. First, the study was retrospective; thus, important clinical information was not available. Second, the relatively small number of cases in GSE142153 should be considered a limitation. In addition, the biomarker profiles in the blood cell were obtained from the datasets, and their reproducibility should be further validated. Prospective studies with larger sample sizes should be conducted to validate our conclusions.

Conclusion

In summary, CXCL3 and LINC00282 were identified as diagnostic biomarkers of DKD and can provide new insights for future studies on the occurrence and the molecular mechanisms of DKD.

Declarations

DATA AVAILABILITY

The dataset analyzed during the current study are publicly available. All data generated or analysed during this study are included in supplementary files.

AUTHOR CONTRIBUTIONS

Conceptualization and design: Yanan Wang and Qian Gao.

Data curation:Yanan Wang.

Writing: Qian Gao.

Revised: Wenfang XU and Huawei Jin

All authors read and approved the final manuscript.

ACKNOWLEDGMENTS

The authors acknowledge the Gene Expression Omnibus (GEO) database for providing data of DKD available.

References

  1. Cho NH, Shaw JE, Karuranga S, et al. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res Clin Pract. 2018 Apr;138:271-281.
  2. Navaneethan SD, Zoungas S, Caramori ML, et al. Diabetes Management in Chronic Kidney Disease: Synopsis of the 2020 KDIGO Clinical Practice Guideline. Annals of internal medicine, 2021, 174(3):385-394.
  3. Collins AJ, Foley RN, Chavers B, et al. 'United States Renal Data System 2011 Annual Data Report: Atlas of chronic kidney disease & end-stage renal disease in the United States. American journal of kidney diseases : the official journal of the National Kidney Foundation, 2012, 59(1 Suppl 1):A7, e1-420.
  4. Tonelli M, Muntner P, Lloyd A, et al. Risk of coronary events in people with chronic kidney disease compared with those with diabetes: a population-level cohort study. Lancet (London, England), 2012, 380(9844):807-814.
  5. Laranjinha I, Matias P, Mateus S, et al. Diabetic kidney disease: Is there a non-albuminuric phenotype in type 2 diabetic patients?. Nefrologia : publicacion oficial de la Sociedad Espanola Nefrologia, 2016, 36(5):503-509.
  6. Camargo EG, Soares AA, Detanico AB, et al. The Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation is less accurate in patients with Type 2 diabetes when compared with healthy individuals. Diabetic medicine : a journal of the British Diabetic Association, 2011, 28(1):90-95.
  7. KDOQI Clinical Practice Guidelines and Clinical Practice Recommendations for Diabetes and Chronic Kidney Disease. American journal of kidney diseases : the official journal of the National Kidney Foundation, 2007, 49(2 Suppl 2):S12-154.
  8. Chen Y, Lee K, Ni Z, et al. Diabetic Kidney Disease: Challenges, Advances, and Opportunities. Kidney diseases (Basel, Switzerland), 2020, 6(4):215-225.
  9. Kramer HJ, Nguyen QD, Curhan G, et al. Renal insufficiency in the absence of albuminuria and retinopathy among adults with type 2 diabetes mellitus. Jama, 2003, 289(24):3273-3277.
  10. Regmi A, Liu G, Zhong X, et al. Evaluation of Serum microRNAs in Patients with Diabetic Kidney Disease: A Nested Case-Controlled Study and Bioinformatics Analysis. Medical science monitor : international medical journal of experimental and clinical research, 2019, 25:1699-1708.
  11. Assmann TS, Recamonde-Mendoza M, Costa AR, et al. Circulating miRNAs in diabetic kidney disease: case-control study and in silico analyses. Acta diabetologica, 2019, 56(1):55-65
  12. Sur S, Nguyen M, Boada P, et al. FcER1: A Novel Molecule Implicated in the Progression of Human Diabetic Kidney Disease. Frontiers in immunology, 2021, 12:769972.
  13. Barutta F, Bruno G, Matullo G, et al. MicroRNA-126 and micro-/macrovascular complications of type 1 diabetes in the EURODIAB Prospective Complications Study. Acta diabetologica, 2017, 54(2):133-139.
  14. Eissa S, Matboli M, Aboushahba R, et al. Urinary exosomal microRNA panel unravels novel biomarkers for diagnosis of type 2 diabetic kidney disease. Journal of diabetes and its complications, 2016, 30(8):1585-1592.
  15. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 2005, 4:Article17.
  16. Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics : a journal of integrative biology, 2012, 16(5):284-287.
  17. Zhang HH, Ahn J, Lin X, et al. Gene selection using support vector machines with non-convex penalty. Bioinformatics (Oxford, England), 2006, 22(1):88-95.
  18. Lin CH, Chang YC, Chuang LM. Early detection of diabetic kidney disease: Present limitations and future perspectives. World journal of diabetes, 2016, 7(14):290-301.
  19. Stevens PE, Levin A. Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Annals of internal medicine, 2013, 158(11):825-830.
  20. KP, et al. Circulating TGF-β1-Regulated miRNAs and the Risk of Rapid Progression to ESRD in Type 1 Diabetes. Diabetes, 2015, 64(9):3285-3293.
  21. Argyropoulos C, Wang K, Bernardo J, et al. Urinary MicroRNA Profiling Predicts the Development of Microalbuminuria in Patients with Type 1 Diabetes. Journal of clinical medicine, 2015, 4(7):1498-1517.
  22. Zheng Z, Zheng F. Immune cells and inflammation in diabetic nephropathy[J]. J Diabetes Res,2016,2016:1841690.
  23. Ma Z, Li L, Livingston MJ, et al. p53/microRNA-214/ULK1 axis impairs renal tubular autophagy in diabetic kidney disease The Journal of clinical investigation, 2020, 130(9):5011-5026.
  24. Guo W, Tian D, Jia Y, et al. MDM2 controls NRF2 antioxidant activity in prevention of diabetic kidney disease. Biochimica et biophysica acta. Molecular cell research, 2018, 1865(8):1034-104.
  25. Ma F, Wu J, Jiang Z, et al. P53/NRF2 mediates SIRT1's protective effect on diabetic nephropathy. Biochimica et biophysica acta. Molecular cell research, 2019, 1866(8):1272-1281.
  26. Shao Y, Lv C, Yuan Q, et al. Levels of Serum 25(OH)VD3, HIF-1α, VEGF, vWf, and IGF-1 and Their Correlation in Type 2 Diabetes Patients with Different Urine Albumin Creatinine Ratio. Journal of diabetes research, 2016, 2016:1925424.
  27. Persson P, Palm F. Hypoxia-inducible factor activation in diabetic kidney disease. Current opinion in nephrology and hypertension, 2017, 26(5):345-350.
  28. O'Shea JJ, Plenge R. JAK and STAT signaling molecules in immunoregulation and immune-mediated disease. Immunity, 2012, 36(4):542-550.
  29. Choudhury GG, Ghosh-Choudhury N, Abboud HE. Association and direct activation of signal transducer and activator of transcription1alpha by platelet-derived growth factor receptor. The Journal of clinical investigation, 1998, 101(12):2751-2760.
  30. Tuttle KR, Brosius FC, 3rd, Adler SG, et al. JAK1/JAK2 inhibition by baricitinib in diabetic kidney disease: results from a Phase 2 randomized controlled clinical trial. Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association, 2018, 33(11):1950-1959.
  31. Russo RC, Garcia CC, Teixeira MM, et al. The CXCL8/IL-8 chemokine family and its receptors in inflammatory diseases. Expert review of clinical immunology, 2014, 10(5):593-619.
  32. Reyes N, Figueroa S, Tiwari R, et al. CXCL3 Signaling in the Tumor Microenvironment. Advances in experimental medicine and biology, 2021, 1302:15-24.