Recognition of DEGs in CKD and the enrichment of these genes
The DEGs of GSE104954 were analyzed after preprocessing and removing batch effects. A total of 294 DEGs containing 180 upregulated genes and 114 down regulated genes were identified (Figure 1A, B) (Table. S1). The DEGs are shown in the volcano map (Figure 1A), and are also visualized on a heatmap (Figure 1B). To further determine the biological functions of the DEGs, DAVID was used to explore the DEGs’ biological significance including GO terms and KEGG pathways. Statistically significant enriched GO terms of DEGs (FDR < 0.05) were identified (Figure 1C and Table S2). The results showed that the DEGs were mainly enriched in immune response, type I interferon signaling pathway, and innate immune response (Figure 1C). Meanwhile in the KEGG analysis, the DEGs were mainly enriched in the Staphylococcus aureus infection, Rheumatoid arthritis, and Type I diabetes mellitus. Identification of significant terms and pathways, where the altered genes in tubulointerstitial section of CKD patients were enriched, may further contribute to our understanding of the role of DEGs in tubulointerstitial injury of CKD patients.
Integration of protein–protein interaction (PPI) network and clusters analysis
To explore the expressive relationships among DEGs, we inputed the DEGs to STRING. Then, PPI networks were visualized using the cytoscape software. As a result, a PPI network with 230 nodes and 1135 edges was constructed (Figure S1). Then, plug-in MCODE was used to screen the clusters inside the PPI network. 8 clusters were finally identified in the PPI network. The top 4 clusters are shown in Figure 1E-H, while the functional annotation of the genes involved in each module were analyzed, respectively. Enrichment analysis showed that the genes in module 1-4 were mainly associated with type I interferon signaling pathway, fibrinolysis, and extracellular matrix organization, specifically inflammatory response and immune response in cluster rank 4 (Table 1). Previous studies have reported these GO terms to be related to kidney or tubular epithelial cell damage [19, 20]. Therefore, genes included in the four clusters were more suspectable to participate in the triggering of injury mechanism of renal tubule-interstitium compared to other DEGs. Also, construction of DEGs regulatory networks further promotes the understanding of the molecular mechanisms underlying CKD progression.
Construction of weighted gene correlation network analysis (WGCNA)
Weighted gene correlation network analysis (WGCNA) has been widely applied in cancer-related studies [21, 22]. In the present study, we performed WGCNA to explore the possible mechanisms of tubulointerstitium damage using genetic expression profile data of GSE47185, which included a total of 107 tubulointerstitium samples from kidney biopsy of various CKD patients. RMA method and Human Entrez Gene custom CDF annotation version 19 were used for the normalization. Each of the hybridization was separately normalized and the batch-corrected data were processed using Combat. 17 samples were removed because the lack of patients’ clinical characteristics and the remaining samples were kept for further analysis. Consequently, 90 tubulointerstitial samples from CKD patients containing 12031 genes were used for the construction of WGCNA. Figure 2A shows the clusters of the 90 tubulointerstitial samples and clinical characteristics of the patients. No abnormal sample was found. The correlation network was constructed with WGCNA package in the R software. β = 6 was set in order to achieve a scale-free topology (R2 > 0.9) (Figure 2B). Then the pairwise correlation was converted into an adjacency matrix of connection strengths using soft-thresholding approach (connection strength = |correlation|β)[23] . A dissimilarity matrix based on topological overlap measure (TOM) was used to identify gene modules through a dynamic tree-cutting algorithm[22]. All modules were assigned to a different color (Figure 2C). Finally, these DGEs were categorized into 15 modules with sizes ranging from 36 to 3711 genes, and each module was assigned to a different color: turquoise, blue, brown, yellow, green, red, black, pink, magenta, purple, green-yellow, tan, salmon, cyan and midnight-blue, representing 3711, 2269, 1746, 1133, 214, 209, 129, 117, 107, 105, 86, 80, 75, 53 and 36 genes, respectively. Meanwhile, there were 1961 genes independent of any of 15 modules, which were classified as a module and assigned to grey color (Table 2, Figure 2C).
Weighted gene correlation network analysis (WGCNA) identifies critical modules correlating with estimated glomerular filtration rate (eGFR)
Using the WGCNA analysis, we identified 15 co-expressed modules in this study (Figure 2C). There were four very large modules (turquoise, blue, brown and yellow) indicating that the expression levels of a large number of genes were strongly correlated. The module eigengene (ME), which was calculated by the first principal component, was often used to represent each module. To explore the relationship between each module, the clustering analysis for ME was applied. We examined the association between ME and eGFR or CKD stage. Both indexes were crucial clinical measurements for CKD patients. The results showed that CKD stage cluster together with brown, and the distance between eGFR and blue module were very close (Figure 2D). Next, we examined the association between each of the modules and demographic and clinical parameter traits, including age, sex, eGFR, serum creatinine and disease status (Figure 2F). Interestingly, the turquoise module with the most genes exhibited a weak correlation with eGFR (r = 0.220, p = 0.038). The blue, brown and yellow modules were identified as strongly associated with eGFR (|r|>0.5, P<0.001) (Table 2). These results indicate that genes included in these three modules are more likely to participate in the progression of CKD and accelerate the reduction of renal function compared to other modules.
To further explore the biological functions and pathway of the genes in specific modules, enrichment analyses of GO and KEGG were performed. Whole enriched GO BP terms and KEGG pathway terms (FDR < 0.05) involved in each module were shown, respectively (Supplementary table S3 and supplementary table S4). Since DAVID does not allow lists of > 3000 symbols to be uploaded in order to maintain system stability, the top 3000 genes in turquoise with the highest intramodule connectivity were used for GO analysis. Top 8 GO BP terms and KEGG pathways for blue, brown and yellow module were shown, respectively (Figure 3B-G). The blue module, which contained 2269 genes where 55 of them were DEGs, exhibited significantly positive correlation with the eGFR. Bio-functions of this module were primarily involved in the oxidation-reduction process, fatty acid beta-oxidation, metabolic process and fatty acid beta-oxidation using acyl-CoA dehydrogenase, and so on (Figure 3B, Table S3). Pathway analysis exhibited a similar result according to which the genes in the module were mainly enriched in metabolic pathways, biosynthesis of antibiotics and carbon metabolism, and so forth. (Figure 3D, Table S4). The other two important modules, brown and yellow, had significantly negative correlation with eGFR. Genes in these two modules were mainly enriched in biological processes like cell division, mitotic nuclear division, cell proliferation, response to lipopolysaccharide, chemokine-mediated signaling pathway, adaptive immune response (Figure 3C, E, F). Interestingly, the KEGG pathways of yellow module also exhibited significant relevance to specific signaling pathways relevant to inflammation such as Chemokine signaling pathway or T cell receptor signaling pathway (Figure 3G, Table S4). Numerous previous studies have identified these GO terms or pathways as related to renal tubular mesenchymal damage.
Hub-based analysis
Since blue, brown and yellow modules exhibit strong association with eGFR, the gene significance (GS) and module membership (MM) were conducted. The GS and MM showed a very significant correlation (Figure 4A-C), indicating that genes involved in the blue, brown and yellow module are more likely to be highly correlated with eGFR. Highly connected hub nodes are central to the network’s architecture[22] and some studies suggested that genes with more centralized position in the network are more likely to be key drivers to proper cellular function than peripheral genes[24]. In this study, the top 5% nodes with the highest intramodular connectivity (Kwithin) were defined as hub genes of each module [23]. Complete list of hub genes for each module is shown in supplementary table S5. Weighted co-expression networks of genes in blue, brown and yellow module with a weight (w) > a threshold of 0.2, 0.15, 0.2 are shown in Figure 4E-G, respectively (Figure 4E-G). It can be seen that the relative expression of most hub genes (Figure 4E-F, diamond nodes) has consistent trend and can exhibit high connectivity with neighboring genes whose functions are consistent with the analysis results of GO and KEGG. Taking the blue module, which shows functional enrichment in metabolic process and fatty acid metabolism as an example some of the hub genes have also been reported to participate in similar process. Some of these are SLC13 Family (SLC13A3)[25] or MSRA, where the latter one has a protective role in the progression of UUO-induced kidney fibrosis via suppression of fibrotic responses caused by oxidative stress[26]. Meanwhile, though some famous genes were not defined as hub genes in this study, they have evident high connectivity with other hub-genes, such as FABP1 [27], also known as L-FABP. Numerous studies have demonstrated that FABP1 is a promising biomarker for several kidney diseases, that can also attenuate renal injury [28, 29].
In order to further identify which genes may be potentially important for the development of CKD and to refine the hub genes in blue, brown and yellow module, the genes in these modules were mapped to STRING with a combined score > 0.4. After visualization of PPI networks via cytoscape software, top 10% genes with the highest degree of each module PPI network or DEGs PPI network were examined and intersected with hub genes for each module respectively (Figure 4D, Figure S2-4). As a result, four genes appeared as follows: PLG (blue module) and ITGB2, CTSS, CCL5 (yellow module). These four genes were also defined in the core clusters (Figure 1F and Figure 1H). Considering the core position in all networks and statistically significant expression in the tubulointerstitial of CKD patients, it is reasonable to assume that these four genes could be candidate biomarkers in the diagnosis of CKD and could serve as gene targets in the treatment of renal tubular mesenchymal damage.
The relationship between expression of candidate genes and clinical parameters of CKD patients
To further validate the relationship between CKD and these four candidate genes in patients, clinical data of CKD patients from Nephroseq (www.nephroseq.org, Apr 2018, University of Michigan, Ann Arbor, MI) were obtained for further analysis. The tubulointerstitial expression levels of these four genes were significantly correlated with eGFR of CKD patients (Figure 5). Generally, the expression levels of tubulointerstitial or renal PLG had significant positive correlation with the eGFR of patients receiving kidney transplants (r = 0.691,p = 9.7e-10, Figure 5A), patients with nephritic syndrome (NS) ( r = 0.559,p = 2.96e-5, Figure 5C), IgAN (r = 0.798,p = 2.99e-6, Figure 5D) or with other CKD (r = 0.593,p = 5.09e-19, Figure 5B). The expression levels of tubulointerstitial or renal ITGB2, CTSS or CCL5 had significant negative correlation with the eGFR of patients receiving kidney transplants, IgAN, Diabetic Nephropathy (DN) or other CKD affirmed in a series of different datasets (Figure 5E-P). Expression trend of these candidate genes was in accordance with each ME involved in different modules. Besides, the glomerulus expression levels of PLG were down-regulated in patients with IgAN, vasculitis, FSGS, LN, HT, DN (Figure 6A), while ITGB2, CTSS, CCL5 were up-regulated in these diseases (Figure 6B-D). This expression trend was in line with that of tubulointerstitial section. Moreover, the expression level of tubulointerstitial PLG had significant negative correlation with the urinary protein content of renal transplant patients (r = -0.926, p = 0.008, Figure 6E), while the tubulointerstitial ITGB2 and CTSS or blood ITGB2 expression were positively related to the urinary protein content of IgAN or DN patients. The expression level of PLG was also significantly down-regulated in patients with kidney transplant and renal dysfunction compared with patients with no rejection (P =2.60E-02, FC = -1.692, Table S6). The integrated significant clinical parameters for PLG, ITGB2, CTSS, CCL5 are listed in Supplementary Table S6-9 respectively, with p < 0.05 and |FC| ≥ 1.5 (or r ≥ 0.5) set as cut-off criteria. All the reported results suggest that these four candidate genes do not merely serve as biomarkers for tubulointerstitial lesions, but also participate in the pathogenesis of kidney damage and thus are associated with progression of CKD.