1. Single-cell data processing and cell annotation
Apply the filter conditions of nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 10 to obtain 29462 cells. Here, nFeature_RNA represents the number of genes measured per cell, nCount is the sum of the expression of all genes measured for each cell, and percent.mt is the proportion of mitochondrial genes (Fig. 1A). The PCA plot after the Harmony debatching effect demonstrated a good correction effect (Fig. 1B). Umap dimensionality reduction clustering yielded a total of 24 subpopulations, which were cell-annotated by article gene markers published in the original data (Fig. 1C). This diabetic nephropathy single-cell data comprises 13 cells, including proximal tubule (PT), parietal epithelial cells (PEC), thick ascending limb (TAL), distal tubule (DCT), connecting tubule (CNT), collecting duct (PC), Type A Intercalated cells (ICA), Type B intercalated cells (ICB), Podocyte cells (PODO (MES)), fibroblasts (FIB), endothelial cells (ENDO), and leukocytes (LEUK). Different types of cells correspond to different colors in the umap picture (Fig. 1D). Refer to Fig. 1E for a picture of a single-cell Umap of the kidney in the diabetic nephropathy group and the blank control group. Figure 2A shows the proportion of 13 kidney cells in diabetic nephropathy and the blank control group, where TAL is significantly increased in diabetic nephropathy. The histogram in Fig. 2B shows that PT cells accounted for the largest proportion of 13 cells in the blank control group, followed by TAL cells. The histogram in Fig. 2C shows the number of 13 cells in the diabetic nephropathy group, of which PT cells account for the largest proportion, followed by TAL cells. Based on the three pictures in Fig. 2, it is evident that diabetic nephropathy has significantly more TAL cells than the blank control group.
2. Single-cell co-expression network analysis of TAL in diabetic nephropathy
Visualize the parameter sweep results using PlotSoftPowers and select the most suitable softpower as 5 (Fig. 3A). The co-expression network of TAL cells in diabetic nephropathy is visualized in Fig. 3B, which comprises a total of 9 modules. Based on the eigengene-based connectivity (KME) score for each module, highly connected genes within each module are derived (Fig. 3C). Figure 3D shows the correlation between the genes, where the genes in the turquoise module are strongly positively correlated with the genes in the brown and yellow modules, the genes in the brown and pink modules are strongly positively correlated, and the genes in the Yellow and blue modules are strongly negatively correlated. Dot plots in Fig. 3E and F suggest that genes in turquoise, brown, and pink are most associated with subclass 1.2.12. Please refer to Supplementary Material 1 for genes from hdWGCNA turquoise, brown, and pink.
3.Diabetic nephropathy TAL time-bound sequence analysis and cell communication
We analyzed the development trajectory of TAL cells in diabetic nephropathy using a pseudo-time series analysis. The trajectory starts from node 2 and is divided into two branches, developing backward from cluster 1 and cluster 2, respectively, and gradually developing into subgroups 3, 4, and 5 after subgroup 6 (Fig. 4A and B). Figure 4C shows the heat maps of representative genes in different cluster plots, where genes in clusters 1, 2, and 7 are early cell expression genes, and genes in clusters 3 and 4 belong to genes expressed by late cells. Early expressed genes include LINC01320, TENT2, SOX6, VWA8, UBE2E3, FBXL4, PPARGC1A, HIBADH, NEDD4L, OSBPL3, KLF12, TRPS1, DENND1B, CADPS2, ARID1B, MITF, NCOA2, EXOC4, MEF2F, VAV3, GRAMD2B, ARHGAP24, SGIP1, PPM1L, ESRRG, IGF1R, TMEM161B-AS1, PANTR1, and NTRK2. The potential TAL biomarkers we screened for, ESRRG and IGF1R, are expressed at an early stage. The genes expressed in late stages are TTC14, PAX2, LAMB1, TNS3, SELENBP1, DENND2A, GOLM1, KIFC3, CCDC7, PRODH, PRMD16, MCF2L, EPN1, EHBP1, GRB14, HPN, and C20Orl194. We also investigated the cell communication networks between TAL cells and other cell types in the kidney. Our results suggest that TAL cells in diabetic nephropathy send BMP signals to many cells, which are mediated and regulated by PEC and PODO cells, ultimately leading to altered BMP pathway in PC cells and contributing to the development of the disease (Fig. 4D, E).
4. Immune infiltration analysis of TAL cell-related genes in diabetic nephropathy
The TAL cells of diabetic nephropathy were subjected to non-negative matrix factorization (NMF), wherein the point preceding the point with the fastest decline was selected, resulting in the acquisition of three distinct cell types (Fig. 5A). The heat map in Fig. 5B illustrates the variations in NMF of the three TAL cell types in diabetic nephropathy. The heat map in Fig. 5C reveals that the first NMF type is predominantly associated with immunity, particularly Neutrophils, NK cells, Cytotoxic lymphocytes, and Myeloid dendritic cells. Figure 5D box plot indicates that there are differences in immune infiltration among the three NMF types, with the first type of cells being most closely linked to immune infiltration. When comparing the three NMF cell types, T cells, Cytotoxic lymphocytes, NK cells, Myeloid dendritic cells, and Fibroblasts exhibited statistical differences, with the first group of NMF cells exhibiting high expression levels in these five immune cell types (Fig. 5E). Furthermore, the first subset of TALNMF cells in diabetic nephropathy exhibited high expression levels of three genes, namely COBL, PPARGC1A, and THSD7A (Fig. 5F).
5. Machine learning screens for TAL biological markers of diabetic nephropathy
The transcriptome datasets GSE30529 and GSE104954 were merged, and batch effects were eliminated to obtain an expression matrix (Supplementary Material 2). GSE30529 served as the training set, comprising a total of 22 samples, including 10 diabetic nephropathy groups and 12 blank control groups. On the other hand, GSE104954 was utilized as the verification set, consisting of 25 samples, including 7 diabetic nephropathy groups and 18 blank control groups. Single-gene logistic regression was performed on the training set, followed by lasso regression, which yielded 7 genes (Supplementary picture 2A, B). These seven genes, namely ESRRG, IGF1R, PTGER3, LAMB1, CYFIP2, EPN1, and MAST4, were primarily expressed in the NMF type II cell population of TAL cells in diabetic nephropathy (Fig. 5F). Further verification was conducted using mlr3 machine learning, with the training set adopting fold = 5, repeats = 10, and 50 internal cross-validations, while the external training set was utilized for verification. The AUC and ROC graphs demonstrated that the SVM machine learning algorithm and lda exhibited high sensitivity and specificity. In this study, SVM was selected for external dataset testing (Supplementary picture 2C, D). The area under the AUC curve tested on the external dataset was 0.905 (Fig. 7E). The correlation between potential biomarkers of TAL cells in diabetic nephropathy and immune cells and immune infiltration revealed that the IGF1R gene was most closely associated with NK cell infiltration (Supplementary picture 2F). Figures 6A-G showed the expression of seven screened biomarkers, which were statistically different in the blank control group and the diabetic nephropathy group, and the expression was reduced in the diabetic nephropathy group.
6. Convolutional neural network deep learning validation of biomarkers immune infiltration
GSE30529 is the training set, and GSE104954 is the verification set. The process of deep learning training gradually decreases and the accuracy gradually increases (Supplementary picture 3A). After deep learning on the training set GSE30529, when the threshold is 0.487, the sensitivity and specificity are both 1 (Supplementary picture 3B). After deep learning on the validation set GSE104954, when the threshold is 0.300, the sensitivity is 0.833 and the specificity is 1 (Supplementary picture 3C).
7. Molecular docking
Molecular docking of 7 genes with Empagliflozin, the results are as follows. Empagliflozin binds to the active cavity of CYFIP, establishes hydrophobic interactions with amino acids near the active site, and forms Pi-Pi stacking with TYR700 near the active cavity (Fig. 7A,B, Supplementary picture 4A). The hydrophobic amino acids present in the active cavity include MET37, PRO32, TYR700, LEU642, LEU636. These interactions stabilized the binding of Empagliflozin to CYFIP with a score of -9.273. A higher score indicates that Empagliflozin is stably combined with CYFIP. Empagliflozin binds to the active cavity of ESRRG, establishes hydrophobic interactions with amino acids near the active site, forms Pi-Pi stacking with PHE435 near the active cavity, and forms H bond with GLU275 near the active cavity (Fig. 7C,D, Supplementary picture 4B). The hydrophobic amino acids present in the active cavity include TRP305, MET306, LEU309, ILE310, VAL313, LEU276, ALA272, LEU271, CYS269, LEU268, LEU265. These interactions stabilized the binding of Empagliflozin to ESRRG with a score of -8.253. A higher score indicates that Empagliflozin is stably combined with ESRRG. Empagliflozin binds to the active cavity of PTGER3 and establishes hydrophobic interactions with amino acids near the active site (Fig. 7E,F, Supplementary picture 4C). The hydrophobic amino acids present in the active cavity include LEU329, VAL332, ALA335, ILE340, VAL110, LEU59, MET58, PRO55. These interactions stabilized the binding of Empagliflozin to PTGER3 with a score of -8.253. A higher score indicates that Empagliflozin binds stably to PTGER3. Empagliflozin binds to the active cavity of IGF1, establishes hydrophobic interactions with amino acids near the active site, and forms H bonds with ASP150 and ASN1137 near the active cavity (Fig. 7G,H, Supplementary picture 4D). The hydrophobic amino acids present in the active cavity include ALA1028, LEU1078, MET1079, MET1139, LEU1002. These interactions stabilized the binding of Empagliflozin to IGF1 with a score of -7.073. A higher score indicates that Empagliflozin binds stably to IGF1.
Empagliflozin binds to the active cavity of MAST4, establishes hydrophobic interactions with amino acids near the active site, and forms H bonds with GLU73, ASN114, ASN156, ASP169, GLN37 and ARG69 near the active cavity (Fig. 8A,B, Supplementary picture 4E). The hydrophobic amino acids present in the active cavity include LEU57, ALA53, ALA113, MET111, MET108, ILE32, VAL40. These interactions stabilized the binding of Empagliflozin to MAST4 with a score of -7.577. A higher score indicates that Empagliflozin is stably bound to MAST4. Empagliflozin binds to the active cavity of EPN1, forms Pi-cation with ARG7 and ARG25 near the active cavity, and forms H bond with ARG8 near the active cavity (Fig. 8C,D, Supplementary picture 4F). These interactions stabilized the stable binding of Empagliflozin to EPN1, with a score of -4.642. Empagliflozin binds to the active cavity of LAMB1, forms Pi-cation with ARG152 near the active cavity, and forms H bond with ASP155, SER154 and ASN226 near the active cavity (Fig. 8E,F, Supplementary picture 4G). The hydrophobic amino acids present in the active cavity include ALA159, TRP160, VAL162, TYR163. These interactions stabilized the binding of Empagliflozin to LAMB with a score of -4.571.