Complement C3 as therapeutic target in diabetic nephropathy by bioinformatics analysis

Background: Various public platforms have contained extensive data for deeper bioinformatics analysis. The pathogenesis of diabetic nephropathy is always a hot topic and the underlying molecular events are not completely clear. Methods: Differential expression analysis and weighted correlation network analysis (WGCNA) were applied to screen meaningful gene in GSE30529. The combined result was used for gene ontology (GO) annotation and KEGG pathway enrichment analysis. STRING was used for protein-protein interaction (PPI) network construction and Cytoscape software was used for hub gene identification. Gene methylation profile GSE121820 and miRNA profile GSE51674 were also acquired to perform multi-omics analysis. Clinical features were obtained form Nephroseq and potential drugs were identified by CMap. Results: 345 genes were obtained form GSE30529 after differential expression analysis and WGCNA. GO analysis mainly included neutrophil activation, regulation of immune effector process, positive regulation of cytokine production and neutrophil mediated immunity. KEGG pathway analysis mostly included phagosome, complement and coagulation cascades, cell adhesion molecules and AGE-RAGE signaling pathway in diabetic complications. 16 down regulated miRNAs were obtained from GSE57674 to construct miRNA-mRNA network. Top 20 genes were discerned from PPI network and there were DNA methylation differences in 15 genes among them. Correlation analysis showed SYK, CXCL1, LYN, VWF, ANXA1, C3, HLA-E, RHOA, SERPING1, EGF and KNG1 may involved in DN. 10 small molecule compounds have been identified as potential therapeutic drugs. Conclusion: We screened 11 target genes and suggested C3 may serve as a therapeutic target in diabetic nephropathy. Student t compare two P-Value than considered statistically significant. results not


Background
It is estimated a total of 451 million people sufferd from diabetes by 2017 and the number is speculateed to 693 million by 2045 (1). As one of the most serious microvascular complication, diabetic nephropathy (DN) has been a major cause of end-stage renal disease (ESRD) in many countries. The congregation of advanced glycation end-products, oxidative stress and activation of 3 proteinkinase C are the major contributing causes to the pathogenesis of DN. The new viewpoint holds that tubular injury plays an important and even initial role (2). Current treatment strategies for DN are aimed at controlling blood glucose and blood pressure levels and inhibition of RAS system to reduce albuminuria and delay the progression of DN (3). However, the effect is not entirely satisfactory for high incidence of DN-related ESRD. Therefore, there is a critical need to identify new therapeutic targets and improve clinical management.
High-throughput sequencing technology offers a effective method to study disease-related genes and provides promising medication goals in many fields, especially tumors (4). So far, several studies screened gene or miRNA involved in DN (5)(6)(7)(8)(9). Integrating these data could overcome the heterogeneity of researches and provide more accurate information. This study identified target genes whicn may improve the understanding of molecualr mechanisms of DN and provide a resource to build new hypotheses for further follow-up studies. We suggested complement system may serve as a therapeutic target in DN.

Data download
Expression profile GSE30529 (5) and miRNA profile GSE51674 (9) were downloaded by GEOquery package (10) in R software version 3.6.2. DNA methylation profile GSE121820_T2DN-CTL (unpublished) were downloaded from GEO database (https://www.ncbi.nlm.nih.gov/geo/). GSE30529 conducted by GPL571 covers 10 DN tubule samples and 12 control samples. GSE51674 conducted by GPL10656 includes 6 DN tissue samples and 4 control samples. GSE121820 conducted by GPL5082 contains 10 Type 2 DN blood samples and 10 control samples. The probe ID was converted into HUGO Gene Nomenclature Committee symbol (gene symbol).

Data processing
Differential expression analysis is the most commonly used method for data analysis. After checking the homogeneity of sample data in GSE30529, differentially expressed genes (DEGs) between DN samples and control samples were filtered with the criteria of |log2 fold change (FC)| greater than 1 and adjusted P-Value less than 0.05 by limma package (11). Similarly, differentially expressed miRNA (DERs) in GSE51674 were screened with |log2 FC| greater than 3 and adjusted P-Value less than 0.01.

4
Weighted correlation network analysis (WGCNA) allows biologically meaningful module information mining based on pairwise correlations between genes in high-throughput data using WGCNA package (12). WGCNA workflow consists of gene co-expression network construction, module identification, module relationship analysis and key drives gene recognition. The gene co-expression network was constructed with filtering principle that soft threshold is to make the network more consistent with a scale free topology. The module was identificated with criterion of module size 30 to 10000, merge cut height equal to 0.25 and verbose equal to 3. Highly related genes were obtained with threshold greater than 0.1 in Topological overlap matrix (TOM).

Enrichment analysis and hub gene screening
Gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed with clusterProfiler package (17). The STRING database (18) (version 11.0, http://string-db.org/) is able to search interactions between candidate proteins based on laboratory data, other databases, text mining and predictive bioinformatics data. The proteinprotein interaction (PPI) network was constituted by STRING database. Cytoscpe software was used to visualization and network analysis. In addition, CytoHubba, built-in tool in Cytoscape, allows 12 methods to explore important nodes in biological networks, such as Degree method (Deg), Maximum Neighborhood Component (MNC), Density of Maximum Neighborhood Component (DMNC), Maximal Clique Centrality (MCC), Closeness, EcCentricity, Radiality, BottleNeck, Stress, Betweenness, Edge Percolated Component (EPC) and ClusteringCofficient (19).

Clinical data analysis and drugs analysis
Nephroseq v5 analysis engine (http://v5.nephroseq.org) gives access to gene expression signatures and clinical features. The Pearson correlation analysis was performed between genes and GFR, proteinuria, and SCR level (5,20). Unpaired Student t test was used to compare two groups. P-Value less than 0.05 was considered statistically significant. Insignificant results are not displayed.
Potential drugs were predicted with Connectivity Map (21), a online database that related disease, 5 genes, and drugs based on similar or opposite gene expression signatures.

Differential expression analysis of GSE30529
In order to determine whether data samples have batch effects or sample heterogeneity in GSE30529, gene expression levels of each sample were displayed in boxplot. And the results showed that they located approximately at the same level (Fig. 1A). In order to obtain the reliability of sample grouping, principal component analysis was performed in DN group and control group. The two main components respectively contributed 25.7% and 25.4% (Fig. 1B). Next, differential expression analysis was performed to acquire DEGs which may involved in DN with the criteria of |log2 FC| greater than 1 and adjusted P-Value less than 0.05 by limma package. GO annotation and KEGG pathway analysis of combined result DEG list and highly related gene list were combined to obtain more accurate targets (Fig. 3A). We obtained 345 genes uesd for GO annotation and KEGG pathway analysis to get bioinformatics annotation. Each gene is assigned at least three GO terms and each GO term is assigned at least five genes in chord chart. Top 12 GO terms were displayed and mainly included neutrophil activation, regulation of immune effector process, positive regulation of cytokine production and neutrophil mediated immunity (Fig. 3B). KEGG pathway analysis mostly included phagosome, complement and coagulation cascades, cell adhesion molecules (CAMs), ECM-receptor interaction and focal adhesion

PPI network and hub gene identification
To construct PPI network, combined gene list was export to STRING database with highest interacion confidence score 0.9. Cytoscape software was used to present this network containg 190 nodes and ANXA1, P2RY14 and FCER1G were identified in heatmap (Fig. 6B). C3 is particularly noteworthy.

Gene methylation differences
It is generally accepted that genetic and environmental factors are involved in the development of DN. DNA methylation could be the bridge (22). To learn whether the target gene has methylation differences, GSE121820 profiles were downloaded as validation cohort. As a result, 15 hub genes, ANXA1, C3, CXCL1, CXCL8, FCER1G, FN1, HLA-E, ITGAV, ITGB2, KNG1, LYN, P2RY13, P2RY14, RHOA and VWF indeed existed DNA methylation difference between DN group and control group with Pvalue less than 0.05 (Fig. 6C).

Clinical data validation and drug prediction
To explore potential role of hub genes in DN, Pearson correlation analysis was performed between hub genes and clincal data. Gene expression of SYK, CXCL1, LYN, VWF, ANXA1, C3, HLA-E, RHOA and SERPING1 negatively related with GFR, suggesting pathogenic role of up regulated genes in DN (Fig. 8). Conversely, gene expression of EGF and KNG1 positively related with GFR, suggesting protective role of down-regulated genes in DN (Fig. 7A-8F). SYK, CXCL1, LYN, VWF, ANXA1, C3, HLA-E, RHOA, SERPING1, EGF and KNG1 were defined as target genes.
Given that the effective of existing treatment strategies is not entirely satisfactory, 23 up regulated DEGs (logFC greater than 2.5) and 13 down regulated DEGs (logFC less than 1.5) were used to search potential drugs in Connectivity Map. 10 small molecule compounds were identified as potential therapeutic drugs (Table 1). Table 1 Changes caused by small molecular compounds in cell lines. As one of microvascular complication of diabetes, DN is the main cause of ESRD. Existing treatments are not sufficient to control disease progression. New treatment strategies were needed. Highthroughput omics data have been widely used to study mechanisms of disease and predict possible therapeutic targets. Current research aboout DN published on the public platform are mostly generated from a single-cohort study. Therefore, an integrated analysis of the data is needed. We reported to mediate high glucose induced TGF-β1 increasement and IL-1β secretion (23,24). In two animal experiments, C-X-C motif chemokine ligand 1 (CXCL1) stimulated in DN pathogenic environment may served as proinflmmatory mediator (25,26). In addition, VWF was reported to involve in intrarenal thrombosis leading to deterioration of renal function (27). Purvis et al. observed higher circulating plasma levels of ANXA1 in T1D and T2D patients, whereas exogenous supplement of ANXA1 improves insulin resistance and keeps off the progression of subsequent microvascular complications in mice (28,29). Previous studies have demonstrated that statins prevent from DN by reducing Ras homolog family member A (RhoA) protein activation (30)(31)(32)(33). Another study reported that activation of RhoA/ROCK may regulate NF-κB signaling pathway (34). In addition, sinomenine, kaempferol, catalpol and rutin have been shown a protective effects through RhoA/ROCK signaling pathway (35)(36)(37)(38). EGF was considered as urine biomakers in two researches (39,40). Recently, a newest report about cytosine methylation differences in kidney tubule samples supported this viewpoint (41). Besides, one large-scale linkage study revealed polymorphisms in kininogen 1 (KNG1) associated with DN in European populations (42).
Results is consistent with knowledge that complement system participates in DN. The development of diabetes intimately linked to low-grade inflammation (43). High levels of inflammatory markers such as c-reactive protein and adiponectin proved this viewpoint (44,45). Inflammation might promote the occurrence and development of diabetic complications such as DN. But it is still poorly understood about underlying mechanisms of initiation of low-grade inflammation. More and more research evidence proved innate immune system are closely involved in diabetes (46). Simultaneously, roles for pattern recognition receptors (PRRs) have been discussed in related with DN (47, 48).
Complement system is not only involved in innate immune defence but also considered as an important proinflammaton factor. Several studies pointed out that the complement system is involved in the pathogenesis of DN and might be a therapeutic target (49)(50)(51)