Identication Candidate Diagnostic Biomarkers Between Minimal Change Disease and Focal Segmental Glomerulosclerosis by Bioinformatics Analysis

Background: Minimal change disease (MCD) and focal segmental glomerulosclerosis (FSGS) are common causes of nephrotic syndrome which have similar clinical as well as histologic magnication and hard to differentiate. This study aimed to identify novel biomarkers to distinguish FSGS and MCD through bioinformatics analysis and elucidate the possible molecular mechanism. Methods: Based on the microarray datasets GSE104948 and GSE108113 downloaded from the Gene Expression Omnibus database, the differentially expressed genes (DEGs) between FSGS vs healthy control , MCD vs healthy control were identied, and further dened by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Hub genes were checked by protein-protein interaction networks. Results: A total of and 358 and 368 genes were identied in FSGS and MCD compared with healthy controls, among them, there were156 overlapping DEGs. GO analysis showed the DEGs in these two diseases were simultaneously enriched in mRNA splicing, RNA polymerase II transcription, mRNA export, insulin stimulus, integrin-mediated signaling pathway, viral process and phagocytosis. Module analysis showed that genes in the top 1 signicant module of the PPI network were mainly associated with Spliceosome among FSGS and MCD. The top 10 hub genes analysis discovered that Most of hub genes were same between two disease, while among these genes, CD2 cytoplasmic tail binding protein 2 (CD2BP2), U6 snRNA-associated Sm-like protein (LSM8) and Small nuclear ribonucleoprotein polypeptides B (SNRPB) only differential expression in FSGS and Splicing factor 3A, subunit 3 (SF3A3) only differential expression in MCD, which may be used for differential diagnosis of these two diseases in the future. Conclusions: We identied key genes and mainly pathway associated with FSGS and MCD. Our results provide a set of potential genes used for differential diagnosis of these two diseases.


Background
Minimal change disease (MCD) and focal segmental glomerulosclerosis (FSGS) are common causes of nephrotic syndrome in adults and children, characterized by proteinuria, hypoalbuminemia, hyperlipidemia and edema, and all de ned by lesions of the podocyte [1][2]. Maas et al think that idiopathic FSGS should be considered to be an advanced stage of MCD,while most of the people still described MCD and idiopathic FSGS as separate entities [3] FSGS and MCD have many clinical as well as histologic similarities at presentation, making separation into these two categories di cult, but compared with MCD, FSGS is associated with a higher likelihood of steroid resistance and progression to renal failure which make it more important to identi ed a diagnostic marker to differentiate between these two diseases.
Pathological examination is the gold standard for the diagnosis of FSGS and MCD, while histological diagnosis has limitations because it does not re ect the underlying molecular mechanisms and due to the focal nature of FSGS, it is complicated to identify this lesion if no affected glomeruli are sampled in the biopsy, and a misdiagnosis of these patients as MCD may occur.
Previously, a number of biomarkers have been used for separate FSGS from MCD. Many research focus on using circulating permeability factors as a biomarker, such as soluble urokinase-type plasminogen activator receptor (suPAR) and Angptl4 [4][5]. Other research groups have proposed different urinary biomarkers to differentiate between these glomerular diseases, such as CD80 and transforming growth factor β(TGF-β) [6][7]. Gene expression pro les also used for identi ed the diagnoses of disease. Hodgin et al [8] reported that genes that participated in cell motility, migration, differentiation and morphogenesis were up-regulated in FSGS patients, while podocyte speci c genes were signi cantly down-regulated in FSGS group compared with normal and MCD groups. Bennett et al [9] reported that genes implicated in kidney brosis, the TGF-β signaling pathway, transcription factors that drive chondrogenesis and brosis, were up-regulated in FSGS patients. the Sox9, osteopontin, and TGF-signaling component genes including thrombospondin-2 represent excellent candidate targets for future FSGS therapeutic strategies.
Schwab [10] studied the transcriptome characters of childhood-onset FSGS, which might be different from adult FSGS patients.
While most of the study. However, due to the lack of large-scale studies, the limitation of animal models, the crucial genes involved in the diagnosis and distinguishment of FSGS and MCD have remained elusive. As the development of bioinformatics study which has been widely used in various elds to excavate potential information and reveal underlying mechanics and is used in various diseases.
Recently bioinformatics analysis has also gradually provided insight into the molecular mechanisms of kidney diseases, such as membrane nephropathy, lupus, IgA nephropathy and Diabetic nephropathy [11][12][13][14]. While right now, only few bioinformatics analyses have been performed on FSGS and MCD. Tong [15] once research analysis the differential expressed genes between FSGS and MCD in adult while only in a single center and collected kidney biopsies of 6 FSGS patients and 5 MCD patients, the comparatively smaller numbers of patients, making themselves less reliable. Right now the critical genes and the interaction between these two disease have not been fully investigated.
In the present study, in order to gain a deeper insight into the potential correlations between FSGS and MCD, the mRNA expression pro les of three types of human renal biopsy samples from patients with FSGS, MCD and healthy control were analyzed to obtain a set of differentially expressed genes (DEGs). further analyzed to determine the potential cellular and biological processes involved in these diseases, nally identify candidate a set of biomarkers to separate FSGS from MCD.

Microarray data
The microarray data were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo) using FSGS and MCD as the search term. GSE104948 is based on the Affymetrix Human GeneChip 3000 7G platform (includes 53 samples, 18 FSGS,14MCD and 21 healthy controls). GGSE108113 is based on the same platform (includes 52 samples, 30 FSGS, 16 MCD and 6 healthy controls).

Microarray Datasets Preprocessing and Differentially Expressed Gene (DEG) Identi cation.
The DEGs were identi ed based on the series matrix le using the Limma package in R software (version 3.5.0). An adj P value < 0.05 was de ned as the thresholds for DEG screening. The DEGs overlapped between the two datasets were identi ed and then used for further functional enrichment analysis. The overlapped DEGs were subjected to bidirectional hierarchical clustering analysis using the heatmap package in R to recognize and visualize the differences in DEGs between FSGS and MCD vs healthy control.
The Search Tool for the Retrieval of Interacting Genes (STRING) (http://string-db.org/) [18] was used to construct the PPI networks of DEGs. Interaction score > 0.7 was set as the cut-off point. Cytoscape software [19] was applied to visualize the PPI network and analyze the interactive relationships. The plugin cytoHubba [20] was used to explore hub genes and subnetworks by topological analysis strategy. The top 10 nodes calculated by the maximal clique centrality (MCC) algorithm were shown as hub genes in the network. The plugin Molecular Complex Detection (MCODE) was performed to identify key clusters.

Identi cation DEGs of FSGS and MCD
We identi ed 105 glomerular samples, comprising 48 FSGS samples, 30 MCD and 27 normal samples, in the GSE104948 and GSE108113 datasets. Based on the cut-of criteria of adj P-value ≤ 0.05, A total of and 358 and 368 glomerular genes were signi cantly differentially expressed in FSGS and MCD compared with healthy controls. Of note, A total of 156 overlapping DEGs were identi ed in FSGS were also differentially expressed in MCD. There were 202 and 212 speci c DEGs remained for FSGS and MCD, respectively. The complete lists of shared DEGs and the remaining DEGs were presented in (Fig. 1A). As shown in volcano plot 223 DEGs were upregulated and 135 DEGs were downregulated in FSGS,193 DEGs were upregulated and 175 DEGs were downregulated in MCD. The results of the expression level analysis are presented in a volcano plot in Figure.1B and C. As indicated in the clustering heat map (Fig. 1B C), these DEGs could well distinguish the FSGS and MCD vs healthy control.

GO and pathway enrichment analysis of DEGs
To further perform a systematic characterization and explore the biological functions of the DEGs in FSGS and MCD, functional annotation and pathway analyses, including GO and KEGG analyses, were performed using DAVID. GO analysis showed that most of the biological processes (BP) terms of the DEGs in these two diseases were simultaneously enriched in mRNA splicing, RNA polymerase II transcription, mRNA export, insulin stimulus, integrin-mediated signaling pathway, viral process and phagocytosis. while some BP terms involved in immune response, autophosphorylation and actin cytoskeleton organization were speci c enriched in FSGS. DNA methylation and apoptotic process were only enriched in MCD. (Fig. 2) CC terms such as focal adhesion, catalytic step 2 spliceosome, nucleus, spliceosomal complex were enriched in both diseases. CC terms of extrinsic component of cytoplasmic side of plasma membrane, AIM2 in ammasome complex and endoplasmic reticulum membrane only enriched in FSGS. Terms of extracellular exosome, dendritic spine, axonal growth cone and cytoplasmic vesicle only enriched in MCD. As for MF term, we also found out some terms related to RNA, nucleotide and protein binding enriched in these two diseases at the same time. While tyrosine kinase, cysteine-type endopeptidase and aminoacyl-tRNA ligase activity only enriched in FSGS. Some MF terms associated with speci c chromatin, protein and domain binding, protein kinase and transcription coactivator activity were speci c in MCD. (Fig. 2) KEGG pathway analysis indicated that the DEGs both in FSGS and MCD were enriched in Spliceosome, Vibrio cholerae infection and FoxO signaling pathway simultaneously (Fig. 3.C). In addition, certain KEGG pathways, include Platelet activation, Osteoclast differentiation, Fc gamma R-mediated phagocytosis, Renin secretion, Fc epsilon RI signaling pathway, Insulin resistance and Pathogenic Escherichia coli infection were commonly involved in the development of FSGS. (Fig. 3A) Some pathway such as Apoptosis, Estrogen signaling pathway, Insulin signaling pathway and Sphingolipid signaling pathway were involved in MCD exclusively. (Fig. 3B)

PPI Network Construction and Module Analysis
The PPI network of the DEGs was constructed and the most signi cant module was obtained using MCODE plugin of Cytoscape (Fig. 4). GO analysis showed that most of the BP terms in these the top module of FSGS, MCD and overlapped DEGs were mainly enriched in mRNA splicing, RNA mRNA export and termination (Fig. 5). KEGG pathway analysis showed that among these three modules, KEGG pathway only enriched in Spliceosome which means MCD and FSGS shared the same pathogenic pathways, this is also can explain why they have the similar clinical manifestations and pathologic changes.

Identi cation and analysis of hub genes
We exported the STRING data to Cytoscape to construct and visualize the PPI network by implementing cytoHubba. Thereafter, we implemented the MCC method to evaluate the signi cance of the genes in the network. There are seven same hub genes shared both in FSGS and MCD, SR splicing-factor 5 (SRSF5), Heterogeneous Nuclear Ribonucleoprotein A2/B1(HNRNPA2B1), Crooked Neck Pre-MRNA Splicing Factor 1(CRNKL1), DEAD-Box Helicase 5(DDX5), Cleavage Stimulation Factor Subunit 2 Tau Variant (CSTF2T), Splicing Factor 3b Subunit 1(SF3B1)and Serine And Arginine Rich Splicing Factor 7(SRSF7).while Splicing factor 3A, subunit 3(SF3A3), DExD-Box Helicase 39A( DDX39A) and RNA Binding Motif Protein 25 ( RBM25) only as hub genes in MCD, CD2BP2, LSM8 and SNRPB as hub genes in FSGS. Further analysis found out that, among these hug genes, CD2BP2, LSM8 and SNRPB only differential expression in FSGS and SF3A3 only differential expression in MCD, which may be used for differential diagnosis of these two diseases in the future (Fig. 6).

Discussion
Primary FSGS and MCD have many clinical as well as histologic similarities at presentation, making separation into these two categories di cult. The common target of injury in both is the podocyte. The distinction between the two disorders is important given the marked difference in terms of response to steroid treatment and long-term outcome. Recently bioinformatics analysis plays an important role in disease studies, facilitating the understanding of pathogenesis by integrating data at the genome level with systematic bioinformatics methods. Consequently, our results may be useful to clinicians to con rm the diagnosis, provide a new method to differential diagnosis and thereby avoid unnecessary or inadequate treatments.
In the present study we found out that, 358 DEGs and 368 DEGs were identi ed in FSGS and MCD compared with healthy controls. The FSGS and MCD group had 156 shared DEGs, indicating that the two disorders have an important and overlapping genetic component.
GO analysis indicated that MCD and FSGS shared most of the BP terms such as mRNA splicing, RNA polymerase II transcription, mRNA export, insulin stimulus, integrin-mediated signaling pathway, viral process and phagocytosis, which were consistent with previous studies. Integrin-mediated signaling pathway was the most well-known pathway associated with kidney disease. Madhusudhan [21] once reported that integrin as an essential coreceptor for activated protein C(aPC) that is required for nephroprotective aPC -protease-activated receptors (PARs) signaling in Diabetic nephropathy. In FSGS,Wei [22] indicated that uPAR is binding to and activating αvβ3 integrins on podocytes, a process that leads to activation of small GTPase Rac-1 which in turns drives podocyte foot process motility and foot process effacement. Kriz [23] discovered that the activation of αvβ3 integrin becomes a mechanism for the structural and functional changes that they see in podocytes under pathological conditions and which change the capability of podocyte to adapt to physiological events such as a changing ltration pressure and shear forces. Phagocytosis also played an important role in kidney disease. Studies from renal cells in culture, human kidney tissues, and experimental animal models implicate that autophagy regulates many critical aspects of normal and disease conditions in the kidney, such as diabetic nephropathy and other glomerular diseases, tubular injuries, kidney development and aging, cancer, and genetic diseases associated with the kidney [24].
Insulin signaling has been widely reported in DN. which can control of glucose uptake and podocytes insulin sensitivity, also involve in insulin-dependent cytoskeleton reorganization in podocytes, mediating glomerular albumin permeability then in uence podocytes viability [25].
Through nding out hub genes between two diseases, we discovered that, most of the hub genes shared between them, which indicated these two diseases have the similar pathogenesis. Noteworthy, there were 4 genes as hub genes for FSGS or MCD exclusively. CD2BP2, LSM8 and SNRPB only differential expression in FSGS and SF3A3 only differential expression in MCD.
CD2BP2, originally identi ed as a binding partner of the adhesion molecule CD2, is a pre-spliceosomal assembly factor that utilizes its glycine-tyrosine-phenylalanine (GYF) domain to co-localize with spliceosomal proteins. So far, its function in vertebrates is unknown. Gesa once discovered that CD2BP2 is critical for embryogenesis and podocyte function. They nd out that CD2BP2-depleted podocytes display foot process effacement, and cause proteinuria and ultimately lethal kidney failure in mice, which de nes that CD2BP2 as a non-redundant splicing factor essential for embryonic development and podocyte integrity [26][27].
SNRPB is a core component of spliceosome and plays a major role in regulating alternative splicing of the pre-mRNA which has been reported associated with various kinds of cancer. 2019 liu [28] reported that SNRPB can facilitate Non-small cell lung cancer (NSCLC) tumorigenesis via regulation of RAB26 expression and proposes that the SNRPB/RAB26 pathway may offer a therapeutic vulnerability in NSCLC.
Bruna [29] revealed the function of SNRPB on splicing and gene expression, through knockdown SNRPB in a GBM cell line followed by RNA sequencing they found that SNRPB was involved in RNA processing, DNA repair, and chromatin remodeling. Additionally, genes and pathways already associated with gliomagenesis, as well as a set of general cancer genes, also presented with splicing and expression alterations.
LSm8 encoded protein consists of a closed barrel shape, made up of ve anti-parallel beta strands and an alpha helix. This protein partners with six paralogs to form a hetero heptameric ring which transiently binds U6 small nuclear RNAs and is involved in the general maturation of RNA in the nucleus. LSm8 also Plays role in pre-mRNA splicing as component of the U4/U6-U5 tri-snRNP complex that is involved in spliceosome assembly, and as component of the precatalytic spliceosome (spliceosome B complex).
Splicing factor 3A, subunit 3 (SF3A3) was originally identi ed from puri ed spliceosome, and had been known to be a critical component of SF3A RNA splicing complex. SF3A3 appears to localize in nuclear speckles and binds with SF3A1 through its zinc ngers in the N-terminus region. 2017 Zou [30] reported that SF3A3 might be the regulatory unit of RNA spliceosome. The tumor suppressor gene, cellular stress response 1 (CSR1) through inactivation of SF3A3 to down-regulates the expression of epidermal growth factor receptor and platelet derived growth factor receptor to plays an important role in regulating cell death.
In our study CD2BP2, LSM8, SNRPB and SF3A3 were rst reported in FSGS and MCD, and these four genes were only expressed in FSGS or MCD which can be used for future differential diagnosis.

Conclusions
Trough bioinformatics analysis, we identi ed hub genes involved in the pathological changes of FSGS and MCD, and con rmed four genes (CD2BP2, LSM8, SNRPB and SF3A3) which were exclusively involved in FSGS or MCD and had potential used for differential diagnosis of MCD and FSGS in the future.