Study on the Differentially Expressed Genes and Signaling Pathways in Systemic Lupus Erythematosus Using Integrated Bioinformatics Method

Background: Systemic lupus erythematosus (SLE) is a chronic immune connective tissue disease, which is common in women of childbearing age and easy to cause multiple organ inammatory injury. The occurrence of prostate cancer is the result of multiple factors and genes, but we have little understanding of the mechanism involved. In this study, we deeply explored and analyzed the existing gene data in GEO database in order to nd the key genes and new therapeutic targets of SLE. Results: The expression prole dataset of GDS4185, GDS4888, GDS4889 and GDS4890 containing 99 specimens, 42 cases of SLE patients and 57 cases of normal volunteers, were downloaded from the Gene Expression Omnibus (GEO) website. The differentially expressed genes (DEGs) in different tissues was analyzed by statistical hypothesis T test. The gene ontology (GO) enrichment analysis was carried out by the DAVID online tool. KEGG pathway annotation of DEGs was carried out by the KOBAS online computing database. The protein–protein interaction (PPI) networks of the DEGs were built from the STRING website and Cytoscape software. A total of 839 DEGs were calculated from the four GEO datasets. The GO and KEGG analysis indicated that the functions of DEGs mostly participated in the Osteoclast differentiation, HTLV-I infection, Measles, FoxO signaling pathway, Herpes simplex infection, Primary immunodeciency, Jak-STAT signaling pathway. The following 14 closely related genes, HERC5, TP53, CDC20, GNB2, GNB4, PPP2R1A, GNAI2, PMCH, SOCS3, HERC6, STAT1, SOCS1, ISG15, IFIT3, were key nodes from the PPI network. These genes may have synergistic or indirect interactions with each other in the process of biological metabolism inducing the pathogenesis of SLE. Conclusion: Mining geo database has great scientic research value. In the future, scientic research must fully excavate a variety of database analysis methods. In this study, the screened candidate genes provide effective theoretical basis for the diagnosis, treatment, expected evaluation and related laboratory research of SLE, which are


Background
Systemic lupus erythematosus (SLE) is a spontaneous autoimmune disease characterized by buccal butter y rash, arthralgia, myalgia, and neuralgia, which can affect all organs and systems of the body [1] .
Its basic pathological changes are vasculitis involving multiple organs and systems [2] . Serum autoantibodies and multiple systemic involvement are two main features of SLE [3] . This disease is predominant in women of childbearing age, mostly in the 15-64 age group, with an average incidence ratio of 1:9 for men and women [4] . Data show that the prevalence of SLE varies from (20-240)/100,000 to] (1-10)/100,000 per year [5] . The pathogenesis of SLE is very complex. It is generally believed that the interaction of multiple genes and environmental factors leads to the continuous activation and apoptosis of T and B cells, and ultimately leads to excessive production of autoantibodies to damage their own tissues and organs [6] . To date the early diagnosis of SLE mainly relies on the clinical manifestations of patients and the detection of serum autoantibody markers, but the diagnosis of speci city and accuracy of this disease is not high. The treatment of SLE includes non-steroidal anti-in ammatory drugs, glucocorticoids, disease modifying anti-rheumatic drugs and novel biological targeting drugs [7] . Because of the deepening of basic research, the renewal of examination methods and the formation of international consensus, the level of diagnosis and treatment of SLE has been signi cantly improved, and the prognosis has also been signi cantly improved. However, in recent years, with the gradual increase of SLE combined with cardiovascular events, the mortality rate is gradually on the rise. Therefore, it is signi cant and urgent to nd the potential pathogenic mechanism of SLE, to explore more available early diagnostic methods, more dependable pathogenic genes for monitoring relapse and disease control, and to investigate a more useful method to improve SLE symptoms. As an effective and up-to-date technology to obtain genetic information, gene expression datasets have been in common usage to integration gene expression pro les information and to explore molecular mechanism in a great deal of diseases. These microarrays offer a new approach for exploring pathogenic factors. It provides broad expects for drug-based target prediction and wide application of molecular therapy gene expression chips. There are a lot of studies in which pro les are uploaded on public database platform, also integration of these pro les may further explore pathogenesis of this disease.
At present, many literatures and papers have been carried out on SLE gene expression arrays [8][9][10] . These literatures and papers have explored a great deal of DEGs that may participate in the occurrence and development of SLE. But the results of identifying the prominently expressed mRNAs were not consistent.
Because of the incidence of organization or sample in different experiment, distinct genetic testing datasets, various data treatment means and due to samples collection from dissimilar backgrounds, they are re-expressed in diverse studies. Therefore, there are still certain de ciencies in single cohort researches. These research data should be re-integrated. The consolidation and merging of microarray data from multiple gene expression pro les can puzzle the di culties out and nd important and signi cant molecular markers. Robust rank aggregation (RRA) method is specially designed for comparing multiple ranked gene lists [11] . RRA uses a probability model to aggregate, which has a strong tone and helps to calculate the saliency probability of all elements in ultimate position [12] . The RRA method suppresses the effect of outliers on the results, leaving only the genes associated with statistics [13] . The P-value can delegate the hierarchical position and importance of a gene. RRA method is an effective comprehensive analysis method for identifying genes in statistical sense. In addition, this is useful when different technology platforms acquire different types of genes and do not have a complete mRNAs ranking.
In this research, we retrieved four microarray expression datasets, GDS4185 [14] , GDS4888 [15] , GDS4889 [16] and GDS4890 [17] , from the NCBI-Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The CEL le of dataset is processed with R, and the expression value of log2 of single probe of each chip is generated with default parameters. Then, the P value of each probe is obtained by two sample equal variance T test and two sample heteroscedasticity T test. In this study, P = 0.05 was de ned as the differential expression threshold of gene or probe. David (the Database for Annotation Visualization and Integrated Discovery) is a biological information database, which integrates biological data and analysis tools. It provides systematic and comprehensive biological Page 4/22 function annotation information for large-scale gene or protein list, and helps users extract biological information from it. The enrichment analysis of gene ontology (GO) and KOBAS-Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs was carried out in David database. String database is a database to search for known protein-protein interactions and predict protein-protein interactions. We used STRING database to construct protein-protein interaction network (PPI), which is helpful to obtain the correlation of DEGs and discover the core regulatory genes of SLE. In summary, the gene fragments related to SLE occurrence and development were screened by SLE geographic database and analyzed comprehensively. The molecular mechanism and considerable signaling systems of these gene fragments were analyzed, and the interaction network of coding proteins was made. Our study produced results of causative factors for early prediction and judgment of SLE and provides effective drug targets for the remedy of SLE.

Methods
DNA microarray technology is a new technique for analyzing genomic pro les of gene expression.
Various DNA microarrays and DNA chip equipments and systems have been exploited and commercialized. DNA microarray analysis consists of oligonucleotide chip, DNA chip and genome chip, which can be organized into two patterns: one is to repair object DNA on the carrier for analysis of a great deal of object DNA; the other is to repair many different probes on the carrier material for analysis of extensive different target DNA. The various probe sequences of the identical objective DNA were analyzed by R.
The gene expression pro les of GDS4185, GDS4888, GDS4889 and GDS4890 were screened using the keyword "lupus erythematosus" searched in the geographic dataset database (https://www.ncbi.nlm.nih.gov/geo/). GDS4185 is a GPL96 platform, [HG-U133A] Affymetrix Human Genome U133A Array, including 39 samples of normal human peripheral blood subsets and 28 SLE patients peripheral blood subsets. The platform of GDS4888 is GPL570, [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, which contains 8 samples of normal human peripheral blood subsets and 6 SLE patients peripheral blood subsets. The platform of GDS4889 is GPL570, [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, which consists of 4 samples of normal human peripheral blood subsets and 4 SLE patients peripheral blood subsets. The platform of GDS4890 is GPL570, [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, which contains 6 samples of normal human peripheral blood subsets and 4 SLE patients peripheral blood subsets. The platform and series matrix les are saved as TXT les. Software packages are used to process downloaded les and convert and reject substandard data. The data are classi ed, corrected and log 2 converted.

Screening for DEGs
The downloaded DEGs platform and matrix le series are converted by using R language software and annotation package. The ID corresponding to probe name is transformed to the international canonical name of gene (gene symbol) and stored in TXT le. The DEGs analysis was carried out using limma software package in Bioconductor Package (available online: http://www.bioconductor.org/). The correlative manipulating instruction codes were entered into R, and the DEGs in SLE and normal specimens of the four gene expression datasets were calculated by the limma software package. Specimens with an adjusted P-value of <0.05 and log fold change (FC)>2 were treated as DEGs. The TXT outcomes were saved for later calculation.

Integration of microarray data
The list of DEGs from the four microarray datasets calculated by limma packet analysis was stored as a TXT le. The RRA software package was downloaded, and R was applied to run the command code. A portion of genes that were up-or down regulated in the four chips were saved for later research. The RRA approach is openly workable in the Comprehensive R Network (http://cran.r-project.org/).

GO and KEGG pathway enrichment analyses of DEGs
The DAVID database (https://david.ncifcrf.gov/) is a genetic database, which provides bioinformatic data and analysis tools, supplies systematic and complex bioinformatics annotations for extensive gene or protein datasets, and identi es the most signi cant bioinformatics annotations by statistical methods. The gene ontology (GO) enrichment analysis was carried out by the DAVID online tool. KEGG pathway annotation of DEGs was carried out by the KOBAS online computing database (http://kobas.cbi.pku.edu.cn/). In this research, we obtained the DEGs that were prominently up-and down-regulated as de nitive factors from merged gene expression SLE data, and a P-value of <0.05 was treated different dramatically.

PPI network integration
The STRING database (http://string-db.org/) is an online database for searching for protein-protein interactions (PPIs), which helps to mine the core regulatory genes. It includes not only the direct physical interaction between proteins, but also the indirect functional correlation between proteins. Each node in the network graph represents a protein, and the lines between nodes represent the interaction between two proteins. All DEGs were input into STRING database to obtain PPIs, then imported into Cytoscape software to weighted, integrated, and calculated to obtain reliable value and build into a speci c view.
The corresponding proteins in the central node are probably key proteins or core genes to be screened with momentous pathogenic effect.

Microarray data information and identi cation of DEGs in SLE
The SLE expression microarray datasets GDS4185, GDS4188, GDS4889 and GDS4890 were standardized, and the results are exhibited in Figure 1. When the GDS4185 dataset was analyzed by the limma package (corrected P-value <0.05, logFC >2), 23 DEGs were received. Among them, 22 upregulated genes and 1 downregulated gene were identi ed. 333 DEGs were analyzed from the GDS4188 dataset, containing 221 upregulated genes and 122 downregulated genes. 579 DEGs were analyzed from the GDS4889 dataset, containing 357 upregulated genes and 222 downregulated genes. Additionally, 325 DEGs were analyzed from the GDS4890 dataset, containing 189 upregulated genes and 136 downregulated genes. The DEGs from the two specimen datasets contained in every microarray are exhibited in Figure 2. The cluster heatmaps of the top 20 DEGs are exhibited in Figure 3.
DEGS in SLE was identi ed by comprehensive bioinformatics. The microarray data of four kinds of SLE gene expression were analyzed by limma software package. The data were classi ed according to logarithmic variation, and then by RRA analysis was performed (corrected P value <0.05). The RRA method assumes that each gene in each experiment is randomly arranged. If a gene ranks higher in all experiments, the smaller its p value, the greater the possibility of differential gene expression. By rank analysis, we identi ed 839 DEGs, of which 289 were up-regulated and 550 were down-regulated. Using Rheatmap software, the Heatmap of Top 20 up-and down-regulated genes was drawn, as exhibited in  Table 1. The BPs of DEGs mainly enriched in response to virus, type I interferon signaling pathway and cellular protein metabolic process.
The main CCs concluded perinuclear region of cytoplasm, focal adhesion and cell-cell adherens junction. The MFs concluded protein binding, double-stranded RNA binding and actin lament binding.

KEGG pathway analysis of DEGs
Using the KOBAS online analysis database (http://kobas.cbi.pku.edu.cn/) to analyze the DEGs identi ed from SLE-integrated gene microarrays, the most obviously enriched pathways of the DEGs were submitted to KEGG database. The outcomes are exhibited in Table 2. The signaling pathways of DEGs were mainly enriched in the Osteoclast differentiation, HTLV-I infection, Measles, FoxO signaling pathway, Herpes simplex infection, Primary immunode ciency and Jak-STAT signaling pathway in SLE. The differentially expressed genes and their attribute columns were introduced into Cytoscape to analysis the topological properties of each node in an interactive network.
Pathway enrichment analysis pointed out that the genes mostly participated in the Osteoclast differentiation, HTLV-I infection, Measles, FoxO signaling pathway, Herpes simplex infection, Primary immunode ciency, Jak-STAT signaling pathway.
As E3 ubiquitin ligase, HERCS has classical ubiquitin ligase activity [18] . In addition, HERCS can bind ubiquitin-like protein molecule ISG15 to speci c protein substrates [19] . Like the process of ubiquitination, ISG15 covalently binds to the target protein and modi es the target protein, which is called ISG localization [20] . Several studies have found that HERCS also play an important role in antiviral responses [21][22] . In terms of expression regulation, Perng et al [23] found that HERCS gene expression is regulated by a variety of stimulation factors, especially in ammatory factors (such as LPS, TNF-α and IL 1β).
Human TP53 gene is located at 17p13, about 20 KB in length, containing 11 exons, and transcript 2.8KB mRNA, encoding protein p53 [24] . As a transcription factor, p53 induces the expression of downstream genes such as p21, BAX and GADD4 and plays an important role in cell apoptosis, cycle change, DNA repair, cell metabolism and autophagy [25] . Related studies have found that p53 gene polymorphism is associated with systemic lupus erythematosus susceptibility in Asian populations [26][27] . It has also been found that p53 induces the production of related proteins by binding to speci c DNA sequences, thereby inhibiting the expression of DNA methyltransferase gene, thereby affecting DNA methylation level [28] . In addition, Shepherd et al [29] found that the expression of TP53 gene in active SLE patients was higher than that in normal people, and linear regression analysis showed that the expression of TP53 gene was positively correlated with SLE-DAI. These results suggest that the abnormal expression of TP53 gene in peripheral blood T lymphocytes is related to the pathogenesis of SLE.
Cell division cycle molecule 20 (CDC20) is an important regulatory molecule at cell cycle checkpoint, which plays an important role in the regulation of cell mitotic anaphase by directly binding and activating the mitotic anaphase promoting complex [30] . APC/CDC20 complex has the activity of E3 ubiquitin ligase, and through the recognition of the structure domain of substrate degradation box, it performs the protein degradation mediated by ubiquitin cage haberasome [31] . Its abnormal expression can cause normal cells to produce non-integer ploidy by destroying cell mitosis, and then promote malignant transformation of cells [32] . TP53 negatively regulates the expression of CDC20, which is supported by evidence that the overexpression of TP53 inhibits the expression of CDC20, while the absence of TP53 leads to the expression of CDC20 [33] .
GNB2 is a house-keeping gene that codes for RACK1 protein in normal cells [34] . RACK1 protein as an important protein skeleton, intracellular protein joint, anchored proteins, involved in compound generated signaling molecule in cells, affect the active molecule subcellular localization, communication and the connection between the different signaling pathways, so as to exert its for signal transduction, gene expression, cell proliferation, differentiation and so on various aspects of adjustment, so the researchers widely believe that GNB2 participated in many important physiological processes and organisms as an important role in it [35][36] . When cells are stimulated, genes can respond quickly, encode synthetic proteins, realize transcriptional regulation and post-transcriptional level regulation of genes, and further trigger a series of delayed responses [37] . Nielsen et al [38] knocked out the RACK1 gene in mice and found that the immune response to LPS or Poly I: C stimulation in mice was enhanced, which showed increased mortality. It has been preliminarily proved that RACK1 can inhibit overstrung innate immune response in mice [39] . In addition to RACK1, CD8T cells and CD4T cells, Bolger et al [40] also showed a certain degree of reduction in mouse T cells. Li et al [41] have found that RACK1 is associated with endocrine disruptors, suggesting that GNB2 is a bridge between the endocrine and immune systems.
GNB4 gene is used to make G protein 4 subunit (Gβ4), which is one of the core members of G protein signal transduction system [42] . Gβ4 related signaling systems play an important role in the functioning of peripheral nerves [43] . GNB4 gene mutation can cause Gβ4-related G protein signaling pathway to be blocked, resulting in muscle atrophy, weakness and decreased tendon response [44] . Lohmann et al [45] found that GNB4 gene mutation was associated with infectious motor neuropathy.
PP2A is a heterotrimeric holoenzyme complex and an important positive regulator of cell growth and proliferation [46] . As the structural subunit of protein phosphatase PP2A, PPP2R1A protein plays a role of scaffold and linkage in the whole enzyme catalysis process of PP2A and regulates the dephosphorylation process of PP2A substrate protein serine/threonine residues [47] . Some studies have found that the transcription and expression of PPP2R1A can affect the activity of the NF-κB pathway, thereby regulating environmental factors and mediating cell function [48] .
G protein α inhibitor 2(GNAI2) is an inhibitory G protein, which can inhibit adenylate cyclase and reduce the content of the second messenger (cAMP) in cells, thus resulting in the corresponding biological reaction [49] . Supper et al [50] found that GNAI2 is involved in the regulation of intercellular connections and can regulate the functions related to glutamate signaling pathway and the functions related to mTOR signaling pathway.
Pro-melanin-concentrating hormone (PMCH) is a key regulator in the body's energy homeostasis system, which is related to the occurrence of obesity [51] . Ferreira et al [52] found that PMCH is related to body weight and energy consumption and plays a key role in maintaining the overall homeostasis of hypothalamus and neuroendocrine responses.
SOCS3 is one of the most active members of the SOCS family and the most important inhibitor of the JAK kinase and STAT signaling pathway [53] . It negatively regulates the expression of the JAK/STAT signaling pathway and its downstream target genes to prevent malignant cell transformation and apoptosis inhibition. Because of SOCS3 inhibit LPS, and interferons, IL-2, IL-12, IL-6 immune signal transduction related molecules, such as the body's natural immune response and the adaptive immune response has an important in uence, closely associated with in ammatory disease development [54] .
These studies suggest that the molecule SOCS3 may be a new target for the development of drugs for the treatment of SLE.
HERC6 is one of the E3 ligases of ISG15 [55] . The expression of ISG15 is closely related to interferon, suggesting that it plays an important role in immunity. Recombinant ISG15 can stimulate the proliferation of natural killer cells (NK) and enhance the cytotoxic effect of NK cells. In this process, CD3 + T cells are necessary for ISG15-induced proliferation and cytotoxic effect of NK cells [56] .
Signal transduction and transcription activators (STAT) are a family of transcription factors involved in cell signal transduction. STAT1, an important member of the STATs protein family, plays an important regulatory role in cell growth, differentiation, proliferation and apoptosis mainly through Janus associated kinase-STAT signal pathway [57] . Weng et al [58] used anti-stat1 antibody and immunohistochemical techniques in renal biopsy of 15 patients with diffuse proliferative lupus nephritis and found that the expression of STAT1 in renal tissue was high, and the prognosis of the disease was poor with high expression of STAT1. Kohanbash et al [59] found that the renal tissues of MRL/ LPR mice with lupus nephritis were highly expressed with STAT1 and activated phosphorylated STAT1 could be detected in glomerular cells, while the negative regulators of JAK/STAT SOCS1 and SOCS3 were also signi cantly increased. STAT1 phosphorylation was induced by adding IFN-α or IFN-γ to cultured MRL/ LPR mouse mesangial cells. Chen et al [60] found in the rat model of lupus that calcium-dependent kinase can inhibit STAT1-mediated IFN-signal transduction. Therefore, some scholars speculate that reducing the expression of STAT1 may reduce the effect of IFN and become a potential therapy for SLE.
As one of the important negative feedback factor types of the JAK/STAT pathway, SOCS1 and the other two negatively regulated protein families are protein tyrosine phosphate and STAT activation inhibitor proteins, which play a crucial role in the development of IL-10 mediators in SLE patients [61] . SOCS1 can induce the degradation of NF-κB and inhibit the activity of NF-κB. Meanwhile, SOCS1 is involved in innate immunity and plays an important role in the pathogenesis of in ammatory diseases. SOCS1 also plays a crucial role in systemic autoimmune suppression [62] .
ISG15 can under the stimulus of virus or interferon, strongly induced expression [63] . Similar to ubiquitin, ISG15 can to covalent modi cation of proteins, but not mediated protein degradation. ISG15 is mainly involved in the body's innate immune function in the process of regulation and interferon [64] . Bianco et al [65] found that the expression level of ISG15 mRNA in PBMCs of SLE patients was signi cantly positively correlated with SLE disease activity score index score and anti-double-stranded DNA antibody, and signi cantly negatively correlated with complement C3 and C4. The expression level of ISG15 mRNA in patients with active SLE is signi cantly higher than that in normal people and in disease remission, suggesting that the expression of ISG15 may be involved in the pathogenesis of SLE and signi cantly correlated with the activity of the disease [66] .
IFIT3 is a member of the related protein family encoded by interferon inducible gene and plays an important role in inhibiting virus and cell proliferation [67] . Wang et al [68] found that IFIT3 mRNA in SLE group was signi cantly higher than that in non-SLE control group and normal control group, and IFIT3 mRNA in SLE active group was signi cantly higher than that in SLE inactive group. There was a very signi cant positive correlation between IFIT3 mRNA and SLEDAI score in SLE group. It suggests that the increase of the quantitative expression level of IFIT3 mRNA is of certain signi cance for the diagnosis of SLE patients, as well as the judgment of the disease activity and severity of SLE patients.

Conclusions
In this study, we deeply explored and analyzed the existing gene data of GEO database to nd the key genes and new therapeutic targets of SLE. According to the standard process, the chip data in GEO database were collected, processed and integrated. The chip data were normalized and corrected. The results were analyzed by R language program.
A total of 839 differentially expressed genes were detected, including 289 up-regulated genes and 550 down-regulated genes. The range of candidate genes was further narrowed by heat map and volcano map analysis, and the differentially expressed genes, pathways and their encoded proteins were analyzed by go enrichment, KEGG pathway analysis and string online tools. The results showed that the differentially expressed proteins and their encoded genes were mainly concentrated in 14 genes: HERC5, TP53, CDC20, GNB2, GNB4, PPP2R1A, GNAI2, PMCH, SOCS3, HERC6, STAT1, SOCS1, ISG15, IFIT3.
Although our study provides some differentially expressed genes, the occurrence and development of SLE is a complex and multifactorial common outcome. The genes involved in the study may only be a part of them, or even a part of the secondary changes after the occurrence of SLE. It is necessary to continue to carry out in vitro and in vivo experimental veri cation in order to provide effective theoretical basis for the diagnosis, treatment, expected evaluation and related laboratory research of SLE. As a chip screening with large sample size, low cost and simple analysis, GEO database analysis can provide reference and basis for more people to carry out large-scale genomics research and follow-up research based on genomics through various statistical analysis methods.

Availability of Data and Materials
All data generated or analysed during this study are included in this published article.
Ethics approval and consent to participate Not applicable.

Consent for Publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.