Identication of Potential Biomarkers in PBMC of Systemic Lupus Erythematosus: Results from Bioinformatic Analysis

The discovery of biomarkers has become an attractive eld in studying autoimmune diseases. For example, in the study of systemic lupus erythematosus (SLE), various biomarkers such as genes and miRNAs have been identied for the diagnosis of SLE and its organ involvement. Results The expression data of gene microarray GSE50772 was downloaded from the GEO, and 257 differentially expressed genes (DEGs) were obtained by using limma plug-in for R software. The tissue-specic gene expression analyses were performed in BioGPS database. Then, a protein-protein interaction (PPI) network was constructed with STRING and visualized in Cytoscape. Whereafter, top twenty hub genes derived from the PPI network, could basically differentiate the SLE samples from the non-SLE samples, were ascertained through CytoHubba. What is noticeable is that the ve novel hub genes ( ORM1, SLPI, OLFM4, TCN1 and CRISP3) and a related miRNA (hsa-let-7e-5p) may be considered as candidate biomarkers of SLE. and they will valuable

able to be a valuable tool for quality control [31]. We used Affy package and affyPLM package of R software (R Foundation for Statistical Computing, Vienna, Austria) to estimate the quality of GSE50772 dataset, and the RNA degradation plot was to display the results of the analysis. RMA and KNN methods were used to preprocess the data of each sample in the dataset. Differential expression analysis R software was used to normalize and process the original expression matrix, and DEGs were screened via the limma package. The P-values were calculated by adopting the T-test methods, and the adjusted P-values were computed by applying the Benjamini and Hochberg's method. The DEGs were screened out by the following selection criteria: 1) | log2 (fold-change) | >1, and 2) the adjusted P-values < 0.05. The heatmap and volcano map for the DEGs were created by SangerBox software (http://sangerbox.com/).

Tissue-speci c gene expression analysis
We analyzed the tissue speci c expression of the DEGs by the online resource BioGPS (http://biogps.org). If two following criteria were satisi ed, transcripts mapped to the single tissue would be identi ed as highly tissue speci c: 1) The tissue-speci c expression of the transcripts was 10 times higher than its median level, 2) The second highest expression level was lower than 1/3 of the highest expression level [32].

Functional enrichment analysis of DEGs
We used Database for annotation, visualization and integrated discovery (DAVID) v6.8 (https://david.ncifcrf.gov/tools.jsp) to conduct the functional enrichment analyses of DEGs, including Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The GO, which are used to predict protein functions, includes cell composition (CC), molecular function (MF) and biological process (BP) [33]. The KEGG pathway analysis, which is used to allot a series of DEGs on speci c pathways, constructs the molecular reaction, interaction and relationship [34]. Thus, we conducted pathway analysis to identify which key pathways might be associated with DEGs. In addition, P-values < 0.05 and enriched gene count > 5 were chosen as the criteria for signi cance.
Protein-protein interaction (PPI) network analysis and hub genes identi cation.
The DEGs were uploaded to STRING(https://string-db.org/) to produce the PPI network diagram. These protein-protein interactions involve both functional and physical connections, with data derived primarily from high-throughput experiments, computational predictions, co-expression networks and automated text mining. In addition, the PPI network constructed from STRING analysis was imported into Cytoscape v.3.8.0 software, thereby to make a visual design of PPI network. And CytoHubba was used to process the network data to identify the top 20 hub genes. Subsequently, immune-related hub genes were identi ed by intersection between top 20 hub genes and immune genes, which were downloaded from the Immport immune database (https://www.immport.org/). The GO analyses for these hub genes were performed on Metascape (https://metascape.org/).

Prediction of pivotal miRNAs and identi cation of hub genes
Based on the results of functional enrichment analysis of DEGs, genes enriched in top 10 statistically signi cant biological processes were selected and then performed with miRWalk 2.0 software (http://mirwalk.umm.uni-heidelberg.de/), then their targeted miRNAs were further predicted [35]. Parallelly, in order to verify the accuracy of the results, we used miRWalk, miRDB and TargetScan for the intersection. Therefore, miRNAs targeting at more than two genes were screened out [36]. Furthermore, the GeneCards database (https://www.genecards.org/) was used to identify hub genes which were found to serve as candidate biomarkers in our study.

Study resign
The work ow chart of our study design is shown in Fig. 1. Our original goal is to identify more useful biomarkers involved in SLE pathogenesis. Above all, RNA quality analysis was performed on the GSE50772 dataset, which was downloaded from the GEO public database. Extracted from the limma package of R software, gene expression datas of SLE patients and normal controls were used to lter DEGs. Then, the selected DEGs were analyzed for tissue-speci c gene expression and the functional enrichment. The PPI network of DEGs was constructed followed, and through which, the top 20 hub genes were identi ed. In addition, miRNAs prediction and gene-miRNA interaction network analysis were performed on the genes which involved in the top 10 biological processes with statistical signi cance. Finally, the GeneCards database was used to verify SLE -related hub genes.

Assessment of the RNA quality of samples
To assess the quality of the available data of GSE50772 in SLE, We applied Affy package and affyPLM package of R software to draw the RNA degradation plot for all sample arrays. In Fig. 2, the analysis result demonstrates that 81 nearly parallel curves representing 81 samples have appropriate slopes, which doesn't display the bias of probe-positional intensity associated with RNA integrity, indicating that the RNA quality of samples is relatively ideal.
Differentially expressed genes |Log 2 FC| greater than 1 and adjusted P-values less than 0.05 were considered as criteria to screen the DEGs out. A total of 257 DEGs are obtained, among which 227 are down-regulated and 30 up-regulated. As shown in Table 1, the RPS4Y1, EIF1AY and KDM5D are the most up-regulated. Similarly, the three most down-regulated genes are CXCL8, ANXA3 and IFI27, whose log FC values are − 3.675, -3.186 and − 3.173 respectively (Additional le 1). The volcano plot and heatmap of the DEGs are seen in Fig. 3.

Tissue-speci c expression of genes
We used the BioGPS database, an online tool, to identify 57 genes expressed in speci c tissues or organ systems. As shown in Table 2, the system with the most highly tissue-speci c expression is the hematologic/immune system (59.6%, 34/57), followed by the digestive system (15.8%, 9/57). The respiratory and skin/skeletal muscle systems have similar levels of enrichment (about 8.8%, 5/57), while the urinary and reproductive systems have the lowest enrichment levels (about 3.5%, 2/57). To investigate the biological function of the DEGs, we performed functional enrichment analyses of 257 DEGs, GO annotation and KEGG pathway analyses were conducted by using the DAVID online tool. The top 10 GO items, including BPs, CCs and MFs, and the top 10 KEGG pathways, are listed in Fig. 4A-4D.
Moreover, the top 10 biological processes of DEGs and their related details are shown in Table 3, among which biological pathways with P-value < 0.05 are statistically signi cant. The results suggest that the biological pathways with DEGs signi cantly enriching are immune system-related pathways. Furthermore, the extracellular exosome, cytosol and extracellular space, and integral component of plasma membrane account for the majority of CC. And the most abundant MFs are protein binding. KEGG pathway analysis reveals that the DEGs mainly enrich in Cytokine-cytokine receptor interaction, TNF signaling pathway and Chemokine signaling pathway (Fig. 4D).

PPI network construction, hub genes selection and analysis
The 257 DEGs were inputted to the STRING tool for further analysis, and a PPI network with 224 nodes and 1485 edges were visualized with Cytoscape. The interaction score of PPI network is greater than 0.4. The nodes correspond to genes, and the edges represent the links between genes. Green nodes represent down-regulated genes, red nodes represent up-regulated genes. The local clustering coe cient is 0.532 and PPI enrichment P-value is less than 1.0e-16. Then, the data le was processed with Cytoscape ( Fig. 5A). CytoHubba was used to process the network data, and then to identify hub genes, the top 20 hub genes (CXCL1, CAMP, HP, PTX3, ARG1, ELANE, LCN2, RETN, MMP8, SLPI, PGLYRP1, LTF, OLFM4, ORM1, TCN1, LRG1, CRISP3, CHI3L1, MMP9 and DEFA4 ) were identi ed ( Fig. 5B). The color of a node in the network re ects the rank of hub genes. Clustering shows that the hub genes could basically differentiate the SLE samples from the non-SLE samples. Most hub genes are highly expressed in SLE samples, while relatively low in non-SLE samples (Fig. 5C). In addition, functional enrichment analysis indicates that these hub genes mainly enrich in immune system process as shown in Fig. 5D. In order to further explore and con rm the nature of hub genes, as shown in the supplementary materials for this study, 2482 immune genes downloaded from the Immport immune database were used to intersect with the top 20 hub genes. As expected, we found that 13 hub genes, including CXCL1, CAMP, PTX3, ARG1, ELANE, LCN2, RETN, SLPI, PGLYRP1, LTF, ORM1, MMP9 and DEFA4, were immune-related genes (Additional le 2).
Further miRNA mining and identi cation of key genes Among the top 10 biological processes, eighty-six genes, associated with statistically signi cant biological processes, were selected, and the gene-miRNA analysis was conducted with miRWalk 2.0 software. The intersection of miRNA results predicted by miRWalk, TargetScan and miRDB databases was considered as the result. The selection condition was set as P-value < 0.05, the target gene binding region was 3′UTR. Therefore, hsa-let-7e-5p with high number of gene cross-links (≥ 2) is identi ed, it targets at OLR1 and IRS2 as shown in Fig. 6. The score of it is 1, which means that it has high reliability.
In addition, using the GeneCards database, SLE -related genes were manually identi ed (Additional le 3), and three of our novel hub genes (ORM1, SLPI and TCN1) were veri ed to be potentially involved in the pathogenesis of SLE (Fig. 7). Discussion SLE is one of the most common systemic autoimmune diseases that seriously endangers human health. There are many factors causing the pathogenesis of SLE, among which the abnormal expression of important genes may contribute to lupus pathogenesis by participating in critical pathways, including immune complex processing, type I interferon producing, toll-like receptor signaling, and so on [37]. However, the accurate mechanism of SLE caused these microenvironmental factors has not been completely elucidated, so more attention should be paid to the detection and evaluation of the expression level of lupus -related genes in lupus researches. In present study, we are committed to discover possible SLE -causing molecules, and 257 DEGs (30 up-regulated genes and 227 down-regulated genes) are screened out from GSE50772 expression dataset. They certainly has laid a foundation for our subsequent analyses, and that may be able to illuminate the initiation and progression of SLE.
Considering that multiple organs or systems involvement caused by autoantibodies is the feature of SLE, we performed tissue -speci c expression analysis on DEGs. As revealed by the result, the most highly enriched system is the hematologic/immune system, which is in line with the pathogenesis and clinical manifestations of SLE, and this seems to explain the underlying molecular mechanisms of a self-aimed immune response in SLE patients. Besides, skin/skeletal muscle system, respiratory system, digestive system, urinary and reproductive systems are also enriched by DEGs. Some studies have also suggested that ANXA3 [38,39], TCN1 [40], C1QC [41], AREG [42] and COL17A1 [43] are associated with respiratory injury. ALOX15B [44] may be related to kidney diseases as reported in the literature. ADM [45] and S100P [46] possibly play important roles in lesions of the reproductive system. And changes in expression of OLFM4 [47], CYP4F3 [48], HP [49], ORM1 [50], ANG [51], LRG1 [52], C15orf48 [53], KRT23 [54] and ARG1 [55] are involved with digestive system diseases. Even some of them have been thought to be classic markers of a particular tissue injury. While whether those tissue -speci c DEGs abovementioned are essential for the development of complications of SLE remains inconclusive, and we postulate that the abnormal expression of those genes probably can indicate organ involvement in SLE patients. However, in our results, there are other relatively common organs and tissues that are not signi cantly enriched by DEGs, such as the central nervous system, cardiovascular and circulatory systems. Limited gene expression microarray data with insu cient samples may be a by-no-means negligible cause.
In order to understand disease machinery more deeply and to visualize the overview of the functional connections between all DEGs, we constructed PPI networks, the vital tools for analysis by identifying subnetworks or modules that display speci c topology and/or functional characteristics [56]. Afterwards, on the basis of DEGs' PPI networks, the top 20 hub genes (CXCL1, CAMP, HP, PTX3, ARG1, ELANE, LCN2, RETN, MMP8, SLPI, PGLYRP1, LTF, OLFM4, ORM1,   TCN1, LRG1, CRISP3, CHI3L1, MMP9 and DEFA4) were selected. According to cluster analysis results on them, it is obvious that most hub genes are upexpressed in SLE patients, while relatively low-expressed in normal subjects, which highlights the importance and representativeness of these hub genes in SLE disease. And further functional enrichment analysis on them manifests that immune system processes are dominant. Furthermore, 13 hub genes veri ed by Immport database are thought to be immune-related genes as expected, namely CXCL1, CAMP, PTX3, ARG1, ELANE, LCN2, RETN, SLPI, PGLYRP1, LTF, ORM1, MMP9 and DEFA4. The result exactly supports the idea that these 20 hub genes probably play essential roles in immune-related pathways which can trigger autoimmune dysfunction in patients, and resulting in the pathogenesis and development of SLE.
In prior studies on 20 hub genes mentioned above, the aberrant expression levels of 15 genes have been investigated that they may have various and crucial in uences for different processes of SLE development. Given the roles of ve novel genes in SLE as shown in Table 4, several novel genes, including ORM1, SLPI, OLFM4, TCN1 and CRISP3 may also have diagnostic value in the condition. ORM1 (orosomucoid 1) is an acute phase plasma protein known to activate NFκB, p38 and JNK pathways in macrophages, and it has been reported in rheumatoid arthritis (RA) [57], sarcoidosis and other immune diseases [58]. In experimental autoimmune encephalomyelitis, SLPI (secretory leukocyte peptidase inhibitor) exerted potent pro-in ammatory actions by regulating T cell activity, a process that might bene t the patient [59]. OLFM4(olfactomedin 4) could mediate the autoimmune in ammatory responses of generalized pustular psoriasis, a severe in ammatory skin disease [60]. Low expression of TCNI (transcobalamin I) involved in innate immunity might be partly responsible for the pathogenesis of IgG4-related disease, due to impairments in the innate immune system [61]. Since CRISP3 (cysteine-rich secretory protein 3) was detected to be signi cantly elevated in RA, the researchers hypothesized that it was implicated in the development of RA [62]. In addition, using the GeneCards database, three novel hub genes (ORM1, SLPI, and TCN1) were con rmed to be potentially involved in the pathogenesis of SLE. Almost all of these genes, either high or low expression, are associated in the development of immune diseases. Consequently, chances are that the ve hub genes play pivotal roles in the molecular mechanism of SLE pathogenesis, and we reasonably confer that these novel hub genes may be used as biomarkers to help improve the diagnostic rate of SLE and to provide valuable information for the evaluation of organ or system involvement.It is well known that miRNAs can interfere with the transcription and regulate gene expression [63]. Altered miRNA expression has been regarded as another important factor to the pathogenesis of immune-related diseases, such as SLE.  [70,71].
Targeted by MiR-203a, IRS2 regulates the proliferation and apoptosis of pancreatic β cell [72], which implies the expression of IRS2 is closely related to type 1 diabetes mellitus (T1DM), an autoimmune disease. These results rend us to speculate that has-let-7e-5p may be a potential molecule to induce and deteriorate the SLE even though there have been rare relevant studies are published on this subject. Thus, we propose that hsa-let-7e-5p probably acts as another novel latent biomarker of SLE, and we hope it could provide new insights into molecular mechanism underlying the development and progression SLE.
Additionally, since the GSE50772 is a public dataset, patient consent or ethics committee approval is not required, but the information on individuals' age, gender and health status, as well as medication use, is absent, which appears to be an underlying limitation.

Conclusions
In conclusion, some DEGs speci cally expressed in a tissue or system might be a signal to estimate organ involvement in SLE, and novel candidate biomarkers including hub genes ( ORM1, SLPI, OLFM4, TCN1 and CRISP3) and has-let-7e-5p were identi ed to assist in diagnosing SLE through comprehensive bioinformatic analyses. Our point will provide new and meaningful reference for later SLE studies. Since the current nding is limited by the lack of experimental validation in vivo and in vitro, it is necessary to conduct multiple in-depth studies to detect and verify those potential biomarkers.
Declarations Figure 1 Flow chart of data preparation, processing, analysis, and validation. The gene expression pro les of GSE50772 were downloaded from the GEO database.
Tissue-speci c expression of genes, the functional enrichment and PPI networks were used to investigate potential biomarkers associated with the pathogenesis and clinical manifestation of SLE. In addition, key miRNA and hub genes were were further identi ed, and some SLE-related hub genes were validated based on data from the GeneCards database.   Interaction network between genes involved in top 10 biological processes and its targeted miRNAs. Genes are coloured in blue, and node size is adjusted according to number of targeted miRNAs; miRNAs are coloured in red; miRNAs targeting more than two genes simultaneously are coloured in green.

Figure 7
Venn diagram of key genes between ve novel hub genes in our study and SLE-related genes in GeneCards. ORM1,SPLI and TCN1 are identi ed.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementarymaterial.docx