Signalling networks constructed by integrative analysis of multi-omics in peripheral blood mononuclear cells in systemic lupus erythematous patients

Wencong Song Jinan University Fengping Zheng Shenzhen People's Hospital, Jinan University Shaoying Huang Jinan University Wanxia Cai Jinan University Institute for Economic and Social Research Haiyan Yu Jinan University Xiaoping Hong Jinan University Jingquan He Jinan University Weier Dai University of Texas at Austin Lianghong Yin Jinan University Donge Tang Jinan University Dongzhou Liu Jinan University Yong Dai (  daiyong22@aliyun.com ) Jinan University https://orcid.org/0000-0002-6840-9158


Introduction
SLE is a worldwide disease with approximate 70/100,000 prevalence in China [1,2]. There is no speci c or e cient diagnosis and treatment for SLE, thus it needs more studies on the etiology of SLE disease. SLE is the autoimmune disease with a large number of autoantibodies production, which results in multiple tissues and organs' damage. Early diagnosis and treatment improve patient prognosis [3]. Up till now, the diagnosis and treatment of SLE disease are largely relied on the autoantibodies laboratory tests and the immune-suppressive drug respectively. However, the existing diagnosis and treatment methods for SLE disease contain some drawbacks, for example, many autoantibodies are non-speci c, which lead to activation of polyclonal B cell [4], while immune-suppressive drugs such as steroids might increase the susceptibility to dangerous infections. To improve the diagnosis and treatment of SLE disease, more studies are needed for the SLE molecular pathogenesis mechanism investigation. Notably, exploration of SLE disease relative signaling pathways will be helpful for understanding disease's molecular mechanism, and investigating diagnostic biomarkers and the drug targets.
Omics technologies have developed our understanding on the molecular mechanism of SLE and provided an amount of information for improving the treatment and diagnosis of SLE. Nonetheless, due to the complex pathogenesis of SLE, studies by one or two omics methods are limited. Therefore, the exact molecular mechanism of SLE has not been well elucidated so far. The integrative analysis of data generated by multi-omics in recent years has provided new ways for biomedical research. The joint analysis study combines data of messenger RNA (mRNA) and long non-coding RNAs (lncRNA) in SLE [19]. Cell proliferation, mRNA processing and translation-related pathways were identi ed in induced pluripotent stem cells from SLE patients' urine by integrative analysis of mRNA, microRNA and protein [5]. The vital role of Coagulation and complement pathway in the pathogenesis of SLE were studied by integrated analysis of Transcriptomics, Proteomics and Metabolomics [20]. Nevertheless, to explore the mechanism of SLE disease, the number of studies on integrative analysis of multi-omics are far more enough.
In our study, peripheral blood mononuclear cells (PBMCs) were collected from both SLE patients and healthy subjects. With the help of integrated analysis method, unique methylation pro les of simultaneous genes and distinctive expression pro les of Long non-coding RNAs (lncRNA), microRNAs (miRNAs), circular RNAs (circRNAs), messenger RNAs (mRNAs) and proteins in SLE patients were identi ed. At the same time, the systematic signaling regulation networks of SLE were constructed in here.

Methods And Materials
Blood samples collection

PBMCs preparation
PBMCs from each donor were obtained by using the Ficoll density gradient separation method (Sigma, USA). Then PBMCs was stored at -80 °C for further treatment.
Whole-genome bisul te sequencing (WGBS)-DNA methylation Genomic DNA library construction and sequencing Total DNA was extracted by QIAamp Fast DNA Tissue Kit (Qiagen, Dusseldorf, Germany) and measured by spectrophotometer. The fragmented DNA samples by using sonication were subjected to bisul te conversion. The Accel-NGS Methyl-Seq DNA Library Kit (Swift, MI, USA) was utilized for attaching adapters to single-stranded DNA fragments. Then, the pair-end 2×150bp sequencing was conducted by an Illumina Hiseq 4000 platform.

Bioinformatics analysis for WGBS data in PBMCs
Sequence quality veri cation was performed by the FastQC online analysis tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). WALT was used for mapping quali ed reads to the reference genome [21], and samtools was used for further deduplication of reads [22]. The DNA methylation level was determined by using perl scripts and MethPipe [23]. The R package-MethylKit was utilized for calculation of differentially methylated regions (DMRs) [24].

Whole transcriptomics
Small-RNA library construction and sequencing Total RNA was extracted from PBMCs by using TRIzol reagent (Invitrogen, CA, USA). TruSeq Small RNA Sample Prep Kits (Illumina, San Diego, USA) was used for small RNA sequencing library construction, and then the library was sequenced by an Illumina HiSeq 2500 with 50 base single-end reads.

Differential expression analysis for RNAs
The raw datasets were processed by the program of and Cutadapt and ACGT101-miR (LC Sciences, Houston, TX, US) [25]. Quali ed reads were mapped to the human reference genome (GRCh38) by Bowtie 2[26] and TopHat2 [27]. Differential expression levels of mRNAs, miRNAs, circRNAs and lncRNAs were calculated [28] and Fisher's exact test according to |log2 (fold change) |> 1 and p-value < 0.05.

LncRNAs identi cation
First of all, the known mRNAs and transcripts shorter than 200 bp were castoff. Then CPC [30], CNCI [31] and Pfam [32] were used for prediction of transcripts.

Localization of lncRNAs on genome
With the help of program Circos [33], localization and abundance of lncRNAs in the genome could be present in the way of diagrams. The lncRNAs in diagram were subdivided into six categories (I, J, O, U, X, K) according to their class code generated by StringTie: (I) A transfrag falling entirely within a reference intron (Intronic); (J) Potentially novel isoform or fragment at least one splice junction is shared with a reference transcript; (O) Generic exonic overlap with a reference transcript; (U) Unknown, intergenic transcript (intergenic); (X) Exonic overlap with reference on the opposite strand (antisense); (K) The known, lncRNAs .

Prediction of miRNA targets
TargetScan [34] and Miranda [35] were tools for prediction of miRNA targets.

Bioinformatics analysis for circRNAs
Firstly, Cutadapt [25] was used for un-quality reads ltering, then sequence quality was veri ed by online analysis tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Bowtie2[26] and Tophat2 [27] were used for humans genome mapping. Unmapped reads were mapped by using tophatfusion [28]. CIRCExplorer [26] was used to assemble the mapped reads to circRNAs; Then, back splicing reads were identi ed in unmapped reads. The expression level of circRNAs from different samples or groups was calculated. By the R package edgeR [27], the comparisons with p-value < 0.05 were selected.
Bioinformatics analysis of miRNAs RNA families (rRNA, tRNA, snRNA, snoRNA) and repeats were applied to lter non-miRNA sequences in the ACGT101-miR program (LC Sciences, Houston, Texas, USA). Subsequently, the remaining sequences were mapped to human precursors in miRBase 22.0 by BLAST to identify both known miRNAs and novel 3p-and 5p-derived miRNAs.

The Prediction of Target Genes of miRNAs
Two computational target prediction algorithms (TargetScan 50 and Miranda 3.3a) were used for the prediction of genes targeted by these most abundant miRNAs.

Proteomics-iTRAQ
Samples were eluted in lysis buffer and further processed by the procedures in the previous report [20] and nally get the supernatant aliquot. Protein was digested with Trypsin Gold (Promega). Then peptides were dried and reconstituted in 0.5 M TEAB and processed according to the manufacture's introduction. The peptides were subjected to nanoelectrospray ionization. Intact peptides were detected in the orbitrap. The details of the protocol were referred from a previous study [20].

Bioinformatics analysis for proteomics
Msquant search engine (version 2.3.02) and related database were utilized for protein identi cation[36].
Each con dent protein contains at least one unique peptide. The median peptide ratio in Mascot was used for normalization of the quantitative protein ratios.

Functional enrichment analysis
The databases of Gene Oncology (GO) (http://www.geneontology.org) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www. genome.jp/kegg) were utilized for predicting the underlying biological functions and pathways of the candidate target genes. DNA methylation, ceRNAs (lncRNAs, circRNAs), miRNAs, mRNAs and proteins related genes were put into the database for annotation, visualization and integrated discovery (DAVID) for GO and KEGG pathway analysis.

Statistical analysis
Before further analysing of data, all variables were go through normality and equal tests. Unpaired Student's t test was used for analysis the difference of normal distributed data while nonparametric tests were used for non-normal distributed data. Data were analysed by GraphPad Prism (Version 6, CA) and GraphPad Prism uses the abbreviation mean±SEM. P <0.05 was considered as statistically different.
Signaling networks construction TargetScan (v. 5.0) [37], miRanda (v. 3.3a) [38] were used for predicting potential regulation networks of miRNA-mRNA and lncRNA-miRNA-mRNA. The regulatory networks of methylation-mRNA, methylation-miRNA-mRNA and protein-mRNA were unveiled in PBMCs-SLE. The regulation networks in proteinsproteins was predicted by the STRING database (http://string-db.org/) and KEGG database. The overlapping predictions between two datasets were considered as effective target pairs, and all signaling networks were constructed by Cytoscape (v. 3.5.0) [39].
qRT-PCR veri ed the identi ed genes in the PBMCs from SLE patients and healthy people The technical veri cation of identi ed RNA sequencing data in the PBMCs of SL_B and H_B was under the help of the Quantitative real-time polymerase chain reaction (qRT-PCR). Total RNA was extracted from the PBMCs described above and subjected to rst-strand cDNA synthesis using the TRUEscript 1st Stand cDNA SYNTHESIS Kit (Aidlab, Beijing, China). The speci c primers used in the qRT-PCR are listed in supplement material (Table S10), and qRT-PCR was performed using an SYBR PrimeScript miRNA RT-PCR Kit (TianGen Biotech, Beijing, China). GAPDH and U6 were used as internal controls. Three independent biological replicates for each gene. The comparative 2-ΔΔCt analyzed method was used in this study.

Results
Twenty both sex and age match SLE patients (SL_B) and healthy subjects (H_B) with no autoimmune disease were recruited. The SLE diagnostic criteria in this study was based on ACR SLE classi cation criteria and the characteristics of the enrolled people are present in supplement material (Table S11).

Summarization of KEGG enrichment and GO analysis for mono-omics
We summarized the data of KEGG enrichment and GO analysis from single omics which including Genomics (DNA methylation), Transcriptomics (lncRNAs, circRNAs, miRNAs and mRNAs), Proteomics (proteins) ( Fig. 1-3). We use the interactive Venn diagram (Fig. 1E) to display the overlap of KEGG pathways and GO terms in single omics, and the detail of the KEGG pathways and GO terms in monoomics were showed in the table (Table 1).
Different single omics shared some common regulatory pathways which showed in the Venn diagram (Fig. 1E) and table 1, and these common pathways including Apoptotic process, Cell adhesion et al.  (Fig. 4B). KEGG showed that the differential expression of Methylation modi ed genes were involved in the pathways of JAK-STAT et al. (Fig. 4A).
miRNA-mRNA GO analysis was conducted for the functional analysis of miRNA targeted genes. GO terms were enriched by miRNA targeted genes (the differential mRNAs expression of which were put in the sorted criterial: |log2(fold change) |>=1 and P-value < 0.05). BP GO terms contain apoptotic process et al. CC GO terms include mitochondrion et al. MF GO terms contain ATP banding et al. (Fig. 4B). KEGG enrichment showed that the differential expression of miRNA targeted genes were involved in signaling pathways including Apoptosis et al (Fig. 4A).
Methylation-miRNA-mRNA GO analysis was conducted for the functional annotation of methylated genes targeted by miRNA. GO terms of biological processes (BP), molecular functions (MF) and cellular components (CC) were selected, which were enriched by the miRNA targeted methylated genes (Sorted criteria: |log2(fold change) |>=1 and P-value < 0.05). In the enriched GO terms, BP contains positive regulation of GTPase activity, MF contains ATP banding et al. (Fig. 4B).
KEGG showed that the differential expressed genes involving in the integrative data of methylation-miRNA-mRNA was involved in pathways of Cellular Senescence, Antigen Processing and Presentation and TNF signaling et al. (Fig. 4A).

mRNA-Protein
GO terms were enriched for differential expressed proteins in the integrative data of protein-mRNA (Sorted criteria was the expression levels of proteins in the joint analysis data ful l the following requirements: (fold change>=1.2 or fold change<=0.83) and P-value < 0.05). In our study, GO terms including 25 BP, 10 MF and 15 CC. In the GO analysis, BP GO terms contain Apoptotic process, Cell proliferation et al (Fig.  4B).
KEGG enrichment analysis showed that the differential expressed genes in the integrative analysis of protein-mRNA were involved in signaling pathways including Systemic Lupus Erythematosus, Complement and Coagulation Cascades et al (Fig. 4A).
We also summarized the KEGG and GO data from gure 1, and calculate the overlap of KEGG pathways or GO terms in the joint analysis of multi-omics, and found that different multi-omics shared some common KEGG pathways or GO terms ( Table 2). Those common KEGG pathways and GO terms include apoptosis, oxidation, in ammation process et al.

Further analysis of the common KEGG pathways and GO terms in both single omics and multi-omics
Basing on the summarization of mono-omics and multi-omics described above, there were some common KEGG pathways and GO terms identi ed in our study (Table 1 and Table 2). Besides, part of these common pathways and GO terms in our study which containing signi cant differential expressed molecules (mRNA or protein) were selected and displayed ( Fig. 5A-C). Data also unveiled many critical molecules that acting as bridges connecting different biological processes and signaling pathways, such as hsa-miR543_R+1 connects the Th1 cell differentiation pathway with Th17 cell differentiation pathway (Fig. 5B), mPI3K3 connects the cellular senescence signaling and TNF signaling (Fig. 5B).
The network was made up of 6 genes (mRNA), 3 pathways, 9 miRNAs and 62 lncRNAs. Yellow circles represent target pathways, red triangles symbolize genes and green arrow stand for miRNAs, blue circles indicate lncRNAs. The grey lines stand for connection between different pathways, genes, miRNAs and lncRNAs.
Technical validation of RNA sequencing data The technical validation of sequencing data was conducted by RT-PCR. There was a panel of 12 genes and 3 ceRNAs were selected from the top differential expressed ones. Our technical validation data (Fig.  6A) showed that some genes including IFNα, ENST00000506362, ENST00000274764 and ENST00000578377 were up-regulated while others were down-regulated in the SLE patients, such as TYK2(NM_003331.4), ENST00000493202, ENST0000052956 ENST00000263413 and ENST00000262741. The selected genes mentioned above were consistent to our sequencing data (Table  S1, S4-8).
Three of our selected panel genes including ENST00000494726, STAT1 (NM_139266.2) and ENST00000375769 contain similar expression trends in both RT-PCR and sequencing data, but they were not signi cant in RT-PCR results. Besides, the non-coding RNAs were also validated by RT-PCR, and raw data was showed in Table S9.

Discussion
In this study, hundreds of SLE unique expressed molecules and several SLE disease-related signaling pathways were identi ed. These pathways including T-cell receptor (42), Oxidation-reduction process, regulation of Apoptotic process, Complement and Coagulation Cascades, JAK-STAT etcetera (Table 1-2).
It was found the Complement and Coagulation Cascades played important role in the development of SLE disease [20]. we found that mRNA expression of some Complement and Coagulation pathwayrelated genes including C6, C1QC were signi cantly elevated in our study, which was consistent to the study mentioned above by Liang, et al. JAK-STAT might have an essential role in pathogenesis of SLE disease by regulating IFN regulatory factor-related genes [40], and the role of JAK-STAT cascades in SLE was also veri ed in our previous study on 6-methyladenine (6mA). In our current study, we found JAK-STAT signaling related genes such as PRLR, SOCS6, GHR contain unique pro le in both methylation and mRNA expression in SLE patients (Table S1). Our proteomic data also demonstrated that several core genes in JAK-STAT signaling, including STAT1 and STAT3 were elevated substantially. Has-miR-1249-5p was found to regulate the JAK-STAT signaling, and miRNAs (miR-2110, miR-7110) regulate mRNAs including PIK3R3 and PDGFB (Table S3). miRNAs and mRNAs, such as miR-432-5p and STAT1were reported by previous studies that contain unique expression in SLE disease [41,42] (Table S2). The Oxidation process plays important role in the SLE development [16], which was also supported by our data. Similarly, regulation networks like Apoptotic process, Autoimmune, Innate immune process, Cell Proliferation process and JAK-STAT signaling were discussed in SLE studies previously [10,17,40,43,44], and the function of all these signaling networks mentioned above were also veri ed by our study (Fig. 1 -5). The data of lncRNAs, miRNAs and proteins were combined and constructed a systematic regulation network of SLE disease in our study (Table S2). The SLE disease allied signaling network unveiled by our study provides a systematic insight for understanding the potential molecular mechanism underlies the SLE disease.
This is the rst study in SLE that combining 3 omics methods and 6 molecules at the same time.
Speci cally, the omics technologies and the molecules used in our study included genomics (DNA), whole transcriptomics (ceRNAs (lncRNAs, circRNAs), miRNAs, mRNAs) and proteomics (proteins). By utilizing the joint analysis method of multi-omics, bunch of potential biomarkers and regulation signaling networks were identi ed. This study helps us further understand the SLE disease and expands our view on systematic signaling regulation networks for the molecular mechanism of SLE.

The limitation of single omics
Recently, searching for new biomarkers has been considered as emerging event on SLE research [45].
Single omics studies focusing on a single molecule are limited as they neither fully reveal the complex pathogenesis nor display the complex interaction of molecules at different omics' levels in SLE disease.

The barriers of integrative analysis of multi-omics
The integrative analysis of multi-omics in recent years has provided new method for biomedical research. The complex immune responses in SLE disease are usually result from different bio-molecules interaction and cross-talk of different signaling networks. Therefore, integrating omics data of different omics platforms becomes an effective tool for SLE disease study. Nevertheless, the integrative analysis of omics is not awless and it has some limitations too.

Experimental and Technical Variation
Different generations of platforms (for example, LC-MS/MS) were used for detecting proteomic in various ranges of chemicals identity and abundance. Such technical heterogeneity results in challenging in the joint analysis of omics studies. Technical variation between our study and the previous reports may contribute to the data variation. In the future, ways for tackling those problems may include using standard control samples and appropriate statistical models (eg, mixed-effect model) to adjust for batch effect and harmonize these measurement variations.

Analytical challenges
We got some valuable data from the joint analysis of the multi-omics, for example, some SLE relevant biological markers and regulated signalling pathways were dug out in our study. The various types of omics data we got here were fundamentally different: some data are discrete and static, while others are opposite. The differential omics data sets contribute to the challenge of integrated analysis of multiomics. Firstly, the statistical framework for the joint analysis of multiple disparate data sets should be robust and reproducible. Secondly, combination of multi-dimensional methods, such as Bayesian models [46], neural networks [47] is needed for simultaneously analyzation of several data sets.

Accuracy and validation
High con dence is critical for the clinical application, thus independent technologies including established clinical tests, such as enzymatic and single-assay tests are needed for our data validation.
Besides, the amount of blood samples from both SLE patients and health controls are not enough, and more samples are needed for the veri cation of the markers and pathways unveiled by our study.

Interpretation
Different kind of omics' technologies and the joint analysis of omics produce large amount of data.
However, we face di culty when we try to do the date interpretation especially for some rare and novel molecular events, cause those events are often beyond the ones that can be functionally validated.
Therefore, more studies will be needed in the future for unveiling the function of unknowing molecules.

Conclusion
By integrative analysis of multi-omics, we unveiled some common regulation signaling pathways in SLE disease. Signaling pathways identi ed in this study included Systemic Lupus Erythematosus, Complement and Coagulation Cascades, TNF signaling, JAK-STAT, Oxidation (GTPase activity, mitochondrion), Apoptosis et al. (Fig. 6B). Besides, some critical biological molecules were also unveiled in our study, such as hsa-miR543_R+, mPI3K3, PRF1, mTTLL7, RASGRP4 and so forth (Fig. 6A) Availability of data and supporting materials section The raw data for small RNA sequencing and ribosomal RNA-depleted sequencing in our study are available from [Gene Expression Omnibus: GSE146410] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.
Other data including proteomics generated or analysed during this study are included in this submitted article [and its supplementary information les].
Authors' contributions     Technical validation of sequencing data and summarization of molecular regulation networks. (A) Twelve genes and 3 non-coding RNAs were analyzed by qRT-PCR, and their relative expression levels were normalized to housekeeping genes such as GAPDH, U6. The data are represented in the way of means ± SEM. *p < 0.05 and **p < 0 01 mean signi cant difference between SLE and healthy group, while ns means no signi cance with p >= 0.05. (B) The whole picture of molecular regulation networks in SLE. We summarize a putative molecular regulation network of SLE by utilising integrated analysis of multi-omics.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.