The Long Noncoding RNA Differential Expression in Peripheral Blood Leukocyte from Schizophrenia Patients by RNA Sequencing

Background: The diagnosis of schizophrenia (SCZ) depends on the evaluation of clinical symptoms, and there is no objective biomarker. Surveys have found that long non-coding RNA (lncRNA) may be affected in the pathogenesis of SCZ. There are also different genes in the expression of peripheral blood (PBL) in SCZ patients. Methods: We proled transcriptome analysis of PBL in 50 patients with schizophrenia and 50 controls without psychiatric diagnoses, reconstructed PBL transcriptome information using RNA-seq, predicted lncRNA-mRNA interaction via “RNAplex”, a hierarchical classication-Spielman correlation coecient approach was used to analyze the correlation between lncRNA and protein-coding gene expression among samples, and used systematic bioinformatics methods (Go/Pathway) to perform lncRNA functional annotation and qPCR experimental verication. Predicting functional sites for sequences using the database PROSITE, NCBI, UCSC, JASPAR. Results: We screened 94 lncRNA and 1179 mRNA differential expressions in PBL, of which 46 new lncRNAs were identied for the rst time. Enrichment into lncRNA involves biological processes and signaling pathways related to the neutrophil activation involved in immune response. According to Spearman correlation coecient analysis, 81 lncRNA and 410 mRNA have expression correlation (p<0.01 and |r| ≥ 0.4), QPCR in independent samples veried that the core node of the lncRNA-mRNA co-expression network IL1RAP-TCONS_00138311 variable shear is indeed highly expressed in SCZ patients, 2^ - △△ Ct is 0.56, the area under the ROC curve is 0.924. The top four ranked transcription factors were predicted to be HSF1, HSF2, HSF4, and FOXA1. Conclusions: Combined with sequence function analysis, it showed that the transcription factors FOXA1, HSF1, HSF2, HSF4, etc. may mediate the activation of IL1B-induced NF-kβ pathway and other inammatory pathways through the regulation of IL1RAP alternative splicing transcripts TCONS_00138311, thereby participating in the pathogenesis of SCZ. and test group (T) by the 2^- △△ Ct value. The lncRNA expression rates between the CK and T groups were compared by the independent samples t-test. ROC curve statistical test for whether lncRNA can be used as a possible diagnostic biomarker for SCZ. SPSS 22.0 software was used to analyze and process the data, signicant differences are indicated by p<0.05.

HSF1, HSF2, HSF4, etc. may mediate the activation of IL1B-induced NF-kβ pathway and other in ammatory pathways through the regulation of IL1RAP alternative splicing transcripts TCONS_00138311, thereby participating in the pathogenesis of SCZ. We propose that the frequency of Differential lncRNA in peripheral blood could be used as novel biomarker for distinguishing SCZ from health. Background Schizophrenia (SCZ) is a serious mental disorder of uncertain etiology with a high incidence of morbidity, recurrence, and disability, with a lifetime prevalence of approximately 1% in the adult population [1], and a high prevalence of residual symptoms, with approximately 10-15% of patients requiring lifelong care [2]. SCZ is still diagnosed empirically by clinicians and there are no objective laboratory indicators.
Elucidating the etiology and pathological mechanism of SCZ, looking for clinical diagnostic biomarkers and targeted drug therapy has valuable application and signi cance.
Advanced in Genome-Wide Association Studies (GWAS) indicate that the occurrence of SCZ is affected by a combination of genetic and environmental factors [3]. Recent years' studies have shown that long non-coding RNA (lncRNA) may be involved in the pathogenesis of SCZ [4]. LncRNA conceived to play important roles in many life activities such as dose compensation effect, epigenetic regulation, cell cycle regulation and cell differentiation regulation, participate in many important regulatory processes such as transcription activation and transcription interference [5]. Pro-in ammatory cytokines such as IL-6, TNF-α, and IFN-γ have been reported in the literature to be signi cantly elevated in SCZ, whereas lncRNAs such as TMEVPG1 enhance the expression of IFN-γ and NRON regulates the transcription factor NFAT, thereby regulates IL-6 and IFN-γ expression; conversely, these in ammatory cytokines can stimulate and alter lncRNA expression. LncRNA is mostly located in the nucleus and is highly expressed in the mammalian brain and genome [6], Because brain tissue and PBL gene expression have a common regulatory pathway [7], multiple cytokines secreted in brain tissue are also present in peripheral blood leukocytes, therefore, many lncRNAs that are highly expressed in brain tissue are also clearly expressed in PBL, detection of lncRNA expression levels has clear feasibility and reliability.

Research object
Blood samples used for sequencing were collected from the psychiatric inpatient or outpatient department of the Sixth A liated Hospital of Kunming Medical University, the Second People's Hospital of Honghe Prefecture and the Second People's Hospital of Yuxi City, test groups conform to the diagnosis of schizophrenia in the fourth edition of the American Diagnostic and Statistical Manual of Mental Disorders (DSM-IV), furthermore, in line with the initial onset or did not take antipsychotic drugs 5 weeks before enrollment, Han people aged 15 to 60 years old, control group was recruited at the same time.
Extracting total RNA by TRLZOL Reagent (Ambion ® ), useing VAHTS Total RNA-seq (H/M/R) Library Prep Kit for Illumina ® (NR603) kit for total RNA puri cation and library construction, the constructed sequencing library was used by Illumina HiSeq TM 4000(China, Illumina) to RNA sequencing (RNA-Seq).
Illumina HiSeq TM4000 is currently the only expression pro le chip that can achieve 30 times repeat of the probe, and the chip result and qPCR correlation coe cient R2=0.97 [8]. RNA-Seq compared with the other sequencing techniques, has a broader dynamic range, provides a better estimate of relative expression levels of any genomic region with higher technical reproducibility, and facilitates alternative splicing detection [9]. Along with these advantages, RNA-seq has been used to reconstruct the entire organism transcriptome [10]. Conducting novel lncRNA predictions for the selected new transcripts, predicting the coding capacity of new transcripts using CPC, CNCI and SwissProt, and the intersection of these transcripts without coding potential and protein annotation information was taken as a reliable prediction.
Gene Co-Expression Network Construction Differential analysis of lncRNA expression between groups using edgeR [11], selecting differential transcripts using FDR and log2FC, the lter conditions are FDR<0.05 and |log2FC|>1. On ccount of lncRNA expression is low compared to mRNA levels, a hierarchical classi cation-Spielman correlation coe cient approach was used to analyze the correlation between lncRNA and protein-coding gene expression among samples. Using cytoscape software to build co-expression network.

LncRNA action modes
The action modes of lncRNAs involved in regulation is divided into 3 categories: antisense, cis and trans [12], we respectively predict target genes by the modes of action in which each lncRNA is likely to be involved in the following manner. Antisense action target gene prediction using the RNAplex to nd short interactions between two long-stranded RNAs, to nd the complementary binding between antisense lncRNAs and mRNAs cis action target gene prediction annotating lncRNAs within 10kb upstream or downstream of a gene, to analysis of their potential to intersect with regions where cis-acting elements are located trans action target gene prediction using Pearson's correlation coe cient method to analyze the correlation of lncRNA and protein-coding gene expression between samples, protein-coding genes with absolute correlation values greater than 0.999 were taken.

Quantitative Polymerase Chain Reaction (qPCR)Validate of LncRNA
In this study, the selected core lncRNAs were subjected to qPCR for veri cation, the ltering conditions are FDR<0.05, |log2FC|>1, FPKM>0.5 and GO/ Pathway enrichment analysis of functional lncRNAs. β-Actin was used as an internal reference gene, to compare controls group (CK) and test group (T) by the 2^-△△Ct value. The lncRNA expression rates between the CK and T groups were compared by the independent samples t-test. ROC curve statistical test for whether lncRNA can be used as a possible diagnostic biomarker for SCZ. SPSS 22.0 software was used to analyze and process the data, signi cant differences are indicated by p<0.05.

Functional Analysis of qPCR-validated LncRNA Sequences
Acquisition of QPCR-validated transcript sequences, predicting functional sites for sequences using the PROSITE database [16], If there is no transcription, translation function or is not itself a transcription factor, and then use NCBI blast, predicting whether lncRNAs are transcripts of target genes and their homologues [17], conduct a conservative analysis via UCSC database [18], nding transcription factors that bind to lncRNAs. Finally, the JASPAR database was used to predict transcription factors and binding sites [19], selecting transcription factors with the same direction of transcription.

Clinical Characteristics of Sequencing Samples
Statistical differences were analyzed between the sex and age of the 50 CK and 50 T groups collected.
Analysis using the independent samples t-test yields: the mean age (mean ± standard deviation) of the T and CK groups was 35.56 ± 10.04 and 32.52 ± 8.04, respectively, and no signi cant difference (p>0.05), there was no signi cant difference in gender composition (p>0.05).

RNA-seq Quality Assessment
Total RNA of the sequencing samples was analyzed for quality by an Agilent 2100 Bioanalyzer. The OD260/OD280 values are all between 1.8-2.0 (Fig. 1a), the two-sample independent repeated correlation is shown in Fig (Fig. 1b), suggesting that the quality of the total RNA isolated and puri ed in this experiment meets the requirements of sequencing. The euclidean distance algorithm is used to calculate the expression distance of each sample gene, and the Pearson algorithm is used to calculate the distance between the samples, then established cluster map according to the distance (Fig. 1c). 5106 new lncRNA transcripts obtained by CPC, CNCI and SwissProt prediction (Fig. 1d). Principal component analysis of sequencing samples showed that PC1 principal component was 79.4%, PC2 was 9.3%, and PC1+PC2 could distinguish 88.7% of the overall variance (Fig. 1e).
According to Spearman correlation coe cient analysis, 81 lncRNA and 410 mRNA have expression correlation (p<0.01 and |r|≥0.4), 1152 mRNAs showed positive correlation with lncRNA, 300 mRNAs showed negative correlation with lncRNA (Fig. 3), the obtained data used cytoscape software to construct the lncRNA-mRNA co-expression network, and independent individuals were excluded. Exhibiting some core lncRNAs in the co-expression network, such as PDCD1, MEG3, LINC00599, PSMA3-AS1, CLSTN2-AS1, AC008892.1, XLOC078139, etc, targeting multiple mRNAs the core node IL1RAP was also found to be linked to several genes, suggesting that it could be a candidate gene for late functional validation.

LncRNA Action Modes
Target gene prediction of the mode of action in which lncRNAs might be involved, yielding one lncRNA with an antisense interaction with mRNA, 22 lncRNAs present cis action, the trans-acting lncRNA is the most, up to 94 (Fig. 4). Among them, XLOC_079942, XLOC_081142, LINC00958, LINC00599, LINC01052, CLSTN2-AS1, AL034397.3, LINC01287 are more trans-associated with mRNA, as the core node, ILIRAP is associated with multiple lncRNA. The results show that lncRNAs mainly affect mRNA expression in a coexpression manner and thus participate in gene expression regulation and transcription.

Results of GO/Pathway Enrichment Analysis
GO/Pathway enrichment analysis of 410 target mRNAs, ltered at P<0.05. Enriched to 271 Biological Process (BP) terms, among them, the rst four items that are signi cantly related to neutrophil activation, neutrophil degranulation, neutrophil activation involved in immune response, neutrophil mediated immunity, and leukocyte migration are all related to immune response (Fig. 5a,b); 23 Cellular Component (CC) terms, including speci c granule lumen, secret granule lumen, cytoplasmic vesicle lumen, etc (Fig.   5c,d); 16 Molecular Function (MF) terms, including protein binding transcription factor binding transcription regulatory region DNA binding etc. (Fig. 5e,f) The rst four of these BPs share the genes CAPN1, PLAU, OLR1, CTSD, ABCA13, LCN2, MS4A3, and TCN1 are all involved in the pathogenesis of neuropsychiatric disorders.
The KEGG enrichment results also prove that the main enrichment pathways for targeted mRNAs are concentrated in the immune response, such as C-type lectin receptor signaling pathway, NOD-like receptor signaling pathway, TNF signaling pathway, NF-akppa β signaling pathway and so on (Fig. 6a-d).

QPCR Validation Results
Mean Ct values of β-Actin were detected in peripheral blood samples from CK and T groups using β-Actin as an experimental internal reference, at 15.44-15.24, the CV% was 3.38-3.05. We performed qPCR experiments on the lncRNAs selected by the prenatal bioinformatics analysis to meet the conditions of FDR<0.05 and |log2FC|>1, and the differential expression of FPKM>0.5 between groups. The experimental results were as follows: IL1RAP-TCONS_00138311 was down-regulated in the T group relative to the CK group, which was 0.56 times (P<0.05) (Fig. 7a, Table 1); CCR3-TCONS_00134168 was up-regulated in the T group relative to the CK group, and was 4.68 times that in the CK group (P<0.05) (Fig. 7b, Table 1); Thus the sequencing results are believed to be accurate and credible, and the sequencing results can be further analyzed and functionally veri ed.
Taking IL1RAP-TCONS_00138311 and CCR3-TCONS_00134168 as test variables, and CK group and T group group as state variables, the ROC curve was constructed. The results showed that the expression of IL1RAP-TCONS_00138311 and CCR3-TCONS_00134168 in peripheral blood leukocytes of SCZ patients had certain effects on the predicted value (P<0.05). Using IL1RAP-TCONS_00138311 as a diagnostic indicator, AUC=0.924, 95% of the con dence interval is 0.873-0.974 and the Jorden index reaches its maximum value 0.74 when FPKM = 15.48 in PBL, at which point the sensitivity was 80.0% and the speci city was 94.0% (Fig. 7c, Table 1). Principal component analysis using IL1RAP-TCONS_00138311 sequencing expression value shows that PCA1 plus PCA2 components can explain 24.3% of the overall variance (Fig. 7d).

IL1RAP-TCONS_00138311 Sequence Functional Analysis
The selected transcripts were predicted by CPC, CNCI, and SwissProt databases, and none had coding potential or protein annotation information. The acquires IL1RAP-TCONS_00138311 transcript sequence uses the NCBI blast function, enter Query Length 1311, bits to 1023 length, and matches to Sequence ID: NG_029105.2 (43715-44737, Score=1890, Expect=0.0, Identities=100%, Gaps= 0%), identi ed as a new IL1RAP transcript. Enter the upstream 2000bp and downstream 100bp sequences as the gene promoter region, and conduct a conservative analysis through the UCSC database to nd transcription factors that can bind to lncRNA (Fig. 8a). Finally, using the JASPAR database, the score is set to 500, relative pro le score threshold set 90%, select the transcription factor in the same transcription direction as IL1RAP-TCONS_00138311, the top four ranked transcription factors were predicted to be HSF1, HSF2, HSF4, and FOXA1, and the binding sites are as follows (Fig. 8b).

Page 8/13
The results show that lncRNAs mainly affect mRNA expression in a co-expression manner and thus participate in gene expression regulation and transcription. IL1RAP as a core node was also found to be linked to several genes. The variable clipping TCONS_00138311 of IL1RAP was Differential expression veri cated by qPCR experimental, and a higher ROC area 0.924 can speci cally distinguish between schizophrenic patients and healthy people.
GO/Pathway analysis enriched to 271 Biological Process (BP) terms, among them, the rst four of these BPs share the genes CAPN1, PLAU, OLR1, CTSD, ABCA13, LCN2, MS4A3, and TCN1 are all involved in the pathogenesis of neuropsychiatric disorders. OLR1, CTSD, and MS4A3 are involved in the occurrence and development of neurodegenerative diseases. Interaction between OLR1 and LDL component L5 inhibits NGF-induced activation of the PI3k/Akt cascade and suppresses cell viability and neuronal differentiation induced by nerve growth factor (NGF) [20]. CTSD is expressed at high levels in almost all tissues of the brain and is essential for maintaining a steady state of lysosome-dependent proteins in the brain [21], and defective or reduced activity of CTSD expression can lead to pathological protein aggregation in Parkinson disease (PD), and Alzheimer disease (AD) [22]. The MS4A3 gene encodes a member of the transmembrane 4A gene family that exhibits a unique expression pattern in hematopoietic cells and nonlymphoid tissues, which may play a role in signal transduction and may function as a subunit associated with receptor complexes, and polymeric SNP association analysis of the gene showed a correlation between the gene and AD genes and cognitive decline [23]. CAPN1, PLAU, and TCN1 are involved in neuroprotection. CAPN1 calpain is a calcium-dependent protease, which is related to synaptic plasticity and neuroprotection in the mammalian brain [24] and regulates Alzheimer's disease-related protein signaling pathways [25]. PLAU is a urokinase-type plasminogen activator ligand that promotes neuroprotection with the receptor PLAUR or its plasma membrane-binding partner platelet-derived growth factor receptor beta (PDGFRβ) [26]. The TCN1 gene encodes a member of the vitamin b12-binding protein family, which is a major component of secondary granules in neutrophils, and vitamin B12 de ciency disrupts neurodevelopment during pregnancy and throughout childhood [27], adequate vitamin B12 is also necessary for adult neurocognitive function [28], elevated TCN1 expression is signi cantly associated with poorer memory [29]. ABCA13 and LCN2 were signi cantly associated with schizophrenia. The ABCA13 gene encodes a transmembrane transport protein, a C-terminal motif involved in lipid transport in lipid metabolism [30], and dysregulation of lipid metabolism is strongly associated with psychiatric disorders, with ABCA13 increasing susceptibility to schizophrenia, bipolar disorder, and major depression [31]. LCN2 plays an important role in mediating in ammatory activity by binding to chemotactic peptides, leukotrienes and platelet-activating factors [32], signi cantly increased in schizophrenia [33]. Others, such as TNIP1 and IL1B regulate NF-kβ activation and play a role in autoimmunity and tissue homeostasis; MGST1, CAMP, TNFRSF14, and FOS are involved in cell chemotaxis, immune mediator induction, and in ammatory response regulation.
The KEGG enrichment results also prove that the main enrichment pathways for targeted mRNAs are concentrated in the immune response, such as C-type lectin receptor signaling pathway, NOD-like receptor signaling pathway, TNF signaling pathway, NF-akppa β signaling pathway and so on (Fig. 6a-d). The nuclear factor-kβ (NF-kβ) complex provides direct transcriptional control of interferon-induced transmembrane protein (IFITM) and cytokines [34,35,36] that are overexpressed in schizophrenia role [37]. In the prefrontal cortex (PFC) of schizophrenic subjects, the mRNA levels of two NF-kβ family members NF-kβ1 and NF-kβ2 increased, while the NF-kβ site-binding protein Schnurri, which inhibits the function of NF-kβ-2 mRNA levels are low [38]. These ndings suggest that higher NF-kβ transcriptional activity may play an important role in elevating IFITM and cytokine mRNA levels in schizophrenic PFC.

Conclusions
In this RNA-seq, we used more samples for whole transcriptome sequencing and screened the differentially expressed lncRNAs, cicrRrna, microRNAs and smallRNAs in the peripheral blood of SCZ patients. Among them, the variable clipping TCONS_00138311 of IL1RAP caught our attention. IL1RAP is an essential protein for the IL1 receptor [39], regulation of IL1B-induced activation of the NF-kβ pathway and other in ammatory pathways, its close relationship with SCZ. IL1B in ammatory pathway plays an important role in the development of SCZ [40], there are currently four main types of IL1RAP transcripts and protein isomers, among them mIL1RAP and sIL1RAP are the main types, they are different in molecular structure, tissue distribution and biological function [41,42], mIL1RAP has intact extracellular, transmembrane and intracellular segments, causing IL1B downstream signaling and activating transcription and translation of IL1B effector genes [43] while sIL1RAP only the extracellular segment can bind to IL1B, as the absence of an intracellular segment does not activate the IL1B pathway. Hence, we intend to investigate the regulation of IL1RAP variable splicing by the transcription factors FOXA1, HSF1, HSF2, HSF4, etc., in order to clarify whether IL1RAP aberrant variable splicing is involved in the pathogenesis of SCZ, or whether IL1RAP-TCONS_00138311 new-transcripts induce the activation of NFkβ pathway and other in ammatory pathways through IL1B, and thus participate in the development of SCZ (Fig. 8c).