The expression discrepancy and characteristics of long non-coding RNAs in peripheral blood leukocytes from amyotrophic lateral sclerosis patients

Amyotrophic lateral sclerosis (ALS) is known to be a progressive neurodegenerative disease that affects upper and lower motor neurons. Less than 10% of ALS patients are defined as familial ALS, and more than 90% are sporadic ALS (SALS). According to the genomic information described in existing databases, up to 98% of the human genome consists of non-coding sequences. Nearly 40% of long non-coding RNAs (lncRNAs) are specifically expressed in the brain. We believe that the discrepancy of lncRNAs expression plays a key role in neurodegenerative diseases. We screened 30 lncRNAs with altered expression from peripheral blood leukocytes of SALS patients by microarray and validated 13 of them in leukocytes of SALS, Parkinson’s disease (PD) patients, and healthy controls (HC). We followed the bioinformatics to perform a functional enrichment analysis of co-expressed mRNAs, transcription factors, and lncRNAs for functional prediction. We identified that lnc-DYRYK2-7:1, lnc-ABCA12-3:1, and lnc-POTEM-4:7 show decreased expression in SALS patients, whereas in PD patients, they show increased expression or no change. In addition, expression of lnc-CNTN4-2:1 and lnc-NR3C2-8:1 was decreased in both SALS and PD patients. We found that XIST was only reduced in male patients with SALS and PD, and not in female patients with SALS but was elevated in PD by gender grouping. We also performed GO term enrichment and KEGG pathway analysis for lncRNAs showing differential expression in microarray. We discovered that a significant proportion of differential expressed lncRNAs were associated with various signaling pathways and transcription factors which are consistent with other clinical findings.


Introduction
Amyotrophic lateral sclerosis (ALS) is comprehended as one type of progressive neurodegenerative disease affecting both upper and lower motor neurons. ALS results in muscle weakness, cognitive or behavioral change, and eventual death from respiratory insufficiency within 3-5 years after diagnosis [Citation error]. Less than 10% of ALS patients are hereditary with mainly dominant inheritance, which is defined as familial ALS [Citation error]. Hence, the rest of the proportion that is more than 90% is sporadic ALS (SALS) [1,2]. Recent studies have revealed that abnormalities in RNA processing and metabolism play an important role in the pathogenesis of ALS [3,4]. Molecular genetic studies have identified several ALS causative genes, such as the TARDBP gene (encoding the TDP-43 protein) [5], the FUS/TLS gene (encoding the FUS protein) [6], the ANG gene (encoding the Angiogenin protein) [7], and the C9ORF72 gene (encoding the C9ORF72 protein) [8], which are the encoded DNA/RNA-binding proteins involving in RNA processing and metabolic processes.
Based on genome information described by the ENCODE project, up to 98% human genome is composed of a non-coding sequence [9]. Noncoding RNA (ncRNA) is one type of RNA sequence which is lacking an open reading frame that is not able to code proteins. In actuality, with the advance of biotechnology in recent years, these ncRNA were perceived to associate with some biochemical activities including various cellular regulation and even relate to disease progress [10][11][12]. The classification of ncRNAs is usually defined by their length or function. NcRNA length longer than 200 nucleotides are regarded as long non-coding RNAs (LncR-NAs), while others are microRNAs (miRNAs), small interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs), small nuclear RNAs (snRNAs), and PIWI-interacting RNAs (piRNAs) [13][14][15]. The discovery of lncRNAs has provided new insights into the regulation of disease-associated genes. LncRNAs are thought to interact with RNA, DNA, or proteins to promote or repress the expression of protein-coding genes and their function [16].
It is noteworthy that nearly 40% of lncRNAs are specifically expressed in the central nervous system (CNS) [17]. LncRNAs are heavily distributed in the CNS, which is presumably due to the complexity of CNS function, which requires a greater number of regulatory RNAs to maintain normal brain development and function. Studies have shown that lncRNAs are closely associated with neuronal differentiation, synaptogenesis, or other functional maintenance processes [18,19]. Due to the functional impact and effect of lncRNAs on a variety of biological molecules, it is reasonable to consider their role in several difficult disease issues. Indeed and because of that, dysregulation of lncRNA expression can play many key roles in neurodegenerative diseases. The dysregulated states of several lncRNAs displayed in neurodegenerative diseases such as Alzheimer's or Parkinson's disease, as well as their associated biochemical functions and protein or mRNA regulatory functions, have been published [20][21][22]. Some of these lncRNAs are thought to be potential candidate biomarkers for predicting neurodegenerative diseases [23]. Therefore, we are not surprised that lncRNAs, which occupy a large number of RNAs, have also been reported in ALS cases in recent years. As well, because pathological mutations are associated with excess or aggregation of certain proteins, the relationship between ncRNAs and different events of protein quiescence and disease pathogenesis has been investigated, such as PD and ALS. There are lncRNAs that are differentially expressed in different neurodegenerative diseases, but the associated disease mechanisms are not the same. The expression of nuclear paraspeckle assembly transcript 1 (NEAT1) was reported to be significantly upregulated in peripheral blood cells of PD patients, and also NEAT1_2 (the long isoform of NEAT1) was upregulated in spinal anterior horn motor neurons in the early stages of ALS pathogenesis [24,25]. The stability of PINK1 can be increased by NEAT1, thus promoting cellular quality control mechanisms, while the long isoform NEAT1_2 of NEAT1 can interact with paraspeckle formation in spinal motor neurons of ALS patients [26].
In clinical work, it typically takes a longer time to diagnose ALS disease compared to other classes of neurodegenerative diseases, and yet, the condition of patients often deteriorates during this time [27]. Some biomarkers involving oxidative stress and inflammation are associated with ALS in recent years through testing of blood, plasma, or serum, but diagnosis using early biomarkers is still difficult [28]. Studies on ALS have found that abnormal expression of lncRNAs is involved in the development of ALS, while functional findings also suggest that lncR-NAs have an important role in the development of ALS disease. In a recent study, 293 lncRNAs and 87 mRNAs were found to be differentially expressed in the ALS group and control group by RNA sequencing in ALS patients carrying FUS, SOD1, and TARDBP mutations [29]. However, the involvement of lncRNAs in the pathogenesis of ALS was not further elucidated. There are also in-depth studies showing some of the physiological mechanisms of lncRNA as Neat1_2 which was mentioned before. Nevertheless, the exact mechanisms of those published LncR-NAs are not explained well yet.
It has been shown that Chinese patients with ALS have an earlier age of onset than the Caucasian population [30], and the genetic characteristics are very different from those of Caucasians. For example, C9ORF72 mutation is the most common cause of ALS in the Caucasian population [31], whereas C9ORF72 mutation is less than 1% in the Chinese SALS population [32]. Consequently, it is very important to use the resources of Chinese ALS patients to study the pathogenesis of Chinese ALS patients and carry out targeted drug development and meaningful disease diagnosis biomarkers. To manifest the potential biomarkers of sporadic ALS and provide a preliminary investigation of the role of lncRNA in the development of the ALS disease process, we screened lncRNAs from leukocytes of SALS patients and performed the validation of selected lncRNAs. In this paper, a transcriptome profiling of lncRNAs in peripheral leukocytes of SALS, PD, and healthy controls is presented. Furthermore, the bioinformatics of these "biomarker" lncR-NAs in SALS was also investigated in the present study. There were no significant differences in the distribution of sex among SALS, PD, and healthy controls. Around 30% of SALS patients had initial symptoms in bulbar and the remaining 70% were starting from the spinal cord. The study protocol to obtain PBMC from patients and controls was approved by the West China Hospital of Sichuan University (Chengdu, China). Before being enrolled, the subjects participating in the study signed an informed consent form.

Microarray
All microarray experiments and bioinformatic support were performed by Shanghai OE Biotech CO., LTD (Shanghai, China).

Leukocytes collection and RNA extraction
Approximately 5 ml of whole blood from all participants was collected through venipuncture into EDTA-containing tubes before the start of clinical treatment in the morning. After centrifugation at 2000 rpm for 10 min, the plasma was removed. Then, 1 ml of PBS was added to the remaining erythrocyte-white blood cell mixture. The mixture was carefully transferred to a 15-ml centrifuge tube with 3 ml of lymphocyte isolate (tbdscience Product no. LTS1077, Tianjin, China) and then was centrifuged at 2000 rpm for another 15 min. Finally, the intermediate leukocyte layer of the centrifuged sample is aspirated, washed twice with PBS, and stored in Trizol reagent at − 80 °C. The total RNA from PBMCs was isolated following the trizol RNA isolation technique. Homogenize leukocytes were ground in 1 ml trizol reagent and the final total RNA concentration of the sample was measured using NanoDrop One (Thermo Scientific).

Validation analysis by reverse transcription-quantitative real-time PCR
The reverse transcription reaction was performed by the PrimerScript™ RT reagent kit with gDNA Eraser (#RR047A, Takara). qPCR was performed for selected lncRNAs by using PowerUp™ SYBR™ Green Master Mix (REF A25742, Applied Biosystems by Thermo Fisher Scientific) with 2 μl cDNA. The primers used in this study were synthesized by TSINGKE (Biological Technology) and the sequences are listed in the supplemental table 2. PCR reactions were accomplished in QuantStudio 3 (Applied Biosystems by Thermo Fisher Scientific). Reactions were incubated in a 96-well optical plate following the manufacturer's instructions. Each sample was analyzed simultaneously in triplicate. Melting curve analysis was performed to validate the specific generation of the expected PCR product at the end of the PCR cycles. The expression changes of the lncR-NAs were normalized to GAPDH, and the difference was calculated by the 2 −ΔΔCt method [33]. The mean data of ΔCt for control groups was used as a calibrator (ΔΔCt = 0, 2 −ΔΔCt = 1).

Statistical analysis
We used the interquartile range to find outliers. The differences between the 25th (quartile 1) and 75th percentiles (quartile 3) were used to identify extreme values (outliers) in the tails of the distribution. Statistical evaluation was performed by the Mann-Whitney U test for means between patient and control groups and calculated by online statistical calculator (https:// www. socsc istat istics. com/).

Prediction of lncRNA function
For the functional prediction of the lncRNAs, we adopted the methods mentioned in reference, first calculating the co-expressed mRNAs for each differentiated lncRNA, and then performing a functional enrichment analysis on this set of co-expressed mRNAs [34]. Terms were used as the predicted functional term of given lncRNAs. Co-expressed mRNAs of lncRNAs with correlated P-values < 0.05 were identified by calculating Pearson correlation. We then used the hypergeometric cumulative distribution function to calculate the enrichment of the functional term in the annotations of co-expressed mRNAs. The false discovery rate was calculated following the reference [35].
After the intersection of the set of coding genes coexpressed by LncRNAs and the set of target genes of the transcription factor/chromatin regulatory complex is calculated, the hypergeometric distribution is used to calculate the enrichment degree of this intersection to obtain the transcription factors that are significantly associated with lncRNAs, thus identifying the transcription factors or chromatin regulatory factors that may play a regulatory role in combination with lncRNAs. By calculating the hypergeometric distribution, multiple lncRNA-TF relationship pairs can be obtained for each lncRNA. The Cytoscape software (http:// www. cytos cape. org/) was used for network import and visualization.

Microarray identification of dysregulated LncRNAs between ALS patients and healthy controls
The microarray containing the 92,727 probes for human coding and non-coding RNAs was used for screening dysregulation of RNAs for 5 SALS patients and 5 healthy control samples. The top 30 lncRNAs that differed the most from healthy controls were selected and expression discrepancy was presented as a heatmap (Fig. 1A) by using Heatmapper software, while the fold change was shown (Fig. 1B) [36]. We used the microarray results to compare several different lncRNA databases, including LNCipedia, NONCODE, and Annolnc [37][38][39]. Next, we named these transcripts with IDs corresponding to those in these databases. As shown in the heatmap, a total of 16 lncRNAs are downregulating, and the remaining 14 lncRNAs are upregulating in peripheral leukocytes of SALS comparing with controls.

Verification and analysis of lncRNAs regulation by quantitative real-time PCR
To confirm microarray results, we performed reverse transcription-quantitative real-time PCR (RT-qPCR) for several selected lncRNAs. We excluded the lncRNAs whose sequence is highly overlapping with mRNAs, and the remaining lncRNAs whose sequence is not overlapping or partially overlapping with mRNAs were validated by RT-qPCR. The cDNAs were accomplished from peripheral blood leukocytes of 52 SALS patients, 40 PD patients, and 38 healthy controls. A total of 13 lncRNAs were verified in this study (Fig. 2 and Supplemental Fig. 1 ).
Among these lncRNAs, 6 lncRNA transcripts exhibited expression discrepancy in the SALS case comparing with healthy controls. All of these transcripts showed downregulation results in SALS cases: lnc-CNTN4-2:1, lnc-NR3C2-8:1, lnc-ABCA12-3:1, lnc-DYRK2-7:1, lnc-POTEM-4:7, and XIST (in male cases). Since XIST is usually located on the X chromosome, the changes in XIST expression were compared by gender [40]. The results of lnc-DYRK2-7:1, lnc-ABCA12-3:1, lnc-POTEM-4:7, and XIST are opposite to those of microarray in SALS, and the remaining lncRNA results are in agreement with microarray. By comparing the samples in the microarray, we found that two of the SALS clinical samples were identical to those used for qPCR: SALS2 and SALS4 numbered in the microarray. The expression of lnc-DYRK2-7:1, lnc-ABCA12-3:1, and lnc-POTEM-4:7 are both upregulated in qPCR results for sample SALS2, which are consistent with those of the microarray. However, the expression of XIST is downregulated in qPCR results for SALS2. Also, the qPCR results for these lncRNAs of SALS4 were exactly opposite to those of SALS, which are inconsistent with the microarray results. We speculate that some qPCR-validated regulation results are inconsistent with the microarray due to individual patient differences affecting the differential expression of lncRNAs in peripheral blood leukocytes.
To confirm whether the 6 lncRNAs selected by qPCR can only be used as biomarkers for SALS or also for other neurodegenerative diseases, we simultaneously used peripheral blood leukocyte samples from 40 patients with Parkinson's disease as one of the controls for neurodegenerative diseases. In these PD samples, we found that some of the above lncRNAs had the same trend of differential expression. Lnc-CNTN4-2:1, lnc-NR3C2-8:1, and XIST (male cases) showed downregulated in PD. However, lnc-CNTN4-2:1 and XIST (male cases) were more significantly reduced in PD relative to ALS and lnc-NR3C2-8:1 still showed reduced expression in PD but not more so than in ALS. A trend of elevated expression of lnc-DYRK2-7:1 and lnc-ABCA12-3:1 was observed only in PD, which is the opposite of SALS. The lnc-POTEM-4:7, whose expression was reduced in SALS, were not differentially expressed in PD. XIST in female cases was only upregulated in PD but not in SALS.

Pathway analysis of lncRNAs
The GO (Gene Oncology) term enrichment and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis of SALS patients compared to healthy controls have been performed separately for lncRNAs with the differential expression as shown in the microarray test, in which TOP 10 enriched items were expressed (Fig. 3). The enriched GO for cellular components include the messenger ribonucleoprotein complex and nucleus (Fig. 3A). In terms of molecular function, the highest enrichment of GO terms targeted by differentially expressed lncRNAs was with respect to the activity of various transcription factors (Fig. 3B). Also, the highest enrichment of GO terms for the biological process targeted by differentially expressed lncRNAs included cellular ketone body metabolic process and mitotic chromosome condensation (Fig. 3C). Meanwhile, differentially expressed lncRNAs enriched in the KEGG pathway also included synthesis and degradation of ketone bodies, insulin signaling pathway, and TNF signaling pathway, in addition to various other protein functional expression pathway (Fig. 3D).
The intersection of the set of coding genes co-expressed by lncRNAs with the set of target genes of transcription factors is calculated, and the enrichment of this intersection is calculated using the hypergeometric distribution to obtain transcription factors that are significantly associated with lncRNAs, thus identifying transcription factors that may play a regulatory role together with lncRNAs. The analysis yields multiple lncRNA, coding genes, and transcription factor relationship pairs for each lncRNA. For the correspondence between lncRNAs and transcription factors that showed differential expression in the microarray results, the transcription factors with higher frequencies were selected to draw visual network diagrams with the associated target genes (Fig. 4). In the ternary network diagram, it was confirmed that lnc-ABCA12-3 and lnc-CNTN4-2, which were differentially expressed in the peripheral blood leukocytes of SALS patients, were directly related to several target genes and indirectly related to two transcription factors, ZNF263 and SIX5 (Fig. 4A). Additionally, there are also coding gene that are indicated in the predicted results that may be associated with lncRNAs that also be highly interested in disease progression (e.g., SNCA, encoding alpha-synuclein; FBXW11, encoding member of F-box family). In contrast, in the binary network diagram, several lncRNAs were found to be associated with other transcription factors (Fig. 4B). These also included lnc-NR3C2-8 and lnc-DYRK2-7, which were differentially expressed in peripheral blood leukocytes of SALS patients and were associated with the transcription factors ZBTB7A and BRCA1, respectively.

Discussion
Previous studies have found that the delay between the onset of ALS symptoms and diagnosis ranged up to 15.6 months [27]. Also, the patients with sporadic forms of the disease took relatively longer to diagnose than other patients [41]. Delays in diagnosis can affect the outcome as well as the survival of ALS patients; therefore, the discovery of an accurate and sensitive biomarker is critical to current ALS research. Since blood sampling has the advantages of easy handling and low-cost multiplex assays, biomarkers in the blood are our primary consideration for research. Besides, on the basis that 40% of lncRNAs are expressed in the human brain, it is a more promising potential biomarker for neurodegenerative diseases.
LncRNAs have been validated in recent studies of various disease pathways to be relevant to many biomolecular functions [12,16]. In this study, we have profiled lncRNAs in SALS patients as well as in healthy controls by microarray to expand our knowledge of molecular alterations in the transcriptome and obtain new data on their dysregulation. In the top 30 dysregulated lncRNAs, we validated that 6 of them have differential expression in peripheral leukocytes of SALS comparing with healthy controls. We observed that these 6 lncRNA transcripts have legible downregulation (lnc-CNTN4-2:1, lnc-NR3C2-8:1, lnc-ABCA12-3:1, lnc-DYRK2-7:1, lnc-POTEM-4:7, and XIST in male cases) in SALS cases. A total of 3 lncRNAs (lnc-CNTN4-2:1, lnc-NR3C2-8:1, and XIST in male cases) showed the same trend of regulation but the differences are not expressed to the same extent in SALS and PD according to the P-values. Therefore, we believe that lnc-ABCA12-3:1, lnc-DYRK2-7:1, and lnc-POTEM-4:7 can be considered as SALS-specific peripheral blood biomarkers, while the other lncRNAs may serve as peripheral blood biomarkers for neurodegenerative diseases. As described in the study introduction about differentially expressed lncRNAs in PD and ALS, even if the same differentially expressed lncRNAs are present in different diseases, the pathophysiological mechanisms associated are different. In accordance with several studies in recent years, it was found that lncRNAs mainly alter the formation of protein complexes, affect neurotoxicity, and reduce hexanucleotide repeats, among others [42,43]. In addition, lncRNAs are involved in various biological functions in PD, such as autophagy, apoptosis, oxidative stress, neuroinflammation, and protein ubiquitination [44]. Therefore, we believe that following up the disease-related mechanisms of lncRNAs with the same or different differential expression trends in ALS and PD in the present findings will help us to understand the same or different roles played by lncRNAs in the development of different neurodegenerative diseases. It is also worth mentioning that other studies have shown that the use of peripheral blood for ALS-associated biomarker screening is a good foundation to pave the way for ALS-associated RNA function studies [29]. It is still unfortunate that for most of the remaining lncRNA sequences that overlap highly or completely with the mRNA sequences, we are currently unable to have a more suitable method to validate their expression in clinical samples.
Substantial evidence reveals that RNA as a key regulator associated with other ALS relative proteins plays an important role in ALS relevant RNA metabolism as well as in the transcription of other proteins [3,4,45,46]. Among these disease-related RNAs, two main RNA families can be distinguished, coding RNAs and non-coding RNAs. They are both associated with RNA metabolism and can produce cellular defects that can be the cause of ALS. Alteration of numerous lncRNAs has been described in different types of motor neuron diseases; however, only very few of them have been subjected to in-depth studies of functions related to disease mechanisms [47]. The paraspeckle, a distinct cellular feature of ALS, was revealed that its formation connected with lncRNA NEAT1_2 which was reported association with ALS [25,48]. This lncRNA has been shown directly bound to TDP-43 and FUS, which are the proteins enriched in paraspeckles [25]. This is one of the most in-depth studies targeting the mechanism of lncRNA in ALS, while our data will provide more evidence and ideas for studies on lncRNA as a biomarker in ALS and related mechanisms.
What has attracted the attention of researchers in recent years is the variation and significance of its interaction with miRNAs or mRNAs and the networks they associate with various diseases [49][50][51]. Through bioinformatics analysis, we found that a significant proportion of lncRNAs with differential expression in microarray and peripheral blood leukocyte screens were associated with various signaling pathways and transcription factors. In the GO component analysis, we obtained results similar to those in the literature, with differentially expressed lncRNAs being most associated with the messenger ribonucleoprotein complex in cellular components which is closely related to mRNA. In GO molecular function and biological process analysis, the items most associated with differentially expressed lncRNAs: 3-oxoacid CoA-transferase and cellular ketone body metabolic process, are both strongly associated with fatty acid metabolism, and their deficiency is expressed in the development of multiple diseases [52][53][54][55]. Also, some findings suggest that unsaturated fatty acid metabolism is significantly dysregulated in the brains of patients with neurodegenerative diseases [56,57]. Furthermore, among the predicted results of lncRNA-associated coding gene and transcription factor, SNCA (encoding alpha-synuclein), FBXW11 (encoding F-box family member), and BRCA1 (DNA repair-associated) may be associated with neurodegenerative diseases [58][59][60]. Therefore, we believe that the results of this bioinformatic analysis of lncRNAs are consistent with other clinical findings and can be used in the future as an idea to study the mechanism and treatment of lncRNAs related to ALS.
LncRNAs can regulate their molecular functions as well as those of other molecules in the cytoplasm or nucleus through a variety of mechanisms [16]. Thus, for differential expressed lncRNAs, it is also necessary to perform intracellular localization experiments in multiple cell lines. The intracellular localization of lncRNAs is highly relating to their association with other molecules which is similar to mRNA [61]. XIST, a lncRNA named X-inactive-specific transcript, has been intensively expounded in multiple studies in recent years. The intracellular localization of XIST has been confirmed in the nucleus [62]. Correlation of XIST with functions and mechanisms studied in the nucleus has also been confirmed: association with silencing transcription, inducing chromatin formation and diacylation [63]. Also, inactivation of one of the two X chromosomes in female mammals can compensate for differences in the dosage of X-linked genes between the sexes, and this is denoted as X chromosome inactivation disorder. XIST dysregulation of atypical B cells in patients with diseases that show a clear female preference, such as autoimmune diseases or ssRNA viral infections, drives a different immune response in males than in females [64]. However, in this trial, XIST was downregulated in the peripheral blood leukocytes of male SALS and PD patients; thus, more studies on the mechanisms and functions of neurodegenerative diseases associated with XIST in different gender need to be demonstrated as soon as possible [65].
The other lncRNAs reported in this study serve as novel biomarkers that need to be predicted and validated for their ALS-related mechanisms and functions based on their respective intracellular localization. Most lncRNAs were validated for their association with mRNAs or miRNAs after they were verified for anomalous change in expression in the presence of disease. Numerous results illustrate that lncRNA-mRNA-miRNA pathway studies can indeed provide conjecture and basis for our understanding of the functional mechanisms of lncRNAs [49,50,[66][67][68]. However, only a few ceRNA interactions have been studied concerning neurodegenerative diseases. Nevertheless, we are still interested in lncRNA-miRNA interactions in neurodegenerative diseases and will continue to better understand this mechanism in subsequent studies. Besides, our bioinformatics data indicates the possible relation between dysregulated lncR-NAs and their target genes or relative transcription factors.

Conclusion
In summary, our study elucidates the expression discrepancy and intracellular characteristics of lncRNAs in peripheral blood leukocytes from SALS patients, which may provide new SALS biomarkers for clinical diagnosis. The bioinformatic analysis of lncRNAs with expression discrepancy also provides information for subsequent functional studies on these lncRNAs.