Differential expression of circRNA in amyotrophic lateral sclerosis and validation of identied circRNA biomarker

Amyotrophic lateral sclerosis (ALS) is a rapidly progressive adult-onset neurodegenerative disease that is often diagnosed with a delay due to initial non-specic symptoms. Therefore, reliable and easy-to-obtain biomarkers are in desperate need for earlier and more accurate diagnostics. Circular RNAs (circRNAs) have been already proposed as potential biomarkers for several neurodegenerative diseases. In this study, we further investigated the usefulness of circRNAs as potential biomarkers for ALS. We rst performed a microarray analysis of circRNAs on peripheral blood mononuclear cells of a subset of ALS patients and controls. Among the differently expressed circRNA by microarray analysis, we selected only the ones with a host gene that harbors the highest level of conservation and genetic constraints. This selection was based under the hypothesis that genes under selective pressure and genetic constraints could have a major role in determining a trait or disease. Then we performed a linear regression between ALS cases and controls using each circRNA as a predictor variable. With an FDR threshold of 0.1, only six circRNAs passed the ltering and merely one of them remained statistically signicant after Bonferroni correction: hsa_circ_0060762 and its host gene CSE1L. Finally, we observed a signicant difference in expression levels between larger sets of patients and healthy controls for both hsa_circ_0060762 and CSE1L . In addition, receiver operating characteristics curve analysis showed diagnostic potential for CSE1L and hsa_circ_0060762 . Hsa_circ_0060762 thus represent a novel potential peripheral blood circRNA biomarker for ALS.


Introduction
Amyotrophic lateral sclerosis (ALS) is a rapidly progressive adult-onset neurodegenerative disorder where both upper and lower motor neurons are affected [1]. Majority of the patients die within 3 years of rst symptoms [2]. Current diagnosis is based on clinical examination (El-Escorial criteria) [3] and neurophysiological examination (Awaji criteria) [4], while recently the ALS diagnostic index was also described [5]. Despite all this, establishing the correct diagnosis can still take one year or more [6]. Therefore, reliable and easy-to-obtain biomarkers are in desperate need for earlier and more accurate diagnoses. Our research group previously described the potential use of circular RNA (circRNA) expression levels in ALS patients as biomarkers [7]. CircRNAs represent a class of non-coding RNAs that are formed during precursor mRNA processing via back-splicing events [8]. They have several biologically diverse functions such as miRNA sponges, protein-coding RNAs, and transcriptional regulators [9]. They have been already associated with numerous diseases of the nervous system such are Parkinson's disease [10,11], Alzheimer's disease [12,13], glioblastoma [14,15], multiple sclerosis [16], and epilepsy [17].
A previous work [18] demonstrated the importance of highlighting genes and critical genomic regions subjected of strong purifying selection, showing that such regions are enriched for disease-causing variants and thus should be prioritized in a genetic study that aims to nd the causal actors for a disease.
The gene prioritization approaches will help identify the parts of the human genome increasingly likely to harbor mutations that in uence the risk of disease [19].
Here, we wanted to use an approach based on metrics of constraints for the selection of potential circRNAs biomarkers. We focused on circRNA host genes that are under selective pressure and have strong genetic constraints as these genes could have a predominant role in determining the disease.
Selection of circRNAs for the analysis was based on p-value, fold change, and function of the host gene. A statistically signi cant difference (after Bonferroni correction) in expression between patients and controls was observed for hsa_circRNA_060762 and its host gene CSE1L. This approach showed great potential for use as blood-based biomarkers for ALS.

RNA extraction
Peripheral blood mononuclear cells (PBMCs) were isolated from fresh blood. Ficoll density centrifugation (GE Healthcare, Sweden) was used to collect the cells that were afterwards stored at − 80 °C in Qiazol reagent (Qiagen, Germany). Total RNA was extracted from stored cells using miRNeasy Mini Kit (Qiagen, Germany) according to the manufacturer's instructions. The concentration and purity of total RNA were measured with NanoDrop ND-1000 (ThermoFisher, USA).

Microarray analysis
Microarray analysis of circRNA expression was performed on a subset of 20 samples -12 patients (6 females, 6 males) and 8 age-and sex-matched controls. Samples were prepared and processed as previously described [7].
qPCR Total RNA was reverse transcribed to cDNA using SuperScript VILO Master Mix (ThermoFisher, USA). Expression levels circRNAs were measured by real-time quantitative PCR (qPCR) using Sybr Select Master Mix (ThermoFisher, USA) on the Rotor Gene Q 5plex HRM platform (Qiagen, Germany) in duplicate for each sample. Used primers are shown in Table 2. Primers were synthesized by IDT (USA) or Qiagen (Germany) (QuantiTect primers). RPS17 and RPL13A were used as reference genes and the data were analyzed using the comparative cycle threshold method (2 ΔΔCt ).

Statistical analysis
Among the differently expressed circRNA by microarray analysis, we collected only the ones with a host gene that harbors any evidence of genetic constraints. This selection was based under the hypothesis that genes under selective pressure and genetic constraints could have a major role in determining a trait or disease. In particular, we collected the loss of function intolerance score (pLI) from gnomAD [20], and the DSC and SSC score for Europeans [21]. Then we created a "conserved" set of genes that follow these criteria: pLI=1 (highest probability of loss of function intolerance), DSC score < -2 and SSC score < -2 [21]. These stringent criteria aim to selected the genes with the highest level of genetic constrains and evidence of ongoing purifying selection. Linear regression analyses were performed using the expression level as a predictor and the status (cases and controls) as the response. For regression analyses, we used only the set of genes labelled as "conserved set". False discovery rate and Bonferroni correction on regression p-values were calculated using R [22].
All experimental data were analyzed using SPSS software 24.0 (SPSS, USA). Differences in expression levels between patients and healthy controls were assessed using the Mann-Whitney U test. Spearman's rank correlation was used to determine the correlations between circRNA/gene expression levels and clinical data. Diagnostic potential of circRNA and host gene was assessed with ROC curve analysis. A pvalue < 0.05 and AUC metric > 0.5 was considered to be statistically signi cant.

Results
genes. Then we performed the regression analyses using the circRNA mapped on these genes (total of 95 from an initial set of 10161). Then, after Bonferroni correction, the only circRNA that was signi cantly associated (p-value≤0.05/95) with ALS status was hsa_circRNA_060762 that is encoded in CSE1L gene (see Fig. 1). Afterwards, we also estimated the false discovery rate (FDR) for each circRNA analyzed among the "conserved set" of genes, using, in this case, a less stringent cut-off of false discovery rate of 0.1. We found that the following circRNA and genes are associated with ALS status: UPF2, XPOI, KPNB1 and MED13, in addition to CSE1L (Table 3). Despite not being statistically signi cant (after Bonferroni correction), we can consider these genes as possible candidates for future analyses of gene-gene interaction.

Expression of circRNA and host gene
The only circRNA that remained signi cant after Bonferroni correction was hsa_circRNA_060762. We determined the expression levels of hsa_circ_0060762 and its host gene CSE1L (Fig. 2). Both circRNA and its host gene were signi cantly downregulated in sixty ALS patients compared to healthy controls, expression of hsa_circ_0060762 by 2.5-fold and expression of CSE1L by 1.8-fold.

Associations between clinical variables and circRNA expression
Using the Spearman rank correlation test, we observed no statistically signi cant association between circRNA expression and clinical parameters (Table 4). There is a slight positive correlation between the expression of circRNA and its host gene.

Diagnostic potential
We performed receiver operating characteristics (ROC) curve analysis to evaluate the diagnostic potential of hsa_circ_0060762 and CSE1L. The curves for circRNA and its host gene are similar, resulting in an area under the curve (AUC metric) of approximately 0.75, together with 82.5% sensitivity and 62.5% speci city for the optimal cut-off point (Fig. 3)

Discussion
ALS is a rapidly progressing neurodegenerative disease that is often diagnosed with a delay due to initial non-speci c symptoms. Several types of biomarkers were already proposed (miRNAs, mRNAs, proteins, various metabolites) in cerebrospinal uid, leukocytes, serum, and plasma [23]. None of them is routinely used in the diagnostic at the moment, although some show great potential for further validation.
Here, we further investigated the usefulness of circRNAs as potential biomarkers for ALS. Our framework was based on a gene prioritization approach in order to select the circRNAs with host genes that showed the highest level of conservation and genetic constraints. Because, according to our hypothesis, these conserved genes should be the ones in which any kind of variation should have a more substantial effect on the disease compared to the non-conserved type of genes. Then we performed a linear regression between ALS cases and controls using each circRNA as a predictor variable. With an FDR threshold of 0.1, only six circRNAs passed the ltering and merely one of them remained statistically signi cant after Bonferroni correction: hsa_circ_0060762 and its host gene CSE1L. Finally, we observed the signi cant difference in expression levels between patients and healthy controls for both hsa_circ_0060762 and CSE1L. Hsa_circ_0060762 is encoded in CSE1L gene, which is found in the "conserved set" of ALS genes with the highest level of genetic constraints. CSE1L in humans encodes an exporting factor for importinα. In the spinal cord of mice model of ALS, an altered localization of two proteins of nucleocytoplasmic transport system, importin-α and importin-β, was detected using immunohistochemistry [27]. An abnormal transporter protein distribution was also detected in spinal cords of patients with a sporadic and familial form of ALS [28]. Furthermore, reduced levels of CSE1L were reported in the brains of patients with frontotemporal lobar degeneration (FTLD) [29], the disease which shares many clinical, pathological and genetic characteristics with ALS, including nuclear tra cking impairment [30,31].
In this study, we report for the rst time the reduced expression of has_circ_0060762 and it's host gene CSE1L in peripheral blood mononuclear cells of patients with ALS. In addition, receiver operating characteristics curve analysis showed some diagnostic potential for CSE1L and hsa_circ_0060762. Hsa_circ_0060762 thus represent a novel potential circRNA biomarker for ALS, together with three other circRNAs that we have previously reported in connection with ALS [7]. Nevertheless, all described potential circRNA biomarkers for ALS need validation and comparison with other neurodegenerative diseases before any conclusion is made on their usefulness as ALS biomarkers. Also, some limitations in this study like samples size and reduced power to detect circRNA with smaller effect have to be considered.
In conclusion, we showed that circRNAs have the potential to be effective blood-based circulating ALS disease biomarkers. However, an extensive validation based on diverse sets of healthy and diseased cases, preferably with a larger number of samples in each group, has to be performed, before we can classify them as useful biomarkers for ALS.

Declarations
Funding This work was supported by Slovenian Research Agency (ARRS) under PhD thesis grant for young researcher Ana Dolinar and under research core funding Nos. P3-0054 and P3-0338.

Con ict of interests
The authors declare no con ict of interest.
Availability of data and materials