Identication of Novel Gene Mutations in Degenerative Lumbar Spinal Stenosis Using Whole Exome Sequencing in Chinese Population

Degenerative lumbar spinal stenosis (DLSS) is a common lumbar disease requiring surgery. Previous studies indicate that genetic mutation may play a part in DLSS. However, there is scarcely specic study regarding the genetic study for DLSS. Whole exome sequencing (WES) is an effective strategy to search for disease-causing genes. To identify probable susceptible genes, we performed a WES in a cohort of 50 patients and 25 controls with DLSS. By bioinformatics analyses, we successfully identied several novel variants including HLA-DRB1, PARK2, ACTR8, AOAH, BCORL1, MKRN2, NRG4, NUP205, etc. To the best of our knowledge, this is the rst genetic study of susceptibility genes with DLSS by WES in Chinese patients. In conclusion, our ndings revealed that deleterious mutations in these genes may contribute to the development of DLSS.


Introduction
Lumbar spinal stenosis (LSS) is de ned as any type of narrowing of the lumbar spinal canal, the lateral recesses, or the intervertebral foramina. It may be congenital and acquired or degenerative (1,2). Congenital stenosis is rare and typically caused by achondroplasia and hypochondroplasia. Degenerative lumbar spinal stenosis (DLSS) is usually attributable to degenerative changes in the intervertebral discs, facet joints, and ligamentum avum, spondylolisthesis. DLSS typically affects subjects over 50 years of age, as one of the most common spinal conditions worldwide that often requires surgery. Neurogenic claudication is the classic clinical manifestation associated with lumbar spinal stenosis causing diminished function and impaired quality of life (3,4).
To date, the etiology and pathomechanism of DLSS remain unclear, and previous studies indicate that genetic factors may play a part in DLSS. A previous Finnish study identi ed a splice site variant in COL9A2 was associated with DLSS (5). Seung-Jae Hyun reported a COL9A2 haplotype (HAP2) was signi cantly associated with DLSS in the Korean population, whereas another haplotype (HAP4) may play a protective role against DLSS development (6). However, there is inadequate study regarding the susceptibility gene mutation for DLSS. It is urgent to identify pathogenic genes to provide novel approaches for future investigation and intervention of DLSS.
Whole-exome sequencing (WES) represents a cost-effective, reproducible, and robust strategy for the sensitive and speci c identi cation of variants causing protein-coding changes in individual human genomes. Previous genetic studies were limited to implementing one or several candidate genes approach rather than the more established measure of WES study, whereby hundreds of thousands of single nucleotide polymorphisms (SNPs) can be screened (7). For the population with pathogenic gene mutations, preventive measures such as lifestyle modi cation, and better monitoring for stenosis development may be applied. In this study, we analyzed data of WES on fty DLSS patients and twentyve healthy controls to attempt the identi cation of pathogenic genes. We used DNA sequencing to validate the variants disclosed by WES. Single nucleotide polymorphisms were analyzed by bioinformatics analysis. Our study reported several pathogenic variants in genes associated with DLSS in the Chinese population.

Study population
A total of 50 unrelated Chinese patients with symptoms consistent with DLSS and 25 unrelated healthy controls were consecutively recruited between the year 2016 January and 2017 December from the Orthopaedic Department in China-Japan Friendship Hospital. This study was conducted with the approval of the ethics committee of the China-Japan Friendship Hospital Institutional Review Board.
Written informed consent was obtained from all patients whose specimens and clinical information were used for this study. All patients were examined both clinically and radiologically to con rm the degenerative nature of the stenosis. All DLSS patients met the all following three criteria: (1) diagnosis of DLSS by magnetic resonance imaging (MRI); (2) conservative treatment and monitoring for more than 3 months by spinal surgeons; and (3) a history of typical LSS symptoms, consisting of self-reported intermittent neurogenic claudication, stenotic symptoms extending to the lower extremities upon extension of the lumbar spine, or numbness or weakness of the lower extremities. Patients with neurological disorders, spinal fracture, spondylolisthesis, spinal tumor, presentation with trauma, and diagnosis with an infectious disease were excluded. 25 biologically unrelated healthy individuals, with a similar ethnic background, were enrolled as the control group.
Whole exome sequencing Genome DNA was extracted from whole blood with TIANamp Blood DNA kit (TIANGEN BIOTECH Beijing, China). Exome capture was performed using Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, Santa Clara, CA) as per the manufacturer's instructions. A whole exome sequencing was performed on an Illumina HiSeq Xten platform. Sequencing-derived raw image les were processed by Illumina base calling Software with default parameters, and sequence data for each individual were generated as paired-end reads, de ned as 'raw data'. The experiment was performed according to manufacturer's protocol. One sample was loaded per ow cell lane to obtain a minimum of 10× read depth across ~96% of the target regions.

Exome data analysis
Bioinformatics analysis started with the raw sequencing data from the Illumina pipeline. Raw data were processed and ltered and low-quality reads were discarded. Low-quality sequencing reads were ltered by the following criteria:1) removal of reads containing the abnormal sequencing adapter, 2) removal of reads with a low-quality base ratio (base quality less than or equal to 5 that was more than 50%; 3) removal of reads with an unknown base ('N' base) ratio that was more than 10%. Variants located in exon regions or alternative splicing regions were retained. Variant frequency was compared with the 1000G SNP database (http://www.1000genomes.org/). Exome Aggregation Consortium (ExAC) (http://exac.broadinstitute.org;) Analysis for potential deleterious mutations was performed using various algorithms, such as PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/) (8), SIFT (http://sif.jcvi.org/) (9), Mutation Taster (http://www.mutationtaster.org/) (10) as well as CADD (Combined Annotation Dependent Depletion). We used the identi cation of rare combined with at least two of the four algorithms above to prioritize potential pathogenic variants. R software (http://www.bioconductor.org/packages/release/bioc/html/maftools.html) was used to perform the clustering analysis to determine statistical signi cance in genotype and allele frequencies between patients and controls (FDR, false discovery rate < 0.05). Phenolyzer (Phenotype Based Gene Analyzer, http://phenolyzer.wglab.org/) was used to discover genes based on user-speci c disease/phenotype terms.
Read Mapping to reference sequences All single nucleotide variants (SNVs) of 75 samples were obtained by comparing the valid sequencing data with the human reference genome (UCSC Genome Browser hg19) using Burrows-Wheeler Aligner (BWA) software to gain the primary mapping results. And then, the aligned data were sorted by SAMtools to select the best mapping positions, and the duplicated reads were marked by Picard (http://sourceforge.net/ projects/picard/) so that they could be used in the next analysis.
In addition, the functional annotation was conducted to nd the genetic variation associated with DLSS. First, all variants were annotated using the ANNOVAR software tool. Then, the normal population variant databases, including 1000g2015aug_all, 1000g2015aug_Chinese esp6500siv2_all NovoDb_WES_SNP and ExAC database, were performed to exclude the common variations occurring with no more than 1% minor allele frequency (MAF).

Candidate susceptible variants and gene selection
The SIFT, PolyPhen-2 and MutationTaster were performed to predict whether the substitution of amino acid affected the function of the protein. The mutation was selected as candidate susceptible variant if one of the three software showed it was pathological.

Gene function enrichment analysis
Toppgene suite was used for gene list enrichment analysis and candidate gene prioritization based on functional annotations and protein interactions network (https://toppgene.cchmc.org/). A web-based portal named Metascape (http://metascape.org/) was used to do Gene Ontology (GO) enrichment analysis. GSEA software (version 3.0) with "c2.cp.kegg.v6.1symbols.gmt" gene sets database was used to do Gene Set Enrichment Analysis (GSEA) (http://www.broadinstitute.org/gsea/index.jsp), to identify signi cantly enriched or depleted genes that show statistically signi cant, concordant differences between two given clusters.

Statistical analysis
A Fisher's exact test was used for evaluating the association between rare variants and disease phenotype (case/control data) with Benjaminiand-Hochberg method, which is a type of false discovery rate (FDR) test and is used in multiple hypothesis testing to correct for multiple comparisons.

Patient cohort
Twenty-one male and 29 female patients were included in this study, with an average age of 52. Genes related to LDSS by clustering analysis by R software. We ltered allele mutations in controls to retain only mutations in patients and got multiply SNPs on thirty-four genes. (FDR, false discovery rate < 0.05.) ( Figure 1 and Table 1) To decipher the speci ed characteristics between patients and control groups, we analyzed the differences by Gene Ontology (GO) and pathway analysis. (Figure 2 and Supplementary Table 1) Fifty cases and 2245 control cases (Novegene database) were used to do the sites-based association.
( Figure 3) Forty-three genes were ltered up to P-value ≤ 0.001, FDR_BH_Allele ≤ 0.01. After toppgene test, the top two genes were HLA-DRB1 and PRKN. The top four SNP related to lumbar disease were shown in Table 2. We perform pathway and Phenolyzer to discover genes based on lumbar disease. Genes related to LDSS (Phenolyzer score ≥ 0.01) were showed in Supplementary Figure 5.
After ltering the data by 1000G_EAS ≤ 0.005 and damaging variants predicted by at least two of four algorithms (i.e. SIFT, Polyphen2, Mutation Assessor, Mutation Taster, and CADD), there are 322 SNP on 60 candidate genes which P-value ≤ 0.001. We perform pathway and Phenolyzer to discover genes based on lumbar disease. As 'lumbar degeneration' is not in the disease item, we input lumbar disc degeneration as the disease item in Phenolyzer. Genes related to LDSS (Phenolyzer score ≥ 0.01) were showed in Table 3 and Supplementary Figure 6.
As there are genetic relations among bijingfang, bijinfen, and bijinhua, not all the patients are sporadic cases. We tried to nd SNPs that not only in the three cases but also in the other 47 cases as more as possible. Six genes were identi ed to have relations with LDSS with high possibility. (Table 4 and Table 5)

Discussion
To date, several lines of evidence have shown that genetic factors may play a part in DLSS. We performed a WES study design on 50 patients and 25 controls to detect genes contributing to DLSS. In our study, we discover several novel candidate genes that have not been previously identi ed in relation to DLSS.
Prior studies have identi ed several candidate genes in association with lumbar disc degeneration. Jason et al. carried out a genome-wide association study, and multiple SNPs identi ed suggested a multifactorial origin of DLSS (11). HLA-DRB1 genotyping was reported to increase the risk of developing pain after surgery or lumbar disc herniation (12). A variant in the PARK2 gene associated with Lumbar disc degeneration by in uencing methylation of the PARK2 (13). In this study, HLA-DRB1 and PARK2 are identi ed as susceptibility genes associated with a predisposition to DLSS. Lumbar disc degeneration causes intervertebral collapse, which may accelerate the development of DLSS. Obviously, there could be some overlapped mechanism between the two degenerative processes. To some extent, our results are in accordance with the previous studies. The speci c mechanism of the two genes contributing to DLSS needs further investigation.
In this study, ACTR8, AOAH, BCORL1, MKRN2, NRG4, NUP205 were novel genes reported in correlation with DLSS. ACTR8 is located on chromosome 3p21.1 and has 14 exons. Mutation in the ACTR8 was associated with lineage-speci c expression in primates (14). AOAH is on chromosome 7p14.2, encoding both the light and heavy subunits of acyloxyacyl hydrolaseregion. AOAH polymorphisms have been reported to associate with asthma, chronic rhinosinusitis, and bronchial hyperreactivity (15,16). Located on chromosome Xq26.1, BCORL1 encoding a transcriptional corepressor that is found tethered to promoter regions by DNA-binding proteins. Pathogenic variants in BCORL1 underlie a newly identi ed Xlinked epigenetic syndrome (17). MKRN2, on chromosome 3p25.2, encodes a probable E3 ubiquitin ligase containing several zinc nger domains. MKRN2 is involved in the regulation of in ammatory response, non-small-cell lung cancer (18,19). NRG4 is located on chromosome 15q24.2. NRG4 is a member of the epidermal growth factor (EGF) family of extracellular ligands, highly expressed in adipose tissues, enriched in brown fat, and markedly increased during brown adipocyte differentiation (20). NUP205 on chromosome ch7q33 encoding a nucleoporin, which is a subunit of the nuclear pore complex that functions in the active transport of proteins, RNAs, and ribonucleoprotein particles between the nucleus and cytoplasm. Mutations in NUP205 are associated with steroid-resistant nephrotic syndrome (21).
Several other candidate genes associated with DLSS were identi ed including GPRIN2, MYOT, PDE4DIP, etc. Some pathways were enriched through analysis of the differentially expressed genes between the patient and control groups. These genes had never been reported to be associated with DLSS, We recognize several limitations of this study. First, the number (50) is still inadequate for genetic study.
Concerning the inadequate number, we enroll three family cases in this study. Numerous candidate genes identi ed in this study warrant further investigation. Functional study should be conducted to determine the mechanism and pathways in which the genes involved affect the development of the disease. As a next step, we will continue increasing the sample size to reveal more susceptible genes.
In conclusion, in this study, we identifed mutations with DLSS in Chinese patients for the rst time using WES. From our analysis, we found two potential pathogenic loci for DLSS on HLA-DRB1 and PARK2, which have previously been reported to be associated with lumbar disc degeneration, and SNPS on several novel genes including ACTR8, AOAH, BCORL1, MKRN2, NRG4, NUP205, etc, which had not previously been reported to be DLSS. Further replication and functional studies are required to con rm these positive ndings, helping to detect early and to provide novel therapeutic approaches of DLSS.

Declarations
Acknowledgments Not applicable

Con ict of interest
The authors declare that they have no con ict of interest.

Ethical approval and consent details
The clinical study was approved by the ethics committee of China-Japan Friendship Hospital and was conducted in accordance with the provisions of the Declaration of Helsinki.

Data availability statement
The data used to support the ndings of this study are available from the corresponding author upon request.

Not applicable
Authors' contribution statement Xin Jiang and Dong Chen Conceived, coordinated, designed, drafted and edited the manuscript.