Regulatory network underlying alterations of gene expression in early death of Huntington's disease

The aim of this study is to explore the genomic mechanism of early death in Huntington’s disease (HD). We identied 10160 and 1511 differentially expressed genes (DEGs) from comparisons of HD versus control and early versus late death, respectively. On the basis of 922 overlapped DEGs among them, six functional modules were established by weight gene correlation network analysis. The turquoise module most strongly related to overall survival of HD was signicantly enriched in GABAergic synapse, retrograde endocannabinoid signaling and neuroactive ligand-receptor interaction. Low expression of ve DEGs (CA10, WSB2, ACTR3B, PCDH19 and GABRB3) with highest degree of connectivity and non-zero regression coecients were identied as hub genes, based on which the LASSO model exactly predicted the occurrence of early death in HD. Furthermore, the network analysis revealed indirect interaction of CA10 with the central hub gene GABRB3, which was involved in the pathway of GABAergic synapse. Compared with high expression of hub genes, their low expression appeared shorter overall survival of HD patients according to Kaplan-Meier survival analysis. Consequently, low expression of CA10, WSB2, ACTR3B, PCDH19 and GABRB3 were possibly associated with early death of HD. The likelihood of low expression of GABRB3 regulated by CA10 in GABAergic synapse contributed to the pathogenesis of early death in HD.


Background
Huntington's disease (HD) is an autosomal dominant, progressive neurodegenerative disorder characterized by neuropsychiatric disturbances, cognitive impairment and motor dysfunction [1]. This clinical entity is caused by CAG trinucleotide repeat expansion in huntingtin gene, resulting in neuronal dysfunction and necrosis by a mutant huntingtin protein [2]. The onset of HD is generally occurring in mid-adult, with an average longevity of proximate 60 years old [3]. However, it is not only a condition of adult-onset disease, but also be symptomatic in juvenile -there are cases of early-death or late-onset accompanied by atypical clinical symptoms, such as Westphal variant with severe developmental retardation in children, as well as very mild symptoms in elderly patients [4,5]. A fundamental but unresolved issue emerges regarding what exactly determines the progression and mortality in patients with HD, which likely shed light on the anticipation of this incurable disease to be not untreatable.
Despite decades of research into the mutant huntingtin protein since the discovery, yet its function remains elusive and, most patients ultimately suffer from intolerable disabilities and early death [6,7]. A growing emphasis on prominent group of genes encoding huntingtin-interacting proteins, provides valuable clues to illustrate huntingtin functions and the pathophysiology of HD [8]. In case of Slc12A5, which is highly enriched and down-regulated in huntingtin proteome [9,10], disrupting cytoskeletal interactions and promoting developmental apoptosis of cortical projection neurons [11]. Moreover, targeting Gpr52 reduces the levels of mutant huntingtin and rescues HD-related phenotypes by the cyclic adenosine monophosphate-dependent signaling pathway in cellular and mouse models [12]. Notably, these studies only highlighted the involvement of genes in the onset of HD, rather than intensive investigation of the genetic effect on prognosis or mortality. As the inherited certainty was taken into account, we sought to identify causative genes associated with diversities of overall survival in HD to immediately target the early death mechanism of the primary disorder.

Data resources
The gene expression pro le of GSE33000 dataset were collected in the Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/). [13]. This dataset was generated using the Rosetta/Merck Human 44k 1.1 microarray from postmortem prefrontal cortex samples between 157 HD patients and 157 non-demented controls with matched genotype and clinical data. The data of gene expression pro le were normalized by application of the normalizeBetweenArrays function in the limma package of R software version 3.6.2 [14].
Gene set enrichment analysis (GSEA) The biological process (BP) Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways that were potentially related to HD and early death in GSE33000 dataset were ltrated by GSEA analysis [15,16]. The statistic of default weight was used to conduct the permutation for 1000 times and normalized P < 0.05 was considered as signi cant enrichment. Thereafter, visualization of enrichment data in GSEA analysis were accomplished by adapting packages of ClusterPro er, ggplot2, enrichplot and GSEABase in R.

Identi cation of differentially expressed genes (DEGs)
The median time of overall survival was identi ed as the cut-off point in order to dichotomize subjects into early death and late death groups. The differential expression analysis was carried out using the limma package of R software, with P < 0.05 adjusted by the false discovery rate considering statistically signi cant [14,17]. The lmFit and eBayes functions of limma package were utilized to obtain the DEGs for HD versus control group and early death versus late death cohort, respectively.
Weight gene correlation network analysis (WGCNA) WGCNA was conducted based on the intersection genome of DEGs between HD/control group and early/late death cohort. The outlier samples were excluded to guarantee the dependability of network after plotting the clustering dendrogram by hclust function. During the module construction, the value of soft thresholding power was ltered using the pickSoftThreshold function, and the appropriate power value of 9 was determined by sft$powerEstimate function. The co-expression network was performed by the WGCNA package, with each module designating a unique color label and the minimum size setting to 30 [18]. Thereafter, topological overlap matrix was calculated to ascertain modules by hierarchically clustering eigengenes [19], and the network heatmap of the interaction association between modules was visualized using TOMplot package. Each module in the heatmap was represented by a different color for the horizontal and vertical axis. The yellow brightness in the unit of intermediate connection indicated the connectivity degree of different modules. Additionally, functional enrichment analysis of each module was conducted by utilizing the clusterPro ler package.

Identi cation of hub genes and construction of protein-protein interaction (PPI) network
In WGCNA, hub genes referring to the highest degree of connectivity in a module were used to determine the biological signi cance of the module. The absolute value of Pearson's correlation was adapted to analyze the relationship between intramodular connectivity and clinical trait, which were measured by module membership (MM) and gene signi cance (GS), respectively [20]. An eigengene that met the criteria of Pearson's correlation (MM ≥ 0.9 and GS ≥ 0.5) was de ned as a candidate hub gene. The pairwise scatterplot matrix of the relationship between candidate hub genes was quantitatively analyzed with gpairs package. Based on the STRING database (Search Tool for the Retrieval of Interacting Genes, https://www.string-db.org/) [21], DEGs linked to the interaction of hub genes with early death and HD were extracted to construct the PPI network of the clinically signi cant module. The cytoscape software was used to visualize the global regulatory network, intersection pathways of DEGs and GABAergic synapse pathway [22].

Analysis of receiver operating characteristic (ROC) curve and Kaplan-Meier survival curve
Least absolute shrinkage and selection operator (LASSO) method was applied to eliminate factors with relatively insu cient contribution and select the most useful predictors as the hub genes for early death of HD [23]. The glmnet package was used to construct the LASSO model according to the expression pro le of candidate hub genes. The regression coe cient of LASSO analysis was calculated to establish the model index of each sample, with the gene expression value weighted by the following formula: index = ExpGene1*Coef1 + ExpGene2*Coef2 + ExpGene3*Coef3 + … (Coef: gene regression coe cient; Exp: gene expression value). The ROC curve was plotted using the pROC package and the diagnostic performance of LASSO model in predicting early death of HD was estimated by the area under the curve (AUC). Generally, an AUC value of 100% indicated a complete prediction, while 50% represented a random selection. In addition, analysis of time-dependent Kaplan-Meier survival curve was conducted by the survival package [24]. All P values were bilateral with those less than 0.05 considered statistically signi cant.

Identi cation of differentially expressed genes
Based on the clinical data of all subjects, the mean overall survival of 157 HD patients (55.85±14.85) was signi cantly shorter than that of 157 control samples (63.52±9.91; P = 1.144e-007) ( Figure 1A). After eliminating the unavailable genes, a total of 19414 genes were included in this study. Among them, 10160 DEGs with 4885 down-regulated and 5275 up-regulated were identi ed between HD and controls ( Figure 1B); meanwhile, 1511 DEGs with 829 down-regulated and 682 up-regulated were established in early death compared with late death cohort ( Figure 1C). Finally, we extracted the intersection genome of 922 DEGs that were commonly down-regulated or up-regulated in both HD/control and early/late death group. The plot of heatmap summarized the top 25 down-regulated and up-regulated DEGs of intersection genome that were related to HD and early death ( Figure 1D).

Module associated with overall survival of HD
Six modules with unique colors were generated by WGCNA in the trait of overall survival in HD ( Figure  2A). As shown in Figure 2B The analysis of functional enrichment revealed that turquoise module DEGs ( Figure 3A) was involved in BP of membrane potential regulation, nervous system process and chemical synaptic transmission; as well as KEGG pathways ( Figure 3B) enriched by GABAergic synapse, retrograde endocannabinoid signaling, morphine addiction and neuroactive ligand-receptor interaction. According to MM > 0.9 and GS > 0.5, seven DEGs were identi ed as candidate hub genes (CA10, WSB2, NETO2, LPPR4, ACTR3B, PCDH19 and GABRB3), all of which were positively interacted with each other ( Figure 3C). The LASSO model was constructed based on the expression pro le of candidate hub genes ( Figure 3D). Two (NETO2 and LPPR4) of them with zero regression coe cients were eliminated, whereas the remaining ve DEGs (CA10, WSB2, ACTR3B, PCDH19 and GABRB3) were identi ed as hub genes (lambda.min = 0.01264). The index formula of hub gene-based model was generated as follows: index = CA10*(-0.9456) + WSB2*0.2320 + ACTR3B*1.1667 + PCDH19*0.3986 + GABRB3*1.4177.

PPI network and GSEA
The DEGs of turquoise module were extracted to construct the PPI network of global regulation based on STRING database ( Figure 4A). Furthermore, we found that GABRB3 was jointly involved in the enrichment of KEGG pathways including GABAergic synapse, neuroactive ligand-receptor interaction, morphine addiction and retrograde endocannabinoid signaling ( Figure 4B). As shown in gure 4C, CA10 regulated the interactions of GABRB3 with GABRA1, GNB5 and KCNJ6 in the pathway of GABAergic synapse. Gene set enrichment analysis revealed the BP of HD ( Figure 5A) and early death ( Figure 5B), as well as the KEGG pathways of HD ( Figure 5C) and early death ( Figure 5D).

Discussion
During the current study, GSEA was performed to explore the expression pro le of 19414 genes belonging to the BP of synapse assembly, synaptic vesicle cycle and neurotransmitter transport; as well as the KEGG pathways of phagosome, neuroactive ligand-receptor interaction and retrograde endocannabinoid signaling. It is worth noting that these genes in the processes and pathways were enriched in HD and early death individuals. Subsequently, a comprehensive characterization of regulatory network for the intersection genome of DEGs between HD and early death was integrated; hereby, the excavation of key target genes and interactive pathways from functional modules with respect to the trait of overall survival might contribute to elucidate the genome-scale pathogenesis of early death in HD patients.
Generally consistent with the results of GSEA, functional enrichment analysis exhibited DEGs of turquoise module participating in BP related to membrane potential regulation, nervous system process and chemical synaptic transmission; as well as KEGG pathways regarding GABAergic synapse, neuroactive ligand-receptor interaction and retrograde endocannabinoid signaling. Convincing previous evidence demonstrated that a series of huntingtin-interacting proteins associated with the de cits of GABAergic synapse yielded the onset of HD [25]. Additionally, endocannabinoids modulated the synaptic transmission through the pathway of retrograde endocannabinoid signaling, resulting in cognitive impairment and movement dysfunction partially similar to the symptoms of HD [26]. Based on MM > 0.9 and GS > 0.5 in WGCNA, seven DEGs including CA10, WSB2, NETO2, LPPR4, ACTR3B, PCDH19 and GABRB3 were identi ed as potential pathogenic factors for early death and HD. Among them, PCDH19 has been found to coordinate neurogenesis, neuronal migration and synaptic connections in mammalian cortex [27,28]. Low expression of PCDH19 promoted over-differentiation of progenitor cells, thus to premature neuronal apoptosis and structural defects of synapse; contrastively, high expression of PCDH19 improved abnormal neurogenesis, ne-scale spatial organization of excitatory neurons and synaptic junction in the cortex [29]. ACTR3B, also known as actin related protein 3B, had implications in the nucleation and assembly of branched actin networks [30]. It bound to DNA double-strand fractures and, in conjunction with actin, facilitating clustering and homologous directed repair at the breakage [31]. There was evidence that the disturbance in ACTR3B expression was account for abnormal neuronal morphology and synaptic growth, as well as alterations in synapse activity [32][33][34], which was likely related to the pathogenesis of early death in HD.
Glaringly, in the global network analysis, GABRB3 was identi ed as a central hub gene at a site with genome-wide signi cant linkage to overall survival of HD. An observation from high-throughput ow cytometry assay revealed that GABRB3 was a critical gene for early development of central nervous system, and the functional impairment of GABRB3 was prone to neurodevelopment-related disorders [35]. Speci cally, it was increasingly expressed in the embryonic brain, with its encoded β3 subunit involved in the biological processes of stem cell differentiation, assembly and tra cking of GABA receptor [36,37]. The inhibitory synaptic transmission mediated by GABA receptor were susceptible to the accumulation of mutant huntingtin, leading to early degeneration of GABAergic neurons in the striatum of HD patients [38]. Indeed, many of the manifestations that were emerged early in HD, such as hippocampal-dependent learning and memory impairment, were considerably attributed to the weakened inhibitory GABAergic transmission veri ed in a HD mouse model [25]. This happened to coincide with the results of our network and GABAergic synapse pathway that low expression of the central hub gene GABRB3, interacting with other DEGs (e.g., GABRA1, GNB5 and KCNJ6), were included in the intersection genome between HD and early death, suggesting that GABRB3-mediated GABAergic synapse was probably a crucial pathway for early death of HD. With the exception of GABAergic synapse, the module-pathway network showed that GABRB3 also participated in neuroactive ligand-receptor interaction, morphine addiction and retrograde endocannabinoid signaling. The potential roles of these pathways in the overall survival of HD was not fully understood, and therefore deserved further exploration with an expectation of offering a GABRB3-related therapeutic target for this incurable disease.
Less is known about the properties of carbonic anhydrase CA10, rather than a non-catalytic activity of extracellular protein that has been described for two decades in relation to carbonic anhydrase [39].
Nevertheless, recent study con rmed that CA10, an evolutionarily conserved ligand of neurexins in the surface transpose, served as a functional adaptor to enhance the expression of neurexin proteins, alter post-translational modi cations and improve their surface levels [40]. It mediated the interaction between neurexins and certain postsynaptic molecules, thereby facilitating the formation of novel trans-synaptic complexes [40]. Evidence in zebra sh using morpholino-mediated knockdown of CA10 showed developmental retardation, aberrant behavior and early death under unknown mechanisms of CA10 de ciency [41]. Herein, the network analysis exhibited multiple indirect interactions of CA10 with the center hub gene GABRB3, suggesting that GABRB3-mediated GABAergic transmission was possibly modulated by CA10 through its adaptor function. In episodic ataxia type-1families, carbonic anhydrase inhibitors decreasing the excitability of GABAergic interneurons by intracellular alkalinization were used to alleviate the clinical symptoms of cerebellar ataxia [42]. Correspondingly, this would provide us with a feasible implication to the intervention of early death of HD by modulating CA10 to maintain postsynaptic potentials of GABAergic inhibition.
Visualization of the global regulatory network showed the down-regulation of candidate hub genes that were strongly related to the clinical trait of overall survival in HD patients. Further matrix of pairwise scatterplot con rmed a positively mutual correlation between them, implying that the expression of a gene subsided with the attenuation of the other. This might be illustrated by the involvement of candidate hub genes underlying the pathogenesis of early death in HD, particularly CA10, which indirectly interacted with GABRB3 in the pathway of GABAergic synaptic transmission [29,34,38,41]. Five genes (CA10, WSB2, ACTR3B, PCDH19 and GABRB3) with non-zero regression coe cients were selected to conduct the ROC analysis, of which the AUC values were all greater than 70%, supporting that they were hub genes and could be served as potential predictors for early death in HD worthy of clinical application [43]. In addition, the veri cation of the Kaplan-Meier survival curve demonstrated that, compared with the high expression of CA10, WSB2, ACTR3B, PCDH19 and GABRB3, their low expression was associated with shorter overall survival of HD patients. It might be an intriguing hypothesis that the improvement of low expression of the hub genes was prospectively used for targeted treatment or prevention to HD patients with a high risk of early death. Since our prediction was based on an analysis of post-mortem samples, subsequent in vivo model experiments are required to validate the underlying mechanisms of the hub genes involved in the early death of HD.

Conclusions
Low expressions of CA10, WSB2, ACTR3B, PCDH19 and GABRB3 have important implications in the pathogenesis of early death in HD. GABAergic synapse is potentially a crucial pathway by which CA10 interacts intimately with GABRB3 to yield early death of HD. The results emerging from our study stand poised to extend a fundamental guidance, in theory, for prognosis prediction and a promising intervention into the target of GABRB3 modulated by CA10 for early death of HD.