Identification of DEGs
The workflow if this study was shown in Fig. 1. Then, two GEO datasets, GEO833 and GEO26276, in our test set were selected to identify the DEGs. GEO833 included 4 nonneurological control samples and 5 sALS samples, while GEO26276 included 3 nonneurological control samples and 3 sALS samples. A heatmap and volcano plot analysis of two datasets were used to visualize the DEGs (Fig. 2A and B). In the GSE833 datasetset, 274 DEGs were identified in the sALS group compared with the nonneurological control samples, including 117 upregulated genes and 157 downregulated genes. In the GSE26276 datasetset, 203 DEGs were identified in the sALS group compared with the nonneurological control samples, including 142 upregulated genes and 61 downregulated genes (Fig. 2C). The Venn plot showed that 8 DEGs were found in both datasets, and we merged the DEGs in the two datasets into a new expression matrix for further analysis.
Analysis of DEG expression in tissues/organs
According to the localization in the gene expression analysis, 46 tissue/organ-specific expressed genes were identified by BioGPS (Table 2). The results showed that most of these DEGs were specifically expressed in the nervous system (24/46, 52.17%). The second tissue/organ-specific expressed system was hematologic/immune cells, including 9 DEGs (9/46, 19.57%), followed by the digestive system (4/46, 9.70%), circulatory system (2/46, 4.35%), endocrine system (2/46, 4.35%), genital system (2/46, 4.35%), respiratory system (1/46, 2.17%), placental system (1/46, 2.17%) and other systems (1/46, 2.17%).
Table 2
Tissues/organ-specific DEGs expression DEGs: differentially expressed genes
System/Organ
|
Genes
|
Counts
|
Haematologic/Immune cells
|
UBE3A, SELL, PTGDR, PRKCD, ARHGAP25, FZR1, DARS1, LARP4B, NUCB2
|
9
|
Nervous
|
APP, PSEN1, SLCO1A2, LRRK2, AKT1, CHRNA4, GDF10, ALK, FHL3, ABCE1, PLP1, LMO3, TF, PRG4, CHI3L1, PDE6H, HTR2A, VAMP1, RXRG, PDE6A, EDNRB, HIP1, OPCML, PUM2
|
24
|
Digestive
|
FABP6, MEP1A, RARRES2, C4BPA
|
4
|
Respiratory
|
F3
|
1
|
Circulatory
|
PGAM2, TNNC
|
2
|
Placenta
|
LGMN
|
1
|
Endocrine
|
CDH1, SERPINA3
|
2
|
Genital
|
SYCP2, SPAG11A
|
2
|
Others
|
GNAT1
|
1
|
DEGs: differentially expressed genes
DEG enrichment analysis
Functional and pathway enrichment analyses of the DEGs were performed by GSEA software, the R software org.Hs.eg.db package and the Online database KOBAS 3.0. First, we performed a GSEA by uploading the expression profiles of GSE833 and GSE26276, and we used the c2: GO gene set to investigate the GO enrichment of gene expression at the overall level. The screening criteria for the significantly enriched gene set were a Q value < 0.05 and an FDR < 0.25. We found that the most enriched gene sets in GSE833 and GSE26276 were related to axon_guidance, neurotrophin_signaling_pathway, neuroactive_ligand_receptor_interaction, adherens_junction, huntingtons_disease, amyotrophic_lateral_sclerosis_als and parkinsons_disease (Fig. 3A-D).
Second, we performed GO and KEGG pathway analyses of the DEGs using the R software org.Hs.eg.db package and KOBAS 3.0, respectively. The results of the GO enrichment analysis of the DEGs showed that processes of nervous system activity, such as chemical synaptic transmission, neural crest cell development and neurogenesis, were significantly related to sALS, and other biological processes, such as response to endogenous stimulus, anterograde transsynaptic signaling and excitatory postsynaptic potential, were also involved. The top 10 biological processes were selected based on a Q value < 0.05 and an FDR < 0.25 and are shown in Fig. 3E. The KEGG pathway analysis revealed that the DEGs were mainly enriched in neuroactive ligand receptor interaction. In addition, the PI3K-Akt signaling pathway, cellular senescence, the apelin signaling pathway, fluid shear stress and atherosclerosis were enriched in the sALS samples (Fig. 3F).
PPI network analysis and hub gene identification
The interaction network of proteins coded by the DEGs between the nonneurological controls and sALS patients comprising 161 nodes and 617 edges was evaluated by STRING and visualized by Cytoscape (Fig. 4A). We further used the MCODE plugin to identify the gene cluster modules according to the filter criteria, and the results showed that four modules were identified (Fig. 4B-E). Cluster 1 had 14 nodes and 60 edges with a score of 4.615. Cluster 2 had the second highest cluster score (score: 3.571, 15 nodes and 50 edges), followed by Cluster 3 (score: 3.5, 9 nodes and 28 edges) and Cluster 4 (score: 3.5, 5 nodes and 14 edges). To identify the hub genes in the interaction network, the CytoHubba plugin was used, and the results showed that 14 hub genes were identified by five algorithms of cytoHubba, including Clustering Coefficient, Degree, MNC, MCC and DMNC (Table 3). These DEGs are the core genes in the PPI network, implying that they play an important role in the pathogenesis of sALS. Since the GO, KEGG and GSEA enrichment analyses revealed an important function of the DEGs in the biological process of nervous system activity, we further intersected 14 hub genes and 24 nervous system-specific expressed genes and identified five nervous system-specific expressed hub genes, including APP, AKT1, LRRK2, PSEN1, and SLCO1A2 (Table 3, in bold).
Table 3
Hub genes determined using cytoscape plug-in cytoHubba by five algorithms.
Gene
|
Description
|
Log2FC
|
adjust P-value
|
Regulation
|
AKT1
|
serine/threonine kinase 1
|
4.183
|
0.007
|
Up
|
PSEN1
|
presenilin 1
|
5.0827
|
0.030
|
Up
|
APP
|
amyloid beta precursor protein
|
5.3519
|
0.021
|
Up
|
LRRK2
|
leucine rich repeat kinase 2
|
2.8715
|
0.029
|
Up
|
SLCO1A2
|
solute carrier organic anion transporter family member 1A2
|
3.9143
|
0.030
|
Up
|
MYOG
|
myogenin
|
2.2627
|
0.041
|
Up
|
MDM2
|
MDM2 proto-oncogene
|
0.4255
|
0.018
|
Up
|
LIG4
|
DNA ligase 4
|
2.7041
|
0.048
|
Up
|
XRCC4
|
X-ray repair cross complementing 4
|
2.8248
|
0.034
|
Up
|
CCNB1
|
cyclin B1
|
3.1106
|
0.031
|
Up
|
ATF3
|
activating transcription factor 3
|
2.7574
|
0.018
|
Up
|
JAG1
|
jagged canonical Notch ligand 1
|
2.2920
|
0.028
|
Up
|
RAD9A
|
RAD9 checkpoint clamp component A
|
-4.7490
|
0.019
|
Down
|
AHR
|
aryl hydrocarbon receptor
|
-1.6973
|
0.035
|
Down
|
FC: Fold change. |
Construction of mRNA–miRNA coexpression networks
MiRNAs have been reported to regulate gene levels by binding the 5’ or 3’ UTR of the target mRNA and play an important role in neurological disorder development. Therefore, we predicted the target miRNAs of 5 nervous system-specific expressed hub genes using five online miRNA databases and identified 87 target miRNAs and 94 mRNA–miRNA pairs. Finally, we constructed a coexpression network of mRNAs and miRNAs by Cytoscape, which comprised 92 nodes and 94 edges (Fig. 5).
Verification of five nervous system-specific expressed hub genes in three GEO datasets
Three GEO datasets, namely, GSE4595, GSE26927 and GSE39644, including 29 nonneurological control samples and 31 sALS patient samples, were used to verify the expression levels of 5 nervous system-specific expressed hub genes. The R software ggplot2 package was used to generate a split violin plot, and the differences were analyzed by Student’s t-test. As expected, we found that the mRNA expression levels of the 5 nervous system-specific expressed hub genes in the sALS group were significantly higher than those in the nonneurological control groups (Fig. 6A-C, p < 0.01), implying that the 5 hub genes play an important role in sALS development. In addition, we performed GO, KEGG and GSEA enrichment analyses based on five nervous system-specific expressed hub genes. As expected, the enriched gene sets of the DEGs were related to the neurotrophin signaling pathway, glial cell activation, neuron death, axon guidance, regulation of the actin cytoskeleton, etc., which are correlated with sALS progression (Supplementary Fig. 1A-G).
ROC curve and prognosis prediction ability of 5 nervous system-specific expressed hub genes in sALS samples
Since sALS patients are difficult to diagnose in the early stage because of clinical heterogeneity, it is essential to find diagnostic biomarkers to distinguish individuals with sALS from nonneurological people. We used ROC curves to evaluate the diagnostic ability of the 5 hub genes in predicting sALS. Specifically, the expression profiles of 5 hub genes in the GSE112681 dataset of nonneurological control samples and sALS samples were analyzed by the R software pROC package. The ROC curves of these hub genes were drawn, and the area under the curve (AUC) was calculated. The results showed that all hub genes have strong diagnostic value in sALS samples. Our results showed that these 5 hub genes could be biomarkers for sALS diagnosis. LRRK2 had the highest diagnostic ability (AUC: 0.927), while the AUC of the other genes was 0.832 for AKT1, 0.811 for APP, 0.832 for PSEN1, and 0.810 for SLCO1A2 (Fig. 7A-E). Additionally, we evaluated the ability of these genes to predict the prognosis of sALS patients by a survival analysis. A survival curve of the sALS patients based on the expression of the 5 hub genes was drawn by the Kaplan–Meier method. The results showed that sALS patients with high expression levels of AKT1, APP, LRRK2, PSEN1, or SLCO1A2 had a poorer prognosis than those with a low expression of these hub genes (Fig. 7F-J). Our results imply that AKT1, APP, LRRK2, PSEN1 and SLCO1A2 may be biomarkers for the diagnosis and prognosis prediction of sALS patients based on our present samples.
Construction of mRNA–miRNA-ncRNA coexpression networks
MiRNAs can cause posttranscriptional gene silencing by binding mRNAs, while other ncRNAs, such as lncRNAs and circRNAs, can regulate gene expression by competitively binding miRNAs, which is called the ceRNA mechanism. CeRNAs can disable microRNAs by binding microRNA response elements (MRS), which reveal mRNA–miRNA-ncRNA interaction coexpression networks. The existence of microRNA regulatory pathways is of great biological significance. We predicted the circRNAs and lncRNAs that interacted with selected miRNAs by the online database StarBase 3.0. The screening criteria were as follows: (1) mammalian; (2) human h19 genome; (3) and strict stringency (≥ 5) of CLIP-Data with degradome data. NcRNAs present in most selected miRNA predictions were chosen as the predicted circRNAs and lncRNAs. In the predicted results of the StarBase database, a transcript has multiple circRNA shearing sites; thus, the circRNA with the largest number of samples and the highest score in the circBase database was selected as the target circRNA. Finally, we obtained 7 lncRNAs and 5 circRNAs from APP-targeted miRNAs, 1 lncRNA and 16 circRNAs from LRRK2-targeted miRNAs, and 1 lncRNA and 14 circRNAs from PSEN1-targeted miRNAs. Three ceRNA networks were constructed based on the prediction results and visualized by Cytoscape (Fig. 8A-C). Subsequently, we conducted a literature search and selected seven miRNAs and three ncRNAs that were reported to affect neurodegeneration disorder development for further study. We propose that NEAT1-miR-373-3p/miR-302c-3p/miR-372-3p-APP, circ_0000002-miR-302d-3p/miR-373-3p-APP and XIST-miR-9-5p/miR-30e-5p/miR-671-5p might be potential ceRNA regulatory pathways that regulate sALS progression (Fig. 8D-F).
Identification of the genetic association between 3 SNPs and sALS
To date, several studies have confirmed that AD (Alzheimer’s disease), PD (Parkinson’s disease), FTD (frontotemporal dementia), PSP (progressive supranuclear palsy) and ALS have similar genetic bases. SNPs of APP (rs463946, rs466433, and rs364048) have been found to be closely related to the incidence of AD in the Chinese Han population. Therefore, we hypothesized that these three SNPs of APP are involved in sALS development, and the locations of the three SNPs of APP are shown in Fig. 9A-C. All 3 SNPs were intronic polymorphisms, and the minor alleles of rs463946 (OR = 3.000, 95% CI = 1.164–7.732, p = 0.014), rs466433 (OR = 3.000, 95% CI = 1.164–7.732, p = 0.014), and rs364048 (OR = 3.000, 95% CI = 1.164–7.732, p = 0.014) significantly increased the risk of sALS development, suggesting that these three polymorphisms might represent genetic susceptibility factors. Subjects harboring the minor G allele (GG + CG) of rs463946 (p = 0.026), minor G allele (GG + AG) of rs466433 (p = 0.026) and minor C allele (CC + CT) of rs364048 (p = 0.026) showed an increased risk of sALS development compared with those with the other genotypes (Table 4).
Table 4
Three SNPs shown the significant at P < 0.05 in the study
SNP
|
Group
|
Genotypes
|
Genotypes P-value
|
Allele
|
Allele
P-value
|
OR(95%)
|
|
|
GG
|
CG
|
CC
|
|
G
|
C
|
|
|
rs463946
|
Cases(n = 30)
|
2
|
11
|
17
|
0.026
|
15(25.0%)
|
45(75.0%)
|
0.014
|
G:3.000(1.164–7.732)
|
|
Controls(n = 30)
|
1
|
3
|
26
|
|
5(8.3%)
|
55(91.7%)
|
|
C:0.818(0.694–0.965)
|
|
|
GG
|
AG
|
AA
|
|
G
|
A
|
|
|
rs466433
|
Cases(n = 30)
|
2
|
11
|
17
|
0.026
|
15(25.0%)
|
45(75.0%)
|
0.014
|
G:3.000(1.164–7.732)
|
|
Controls(n = 30)
|
1
|
3
|
26
|
|
5(8.3%)
|
55(91.7%)
|
|
A:0.818(0.694–0.965)
|
|
|
GG
|
CC
|
CT
|
|
C
|
T
|
|
|
rs364048
|
Cases(n = 30)
|
2
|
11
|
17
|
0.026
|
15(25.0%)
|
45(75.0%)
|
0.014
|
C:3.000(1.164–7.732)
|
|
Controls(n = 30)
|
1
|
3
|
26
|
|
5(8.3%)
|
55(91.7%)
|
|
T:0.818(0.694–0.965)
|
SNP: Single nucleotide polymorphism; OR: Odds ratio; |