We performed a systematic differential expression analysis at single-cell resolution between an iPSC line carrying the PD-associated ILE368ASN mutation in the PINK1 gene and an age- and sex-matched control cell line (control 1-2 in ref 28) during their differentiation into mDA neurons (Fig. 1). After preprocessing and quality filtering, we used 4495 cells and 18,097 genes in our downstream analysis (Methods). Expression of key genes was also confirmed by qPCR and corresponding marker staining. For data integration, we performed a network analysis to identify the underlying key mechanisms of PD progression.
Single-cell RNAseq analysis reveals gene expression panel for direct classification of iPSC-status
The standard accepted procedure for determining iPSC status involves staining for iPSC markers and the use of gene expression platforms, such as Scorecard. However, we show that a standard panel of genes readily detectable by single cell analysis can be used to confirm iPSC status directly in the cells used for the single cell experiment, rather than by staining or expression analysis of an independent sample, which in some cases may not reflect the iPSC status of the experimental sample.
Fibroblasts were isolated from a 64 year-old male with PD symptom onset at 33 years of age who was homozygous for the ILE368ASN (P.I368N/P.I368N) mutation in the PINK1 gene (Coriell Institute, Cat. No. ND40066). The fibroblasts were confirmed to have a normal karyotype (Supplementary Fig. 1) and an episomal reprogramming method (Epi5™ Episomal iPSC Reprogramming Kit, Invitrogen USA, cat. # A15960) was used to avoid unwanted genetic modifications of the target cell. The karyotype of iPSCs was confirmed (Supplementary Fig. 2) and their iPSC status was ascertained by standard methods, including staining for the POU5F1 marker (also known as Oct4) and by using the TRA-1-80 antibody (Fig. 2a), as well as by a TaqMan iPSC Scorecard Assay, which also confirmed their trilineage potential (Fig. 2b).
Subsequent single-cell RNAseq characterization confirmed that these iPSCs expressed genes routinely used to confirm iPSC status, including SOX2, MYC, POU5F1, and NANOG (Fig. 3a, b). However, iPSCs also expressed several additional genes associated with stemness (Fig. 3a), namely TDGF-1, the expression of which was shown by Hough at al. 29 to be present only in stem cells with the highest expression of stemness markers. Additional genes expressed by the iPSCs were L1TD1, USP44, POLR3G, and TERF1 (essential for the maintenance of pluripotency in human stem cells 30–33), as well as IFITM1, DPPA4, and PRDX1 (associated with stemness 34–36). Based on these observations, the following genes are readily detectable by single-cell RNAseq and this panel should provide a reliable indication of stemness in single cell experiments: SOX2, MYC (cMyc), POU5F1 (Oct4), NANOG, LIN28A, TDGF-1, L1TD1, USP44, POLR3G, and TERF1.
In vitro differentiation of iPSC-derived mDAs recapitulates the in vivo process
To confirm that our differentiation protocol (Supplementary Table 1) is recapitulating the in vivo mDA differentiation path, the expression of key genes (OTX2, EN1, LMX1B, LMX1A, and FOXA2) known to drive mDA differentiation in vivo 19,23,37 was confirmed using our single-cell RNAseq data or by qPCR (Fig.3b, c, Table 1). The absence of the early, non-mDA neuron marker PAX6 21,22 was further validated by staining (Supplementary Fig. 3 and Supplementary Table 2).
In vivo, the development of mDA phenotype depends on the early high expression of Sonic Hedgehog (SHH), followed by the induction of Wnt signaling and the expression mDA-specific downstream pathways 18,19,23. Consistent with these in vivo differentiation steps, among the highest-expressed genes on day 6 (D6) of the differentiation protocol were PTCH1, a receptor for SHH, and FZD7, a receptor for Wnt proteins (Fig. 3b). Following the differentiation process from iPSCs to D21 (Fig. 4a), we could see the onset of expression of the mature mDA markers TH and KCNH6 (also known as GIRK2) in the D21 cluster (Fig. 4b). By day 21, many factors that are specific to the mDA differentiation path, such as TCF12, ALCAM, PITX2, ASCL1, and DDC 20,38–41, were among the most highly expressed genes (Fig. 4c). The onset of TH expression indicated that the cells achieved the state of early postmitotic mDA neurons18 (Fig. 4b), which was confirmed by staining (Fig. 4d) and qPCR (Fig. 4e, Supplementary Table 3). In addition, markers specific to mature mDA neurons including TH, DDC, ALDH1A1, and KCNJ6 (also known as GIRK2) also started to become expressed (Fig. 4 b-e, Table 1).
Overall, these observations confirm that our in vitro differentiation protocol does indeed recapitulate the in vivo differentiation of mDA neurons and produces mDA neurons (Pax6-, Aldh1a1+, Pitx3+, GIRK2+, Nurr1+, Lmx1a+), rather than other types of DA neurons (Pax6+, AldhA1A3+) (Supplementary Table 2).
The PINK1 ILE368ASN mutation is associated with persistently dysregulated expression of nearly 300 loci
To analyze mutation-induced changes in gene expression, we performed single-cell RNA expression analysis, at four important timepoints during the mDA neuron differentiation process, using DropSeq 27 (Fig. 1). After preprocessing and quality-filtering (Methods and Supplementary Fig. 4), a total of 4495 cells (2518 control and 1977 PINK1 cells) and 18,097 genes were included in our analysis). Control and PINK1 cells co-clustered together based on their differentiation stage, from iPSCs, to day 6 (D6), D15 and D21 (Fig.4a), indicating that RNA expression was specific to differentiation stages, rather than to a cell line. The PINK1 cells at D10 showed low viability, hence D10 timepoint was not included in the pairwise analysis (Fig. 5).
The analysis of pairwise differential expression at each time point with adjusted p-values (padj) <0.01 fold changes (FC) > 0.1 resulted in 14 genes that were upregulated and 13 genes that were significantly downregulated in the PINK1 cell line compared to control (Table 2). Because iPSCs are very different from differentiating neuronal precursors, we next tested whether inclusion of iPSCs had disproportionately affected the results through the exclusion of neuron-specific genes. Repeating the analysis using D6, D15 and D21 identified 28 genes that were upregulated and 27 genes that were downregulated at all four timepoints, including all genes previously identified (Table 2, Fig. 5, Group A). As expected, excluding iPSCs resulted in the identification of a broader range of genes because genes that are differentially expressed only in the neuronal lineage were previously excluded due to the absence of their expression in iPSCs and the requirement that DEGs have to be dysregulated at all timepoints. However, both sets are equally valuable, as genes dysregulated even in iPSCs are more likely to be genes that participate in systemic PD pathology, regardless of cell type, and may be relevant to a broader spectrum of PD pathology than the death of mDA neurons. Interestingly, most of these genes are already linked to PD, other PD mutations, or neurodegeneration (Table 2).
For an alternative definition of differentially expressed genes (DEGs), we used the maximum adjusted p-value in a pairwise combinations as adjusted p-value, and the average fold change that occurred in the pairwise comparison as fold change threshold. With this approach we retained only genes dysregulated in the same direction at all timepoints. This analysis led to 151 DEGs, including the previously identified genes of Group A, of which 65 were upregulated and 86 downregulated compared with controls (padj < 0.01 and FC > 0.1) (Group B, Supplementary Table 4). Taking the mean of FC of the different time points enhanced the identification of DEGs because it reduced the effect of the variability between pairs due to their different differentiation states. Repeating the same analysis for timepoints iPSCs, D6, D15 and D21, but taking into account only the absolute degree of change in iPSCs, yields 172 genes (Group C, Supplementary Table 5). Repeating the analysis using only timepoints D6, D15 and D21 identified a total of 285 DEGs (Group D) (Supplementary Table 6 and Supplementary Fig. 5). Together, when all analyses were pooled, we obtained 291 DEGs (6 genes in Group C depended on the inclusion of iPSCs and did not appear in Group D, see Supplementary Table 6).
Data integration reveals a common PD network.
To integrate the expression analysis and identify underlying disease mechanisms, we generated a network of interactions between the DEGs via Gephi 42, using protein-protein interaction information obtained from the STRING and GeneMANIA databases 43,44. The network we obtained includes 246 of the 291 DEGs, since pseudogenes and non-coding RNAs could not be integrated into a protein-protein intearction network (Supplementary Table 7), and 2122 interactions (Fig. 6, Supplementary Fig. 5c). The curated network only considers DEGs and any genes automatically added by the databases were excluded to ensure a reliable core network based solely on data. Based on known protein-protein interactions, the DEGs integrate into a close-knit core network in which several DEGs form central nodes (Fig. 6a, Supplementary Fig. 5). To evaluate the significance of the DEG-based PPI (protein-protein interaction) network produced by STIRNGdb (v10), we compared the DEG-based network with corresponding random networks generated from sets of 292 randomly chosen genes excluding DEGs. Based on 50 random networks, we show that the DEG-based network includes significantly more protein-coding genes and interactions than by chance (Fig. 6b) and that the network structure in term of degree distribution is significantly distinct as evaluated by Wilcoxon test (p = 2.22e-16) and indicates the mechanistic character of the network (Fig. 6c).
The network of genes dysregulated by the presence of the PINK1 mutation includes genes related to other PD-associated pathways, which is intriguing, since it was generally assumed that each PD-associated mutation leads to PD pathology via an independent, characteristic path. For example, two DEGs, GOPC and GPC3 45,46, interact with the PD-associated gene DJ-1 (PARK7) 2,47. The DEG network also includes genes of the LRRK2 (PARK8) network 2,47, namely ENAH, HSPA8, MYL6, MALAT1, and SNHG5 (Table 2). SNHG5 and MALAT1 interact with LRRK2 via miR-205-5p 44,45. DLK1 and MALAT1 mediate a-synuclein accumulation 48,49. In fact, the DLK1-NURR1 interaction involved in this process may be mDA neuron-specific 50, highlightihe necessity to use mDA neurons for the study of PD-related pathways. Additionally, MALAT1 was shown to increase a-synuclein protein expression 51. In short, this suggests that interactions leading to PD pathology are more complex than one mutation leading via one path to PD, as generally thought, but it also indicates that there are likely many druggable targets that may be useful in treating PD, and that these may be universally effective for PD caused by several different mutations, and perhaps even for idiopathic PD. For example, terazosin, which is already in clinical use, was found to be associated with slower disease progression, likely by enhancing the activity of phosphoglycerate kinase 1 (PGK1) 52, one of the top DEGs identified in our study.
For the evaluation of the relative importance of each node within the network, we applied betweenness centrality 42 (Supplementary Fig. 5a-c). The major nodes of these networks are formed by genes that were already shown to play an important role in PD pathology (Table 3). Next, we built a correlation network (p-value < 0.05, r > 0.1) of the 246 DEGs based on the normalized counts (Fig. 6e). By extracting the common interactions of these two networks, we obtained a network with 297 interactions (Fig. 6d, e and Supplementary Table 7), which highlights protein-protein connections that correlate with differential expression of the genes. This analysis further supports the role of the connections between these genes in mediating the resulting differential expression in the presence of the PINK1 mutation. STRING was subsequently used to highlight functional pathways represented within the DEG network (Suppplementary Fig. 6 and Supplementary Table 8). Several pathways known to play a role in PD pathology are significantly represented within the network, notably ubiquitination12,53, mitochondrial pathways6,54, cellular response to stress55, lysosomal proteins56, protein metabolism (localization, modification, transport, folding and stability), RNA processing 57, aromatic compound metabolism 58–61, vesicle mediated transport and exocytosis 62, and cellular catabolic processes55,56. Interestingly, the DEGs include genes of the KEGG-PD43 pathway and the CHCHD2 gene, which was recently identified as a PD-associated gene and named PARK22 47,63,64.
To investigate further how the identified network relates to other known PD mechanisms, PD-associated genes, also known as PARK genes (Supplementary Table 9, Fig. 7), were added to the DEG network. Next, PARK-PARK interactions were removed and only PARK-DEG interactions were retained to test how PARK genes integrate into the network. All 19 protein-coding PARK genes 2,47 interact directly with at least one, but usually several DEGs (Supplementary Fig. 7). The degree of interaction of PARK genes with the DEGs of the network is illustrated by coloring (in pink) DEGs that directly interact with a PARK gene. The darker the color, the greater the number of PARK genes the DEG interacts with. The central nodes of the network generally interact with several PARK genes, suggesting that they play a central role in linking the PARK genes to the network, but also that PARK genes may mediate PD pathology through a few central pathways of this network, and that the effects of different PARK genes converge on the same set of pathways.
Further analysis revealed that a large number of the DEGs interact with genes associated with mitochondria or ubiquitination. For this analysis, we used BioGRID 44,65 to identify interactions with mitochondrial or ubiquitination proteins for the top 172 DEGs of groups A-C (Fig. 6f). These interactions were used to create a network illustrating that many of the DEGs in our study directly interact with genes involved in mitochondrial function and in ubiquitination. Thereby, only direct DEG – mitochondrial gene or DEG – ubiquitination gene interactions were included and PARK genes were added for reference (Supplementary Fig. 8).
Based on manual literature search, we determined that at least 68% of the DEGs (174 of 255 genes, not including pseudogenes and RNA genes) are already directly associated with PD, either experimentally, or as significant links in GWAS-PD, or by PD expression studies (Fig.7, Supplementary Table 10). This is particularly true for the major nodes of the network (Supplementary Fig. 9).