The minimal common region of 18q loss spans 14 genes expressed in undifferentiated hESCs and includes SALL3
We initially identified 18q losses in the hESC lines VUB04 and VUB26, in the form of a derivative chromosome 18. VUB04 presented a deletion at 18q21.2qter and a duplication at 5q14.2qter, and VUB26 showed the minimal 18q loss region (18q23qter) and a duplication at 7q33qter (Fig 1A. Sup. Fig. 1 and Sup. Table 1)48. These two hESC lines were not used in the present work because they further genetically drifted and acquired gain of 1q and 20q11.21, but their analysis helped to narrow down the common 18q deletion region. In this study, we used two other hESC lines bearing derivative chromosomes 18 involving a loss of 18q losses (hESCdel18q), VUB14del18q and VUB13del18q, as well as three chromosomally balanced lines (hESCWT), VUB14WT, and VUB04WT and VUB03WT, which served as controls for VUB13del18q since VUB13WT was lost (details on the karyotypes of the lines and their characterization are shown in Sup. Fig. 1 and Sup. Table 1). We used shallow genome sequencing to confirm the karyotypes before starting the experiments and all lines were routinely inspected with qPCR assays targeting recurrent chromosomal abnormalities (1q, 12p, 20q11.21 and 17q) to confirm their genomic stability for the duration of the different experiments.
The 18q losses exhibited a common loss region from bp 75,773,285 to 80,373,285, spanning 37 loci. Bulk RNA sequencing of undifferentiated hESCs indicated that 14 genes within this region are expressed in undifferentiated hESCs, with counts per million greater than one in at least two samples (Fig. 1A). Of these coding genes, ADNP2, SALL3 and TXNL4A had the highest expression and showed decreased transcript levels in mutant cells. ADNP2 and TXNL4A have no known function in hPSC. ADNP2 is predicted to be a transcription factor, and its silencing increases oxidative stress-mediated cell death49. TXNL4A is a component of the U5 small ribonucleoprotein particle, which is involved in pre-mRNA splicing and is associated with Burn-McKeown syndrome50. SALL3 was more promising as candidate driver gene as it has previously been reported to regulate the differentiation propensity of hiPSC lines51. Kuroda et al. showed that hiPSC lines expressing high levels of SALL3 differentiated preferentially into ectoderm, while hiPSC lines expressing lower levels of SALL3 tended to differentiate into mesoderm and endoderm51. It has also been shown that SALL3 interacts with the Mediator complex in neural stem cells52 and is related to the development of the nervous system53. Considering these previous findings, we hypothesized that decreased expression of SALL3 as a result of a loss of one copy of the gene could alter the differentiation capacity of hESCsdel18q.
hESCs with 18q loss show impaired neuroectoderm differentiation
As a first step, we investigated the effect of 18q deletion on hESC ectoderm lineage commitment. hESCWT and hESCdel18q were subjected to neuroectoderm differentiation for 8 days using LDN193189 (LDN), SB431542 (SB) and retinoic acid (RA)54 (Fig. 1B). We measured the mRNA levels of different neuroectoderm markers to evaluate neuroectoderm differentiation efficiency and the expression of the undifferentiated state markers NANOG and POUF51 (Fig. 2A and Sup Fig. 2A). VUB13del18q and VUB14del18q had significantly lower mRNA levels of PAX6, NES and SOX1, which were decreased by 3-fold, 15-fold, and 25-fold, respectively, compared to the levels in hESCWT (p≤0.0001 unpaired t-test), indicating a decreased neuroectodermal differentiation efficiency in hESCdel18q. POU5F1 and NANOG mRNA expression levels were almost undetectable for all differentiated cells (Sup Fig. 2A). We also evaluated the differentiation of hESCWT and hESCdel18q cells by immunostaining (Fig. 2B-C). We observed a lower percentage of PAX6 positive cells in differentiated hESCdel18q than in differentiated hESCWT cells (45% vs 70%, p = 0.0114, unpaired t-test, Fig. 2C), which was consistent with the decrease in the levels of PAX6 mRNA. Taken together, these results show that hESCsdel18q differentiation into neuroectoderm is impaired, and rather than to remain undifferentiated state, they miss-specify.
hESCdel18q readily differentiates into mesendoderm derivatives but shows abnormal cardiomyocyte progenitor differentiation
We further investigated the impact of 18q deletion on the mesendoderm differentiation capacity of hESCs by differentiating hESCWT and hESCdel18q into, in this case cardiac progenitors and hepatoblasts. Thus, we induced the differentiation of hESCWT and hESCdel18q into mesoderm using a 5-day cardiac progenitor induction protocol as described previously55 (Fig. 1B). We evaluated the mRNA levels of the cardiac progenitor markers GATA4, ISL1, NKX2-5 and PDGFRA (Fig. 3A). hESCdel18q showed 2-fold lower levels of GATA4 mRNA (p = 0.0012 unpaired t- test) (Fig. 3A). hESCdel18q lines expressed ISL1 at 15-fold higher levels, on average (p=0.0016, unpaired t-test), than hESCWT lines. We found no significant difference in the expression levels of NKX2-5 or PDGFRA between the control and mutant cardiac progenitor groups (p=0.11 and p=0.42, unpaired t-test, Fig. 3A). We also evaluated the proportion of differentiated cardiac progenitor and undifferentiated cells by immunostaining for GATA4 and POU5F1. The percentage of GATA4-positive cells was not statistically significantly different in hESCdel18q than in hESCWT (p=0.2942, unpaired t-test, Fig. 3B-C), mainly due to the low differentiation efficiency of VUB03WT, while the cells were all POU5F1 negative. Taken together, and bearing in mind the temporal expression of these markers during cardiac differentiation56,57, the results suggest that hESCdel18q may experience differentiation delays or arrest, reaching an ISL1high GATA4low stage at day 5 but remaining less mature than their hESCWT counterparts.
Next, we differentiated hESCWT and hESCdel18q into hepatoblasts by applying the modified differentiation protocol for 8 days as described previously58 (Fig. 1B). We measured the expression levels of the hepatoblast markers HNF4A, AFP, ALB and FOXA2 (Fig. 4A). We found no significant difference in the mRNA expression levels of HNF4A, ALB, and FOXA2 between hESCWT and hESCdel18q (p>0.05, unpaired t-test), while AFP had a lower expression in hESCdel18q (Fig. 4A, p=0.0095, unpaired t-test). We further evaluated hepatoblast differentiation by immunostaining for HNF4A and found that the percentages of HNF4A-positive cells in hESCdel18q cells were similar to those in the wild type (Fig. 4B-C, p=0.2514, unpaired t-test).
For both mesodermal and endodermal lineage commitment, all hESCWT and hESCdel18q lines displayed low mRNA and protein levels of undifferentiated state markers (Sup Fig. 2B-C), indicating a loss of pluripotency for all cell lines during differentiation. Overall, our results indicate that hESCWT and hESCdel18q differentiate into definitive mesoderm and endoderm and that there may be a delay or impairment in the progression of hESCdel18q toward the cardiac progenitor stage. The differences in gene expression observed during hepatoblast differentiation are likely due to between-line variation in differentiation propensity rather than to the 18q deletion itself.
Downregulation of SALL3 impairs neuroectoderm differentiation but does not affect differentiation into mesoderm and endoderm
We next examined SALL3 mRNA expression levels in undifferentiated cells and found that SALL3 expression was significantly lower in hESCdel18q than in hESCWT (Sup Fig. 3A), supporting the notion that SALL3 could be a key gene in the altered differentiation capacity of hESCsdel18q. We first generated SALL3 knockdown (KD) lines from three hESCWT lines (VUB02WT, VUB03WT and VUB04WT) by transducing a lentiviral vector containing shRNA targeting the SALL3 transcript (hESCWT_SALL3KD) or a nontargeting shRNA as a control (hESCWT-NT). We confirmed the knockdown efficiency of the generated hESCWTSALL3KD lines by measuring the SALL3 mRNA expression levels and found that they were reduced by 30%, 40% and 20% in VUB02WT_SALL3KD, VUB03WT_SALL3KD and VUB04WT_SALL3KD respectively, compared to the controls (Sup Fig. 3B). Next, we generated hESCdel18q with stable overexpression of SALL3 (hESCdel18q_SALL3OE) by transducing VUB13del18q and VUB14del18q with the SALL3 lentiviral vector, and we verified the overexpression by measuring SALL3 mRNA levels, which were significantly increased in VUB13del18q_SALL3OE (3-fold) and VUB14del18q_SALL3OE (5-fold) compared to controls (Sup Fig. 3C).
To investigate the role of SALL3 in regulating hESC differentiation propensity, we first induced neuroectodermal differentiation in hESCWT_SALL3KD, hESCdel18q_SALL3OE, and the corresponding control cells. All three hESCWT_SALL3KD lines had lower levels of all NE markers than nontarget controls (Fig. 5A and Sup Fig. 4A-B). PAX6 protein levels were also reduced in hESCWT_SALL3KD (Fig. 5B-C), with only 20% of the cells expressing PAX6, compared to 80% of PAX6 positive cells in the control group (Fig. 5B-C). Our results show that SALL3 suppression in hESCWT recapitulates the impaired NE differentiation seen in hESCdel18q lines. In contrast, hESCdel18q_SALL3OE cells efficiently differentiated into neuroectoderm cells, accompanied by a significant increase in the mRNA levels of PAX6, SOX1, and NES (Fig. 5A and Sup Fig. 4A-B); moreover, hESCdel18qSALL3OE cultures included more PAX6 positive cells than hESCdel18q cultures (60% vs 20%, respectively) (Fig. 5B-C). These results indicate that exongenous SALL3 expression can rescue the impairment of ectoderm differentiation caused by 18q loss.
Next, we induced the differentiation of the different hESC lines into cardiac progenitors. The three hESCWT_SALL3KD lines differentiated inconsistently toward mesoderm fates. VUB04WT_SALL3KD showed lower levels of GATA4 (Fig. 6A, p=0.0003, unpaired t-test), whereas compared to the controls, VUB03WT_SALL3KD and VUB02WT_SALL3KD showed no differences in GATA4 expression (Fig. 6A). Additionally, the percentage of GATA4+ cells detected by immunostaining in hESCWT_SALL3KD followed the same pattern, consistent with the mRNA levels of each line (Fig. 6B-C). The hESCdel18q_SALL3OE cells exhibited variable marker profiles, with increases in GATA4 expression at both the mRNA (Fig. 6A, p=0.004, unpaired t-test) and protein levels for VUB13del18q_SALL3OE but no difference in GATA4 mRNA levels (Fig. 6A, p=0.3622, unpaired t-test) and a slight decrease in GATA4 protein levels in VUB14del18q_SALL3OE (Fig. 6B-C). Similarly, the mRNA expression levels of other markers, NKX2-5, ISL1, and PDGFRA, showed no consistent trend in the hESCWT_SALL3KD groups and exhibited no consistent differences in hESCsdel18q and hESCdel18qSALL3OE (Sup Fig. 5A-C).
When hESCs were differentiated into hepatoblast, SALL3 downregulation in hESCWT resulted in increased HNF4A, ALB, and FOXA2 mRNA expression (Fig. 7A and Sup Fig. 4B-C). The percentage of HNF4A positive cells was higher in VUB03 WTSALL3KD cells but not in VUB04WT_SALL3KD and VUB02WT_SALL3KD cells compared to controls (Fig. 7B-C). The mRNA expression of another marker, AFP, also did not show consistent changes (Sup Fig. 4A). Upon overexpression of SALL3, the differentiation profiles into hepatoblast did not show the expected mirroring effect. The mRNA expression of all the hepatoblast markers HNF4A, ALB, AFP and FOXA2 increased in VUB14del18q_SALL3OE cells, consistent with the changes observed in the HNF4A protein level, but this same effect was not observed in VUB13del18q_SALL3OE cells (Fig. 7 and Sup Fig. 4).
Overall, these results suggest that the effect of SALL3 on mesoderm and endoderm differentiation may be line-specific and is not the reason for the delayed progression seen during cardiac differentiation in hESCs18q.
Downregulation of SALL3 and loss of 18q result in the deregulation of genes in pathways associated with pluripotency and differentiation
To gain deeper insight into the effect of SALL3 downregulation on the global transcriptomic profile of cells with 18q loss, we carried out bulk RNA sequencing of hESCWT-NT (N=6), hESCWT_SALL3KD (N=6), hESCWT (N=6), VUB13del18q (N=4) and VUB13del18q_SALL3OE (N=5) cells. To estimate the similarity among the samples, we generated a distance clustering heatmap with global row scaling (Fig. 8A) and performed principal component analysis (Fig. 8B). The heatmap shows that while VUB13del18q and hESCWT_SALL3KD cluster together, they cluster apart from the WT cell lines, as well as from VUB13del18q_SALL3OE. The samples in this last group clustered more closely to the WT cell lines than to their unmodified VUB13del18q (Fig. 8A). This pattern was also reflected in the first dimension of the PCA (Fig. 8B), where hESCWT, hESCWT-NT and VUB13del18q_SALL3OE clustered more closely with each other than with VUB13del18q and hESCWT_SALL3KD (Sup Fig. 6A). These results suggest that the downregulation of SALL3 in WT cells is sufficient to alter the transcriptome such that its profile is closer to that of an hESC line with a loss of 18q, and SALL3 overexpression in hESCdel18q can restore the transcriptome to a near-WT state. Taken together, these findings support the our hypothesis that the differences between hESCWT and hESCdel18q are mostly driven by the downregulation of SALL3 due to the loss of one copy of this gene. For this reason, we pooled the hESCsWT-NT with the untreated hESCsWT for further analysis.
Next, we carried out differential gene expression analysis to gain further insight into which genes and pathways form the basis of the differences between hESCWT and hESCdel18q and which of these are driven by SALL3. Figure 8B-D shows volcano plots of the differentially expressed genes in the different groups. We considered this differential expression to be significant at a |log2fold-change| > 1.0 and false discovery rate (FDR)< 0.05. VUB13del18q shows 891 and 448 genes that are significantly upregulated and downregulated in hESCdel18q, respectively, compared to the WT cells (Fig 8C). Downregulation of SALL3 in WT cells led to the differential expression of 745 genes, of which 399 were upregulated and 346 were downregulated (Fig 8D). Overexpression of SALL3 in VUB13del18q resulted in the upregulation of 121 genes and the downregulation of 605 genes (Fig 8E).
To elucidate which of the transcriptional differences between hESCWT and hESCdel18q are mediated by SALL3, we investigated the overlaps in up- and downregulated genes across conditions. First, we found that 231 and 93 genes were commonly up- and downregulated, respectively, between VUB13del18q and hESCWT_SALL3KD, representing 26% of the differentially expressed genes in hESCs with an 18q deletion (Fig 8F and G). Next, we compared these subsets of genes to the genes with altered gene expression in hESCdel18q upon overexpression of SALL3. We compared the genes that were upregulated by SALL3 knockdown and by loss of 18q to those downregulated by overexpression of SALL3 in hESC18q, and vice versa (Fig 8F and G). Out of the 231 upregulated genes shared by the VUB13del18q and hESCWT_SALL3KD groups, 173 genes showed increased expression in VUB13del18q_SALL3OE (Fig 8F). Additionally, 18 of the 93 downregulated genes shared by VUB13del18q and hESCWT_SALL3KD displayed increased expression in VUB13del18qSALL3OE (Fig 8G). This further refined the gene set to a core of 191 genes that are the most strongly regulated by SALL3 in hESCs, both by the loss of a copy of the gene itself and by the modulation of its expression (Supplementary Table 2). The other differences in gene expression are likely associated with the loss of other genes in 18q or with the overexpression of genes in the duplicated regions of chromosomes 5 and 7, which are part of the derivative chromosome 18 in hESCdel18q.
Finally, to identify potential molecular targets and elucidate the underlying functional mechanisms that contribute to the observed impairment of differentiation capacity in hESCdel18q, we analyzed the differential gene expression of the three groups using Gene Set Enrichment Analysis (GSEA) and the MSigDB database. Specifically, we focused on the Kyoto Encyclopedia of Genes and Genomes (KEGG) and WikiPathways databases within the C2 library and the pathways of the H library. We filtered the significant pathways based on a normalized enrichment score |NES| > 1, a p-value < 0.05, and the proportion of leading-edge genes accounting for over 30% of the entire gene set involved in the pathway.
Figures 8H and I show Venn diagrams of the overlap between significantly enriched pathways in the C2 and H libraries, respectively, for each of the three groups. The full list can be found in the Supplementary Table 3. In total, we found 13 pathways that overlapped among the three groups, all 13 of which were positively enriched in both VUB13del18q and hESCWT_SALL3KD and negatively enriched in VUB13del18q_SALL3OE (Fig 8J shows the cell-type relevant pathways; the whole set can be found in the Supplementary Table 4). Because of the critical role that these pathways play in pluripotency maintenance and differentiation, we analyzed the expression of pluripotency-associated genes, and we found that NANOG, POU5F1, LIN28, SOX2, PODXL, SUSD2, MYC, FOXD3 and DPPA3 are overexpressed in VUB13del18q, with all but DPPA3 and POU5F1 also being overexpressed upon SALL3KD and downregulated by transgenic SALL3 overexpression in VUB13del18q (Sup Fig. 7A).