Supplementary Dataset 1-9
Supplementary Data 1. Differentially expressed genes in CLDN6Low EpiSCs vs CLDN6High EpiSCs. Each tab shows list of genes upregulated in CLDN6Low EpiSCs and upregulated in CLDN6High EpiSCs respectively (RNAseq, DESeq2, P ≤ 0.05, fold- change ≥1.5).
Supplementary Data 2. Top enriched genes in each of the 5 clusters identified in scRNAseq via MARS-seq of in vitro differentiated APSD progenitors. DE, definitive endoderm; Anterior PS, anterior primitive streak cells; NMPs, neuromesodermal progenitors. p_val, P-value not adjusted for multiple test correction; avg_logFC, average log2 fold change; pct.1, percentage of cells where the gene is detected in the cluster; pct.2, percentage of cells where the gene is detected on average in the other clusters; p_val_adj, adjusted P-value, based on Bonferroni correction using all genes in the dataset; gene, Ensembl gene ID.
Supplementary Data 3. Top enriched genes in each of the 17 clusters identified upon integration of scRNAseq data from in vitro unsorted WT EpiSCs differentiated towards APSD progenitors and E6.5, E7.0, and E7.75 Eomes-GFP embryos sequenced via MARS-seq. EpiSCs, epiblast stem cells; DE, definitive endoderm; 6hAPS, 6 hour in vitro differentiated anterior primitive streak progenitor cells; Anterior PS, anterior primitive streak cells; NMPs, neuromesodermal progenitors. EpiSC clusters identified by scRNAseq annotated based on enrichment of epithelial-mesenchymal genes also observed in sorted CLDN6Low and CLDN6High EpiSCs via RNAseq. The remaining population was annotated as CLDN6Mid EpiSCs, not part of the sorting study. p_val, P-value not adjusted for multiple test correction; avg_logFC, average log2 fold change; pct.1, percentage of cells where the gene is detected in the cluster; pct.2, percentage of cells where the gene is detected on average in the other clusters; p_val_adj, adjusted P-value, based on Bonferroni correction using all genes in the dataset; gene, Ensembl gene ID.
Supplementary Data 4. List of 110 differentially accessible peaks (DAPs) based on ATACseq of CLDN6Low vs CLDN6High EpiSCs (False discovery rate (FDR) > 0.05). Peak location on chromosome, annotation, distance to transcription start site (TSS), and nearest promoter of genes are indicated. The nearest gene was identified as described in materials and methods.
Supplementary Data 5. List of differentially methylated regions associated to genes (DMGs) in CLDN6High vs CLDN6Low EpiSCs and APS. Each tab shows DMGs between two different cell types (based on CLDN6 levels or cell states).
Supplementary Data 6. Differentially expressed genes in WT vs Pbx1-KO EpiSCs. Each tab shows list of genes upregulated in Pbx1-KO EpiSCs and upregulated in WT EpiSCs respectively (RNAseq, DESeq2, P ≤ 0.05, fold-change ≥1.5).
Supplementary Data 7. List of 576 differentially accessible peaks (DAPs) based on ATACseq of WT vs Pbx1-KO EpiSCs (False discovery rate (FDR) > 0.05). Peak location on chromosome, annotation, distance to transcription start site (TSS), and nearest promoter of genes are indicated. The nearest gene was identified as described in materials and methods.
Supplementary Data 8. List of differentially methylated regions associated to genes (DMGs) in WT vs Pbx1-KO EpiSCs based on Whole Genome Bisulfite Sequencing.
Supplementary Data 9. Protein-protein interaction analyses using STRING database. STRING output on a list of known ERK signalling components, ETS and JUN/FOS proteins, de novo DNA methylation and demethylation enzymes; showing known and predicted interactions, gene ontology functional enrichment, and MCL clusters based on default inflation parameters.