Human pluripotent stem cell (hPSC) culture
H1 hESCs (Thomson et al., 1998), H9 hESCs (Thomson et al., 1998), NANOS3-mCherry WIS1 hESCs (Irie et al., 2015), SOX17-GFP H9 hESCs (Wang et al., 2011), NANOG-2A-YFP H9 hESCs (Martin et al., 2020), BJC1 hiPSCs (Durruthy-Durruthy et al., 2014), BJC3 hiPSCs (Durruthy-Durruthy et al., 2014), and BIRc3 hiPSCs were routinely propagated feeder-free in mTeSR1 medium + 1% penicillin/streptomycin (StemCell Technologies) on cell culture plastics coated with Matrigel (Corning). Undifferentiated hPSCs were maintained at high quality with particular care to avoid any spontaneous differentiation, which would confound downstream differentiation.
In the NANOS3-mCherry hESC line, a 2A-mCherry fluorescent reporter was inserted immediately downstream of the NANOS3 gene without disrupting its coding sequence (Irie et al., 2015). In the SOX17-GFP hESC line, a GFP fluorescent reporter was inserted immediately after the SOX17 start codon, thus functionally invalidating one SOX17 allele (Wang et al., 2011). In the NANOG-2A-YFP hESC line, Cas9 RNP/AAV6-based genome editing (Martin et al., 2019) was used to insert a 2A-iCaspase9-2A-YFP fluorescent reporter immediately downstream of the NANOG gene without disrupting the NANOG coding sequence (Martin et al., 2020).
hPSC differentiation into PGCs
Undifferentiated hPSCs were maintained in mTeSR1 + 1% penicillin/streptomycin and enzymatically passaged (Accutase, 1:8-1:12 split) for differentiation. After overnight recovery in mTeSR1 + Thiazovivin (ROCK inhibitor, 5 μM), the following morning, hPSCs were briefly washed (DMEM/F12) and differentiated into posterior epiblast for 12 hours (100 ng/mL Activin + 3 μM CHIR99021 + 10 μM Y-27632) in aRB27 basal media, which comprised Advanced RPMI 1640 medium supplemented with 1% B27 supplement, 0.1 mM non-essential amino acids (NEAA), 100 U/ml Penicillin + 0.1 mg/ml Streptomycin, and 2 mM L-Glutamine (Kobayashi et al., 2017). Subsequently, cells were washed once more and treated with 40 ng/mL BMP4, 1 μM XAV939, and 10 μM Y-27632 for 24 hours, then 100 ng/mL SCF, 50 ng/mL EGF, 1 μM XAV939, and 10 μM Y-27632 for an additional 24 hrs, and finally 40 ng/mL BMP4, 100 ng/mL SCF, 50 ng/mL EGF, 1 μM XAV939, and 10 μM Y-27632 for 24 hours (all in aRB27 basal media). For comparison, published PGC differentiation protocols (Sasaki et al., 2015; Kobayashi et al., 2017) were performed as described previously.
The following activators and inhibitors of signaling pathways were used for differentiation:
Item name
|
Company
|
Catalog no.
|
Activin A
|
Peprotech
|
120-14E
|
BMP4
|
R&D Systems
|
314-BP-050
|
CHIR99021
|
Tocris
|
4423
|
EGF
|
Peprotech
|
AF-100-15
|
LIF
|
Peprotech
|
300-05
|
SCF
|
Peprotech
|
300-07
|
TC-S 7001
|
Tocris
|
4961
|
Thiazovivin
|
Tocris
|
3845
|
XAV939
|
Tocris
|
3748
|
Immunostaining
Cells were grown on Matrigel-coated glass coverslips (Fisher). Cells were washed with PBS and then fixed with 4% PFA (paraformaldehyde) for 15 min. Coverslips were then washed with PBS, permeabilized with 0.2% Triton X-100 and blocked with PBS-BT (3% BSA + 0.1% Triton X-100 + 0.02% sodium azide in PBS) for at least 30 min. Coverslips were incubated with primary antibodies diluted in PBS-BT overnight, and then washed with PBS-BT, subsequently incubated with secondary antibodies diluted in PBS-BT for 45 min, and then washed again. Finally, samples were mounted in ProLong Diamond anti-fade mountant with DAPI (Thermo Fisher Scientific). Images were acquired on a Leica DM4000 B (Leica Microsystems, Inc., IL, USA) equipped with a QImaging Retiga-2000R (Teledyne Photometrics, AZ, USA) digital camera using a 40X objective, and processed using FIJI (v.1.52p).
The following antibodies were used for immunostaining:
Antibody
|
Company
|
Cat. No.
|
Dilution
|
Anti-AP2g
|
Santa Cruz Biotech
|
Sc-12762
|
1:50
|
Anti-Blimp1
|
eBioscience
|
14-5963-82
|
1:50
|
Anti-Hand1
|
R&D Systems
|
AF3168
|
1:100
|
Anti-Nanog
|
Abcam
|
ab109250
|
1:250
|
Anti-Oct4
|
BD
|
611203
|
1:100
|
Anti-Sox17
|
R&D Systems
|
AF1924
|
1:500
|
High-throughput cell-surface marker screening
hESCs or differentiated PGCs were dissociated (using Accutase) and plated into individual wells of four 96-well plates, each well containing a distinct antibody against a human cell surface antigen, altogether totaling 371 unique cell-surface markers across multiple 96-well plates (LEGENDScreen PE-Conjugated Human Antibody Plates; Biolegend, 700007). For each LEGENDScreen experiment, approximately 10-70 million cells of each lineage were used. High-throughput cell-surface marker staining was largely done as per the manufacturer’s recommendations, and cells were stained with a viability dye (DAPI, 1.1 µM; Biolegend) prior to analysis on a CytoFLEX Flow Cytometer (Stanford Stem Cell Institute FACS Core). Stained cells were not fixed prior to FACS analysis. Sometimes, after lysophilized antibodies were reconstituted in LEGENDScreen plates they were aliquoted into a separate plate to generate replicates of antibody arrays. Undifferentiated H9 hESCs and SOX17-GFP H9 hESCs were used for LEGENDScreen analyses.
Flow cytometry and fluorescence-activated cell sorting (FACS) for cell-surface marker expression
hPSCs or their differentiated derivatives were dissociated using TrypLE Express (Gibco), were washed off the plate with FACS buffer (PBS + 0.1% BSA fraction V [Gibco] + 1 mM EDTA [Gibco] + 1% penicillin/streptomycin [Gibco]) and were pelleted by centrifugation (5 mins, 4 °C). Subsequently, cell pellets were directly resuspended in FACS buffer containing pre-diluted primary antibodies (listed below), thoroughly triturated to ensure a single cell suspension, and primary antibody staining was conducted for 30 mins on ice. Afterwards, cells were washed with an excess of FACS buffer and pelleted again, and this was conducted one more time. Finally, washed cell pellets were resuspended in FACS buffer containing 1.1 µM DAPI (Biolegend), and were strained through a 35 µm filter. Flow cytometry and sorting was conducted on a BD FACSAria II (Stanford Stem Cell Institute FACS Core).
The following antibodies were used for flow cytometry:
Antibody
|
Company
|
Catalog Number
|
Dilution
|
CXCR4 APC
|
BD Biosciences
|
560936
|
1:5
|
GARP FITC
|
eBioscience
|
11-9882-41
|
1:20
|
PDGFRα PE
|
BD Biosciences
|
556002
|
1:20
|
RNA extraction and reverse transcription
In general, RNA was extracted from undifferentiated or differentiated hPSC populations plated in 12-well format by lysing them with 350 µL RLT Plus Buffer per well. RNA was extracted with the RNeasy Plus Mini Kit (Qiagen) as per the manufacturer’s instructions. Generally 50-200 ng of total RNA was reverse-transcribed with the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) to generate cDNA libraries for qPCR.
Quantitative PCR (qPCR)
Total cDNA was diluted 1:10-1:30 in H2O and qPCR was performed with the SensiFAST SYBR Hi-ROX Kit (Bioline) with 10 µL qPCR reactions per well in a 384-well plate: each individual reaction contained 5 µL 2x SensiFAST SYBR qPCR Master Mix + 4.2 µL cDNA (totaling ~120 ng of cDNA) + 0.8 µL of 10 µM primer stock (5 µM forward + 5 µM reverse primers). In general, gene-specific primer pairs for qPCR were tested for (1) specificity of amplicon amplification (only one peak on a dissociation curve) and (2) linearity of amplicon amplification (linear detection of gene expression in cDNA samples serially diluted seven times over two orders of magnitude, with 90-110% efficiency of amplification deemed acceptably linear). After qPCR plates were prepared by arraying sample-specific cDNAs and gene-specific primers (listed below), they were sealed and briefly centrifuged (5 mins). 384-well qPCR plates and their adhesive sealing sheets were obtained from Thermo (AB1384 and AB0558, respectively). qPCR plates were run on a 7900HT Fast Real-Time PCR System (Applied Biosystems) with the following cycling parameters: initial dissociation (95 °C, 2 mins) followed by 40 cycles of amplification and SYBR signal detection (95 °C dissociation, 5 seconds; 60 °C annealing, 10 seconds; followed by 72 °C extension, 30 seconds), with a final series of steps to generate a dissociation curve at the end of each qPCR run. During qPCR data analysis, the fluorescence threshold to determine Ct values was set at the linear phase of amplification.
The following primers were used for qPCR analysis:
Gene Name
|
Forward
|
Reverse
|
BRACHYURY
|
TGCTTCCCTGAGACCCAGTT
|
GATCACTTCTTTCCTTTGCATCAAG
|
CDX2
|
GGGCTCTCTGAGAGGCAGGT
|
CCTTTGCTCTGCGGTTCTG
|
KIT
|
GGATTCCCAGAGCCCACAA
|
ACATCCACTGGCAGTACAGAA
|
FOXA2
|
GGGAGCGGTGAAGATGGA
|
TCATGTTGCTCACGGAGGAGTA
|
FZD8
|
ATCGGCTACAACTACACCTACA
|
GTACATGCTGCACAGGAAGAA
|
GATA4
|
TCCCTCTTCCCTCCTCAAAT
|
TCAGCGTGTAAAGGCATCTG
|
HHEX
|
CACCCGACGCCCTTTTACAT
|
GAAGGCTGGATGGATCGGC
|
MIXL1
|
GGTACCCCGACATCCACTTG
|
TAATCTCCGGCCTAGCCAAA
|
NANOG
|
AGAACTCTCCAACATCCTGAACCTC
|
CTGAGGCCTTCTGCGTCACA
|
NANOS3
|
ACTTACTGGCCAGGGCTACAC
|
ACTTCCCGGCACCTCTGAA
|
OCT4 (POU5F1)
|
AGTGAGAGGCAACCTGGAGA
|
ACACTCGGACCACATCCTTC
|
PAX5
|
AAACCAAAGGTCGCCACAC
|
GTTGATGGAACTGACGCTAGG
|
PRDM1
|
TCTCCAATCTGAAGGTCCACCTG
|
GATTGCTGGTGCTGCTAAATCTCTT
|
SHISA2
|
TTCCTTTACTGAAGGGAGACGAAGG
|
CCATCCAAAGGAATCGTGCCATAAA
|
SOX17
|
CGCACGGAATTTGAACAGTA
|
GGATCAGGGACCTGTCACAC
|
SOX2
|
TGGACAGTTACGCGCACAT
|
CGAGTAGGACATGCTGTAGGT
|
TFAP2C
|
ATTAAGAGGATGCTGGGCTCTG
|
CACTGTACTGCACACTCACCTT
|
TFCP2L1
|
AGCTCAAAGTTGTCCTACTGCC
|
TTCTAACCCAAGCACAGATCCC
|
VASA (DDX4)
|
GTGCCCTATGTGCCGTTAC
|
GGCTGACGTTGGACTGAGG
|
Single cell RNA-sequencing
We performed single-cell RNA-sequencing of undifferentiated H9 hPSCs as well as differentiated D0.5 posterior epiblast, D3.5 PGCLCs (bulk population), D3.5 FACS-sorted CXCR4+ PDGFRA- GARP- PGCLCs, and D2 definitive endoderm populations. Cells of various stages were dissociated and washed twice in wash buffer (0.04% Bovine Serum Albumin in Ca2+/Mg2+-free PBS) and counted on the Countess II automated cell counter (Thermo Fisher). For each cell population, 10,400 cells were loaded per lane on the 10x Genomics Chromium platform, with the goal of capturing 6,000 cells. Cells were then processed for cDNA synthesis and library preparation using 10X Genomics Chromium Version 2 chemistry (catalog number 120234) as per the manufacturer’s protocol. cDNA libraries were checked for quality on the Agilent 4200 Tape Station platform and their concentration was quantified by KAPA qPCR. Libraries were sequenced using a HiSeq 4000 (Illumina) to a depth of, at a minimum, 70,000 reads per cell.
Single cell RNA-seq computational analysis of hPSC-derived cell-types
Illumina base call files were converted to FASTQ files using the Cell Ranger v2.0 program. FASTQ files were then aligned to the hg19 human reference genome using Cell Ranger. The Seurat R package (v2.3.1) (Butler et al., 2018) was used for subsequent analyses. Cells from all the various timepoints were first combined into a single Seurat object. For quality control, we first filtered out low-quality cells that expressed fewer than 2,500 genes; we also excluded cells that expressed more than 7,500 genes (which would imply doublets) or that expressed more than 0.15% mitochondrial genes (indicative of dead cells in this dataset). Counts were normalized and scaled by a factor of 10,000. To adjust for cell cycle effects, S phase and G2M genes were regressed out before Principal Component Analysis (PCA) was performed using the highly-variable genes.
For further analyses, 1,000 cells were randomly sampled from each of the 5 data sets (D0, D0.5, D3.5 sorted, D3.5 unsorted and D2 definitive endoderm) and then combined into a new file for further analysis. The top six principal components were used for clustering using the Shared Nearest Neighbor (SNN) algorithm, which was implemented via the FindCluster function in Seurat (Butler et al., 2018). Clusters were visualized in t-SNE dimensional reduction plots with 3-dimensional embedding. Differentially expressed genes between clusters were identified using the Wilcoxon rank sum test, which was performed via the Seurat package (Butler et al., 2018). For all other independent library analyses, the following numbers of principal components were used: Day 3.5 sorted library (15 principal components); Day 3.5 unsorted library (10 principal components); Day 0 vs. Day 0.5 analysis (10 principal components).
Starting from genes that were differentially expressed between each cell-type, we specifically discovered transcription factors (TFs (Chawla et al., 2013)), cell-surface proteins (Bausch-Fluck et al., 2015), and signaling ligands and receptors (Graeber and Eisenberg, 2001) whose expression was enriched in each cell population, using published and curated lists for each set of genes.
All single-cell RNA-seq plots were generated using Seurat (Butler et al., 2018) and ggplot2 (https://ggplot2.tidyverse.org/) R packages.
Finally, we computationally inferred a trajectory for progression from D0 hPSCs to D0.5 posterior epiblast to the D3.5 PGCLC-containing population (Fig. 4f). For this analysis, we used the whole, unsorted D3.5 population, which contains both PGCLCs and non-PGCLCs (i.e., mesoderm-like cells) in order to capture the divergence between these two mutually-exclusive lineages. For trajectory inference, we only used genes expressed in at least 10 cells. The top 1000 differentially expressed genes between these 3 cell populations (D0, D0.5 and D3.5) were used for pseudotemporal ordering and dimensional reduction was done with DDRTree using the Monocle version 2 package (Qiu et al., 2017).
Single cell RNA-seq computational analysis of human fetal germ cells
Single-cell RNA-seq data of 2629 human fetal gonadal cells in vivo—including both fetal germ cells (FGCs) and gonadal somatic cells—were previously published (Li et al., 2017) and were downloaded from Gene Expression Omnibus (GSE86146). Using the Seurat v3 platform (Stuart et al., 2019), we performed quality control pre-processing on this in vivo dataset to 1) filter out genes that were expressed in fewer than 10 cells, 2) exclude low-quality cells that expressed fewer than 2000 genes, and 3) exclude low-quality cells with mitochondrial gene content greater than 5%; after these quality control steps, we obtained 2321 high-quality cells that we used for the following analysis.
Since our goal was to compare in vivo FGCs against in vitro hPSC-derived PGCLCs (and we used the female H9 hPSC line for our single-cell RNA-seq studies), we then selected female FGCs from the in vivo dataset (Li et al., 2017). A total of 1180 female in vivo cells were available and used; 732 were FGCs with ages spanning from 4 weeks to 24 weeks of fetal life, while the remaining 448 were gonadal somatic cells. The study that originally reported this in vivo FGC dataset classified these FGCs into four transcriptional subclusters (FGC1, FGC2, FGC3 and FGC4) (Li et al., 2017), and in our study, we used the same subclusters for transcriptional comparisons against hPSC-derived PGCLCs.
To compare FGCs against in vitro hPSC-derived PGCLCs, we applied the same quality control pre-processing on the FACS-sorted CXCR4+ PDGFRA- GARP- PGCLC single-cell RNA-seq dataset (with 5447 high-quality cells obtained from the original dataset of 5467 cells). We randomly selected 300 of these hPSC-derived FACS-sorted PGCLCs for downstream analysis, and showed that such random downsampling of the PGCLC population did not substantially affect data quality (Fig. S6a). We then integrated the in vivo FGC and in vitro hPSC-derived PGCLC single-cell RNA-seq datasets using SCTransform (Hafemeister and Satija, 2019).
We obtained the list of differentially expressed genes between the FGC1, FGC2, FGC3 and FGC4 populations in vivo from the original publication (Li et al., 2017), with the exception that we applied a slightly more stringent statistical threshold to identify differentially expressed genes (power > 0.5). For these genes that were differentially expressed between FGC1, FGC2, FGC3 and FGC4, we also quantified the expression levels of such genes in the hPSC-derived PGCLC population. This combined expression table of in vivo vs. in vitro cells is provided in Table S1, and expression values are presented as transcripts per million (TPM) and loge transformation was applied. To determine the similarity in gene expression profiles between hPSC-derived PGCLCs in vitro and FGC1, FGC2, FGC3 and FGC4 in vivo, we performed hierarchical clustering using the Average Linkage method and using a Euclidean distance metric, which showed that FGC1 and hPSC-derived PGCLCs clustered together. Pairwise Pearson’s correlation coefficient was calculated for hPSC-derived PGCLCs in vitro and FGC1, FGC2, FGC3 and FGC4 in vivo using the GGally (1.5.0) R package.