Life histories of myeloproliferative neoplasms inferred from phylogenies

Mutations in cancer-associated genes drive tumour outgrowth, but our knowledge of the timing of driver mutations and subsequent clonal dynamics is limited1–3. Here, using whole-genome sequencing of 1,013 clonal haematopoietic colonies from 12 patients with myeloproliferative neoplasms, we identified 580,133 somatic mutations to reconstruct haematopoietic phylogenies and determine clonal histories. Driver mutations were estimated to occur early in life, including the in utero period. JAK2V617F was estimated to have been acquired by 33 weeks of gestation to 10.8 years of age in 5 patients in whom JAK2V617F was the first event. DNMT3A mutations were acquired by 8 weeks of gestation to 7.6 years of age in 4 patients, and a PPM1D mutation was acquired by 5.8 years of age. Additional genomic events occurred before or following JAK2V617F acquisition and as independent clonal expansions. Sequential driver mutation acquisition was separated by decades across life, often outcompeting ancestral clones. The mean latency between JAK2V617F acquisition and diagnosis was 30 years (range 11–54 years). Estimated historical rates of clonal expansion varied substantially (3% to 190% per year), increased with additional driver mutations, and predicted latency to diagnosis. Our study suggests that early driver mutation acquisition and life-long growth and evolution underlie adult myeloproliferative neoplasms, raising opportunities for earlier intervention and a new model for cancer development. Whole-genome sequencing of 1,013 clonal haematopoietic colonies from myeloproliferative neoplasms of 12 individuals reveals haematopoietic phylogenies and indicates that driver mutations are acquired sequentially, starting early in life.

Mutations in cancer-associated genes drive tumour outgrowth, but our knowledge of the timing of driver mutations and subsequent clonal dynamics is limited 1-3 . Here, using whole-genome sequencing of 1,013 clonal haematopoietic colonies from 12 patients with myeloproliferative neoplasms, we identified 580,133 somatic mutations to reconstruct haematopoietic phylogenies and determine clonal histories. Driver mutations were estimated to occur early in life, including the in utero period. JAK2 V617F was estimated to have been acquired by 33 weeks of gestation to 10.8 years of age in 5 patients in whom JAK2 V617F was the first event. DNMT3A mutations were acquired by 8 weeks of gestation to 7.6 years of age in 4 patients, and a PPM1D mutation was acquired by 5.8 years of age. Additional genomic events occurred before or following JAK2 V617F acquisition and as independent clonal expansions. Sequential driver mutation acquisition was separated by decades across life, often outcompeting ancestral clones. The mean latency between JAK2 V617F acquisition and diagnosis was 30 years (range 11-54 years). Estimated historical rates of clonal expansion varied substantially (3% to 190% per year), increased with additional driver mutations, and predicted latency to diagnosis. Our study suggests that early driver mutation acquisition and life-long growth and evolution underlie adult myeloproliferative neoplasms, raising opportunities for earlier intervention and a new model for cancer development.
Human cancers harbour hundreds to many thousands of somatically acquired DNA mutations, a minority of which drive tumour initiation and progression 1 . These driver mutations occur in recurrently mutated cancer genes and stimulate the cell acquiring it to expand into a clone. With a large enough clonal expansion, typically abetted by further driver mutations, a cancer emerges. Little is known about the ages at which driver mutations occur, the timelines of clonal expansion over an individual's lifetime, or how these relate to clinical presentation with cancer. Some mutational processes accrue at a steady rate across life, representing a 'molecular clock' 4,5 . Knowing this tissue-specific rate of mutation accumulation has enabled broad estimates for the timing of mutations for some cancers 2,3 .
In patients with blood cancers, the observation of normal blood counts months to years before diagnosis suggested that tumour development occurs quickly, and therefore driver mutations must occur late in life. Estimates from cancer incidences in people who survived the Hiroshima or Nagasaki atomic bombings who developed chronic myeloid leukaemia have suggested a mean latency time of only eight years between BCR-ABL1 induction and clinical presentation 6 . However, the presence of driver mutations in normal tissues 7-12 -including blood from healthy individuals with clonal haematopoiesis [13][14][15][16][17] , some of whom subsequently develop malignancies-supports a longer multi-hit trajectory of cancer. Understanding the absolute timelines of cancer evolution is critical for efforts aimed at early detection and intervention, especially if a given cancer takes decades to emerge. Myeloproliferative neoplasms (MPN) are blood cancers driven by somatic driver mutations in haematopoietic stem cells (HSCs) that result in increased mature myeloid cell production 18 , and provide an opportunity to capture early tumourigenesis and disease evolution that are otherwise inaccessible in other malignancies. Most of these patients harbour JAK2 V617F -this can either be the sole driver mutation or occur in combination with additional mutations-whereas others lack known causative driver mutations with uncertainty over the underlying clonal origin 19 . Here we undertake whole-genome sequencing (WGS) of individual single-cell derived haematopoietic colonies and targeted resequencing of longitudinal blood samples in patients with MPN to infer timing of driver mutations and tumour evolutionary dynamics in vivo.
using them to reconstruct lineage relationships among both malignant and normal blood cells in patients with MPN 20 . As somatic mutation burden does not differ significantly between HSCs and myeloid progenitors [20][21][22][23] , we undertook WGS of in vitro expanded single-cell-derived haematopoietic colonies as faithful surrogates for the genomes of their parental HSCs. We 'recaptured' somatic mutations in bulk peripheral blood cells using targeted sequencing to longitudinally track clones (Fig. 1a). We sequenced 1,088 colonies to approximately 16.7× mean depth across 17 time points from 12 patients aged 20-81 years with different subtypes of MPN (Fig. 1b, Extended Data Fig. 1a). The majority of colonies were clonal (Extended Data Fig. 1b). Following filtering for low sequencing coverage and cross-colony contamination, 1,013 colonies were used in subsequent analyses (Supplementary Note 3). We identified 560,978 single nucleotide variants (SNV) and 19,155 small insertions and deletions (Supplementary Note 1). Ten out of 12 patients harboured mutated JAK2, and additional driver mutations were commonly observed in DNMT3A (n = 12), PPM1D (n = 6) and TET2 (n = 5) (Fig. 1c).
We reconstructed phylogenetic trees using the presence or absence of SNV across colonies for individual patients (Fig. 2-4, Supplementary Note 2). A branch antecedent to more than one colony represents mutations shared by downstream colonies, and end branches comprise mutations in single colonies. A branch split in the tree ('coalescence') reflects an ancestral HSC symmetrically dividing into two daughter HSCs, the descendants of which have produced sampled blood cells. Each individual has a unique branching shape and phylogenetic tree structure, but many common themes emerge.
In patients with multiple driver mutations, additional driver mutations occurred both before and following JAK2 V617F as well as in independent HSCs, as previously reported [24][25][26] . All patients had mixtures of colonies with and without known driver mutations, suggesting that driver-mutation-bearing HSCs co-exist alongside normal blood production in patients with MPN. The colonies without driver mutations shared few mutations with one another-these were evident as long, isolated branches-demonstrating that residual non-malignant blood production remains highly polyclonal, as in healthy individuals 20 . Colonies with driver mutations typically shared tens to hundreds of mutations, including the driver mutation-this is evident as a 'clade', namely a set of lineages descending from a shared ancestral branch and confirms their single cell origin. Immediately beneath the shared branch containing the driver, we observe many short branches-this represents a 'clonal burst' in which the original mutated HSC expands to a population of cells. Coalescences are more frequent at the top of the tree and at the start of a clonal burst-their disappearance further down reflects the expanded size of the HSC population relative to the lineages sampled (Supplementary Note 4).
We frequently observed similar genetic changes occurring in different HSCs in the same patient-that is, 'parallel evolution'. Within individual patients, we observed (1) multiple acquisitions of chromosome (chr) 9p loss-of-heterozygosity (Figs. 2, 3, Extended Data Fig. 2), each with unique breakpoints as reported previously 27 ; (2) independent acquisitions of chr1q + , chr9q − (PD5179) and JAK2 V617F (PD4781), each affecting different parental chromosomes (Extended Data Fig. 2); and (3) multiple different mutations affecting the same oncogene (Figs. 3, 4). This suggests that patient-specific factors shape selection on driver mutations and evolutionary trajectories in MPNs. In two patients lacking causative driver mutations in JAK2, CALR or MPL, we observed dominant clonal expansions driven by PPM1D and TET2, genes that are typically mutated in clonal haematopoiesis (Fig. 4a, b). PPM1D mutations have been reported in JAK2 V617F -mutated MPN 19 and following chemotherapy 28 , but there was no known chemotherapy exposure prior to MPN therapy in either patient. We also observed clonal expansions without driver mutations in elderly patients (PD6646; Fig. 3), in keeping with our recent observations of haematopoietic oligoclonality in elderly individuals 23 .

Mutation burden and telomere length
Point mutations accumulated linearly with age, with some heterogeneity both within and across patients (Fig. 4c). Wild-type cells were estimated to accrue 17.0 substitutions per year (95% confidence interval 14.8-19.2, mixed effects modelling; Supplementary Note 5), consistent with other studies [20][21][22] , suggesting that mutation accumulation in wild-type blood cells in patients with MPN is similar to that in healthy individuals.
For some patients, mutation burden in mutant clades appeared to be increased compared with other colonies (Fig. 3). Using a Bayesian approach to model rates of mutation acquisition on a per patient and per clade basis, the mutation burden in mutant colonies appeared modestly higher than wild-type counterparts for several patients, but differences were less significant for individual patients when using non-parametric statistical methods (Extended Data Fig. 3a, b, Supplementary Note 5). However, at a cohort level, there was a significant increase in C>T transitions at CpG dinucleotides in JAK2-mutated clades compared with wild-type colonies (Extended Data Fig. 4a, b, Supplementary Note 6) raising the possibility that modest differences in mutation burden reflected differences in cell division history between mutant and wild-type cells 5 . In keeping with this, telomere lengths were significantly shorter in JAK2-mutated clades (P = 0.002; Fig. 4d), a relationship that was retained after correcting for their greater shared ancestry (−829 bp, confidence interval 662-968 bp, P < 0.001, Extended Data Fig. 4c). Telomere lengths followed the phylogenetic tree, that is, more closely related colonies had more similar telomere lengths (Extended Data Fig. 4d). The degree of telomere loss correlated with the size of JAK2-mutant clades and the inferred number of additional HSC cell divisions for clonal expansion, with telomeres estimated to shorten by approximately 57 bp per division (Extended Data Fig. 4e, Supplementary Note 7), similar to a previous report 29 , although it is possible that telomere shortening also occurs during JAK2-mutant progenitor cell expansion 30  Article lengths, much more than mutation burden, reflect cell division differences between wild-type and mutant clades in patients with MPN, and are in keeping with the notion that the number of mutations that accumulate during haematopoietic cell division is very low, with mutation burden more determined by time 20,22,31,32 .  (Fig. 3). These earliest branches represent the first few dozen mutations of life, and correspond to the in utero period, given that blood cells consistently harbour around 50-55 somatic mutations at birth 21,23 . Secondly, whereas some patients have not acquired additional driver mutations in expanded clades (for example, PD7271, PD5163 and PD6634), the majority of patients accrue further driver mutations, each emerging several hundred mutations after the previous event (Fig. 3), suggesting that clonal evolution and sequential driver mutation acquisition is separated across decades over a lifetime. We scaled the phylogenetic trees to chronological time using patient-specific and clade-specific mutation rates, and took into account the accelerated rate of mutation acquisition reported in early life due to more rapid growth 21,22,31 . This provided an age estimate for the start and end of each branch containing a driver mutation together with confidence intervals, and thus a time interval during which the genetic event plausibly occurred, with the age at the end of the branch providing an upper bound on acquisition timing (Extended Data Fig. 5a, b). Of note, no direct measurements for driver mutations were made experimentally at the inferred times of acquisition.

Timing of driver mutation acquisition
Time-based phylogenetic trees delineated qualitative observations. In PD5182 and PD7271, JAK2 V617F acquisition was estimated to have occurred by 32.8 weeks post conception (pc) (95% confidence interval for upper age estimate 11 weeks pc-1.1 year) and 1.4 years (0.6-2.3 years) of age, respectively. Three patients acquired mutated-JAK2 during childhood by 8.5 years (7.2-10.0 years, PD5163), 9.3 years (7.9-10.9 years, PD9478) and 10.8 years (9.2-12.5 years, PD5117) of age (Extended Data Fig. 5c). In these 5 patients in whom mutated JAK2 was the first driver event within the MPN clone, the mean latency to diagnosis was 34 years (range 19-54 years). JAK2 V617F occurred as the second driver event within a mutated-DNMT3A clade in two patients, with diagnosis occurring 11 years (PD6646) and 25 years (PD6629) after JAK2 V617F acquisition. In the remaining three patients with mutated JAK2 (PD4781, PD5847 and PD5179), we were unable to precisely time JAK2 V617F owing to the presence of additional driver events (for example, 9pUPD) on the same branch. However, timing estimates of 9pUPD acquisition often fell to decades before disease presentation, implying that JAK2 V617F acquisition occurred even earlier than this. Mutations in DNMT3A, the gene most commonly  Article detected later in life in clonal haematopoiesis [13][14][15] , were also estimated to have occurred by the in utero period or childhood in several patients. DNMT3A ess.splice in PD5182 localized to the 20th or 21st point mutation of life in that lineage, corresponding to acquisition between 12 and 14 weeks pc. In PD5847, DNMT3A Y660F occurred by the 22nd mutation in the lineage, corresponding to acquisition between 2 and 9 weeks pc. The canonical mutation DNMT3A R882H was estimated to have occurred by 6.5 years (5.2-8.1 years) of age in PD6629, by 7.6 years (6.3-9.2 years) of age for DNMT3A T275fs*41 in PD5163, and mutated PPM1D was acquired by 5.8 years (4.6-7.2 years) of age in PD6634 (Extended Data Fig. 5c).
Branches harbouring more than one driver mutation were often observed, particularly in older patients (PD4781 and PD5147) and for JAK2 V617F /chr9 aberrations (PD5847 and PD5179). Such branches could reflect the out-competition of the ancestral mutant clade(s) harbouring the initial driver mutation(s) by subsequent driver events, or that the ancestral clade was not sampled. We assessed the allele fractions in bulk blood for somatic mutations present on such branches for the presence of more than one clone but could not find evidence that ancestral clades were present at a substantial fraction in blood, suggesting extinction of earlier mutant clones.

Clonal expansion dynamics in patients
The pattern of branching in mutant clades reflect their historical rates of clonal expansion. A clonal burst comprising long emergent branches downstream of a branch harbouring a driver mutation implies slow expansion. Examples include DNMT3A-and PPM1D-mutated clades in PD6629 and PD6634, respectively (Figs. 3, 4), and the JAK2 V617F clade in PD5117, a patient who presented with asymptomatic MPN more than 50 years after JAK2 V617F acquisition. By contrast, a clonal burst comprising multiple short emergent branches suggests more rapid historical expansion-this is seen for PD7271, a young patient with MPN, in whom branching downstream of JAK2 V617F occurs over approximately 200 mutations, and is especially evident for clones harbouring multiple driver mutations (for example, the JAK2/9pUPD/TET2-mutated clade in PD5847) (Figs. 3, 4). Large and rapid clonal expansions are often observed to outcompete ancestral mutant clones (for example, multiply mutated clades in PD4781 and PD5147). Such inter-clonal dynamics are highlighted by phylogenetic trees comprising multiple colony time points. For example, in PD5182, colonies were grown at age 32 (diagnosis), 46 and 53 years. By diagnosis (Fig. 3, PD5182 red dots), there is evidence of a clonal expansion harbouring both JAK2 V617F and 9pUPD that has outcompeted the ancestral heterozygous JAK2 V617F clone acquired very early in life. By the next time point, the patient was receiving interferon therapy, and a smaller fraction of mutant colonies are captured, which now harbour additional Chr1q+ (blue dots). By age 53, at which time the patient is refractory to interferon, the mutant clonal fraction is larger, and almost entirely comprised of the clonally evolved JAK2 V617F homozygous/1q+ clade (green dots), demonstrating how sequential driver mutation acquisition decades apart can lead to successive clonal sweeps during MPN disease course. Similarly, in PD6646, sampling a few years later captures a larger fraction of the DNMT3A/JAK2-mutated clade, raising the possibility that it may be in the process of outgrowing other clades in this individual.

Modelling the fitness of mutant clones
We simulated a forward time continuous birth-death process, in which the introduction of a driver mutation increases the symmetrical HSC division rate by a selection coefficient (s) with corresponding growth rate per year (S) (Methods) and used approximate Bayesian computation (ABC) to estimate S for clades with five or more downstream lineages (Extended Data Fig. 6a). Our approach provided orthogonal estimates for absolute driver mutation timing, confirming estimates from branch length intervals (Extended Data Fig. 6b). Estimates of S correlated well with an alternative method that we developed ('Phylofit' versus ABC correlation coefficient r = 0.96; Extended Data Fig. 6c). Our modelling provided an average historical growth rate for clades and showed that single-mutant JAK2 V617F clades expanded at different rates in different patients (Fig. 5a)-73% yr −1 (confidence interval 43-108% yr −1 ) in PD7271, the youngest patient with MPN in our cohort, and 18% yr −1 in PD5117, who was diagnosed 54 years after JAK2 V617F acquisition. Average  historical rates of clonal expansion following the acquisition of JAK2 V617F fitted well with the latency between mutation acquisition and disease diagnosis (Fig. 5b). In some patients, successive increases in expansion rates occurred with sequential driver mutation acquisition (PD6629 DNMT3A R882H , 26% yr −1 (confidence interval 19-36% yr −1 ); DNMT3A R882H / JAK2 V617F 41% yr −1 (confidence interval 28-60% yr −1 ); DNMT3A R882H / JAK2 V617F /TET2 Q744fs*10 90% yr −1 (confidence interval 42-176% yr −1 )). The most rapidly growing clades in the cohort all harboured multiple driver mutations, with growth rates up to 189% yr −1 (confidence interval 130-303% yr −1 , PD5847), translating to the clone doubling in size every 8 months. Lowest selection estimates were observed for two in utero-acquired DNMT3A mutated clones (PD5847 and PD5182, 3-5% yr −1 ). Given that one would predict a higher probability of stochastic extinction of HSCs harbouring driver mutations with low selection coefficients (Supplementary Note 8), it is plausible that such clones associated with weak fitness advantages survived to expand only by hitch-hiking on the rapid population growth occurring in utero. Indeed, with such low growth rates, clones would take several decades to even reach low detectable clonal fractions in blood. Somatic mutations from the phylogenetic trees were deep sequenced using targeted recapture in bulk mature blood cells from the same patients. We observed that the clonal fractions of clades from phylogenetic trees were broadly concordant with bulk population fractions and inferred clonal trajectories from ABC (Extended Data Fig. 7a). We did not find evidence of lineages missing from the trees, suggesting that any in vitro culture selection bias was not significant (Extended Data Fig. 7b). Our model of HSC population dynamics closely recapitulated the pattern of branching in individual clades, however, there are some assumptions. First, the modelling assumes that the HSC population size remains constant over life and following MPN development. Secondly, we assume that selection is constant during the period of time captured by each clonal burst. We searched for clades in which there was a rapid clonal burst early in the tree (as suggested by short emergent branches) but yet only a small final clonal fraction, suggesting that S may have been higher initially and decreased later in life. We found three such clades (Extended Data Fig. 7c), one associated with treatment with interferon (PD5163 JAK2 V617F clade), and two DNMT3A-mutated in utero clades, suggesting more rapid growth earlier in life. Given the multiple selective pressures, including treatment, bone marrow environment and inter-clonal competition, our estimates of fitness should be viewed as average historical expansion rates of clones under a simple population model and more complex dynamics may exist in vivo.

Discussion
MPN are a common chronic blood cancer prevalent in up to 1 in every 1,000-2,000 individuals, with an increasing incidence with age 33,34 . Our results suggest that MPN driver mutations, such as JAK2 V617F , as well as DNMT3A and PPM1D mutations commonly associated with clonal haematopoiesis [13][14][15] , often occur during childhood, including in utero. Driver mutations in all single-driver mediated clonal expansions were robustly estimated to have occurred in the first half of life in the patients, making this probabilistically likely to be a general feature of MPN. Our data are consistent with JAK2 V617F detection and mutation timing estimation before MPN diagnosis 35,36 and in cord blood 37 , and with the detection of low-burden clonal haematopoiesis in younger adults 16,38 . Inferred rates of clonal expansion following JAK2 V617F were variable across patients, raising the possibility that additional factors influence the consequences of JAK2 V617F , such as cytokine homeostasis, bone marrow microenvironment, inflammation or germline influences 39-43 , which may dynamically contribute over the lifetime of an individual. Factors influential during embryogenesis may also shape the future trajectories of nascent clones. Overall, growth rate estimates for JAK2 V617F in patients with MPN were greater than that reported for JAK2 V617F -mediated clonal haematopoiesis-which may account for the relative lack of clinical manifestation in the latter group. Growth estimates for DNMT3A-mutated clones were consistent with that reported in clonal haematopoiesis 44 .
Clinically, MPN diagnosis is defined phenotypically by blood count parameters and bone marrow histomorphology 45 . Our results indicate that current clinical diagnosis is made at a late time point in the expansion and evolution of phenotype-driving clones. This latency potentially encompasses a disease phase during which no phenotype has occurred, as low to moderately fit clones with driver mutations take decades to reach detectable levels that may be required for clinical manifestations. However, earlier detection may also be opportune to prevent life threatening thromboses that often trigger current diagnosis, and to better manage individuals in the general population harbouring low levels of JAK2 V617F who have increased risk of thrombosis 46 . The cornerstone of MPN management is aimed at reducing the risk of vascular events; such treatments are mostly safe and well-tolerated and could be offered to individuals with high-risk molecular profiles if clinical trials supported their use. Given the lifelong trajectories to MPN, our data also provide a strong rationale for the evaluation of measures targeting the mutant clone 47,48 in order to curb clonal expansion and evolution. This model for blood cancer development may also be relevant to other cancers and organs, given the abundance of mutations under selection in histologically normal tissues 7,[9][10][11][13][14][15][16] .

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-04312-6.   Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Methods
No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Patients and samples
Peripheral blood and bone marrow samples were obtained from patients with myeloproliferative neoplasms attending Cambridge Universities NHS Trust following written informed consent and ethics committee approval. Patients were selected from a cohort that had undergone previous whole exome sequencing of their blood 49 . Apart from ensuring a wide representation of ages, patients were chosen at random and captured a broad range of MPN. Recapture samples obtained in 10 of 12 patients were predominantly peripheral blood derived granulocytes. Constitutional samples were obtained from either buccal epithelium or T cells.

Somatic mutation identification and filtering
Single nucleotide variants (SNV) were identified using CaVEMan 51 for each colony by comparison to an unmatched normal colorectal crypt sample (PD26636b) that had previously undergone whole genome sequencing. CaVEMan was run with the 'normal contamination of tumour' parameter set to zero, and the tumour/normal copy numbers set to 5/2. In addition to standard filters 3 , reads supporting an SNV had to have a median BWA-MEM alignment Score ≥ 140 and less than half of the reads clipped. Filtering designed for quality control following processing through the low-input sequencing pipeline, as recently described 10,50 , were also applied. The use of the unmatched normal meant that this process called both somatic and germline SNVs. The removal of germline SNVs and artefacts of sequencing required further filtering, and we used pooled information across per-patient colonies and read counts from a matched germline WGS sample, either buccal or T cell DNA, to ensure that genuine somatic variants that may have been present in the germline sample, either as embryonic variants or due to tumour-in-normal contamination were also identified. (Supplementary Note 1). Short Insertions and deletions were called using cgpPindel 52 with the standard cgpPindel VCF filters applied, except the F018 Pindel filter was disabled as it excludes loci of depth <10. Copy-number aberrations (CNA) were identified using ASCAT 53 with comparison to a matched normal sample. Given the clonal nature of samples identification of sub-clonal copy number changes was not required. The union of colony SNVs and insertion-deletions (indels) was then taken and reads counted across all samples belonging to the patient (colonies, recapture samples, buccal and T cells) using VAFCorrect. VAFCorrect avoids double counting of overlapping reads and minimises reference bias in the variant allele fractions (VAFs) of indels by performing a pileup mapping to both the reference genome and a locally altered reference genome that incorporates the alternative allele in the reference sequence.

Creating a genotype matrix
The genotype at each locus within each sample was either 1 (present), 0 (absent) or NA (unknown). We inferred the genotype in a depth sensitive manner. We assumed the observed mutant read count for a colony at a given site was MTR ~ Binomial(n = Depth, P = VAF), if the site was mutant, and MTR ~ Binomial(n =depth, P = 0.01), if the site was wild type. The genotype was set to the most likely of the two possible states provided one of the states was at least 20 times more likely than the other. Otherwise the genotype is set to missing (NA). The VAF was usually 0.5 for autosomal sites, but for chromosomes X, Y and CNA sites, it was conservatively set to 1/ploidy. For loss-of-heterozygosity (LOH) sites, the genotype was overridden and set to missing if it was originally 0.

Phylogenetic tree topology
We constructed phylogenetic tree topologies using maximum parsimony with MPBoot 54 . The inputs for MPBoot were the binary genotype matrices with missing values per patient. This method also enables the rapid generation of bootstrap trees that correspond to the maximum parsimony trees that would be obtained by resampling the per site genotypes with replacement. Only SNVs were used to infer the topology, but both SNVs and indels were subsequently assigned to the branches of phylogenetic trees. Variants with zero genotype in genomic regions identified as carrying a chromosomal LOH or deletion were set to missing for MPBoot.

Assignment of mutations to branches and branch length estimation
We developed an expectation maximisation method (R package treemut) to soft assign mutations to trees and to estimate branch length (https://github.com/NickWilliamsSanger/treemut). The starting point for the method is Bayes rule: P(mutation has genotype) ∝ P(read count | genotype). P(genotype). Each variant has a corresponding true genotype vector where the elements are 1 for colonies that descend from a particular branch and 0 for colonies that do not. The prior probability of a mutation having a genotype, P(genotype), is therefore proportional to branch length. The read counts are modelled using a binomial distribution with per sample specific error rates and an assumed VAF = 1/ploidy for variant sites and VAF = 0 for wild type sites. The above process is iterated where each iteration updates the branch lengths (that is, P(genotype)). Having estimated the maximum likelihood probability that each mutation belongs to each branch we then hard assigned mutations. Simulations indicated that this approach did not exhibit obvious biases in branch length estimation vs true branch lengths and that using the edge length implied by the hard assignment had a minimal effect on the deviation from the true edge length (Supplementary Note 2, Supplementary Fig. 2). The sensitivity for detecting somatic variants would be expected to increase both with depth of sequencing as well as with the degree of clonality of a colony. Given that heterogeneous sensitivity of somatic variant identification across the colonies would inadvertently affect branch lengths, particularly private branches, all branch lengths were adjusted for the sensitivity of identified SNVs. We used an approach where fully clonal SNV detection sensitivity was directly estimated from the per colony sensitivity for identification of germline SNVs, in conjunction with a correction for the clonality of the colonies (Supplementary Note 2, Supplementary Figs. 3, 4). In addition to SNVs and indels, the patient colonies exhibited a variety of LOH and CNA events. The events were curated as being present or absent in each of the colonies giving an event genotype vector similar to that obtained for SNVs and indels. Once the tree topology was inferred using the SNV genotypes, the branches that exactly matched the event genotype were identified and the event assigned to the corresponding branch. In the case where the event did not map to the tree topology then the event was further reviewed, and disentangled into repeated acquisitions of the same events and validated by distinctness of CNA breakpoints and haplotypes (Extended Data Fig. 2).

Phylogenetic tree quality assessment
Colonies were initially removed if their CaVEMan somatic mutation detection sensitivity was below 60%. In addition, colonies were tested for cross-contamination with other colonies from the same patient following phylogenetic tree construction. If a colony was a mixture of more than one colony then how this will manifest depended on whether the mixed colonies were from the same mutant clade or not. If the mixed colonies belonged to the same clade with a long shared branch, then the overall sample VAF would appear quite clonal, however the private branches would exhibit a lower than expected VAF.
To check that a colony was sufficiently clonal we required that the mean VAF of variants from a colony that mapped to ancestral branches was not significantly less than 0.35 (Bonferroni-adjusted one-sided binomial test). Additionally, we expected the VAF to be zero in non-ancestral branches and so excluded colonies that had significantly more than 5% VAF on non-ancestral branches (Bonferroni-adjusted one-sided binomial test). The thresholds in the above test reflected the requirement that total contamination be less than 10%. For highly shared early branches, some true somatic variants may inadvertently be filtered out due to their presence as tumour-in-normal contamination in the matched germline DNA sample. We screened all phylogenetic trees to rescue any such variants ( Supplementary Fig. 5). Detailed quality assessment of phylogenetic trees and colony filtering is provided in Supplementary Note 3 and a summary of the proportion of colonies that failed quality thresholds is shown in Supplementary Fig. 6 and Supplementary Table 1.

Mutation rate estimation and timing of branches harbouring driver mutations on phylogenetic trees
We created time-based ('ultrametric') trees, wherein the y-axis of the trees is converted from mutations to time. We jointly fitted wild type rates, mutant rates and absolute time branch lengths using a Bayesian per patient tree-based model under the assumption that the observed branch lengths are Poisson distributed with mean = duration × sensitivity × mutation rate. We also evaluated modelling observed branch lengths using a negative binomial distribution and results were similar. We consider a rooted tree where each edge i consists of an observed mutation count m i and a true duration t i . We refer to a given edge and its child node interchangeably by the same label. D i ( ) is the set of terminal nodes (tips) that descend from node i and A i ( ) is its corresponding set of ancestral nodes excluding the root. We assume that each tip of the tree k has a known corresponding time T k (for example, the post conception age in years of the patient at sampling of the cell) and we therefore have the following constraint: and T t > >0 k i . We incorporate this constraint by performing the optimisation over the interior branches of the tree with reparameterised branch durations x i transformed to be in the range x 0 > >1 i , where x i can be thought of as stick breaking fractions. If j is an edge whose parent node is the root then: t x T k D j = min( : ∈ ( )) j j k . For other interior edges, i, we have The duration of the terminal edges is fixed by the duration of the ancestral interior edges and the overall duration constraint: We assume that there are p − 1 change points in the tree corresponding to the acquisition of driver mutations. This results in p mutation rates λ j applying throughout the tree where we allow at most one change point per branch and the initial ancestral (or wild-type) rate is λ 0 and additional rate change points occur a fraction α j along branch j and descendent branches have the rate λ j unless there are additional change points in descendant branches. The effective rate on branches with a change point going from λ l to λ j is just the weighted average α λ α λ + (1 − ) j l j j where we use a uniform unit interval prior for the α values.
We assume the underlying mutation process follows a Poisson distribution with the above piecewise constant driver specific mutation rates, the number of observed mutations accrued on branch i in time t i measured in years: where we have chosen the parameter c = 100. This reflects only modest uncertainty in our estimates in sensitivity and also allows the model to mitigate larger than expected variability in the branch lengths. In addition, λ λ λ where λ is the naive estimation of a single rate λ as the per patient median of the ratio of the root to tip mutation count and the tip sampling age, and finally we use the weakly informative prior for the stick breaking fractions: where the p i is an initial approximation of the duration of the branch length expressed as a fraction of the sampling time: Copy number regions were masked for the above analysis and the final rates rescaled by the reciprocal of assayed proportion of the autosomal genome (as determined by the background local mutation rate model -see below). The above model will overestimate the duration of early branches because it does not take into account that mutation burden during development is higher 31 . Therefore, we used information from two studies 21,31 to add a time-dependent mutation excess rate into the model and fitted a logistic function to the mutation burden at 3 different ages given across the two studies: This initially maintains an excess instantaneous mutation rate of approximately 150 yr −1 before rapidly dropping to zero between 2 and 4 months and integrating to an excess of 33.5 mutations by 6 months -when added to the steady state rate of 18 mutations per year this implies a mutation burden at birth of around 50 mutations. The above models were coded in R and Rstan and inferred using the Rstan implementation of Stan's No-U-Turn sampler variant of Hamiltonian Monte Carlo method 55 . Models were fitted across four chains each with 20,000 iterations including 10,000 burn-in iterations. The code is available as an R package 'Rtreefit' at https://github.com/ NickWilliamsSanger/rtreefit. Mutation clades for which mutation rates were fitted are shown in Extended Data Fig. 5. The trees are guaranteed to have a root to tip distance that matches the sampling age of the colony.

Estimating the timing of chromosomal aberration events along a branch
We estimated the timing of LOH and CNA events along branches by first estimating the number of expected detectable mutations, L, in LOH/CNA regions for the duration of the branches. We estimated a local relative somatic mutation rate for mutations detectable by CaVE-Man in autosomal regions. This background local mutation rate was measured by counting distinct mutations across a panel of samples consisting of those colonies in the 12 patients that do not exhibit copy number aberrations (457,884 mutations across 734 colonies). The genome was divided into 100 kb bins and the number of passed somatic mutations was counted across all samples in the panel, to give a count c i for bin b i . The probability that a given mutation occurred in bin i was estimated by p b (mut ∈ )= For a branch of duration t and with global mutation rate λ then E L λtp C ( ) = (mut ∈ ) and where it should be noted that errors in t and λ were not included in the variance estimation. To time copy number neutral LOH events, all somatic mutations that occur prior to the LOH event, occurring a fraction x along the branch, will be homozygous with detection sensitivity s HOM and those after will be heterozygous with detection sensitivity s HET . We modelled the mutations as arriving at a constant rate along the branch and fit the following model for: with priors ∼ x Uniform(0,1) and ∼ L N E L L ( ( ), Var( )) and where s = 0.5 HOM (assuming perfect detection of homozygous mutant variants) and s HET is estimated from germline SNPs as previously discussed.
To time chromosomal duplications, somatic mutations that occur prior to the CNA event, occurring a fraction x along the branch, have an equal chance of exhibiting VAF = 1/3 or VAF = 2/3, whereas those occurring after the event will always have VAF = 1/3 and are expected to occur at a 50% greater rate.
∼ where the priors are as in the LOH model above. The detection sensitivities s 1/3 and s 2/3 are similar to s het because of the additional sequencing depth afforded by the duplication. Unlike the LOH case, the value of x is relatively unaffected by L because of the similarity of s 1/3 and s 2/3 .

Estimation of the rate of mutation in wild-type and mutant colonies
We used two orthogonal methods to estimate the rate of mutation acquisition in wild-type cells over age. Mixed effects modelling was used to calculate the relationship between age and mutation acquisition.
To account for inter-patient rate heterogeneity as well as intra-patient variance we fitted a linear mixed model for the wild type adjusted mutation burden, with the patient as a random effect using the nlme package in R.
For each patient we included the time point with the most wild-type colonies and we excluded time points with <3 wild-type colonies. The model was specified as: lme(nsnv_count~age_at_sample_pcy, random =−1 + age_at_sample_pcy|patient (Supplementary Note 5). The second method utilised the time-based 'ultrametric' trees created for individual patients (Extended Data Fig. 5). The resulting posterior mean and standard error of each patient's wild type rate were combined using a random effects meta-analysis. The model incorporated an excess mutation rate in early life that added an average of 33.5 mutations during the first 6 months post conception, this corresponds to fixing the intercept in a linear model to a mean of 33.5 mutations at conception, or around 50 at birth. The mean branch timings were directly sampled from the MCMC posterior distribution and by construction the resulting trees are guaranteed to have a root to tip distance that matches the sampling age of the colony. The resulting posterior mean and standard error of each patient's wild type rate were combined in a random effects meta-analysis using the R package metafor. The cohort wide wild type rate from this approach is depicted on Extended Data Fig. 3a. The within patient difference in burden between wild type clades and mutant clades were also tested using a non-parametric method limma's rankSumTestWithCorrelation where the non-independence of the samples in the mutant clade is corrected using a single estimate of the average correlation between samples. The pairwise correlation, c i j , , of colonies i and j in the mutant clade were estimated as where b i is the total adjusted burden of colony i and b i j , is the total adjusted length of branches shared by i and j . The underlying assumption behind this correlation estimation is that the expected variance in burden is directly proportional to the burden. Note that rankSumTest-WithCorrelation requires one of the comparison sets to be independently sampled. We therefore only carried out comparisons for cases where the mean correlation between wild type colonies is less than 0.01.

HSC simulator Rsimpop
A birth death process was used to model the evolution of a population of cells, tracking each cell as it symmetrically divides. Each cell had a rate of symmetric division and a rate of symmetric differentiation (or death). Asymmetric divisions did not affect the HSC genealogy and were therefore not explicitly included in the model. The wild type rate of symmetric division was, measured in divisions per day. We modelled selective advantage s as an increased rate of symmetric division α α s = (1 + ). mut We assumed during the growth phase that the cells population grows unrestrained by death. Once the specified equilibrium population size, N, was reached then the death rate β, which is the same for every cell, matched the average division rate: For some constant t m .
Following the acquisition of a driver mutation the mutant cell population grows stochastically until the population is sufficiently large, when the growth becomes essentially deterministic following a logistic growth function where in the early stages the exponential growth process exhibits an annual rate of growth S, given by: The above model is implemented using the Gillespie algorithm where the waiting time until the next event is exponentially distributed with a rate given by the total division rate + total death rate, this event is then: division with probability = total division rate/(total division rate + total death rate). If the event is division then the choice of which cell is given by a probability proportional to the cell's division rate whereas if the event is death then all cells are equally likely to be chosen.
Implementation was in C++ with an R based wrapper as an R package rsimpop (see https://github.com/NickWilliamsSanger/rsimpop). The simulator maintains a genealogy of the extant cells, together with a record of the number of symmetric divisions on each branch, the absolute timing of any acquired drivers and the absolute timings of branch start and end. Assessment of the expected behaviour of rsimpop is detailed in Supplementary Note 8. The package also provides mechanisms for sub-setting simulated genealogies whilst preserving the above per branch information.

Approximate Bayesian computation
ABC is a flexible inference procedure that enables the inference of parameters governing a well-defined process that is simple to simulate but for which it might be difficult to calculate the likelihood. This approach has been previously used to infer the HSC population and cell division rate parameters under the assumption of a neutrally evolving population with a stable population size 20 . The general approach was to recapitulate the experiment in-silico, to reproduce the timing and shape of clonal expansions in our phylogenetic trees, using our simulator to generate sampled trees. If the mutation rate, lambda, is the mean root to tip distance divided by the age of sampling, and the mutation counts at the start and end of the branch carrying the driver in the experimental tree are denoted M start and M , end respectively, then the procedure was as follows: • Fix symmetric division rate at 1 division per year. • Simulate the tree with initial division rate of 0.1 per day until population has grown to the equilibrium population size. • Simulate neutral evolution until time T driver .
• Save the state of the simulation (*) • Introduce the driver with the specified selection coefficient.
• If the driver lineage dies out before the sampling age is reached then return to the saved state (*) A tree with the observed number of mutant samples is subsampled from the population of extant cells.
For the case where M is small (<50) we need to account for the discrete character of M and also allow for the drivers being acquired during the growth phase of the population, as might occur early in life. We wished to find the probability distribution for the age at which a given observed mutation count M occurred. We used Baye's rule and assumed a uniform prior for λt between 0 and λT , and that the mutations are Poisson distributed with rate λt. This gives the following cumulative distribution function for λt: This distribution is referred to as m λT PoisInv( , ) and used for sampling λt in 'mutation time' space. To account for the elevated mutation rate that occurs during embryogenesis we used the same model as described above (section 'Estimation of the rate of mutation in wild-type and mutant colonies'). This implies a time varying rate with the following average accumulation of mutations between time t 0 and t 1 : For each simulation the following summary statistics were calculated (i) Total deviation in the simulation mutant clade's number of lineages through time (LTT) with respect to the patient clade of interest, and (ii) deviation of the simulation-based population clonal fraction with respect to the patient clade's aberrant cell fraction (ACF). In all cases but two, ACF was calculated as the proportion of sampled mutant clades. For PD5163_JAK2 and PD5182_JAK2 clades, the aberrant cell fraction at diagnoses were used prior to the commencement of interferon-alpha treatment which led to clone size reduction. The total distance score is then calculated as the sum of the above two scores where each score is expressed as a rank. For each clade approximately 1 million simulations were run (Extended Data Fig. 6b). In each case, the posterior distribution was approximated by the top 0.01% simulations.

Calculation of aberrant cell fraction in recapture samples
Recapture samples were used to validate inferences derived from phylogenetic trees. In recapture samples, a per branch ACF was calculated as twice the aggregate mutant read fraction where only autosomal variants that map to the branch and are outside of the copy number aberrant regions are included. For each patient, the trajectory of the ACF was retrieved from the top 0.01% of simulations. The simulator periodically takes snapshots at approximately daily intervals and measures ACF as the current fraction of cells that carry the driver mutation. Subsequently, for each simulation these daily snapshots are binned into a common sequence of 10-day periods over each of which the ACF is averaged. We present the 2.5%, 50% and 97.5% quantiles for each binned period calculated across the simulations.

Stochastic Extinction
The probability of extinction in a homogenous birth death process is the ratio of death rate to birth rate 56 α α S + log(1 + ) which in our case is .
s 1 1 + In the ABC simulations we record the number of attempts required to introduce the drivers. We then verified that the simulator behaved as expected by gathering all simulations across the analysed mutant clades and then restricted to the 13,048,861 simulations where the driver was introduced after development (>1 year post conception). The simulations were binned into selection coefficient bins of width = 0.05 and log 10 (N) bins of width = 0.1. The extinction probability was estimated in each bin using the maximum likelihood estima- where q is the number of simulations in the bin and each simulation, i, fixes after a i attempts.

Mutational signatures
De novo signature extraction was carried out using SigProfiler. The SNVs that were mapped to the 12 patient trees were divided into per patient driver/wild-type clade-based groupings, where the mutations mapped to the clades were treated as an individual sample for the purposes of mutational signature extraction as detailed in Supplementary Note 6.

Telomere analysis
The mean telomere lengths of the colonies were estimated using Telomerecat with batch correction using cohort wide information to correct the error in F2a counts as detailed. Given the slight discrepancies in length found on multiple readings, each telbam was analysed 10 times and the average length was used. To compare differences in telomere lengths, the degree of shared clonal origin of the colonies was taken into account. To assess for telomere attrition in relation to mutant JAK2, we fit a phylogeny aware mixed model for the mean telomere length with a patient specific intercept using the MCMCglmm library in R (iterations = 1,100,000, burnin = 100,000, thinning interval = 1,000). Further information is provided in Supplementary Note 7.

Testing genes under selection
We searched for genes that exhibit a deviation of ratio of non-synonymous to synonymous variants as evidence of non-neutrality. The SNVs and indels were grouped by individual branches across the cohort and the function dndscv from R package dndscv (version dnd-scv_0.0.1.0) was executed using the default options. The analysis highlighted that only JAK2, DNMT3A, PPM1D and TET2 were significantly under selection.

Phylofit
As an alternative to ABC method we have developed an efficient MCMC approach that models selection/growth by directly fitting the three parameter deterministic phase population trajectory using the joint probability density of coalescence times given the population size trajectory. The model is essentially a parametric adaptation of the phy- , and the selection coefficient, s αŝ= , we arrive at the following log-likelihood: Where Phylo is the distribution defined by the coalescence based log-likelihood equation above.
Additionally, assuming unbiased sampling, we can optionally incorporate the number of sampled mutant-type colonies n mut out of n tot total colonies as an additional layer in the model The input data for this approach is an ultrametric tree. For our samples we obtain the ultrametric tree using the Bayesian tree model presented in Section 8. Credibility intervals for this method do not consider the uncertainty in the branch lengths of the input ultrametric tree, but the model allows us to straightforwardly incorporate what is likely to be the principal source of additional uncertainty, the timing of the first split. We achieve this by optionally adding a normally distributed common offset to the coalescence timings t i .

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper. *P < 0.05, **P < 0.01 (** also significant after multiple hypothesis testing; Bonferonni adjusted, two-sided test). Significantly different mutation rates between clades are highlighted only for those significant by both Poisson and Negative Binomial modelling of mutation rates ( Methods). Average mutation burdens are shown to the right for the different timepoints of sampling. b. Non-parametric comparison of mutation burdens in wildtype versus mutant colonies using limma's rankSumTestWithCorrelation. This accounts for the non-independence of data in mutant colonies but does not account for the timing of driver mutation acquisition. *indicates significance at P < 0.05 following Bonferonni multiple hypothesis correction. Time-based phylogenetic trees. Different coloured branches identify separate clades alongside light blue wild type colonies. The vertical axis represents age post conception with treatment received alongside. Driver mutations are depicted in the middle of the branches but may have occurred at any point between the start and end of the branches. Given the uncertainties in the exact ages at the starts and ends of the branches due to modelling branch lengths from mutation count data ( Methods), the credibility intervals for the ends of the branches harbouring driver mutations are shown as black lines and also in b-c. b. Each horizontal grey box represents an individual patient from birth until the last colony sampling timepoint. The time before birth is represented on an expanded scale and is shaded pink. Within each grey box is shown the range of ages during which driver mutation and copy number aberrations are estimated to have occurred. The start and ends of each coloured box represent the median lower and upper bounds of time estimates corresponding to the start and end of the shared branches harbouring driver mutations. Thus, the upper bounds (right edge of the coloured boxes) represent the latest time by which mutation acquisition is estimated to have occurred from phylogenetic analysis. Black lines show the 95% credibility intervals for the start and end of the branches carrying the drivers. Mutation timings are inferred from a model where mutation accumulation within branches follows a Poisson distribution but were not substantially different when using a Negative Binomial model. Diamonds show age at diagnosis. c. Raw data from a-b is shown with 95% CI intervals around the estimated ages of the starts and ends of branches harbouring driver mutations for different patients, together with adjusted SNV counts for branches.  Marginal distributions are also shown. The prior distribution for driver timing is clade dependent and is largely determined by the mutation count at the start and end of the associated branch. Both clonal fractions and lineages-through-time were used as summary statistics in the approximate Bayesian computation for estimates of selection. Main plots show driver mutations acquired after birth, and driver mutations pre-birth are shown within the black box, taking into account driver mutation acquisition during a time when the background stem cell population size is modelled to be growing. b. Data from a. in tabular format. Here, selection coefficients have been converted to clonal expansion (median growth % per year, Selection). The ABC approach gives alternative estimates for ages of driver mutation acquisition as shown. N depicts the number of simulations per clade. Clones with sufficient immediate descendants (>5 coalescences) were included for estimates of selection. c. Comparison of estimates of selection of mutant clades (each labelled by patient ID and driver mutation) from ABC versus Phylofit. The grey lines show 95% credibility intervals for estimates from each approach. Correlation coefficient r = 0.96. Note, that the PD5182 and PD5847 in-utero DNMT3A expansions from panel a. are not shown because, only the ABC approach, and not Phylofit, allowed for modelling selection against a growing background population.