A multi-stage model of cell proliferation and death: tracking cell divisions with Erlang distributions

Lymphocyte populations, stimulated in vitro or in vivo, grow as cells divide. Stochastic models are appropriate because some cells undergo multiple rounds of division, some die, and others of the same type in the same conditions do not divide at all. If individual cells behave independently, each can be imagined as sampling from a probability density of times to division. The most convenient choice of density in mathematical and computational work, the exponential density, overestimates the probability of short division times. We consider a multi-stage model that produces an Erlang distribution of times to division, and an exponential distribution of times to die. The resulting dynamics of competing fates is a type of cyton model. Using Approximate Bayesian Computation, we compare our model to published cell counts, obtained after CFSE-labelled OT-I and F5 T cells were transferred to lymphopenic mice. The death rate is assumed to scale linearly with the generation (number of divisions) and the number of stages of undivided cells (generation $0$) is allowed to differ from that of cells that have divided at least once (generation greater than zero). Multiple stages are preferred in posterior distributions, and the mean time to first division is longer than the mean time to subsequent divisions.


Introduction
Cells of the immune system patrol our bodies for months or years [1,2]. During an adaptive immune response, a subset of specific cells, initially a small fraction of the total population, expands as cells undergo multiple rounds of division over a few days [3]. Although most of these cells die as the infection is overcome, lasting immunity is ensured by the transformation, or "differentiation" of individual cells to a memory phenotype. The most convenient mathematical and computational models of the dynamics of cell populations, which consider heterogeneity at the single-cell level, are Markov models. In these models, the variables describe the numbers of cells of each type as a function of time, and cellular events such as division, death or differentiation are defined by their associated rates; each event corresponds to a possible fate of an individual cell and cells are independent of each other. In this formulation, interevent times are exponentially-distributed random variables, with probability density maximised at time 0. Rapid expansion of cohorts of lymphocytes is recreated in laboratories, either by stimulation in vitro or by transferring cells to lymphopenic mice. However over the timescales of such experiments, hours or days, it is not appropriate to treat cell division as an instantaneous event. Rather, cells are "cycling" through gap, synthesis and mitosis phases (G 1 /G 2 , S and M), and daughter cells cannot approach introduced in Ref. [39] by considering cell death in Section 2. Analytical solutions are derived under the assumption of identical birth and death rates across stages, and the limiting behaviour of the cell population is studied as t → +∞. In Section 3 cell generations are introduced in the model and an analytical study of the system is carried out. In Section 4 we calibrate the multi-stage model with CFSE data from Ref. [19], and compare its performance with a simple exponential model of cell division. A final discussion is provided in Section 5.

A multi-stage representation of cell division and death
We present a multi-stage model of the time between cell divisions. The stages defined here are arbitrary, and not directly related to the biological phases of the cellular cycle. The cell cycle is divided into N different stages, and the cell is required to sequentially visit each compartment (or stage) in order to divide. At each stage, each cell may either proceed to the next one or die. Let 1/λ (j) , j = 1, . . . , N , be the mean time needed to progress from stage j to the subsequent one j + 1. The time to go from stage j to the next one, j + 1, follows an exponential distribution with rate λ (j) . We will refer to these rates as birth rates from now on. On the other hand, at each stage j the cell may die with death rate µ. Figure 1 shows the dynamics of the cells for the multiple stages. We note that this multi-stage representation is equivalent to considering two independent clocks for cell division and death, which compete to decide the cellular fate. The time-to-death clock follows an exponential distribution with rate µ, while the division time follows a continuous phase-type distribution with parameters τ and T [18], as defined in Figure 1. A particular choice of phase-type distribution is the Erlang(λ, N ), which is a concatenation of N identically distributed exponential steps, where all birth rates are equal: λ (j) = λ, j = 1, . . . , N . We analyse this particular scenario in Section 2.1. Let S j (t) be the random variable which counts the number of cells in compartment j at time t, j = 1, . . . , N . Let us denote by M j (t) = E[S j (t)], the expected value of S j (t). Using the moment generating function, or considering the events that can happen in a short time interval ∆t as ∆t → 0 + , the time evolution of the mean number of cells in each stage can be represented by the following set of differential equations (2.1) We alert the reader that our analysis in the following sections is an extension of the results in Ref. [39], and follows similar arguments.

Identical birth rates across stages
We now focus on the simpler case where identical birth rates are assumed for each cell across different stages; that is, λ (j) = λ, j = 1, . . . , N , so that the phase-type distribution for the time to division in Figure 1 is an Erlang(λ, N ). This implies the mean time to division is given by N λ . We note here that when N = 1, our model is nothing but a Markovian linear birth and death process with parameters λ and µ, where a cell's time to division and death are independent exponential distributions that compete to decide cellular fate. Figure 1: Multi-stage model of the cell cycle. The cell cycle is divided into N different stages. A cell has to visit all the compartments (or stages) in order to divide. At each stage j, j = 1, . . . , N , the cell may proceed to the next stage, with birth rate λ (j) , or die, with death rate µ.
Under the assumption of identical birth rates across stages, Equations (2.1) become We adapt here the arguments from Ref. [39] to analytically solve Equations (2.2). In particular, we rewrite these in terms of the variables m j (t) = e (λ+µ)t M j (t), j = 1, . . . , N . This leads to From Equations (2.3), an N th order homogeneous differential equation for m N (t) follows together with a set of ODEs that relate m j (t) to the derivatives of m N (t) with respect to time Equation (2.4) can be solved following the same arguments presented in Ref. [39]. This leads to for any j = 1, . . . , N and t ≥ 0, where z = e 2πi N is the first N th root of unity and c k (for k = 0, . . . , N − 1) are constants which depend on the initial conditions. In particular, for C 0 cells in the first stage and zero cells for any other stage at time t = 0, one gets Consequently, the analytical solutions of the system (2.2) for these initial conditions are written in terms of the original variables as Therefore, the expected total number of cells in the population at time t, M (t), is computed as (2.7)

Limiting behaviour
The analytical solutions above enable the study of the limiting behaviour of the population as t → +∞, from the dual perspective of the single compartment and the population as a whole. We first consider the coefficient of t in the exponent inside the summation in Equation (2.6), to verify whether there exists a dominant term. When k = 0, the exponent is given by When k > 0, we notice that Since the cosine function is always less or equal to 1, the right hand side of (2.8) is dominated by 2 1/N − 1 λ − µ for all k = 1, . . . , N − 1. This means that the leading term in the summation of Equation (2.6) is the one corresponding to k = 0. To conclude our analysis, we can distinguish the following three cases: 1. µ = (2 1/N − 1)λ. The exponent of the term corresponding to k = 0 is zero. For k > 0, the exponents become 2 1/N λ(z k − 1), which are negative for all k = 1, . . . , N − 1. Therefore, cells at stage j and the total population have the following limiting behaviour 2. µ > (2 1/N − 1)λ. The exponent of the term corresponding to k = 0 is negative. Since it is the dominant term, the exponent is also negative for k = 1, . . . , N − 1, and therefore the cell population will become extinct as time evolves, so that lim t→+∞ M j (t) = 0 for all j = 1, . . . , N . Figure 2 (centre) shows an example of extinction when N = 5, λ = 0.5 t −1 , µ = 0.1 t −1 and the initial number of cells is C 0 = 10 2 , where t is the unit of time.
3. µ < (2 1/N − 1)λ. Since the dominant term corresponds to k = 0, (2.9) From Equation (2.9), one can compute, for instance, the limiting behaviour of the ratio between M 1 (t) and M N (t) as which is confirmed by the plot in Figure 2 (right). To study what happens to the total population size, we compute M (t) as t → +∞. We derive its time evolution from Equation (2.2) (2.11) From our previous arguments, the leading term in the summation of Equation (2.7) is the one corresponding to k = 0. Therefore, for large values of t we have (2.12) We can notice that the expected total number of cells in the multi-stage representation is always lower than the corresponding one in the single-stage model, as it can be easily checked by considering N = 1 in Equation (2.12).

Mean fraction of cells at each stage
In this section we show how, analogously to the process in Yates et al. [39] where there is no cell death, cells are not distributed proportionally to the residence time in a given stage. We define the mean fraction of cells at each stage, P j (t), as the ratio between the mean number of cells in the compartment and the expected total number of cells in the population, i.e., , j = 1, . . . , N. (2.13) From Equations (2.2) and (2.11), it follows that which has the following steady state solution (2. 15) One observes that P * j < P * j−1 , j = 1, . . . , N − 1, which means (on average) the number of cells decreases stage by stage, independently of the initial distribution of cells. In fact, one can solve Equations (2.15) to determine the analytical expression of the steady state fraction, P * j . We have 16) which does not depend on λ or µ.

A multi-stage model of cell proliferation and death tracking generations
The mathematical model defined in Section 2 is generalised here with the inclusion of cell generations. In particular, this model allows one to track the number of divisions a given cell has undergone over time. This will enable one to link cellular dynamics to CFSE data. CFSE is an intracellular dye that dilutes two-fold when a cell divides. At the beginning of the experiment cells are labelled with the dye. Then, harvesting the cells and measuring CFSE intensity by flow cytometry at particular time instants generates cellular profiles, and by quantifying the fluorescent intensity of any given cell, one can ascertain the generation that this cell belongs to, i.e., the number of divisions that this cell has undergone. CFSE data typically display a number of intensity peaks, which reflect the number of divisions that cells of that peak have undergone. The maximum number of peaks is usually 9 or 10 due to the fact that after 10 divisions, the intensity of the dye is 2 10 fold lower than that of the initial one, and comparable to the auto-florescence of cells [10]. In our extended model, each cell cycle identifies a generation and a cell belongs to generation g if it has undergone exactly g divisions. Thus, cells at the beginning of the experiment, when the dye is given, would comprise generation 0 only. Following the arguments of Section 2, and for a given g, the cell cycle is split in N g different stages, where we assume this number might depend on the generation g considered. A cell in generation g has to sequentially visit all N g compartments to divide. On the other hand, cells might also die at each stage of the cycle. As depicted in Figure 3, if a cell belongs to generation g and lies in compartment j, j = 1, . . . , N g − 1, it may proceed to the Figure 3: Multi-stage model with cell generations. Each cell in generation 0 has to visit all the N 0 compartments in order to divide. When cells arrive at the last stage of generation 0, N 0 , they may divide with rate λ 0 or die with death rate µ 0 . If a cell divides, the daughter cells join the first compartment (or stage) of the next generation, and the process continues.
following stage, with birth rate λ g , or die with death rate µ g , and these rates can depend on the generation. When a cell reaches the last stage, N g , of generation g and divides, the two daughters will join the first compartment of generation g + 1. In summary, given a cell in generation g, its time to division follows an Erlang distribution with parameters (λ g , N g ), whereas its time to death follows an exponential distribution with rate µ g . These distributions correspond to two independent competing clocks to control cellular fate, similarly to those considered in Figure 1. We aim to compute the mean number of cells in each generation, which will be used to calibrate the model using CFSE data, as shown in Section 4. We define the random variables S g j (t), g ≥ 0, j = 1, . . . , N g , which count the number of cells in stage j, generation g, at time t ≥ 0. Letting M g j (t) be the expected value of S g j (t), the set of M g j (t) obey the following differential equations In what follows, and keeping in mind our interest in modelling CSFE data, we assume there exists a maximum generation G that can be measured by the dye. Thus, one might be interested in following cells from generations g = 0, . . . , G. For these generations, Equations (3.1) can be solved by making use of the matrix exponential. To this end, let M (t) be the column vector of the mean number of cells in each stage and generation as time evolves, i.e., which has length G g=0 N g , and where column sub-vectors M g (t) contain the mean number of cells across stages in generations g = 0, . . . , G. Let us also define the coefficient matrix A is then a real square matrix of dimension G g=0 N g , and 0 a×b represents a null matrix with dimension a × b. Given the vector of the initial conditions n 0 , which has length G g=0 N g , the system (3.1) can be rewritten as the following Cauchy problem The solution of the system is given by represents the matrix exponential . For efficient ways of computing this matrix, we refer the reader to Refs. [14,15,27,28]. Finally, we note that since CSFE data describe the number of cells in each generation, one can then compute the mean number of cells in each generation over time as From (3.1) and (3.2), one can easily compute the dynamics of M g (t) as follows (3. 3) The solutions of the system (3.1) can be written in a closed analytical form under more restrictive assumptions. An example is given by the hypothesis of identical number of stages and rates, λ and µ, across generations, as illustrated in detail in Section 3.1. Another simplified scenario is the one where the number of stages is equal to 1 for all the generations, i.e., N g = 1 for all g ≥ 0, which leads to the following equations for M g (t) Here we are considering that at time t = 0, there are C 0 cells in generation 0, so that n 0 = (C 0 , 0, . . . , 0). In this particular case, which has been previously considered in Refs. [5,23,24,31], the inter-event times of cell death and division are modelled as exponential random variables, rather than Erlang distributions.

Identical birth and death rates and number of stages across generations
In this section we analyse the special case when the number of stages N g and the birth and death rates, λ g and µ g , respectively, do not depend on the generation g. In this case, all the cycles are comprised of the same number of stages N , and the common birth and death rates are denoted by λ and µ, respectively. Under these assumptions, it is possible to obtain an analytical expression for the mean number of cells in each generation. In particular, Equation (3.1) becomes These equations can be rewritten in terms of the new variables m g To determine the solutions of Equation (3.6), we focus here on the case M 0 1 (0) = m 0 1 (0) = C 0 and all the other compartments are empty at time t = 0. It is clear that m 0 1 (t) = C 0 for t ≥ 0, and by solving Equation (3.6) recursively one gets This expression allows one then to determine the mean number of cells across different stages in generation 1, From Equation (3.7), it follows recursively that the mean number of cells in each compartment j of generation g is given by Going back to the original variables, M g j (t), the solutions of (3.5) have the following analytical expression As expected, we note that since cells in each generation and compartment either proceed to the next stage within their generation, divide (proceeding to the next generation), or die.
Once the mean number of cells in each compartment for a given generation is at hand, the expected number of cells in each generation can be determined according to Equation (3.2). In particular, we can write This equation is consistent with the results of the exponential model [23], which corresponds to the particular case N = 1. On the other hand, if one is interested in the mean number of cells in each compartment, M j (t) for j = 1, . . . , N , regardless of the generation that they belong to, this can be computed as for j = 1, . . . , N and t ≥ 0. In practice, one could truncate the series above to get an approximation of the mean number of cells in each stage. However, we note that one can use instead the solution provided by Equation (2.6), since the dynamics of this model is equivalent to that described in Section 2, when the parameters N , λ and µ are generation-independent. It can be numerically checked, that this indeed provides equivalent results. In fact, when N = 1 or N = 2, it is straightforward to analytically prove the equivalence. In the former case, it is enough to recall the power series of the exponential function. In the latter case, we derive from (2.6) where we used the fact that z = e πi = −1. On the other hand, from (3.10) we obtain 2 , which shows that the two models lead to the same result for the expected number of cells in each stage.

Comparison with the cyton model
The cyton model is a stochastic model proposed to describe the population dynamics of B and T lymphocytes [17]. Division and death times are regulated by two independent clocks, and the competition between these mechanisms determines the fate of the cell (see Figure 1). When a cell divides, these clocks, which depend on the number of divisions the cell has undergone, are reset for each daughter cells. However, when analysing an in vitro experiment with this type of cells, there is evidence that not all cells either divide or die. For instance, a portion of them may not respond to the stimulation [30], or may respond without division [6]. This is the reason why a progressor fraction is defined in the cyton model. This progressor fraction represents for a given generation, the fraction of cells that are capable of undergoing further division. Each cellular fate mechanism is described in terms of a probability density function, and the parameters that define these probabilities are the free parameters in the model. Right skewed distributions, such as lognormal or gamma, are usually adopted to characterise the two independent clocks that regulate cell division and death. In summary, the cyton model is based on the following assumptions: • death and division are stochastic processes, characterised by a probability density function for the time to divide or die, respectively, • these processes are independent, and compete to determine the fate of the cell, • the clocks responsible for these processes are reset when a cell divides, • only a fraction of the cells in each generation are capable to undergo further divisions, and • the machineries that regulate cellular fate depend on the cell's generation.
In order to translate these assumptions into mathematical terms, let γ g be the progressor fraction characterising cells having undergone g divisions, and let φ g (·) and ψ g (·) represent the probability density functions for the time to division and death, respectively, for cells in generation g. As described in Ref. [17], the number of cells dividing for the first time, or dying, per unit time at time t ≥ 0 can be calculated, respectively, as: where C 0 is the initial number of cells in the population. Consequently, the time evolution of the expected number of cells in generation 0, M 0 (t), obeys the differential equation On the other hand, the number of cells in generation g dividing, or dying, per unit time at time t can be computed, respectively, as Hence, the dynamics of the average number of cells in each generation as time evolves, M g (t), is governed by the differential equations d M g (t) dt = 2n div g−1 (t) − n div g (t) − n die g (t), g ≥ 1.
(3. 16) In the next sections we show how the cyton model is equivalent to our model for particular choices of the probability density functions of the division and death clocks, φ g (·) and ψ g (·), and the progressor faction γ g .

Exponential time to division and death
We now assume that the number of stages in all the generations is equal to one, i.e., N g = 1 for all g ≥ 0. This means that cells in generation g divide after an exponentially distributed time with rate λ g , and die with rate µ g . Therefore, Equations (3.3) become (3.17) It is clear that, in this case, our model is equivalent to the cyton model with exponential times for division and death, and progressor fraction γ g = 1, g ≥ 0. One can show this analytically by proving that n div g (t) = λ g M g (t) and n die g (t) = µ g M g (t). This can be shown by induction on g. In the cyton model, the assumption of exponential time to division and death implies that φ g (t) = λ g e −λgt and ψ g (t) = µ g e −µgt , g ≥ 0. Therefore, according to Equations (3.11) and (3.12), the number of cells dividing for the first time or dying to exit generation 0 per unit time at time t is From (3.4) we know that in our model M 0 (t) = C 0 e −(λ0+µ0)t . Therefore, n div 0 (t) = λ 0 M 0 (t) and n die 0 (t) = µ 0 M 0 (t), which proves the case g = 0. We assume the identities n div g (t) = λ g M g (t) and n die g (t) = µ g M g (t) hold for generation g and we prove them for generation g + 1. From (3.4) and (3.14), we can write For the number of cells in generation g + 1 dying, Equation (3.15), together with Equation (3.4) lead to which concludes the proof. Making use of the identities n div g (t) = λ g M g (t) and n die g (t) = µ g M g (t) in (3.13) and (3.16), one obtains that M g (t) and M g (t) obey the same differential equations for all g ≥ 0. Thus, the two models are equivalent.

Erlang time to division and exponential time to death
We now consider the more interesting scenario where the number of stages in each generation is greater than one, so that one gets a multi-stage representation for the proliferation process of each cell. We focus here on the scenario described in Section 3.1, where identical number of stages N and birth and death rates, λ and µ, respectively, are considered across generations. Similarly to the previous case, we prove that n div g (t) = λM g N (t) and n die g (t) = µM g (t) by induction on g. Since a cell's time to division is Erlang distributed and a cell's time to death is exponentially distributed, ψ g (t) = µe −µt for all g ≥ 0 and where the progressor fraction is again set to 1 for each generation. Note that in this case the parameters in φ g (·) and ψ g (·) are independent of the generation g, since the number of stages and the birth and death rates are identical across generations. From (3.11) and (3.12), the number of cells dividing for the first time or dying to exit generation 0 per unit time at time t is The dynamics of the expected number of cells in generation 0 is given by (3.13), as in the previous case. From (3.8) and (3.9), we know that in our model, we have Therefore, n div 0 (t) = λM 0 N (t) and n die 0 (t) = µM 0 (t), which concludes the case g = 0. Replacing these identities in (3.13) leads to which is the differential equation derived in (3.3) for M 0 (t) in our model. Now, we suppose that the identities n div g (t) = λM g N (t) and n die g (t) = µM g (t) hold for generation g and we prove them for generation g + 1. From (3.14) and the induction hypothesis, where we used Equation (3.8) for the last identity.
If we now look at the number of cells in generation g + 1 dying per unit of time, Equation (3.15), together with the induction hypothesis, leads to where the last identity was obtained making use of Equation (3.9). Hence, Equation (3.16) becomes which is identical to (3.3) for M g (t), g ≥ 1. This concludes the proof of the equivalence between the cyton model and our model with generations when a cell's time to divide is Erlang distributed with parameters λ and N , and a cell's time to death is assumed to be an exponential with rate µ. In this way we have shown that the analysis presented in this section for the multi-stage model with Erlang division time and exponential death time leads to exact closed solutions for the cyton model with the previous choice of clocks.

Model calibration
In this section we illustrate how the multi-stage model tracking cell generations can be calibrated making use of CFSE data. To this end we perform Approximate Bayesian Computation based on Sequential Monte Carlo (ABC-SMC) methods [35]. Parameter inference is performed with the multi-stage model with cell generations described in Section 3, and its exponential (or single-stage) version, which results from setting the number of stages equal to 1 for all generations, i.e., N g = 1, g ≥ 0.
The data sets we make use of are taken from an experimental study of lymphopenia-induced proliferation [19]. This response has been observed to vary between different T cell clonotypes (i.e., the set of T cells with the same T cell receptor (TCR) expressed on their surface). Hogan et al. transferred CFSE-labelled OT-I or F5 T cells intravenously to lymphopenic mice. A certain number of days (3, 4, 5, 6, 7, 10, 12 and 18 days) after the transfer, spleens and lymph nodes were recovered from the mice and analysed by flow cytometry to quantify the expression levels of CD8, CD5, CD44, and CFSE dilution [19]. For each time point the number of mice analysed was between 3 and 7. We note that two independent transfer experiments, carried out under identical conditions, were performed: one for OT-I cells and a second one for F5 (see Figure 4). Figure 4 clearly shows that OT-I T cells proliferate faster than F5, so that by day 7 there are OT-I cells in generation 10, whereas for F5 cells the maximum generation at day 7 is 6. This greater proliferation of OT-I cells eventually leads, after one week, to competition for resources (e.g., IL-7 cytokine) and the OT-I population approaching a carrying capacity [19]. Since our model does not account for competition, it can only appropriately describe the dynamics of OT-I cells during the first week of the experiment. Thus, for OT-I cells we will only make use of the data set until day 7. Yet for the F5 population we will use the entire data set. In Ref. [19] this competition was explained with the assumption of a density-dependent birth rate, λ(P ), as follows whereλ represents the rate of growth in the condition of unlimited resources, δ quantifies the size of reduction caused by the expansion of competing cells, and P is the size of the population [19]. Figure 5 shows the density-dependent birth rate, λ(P ), as a function of the population size P . It suggests that the competition for resources is more significant in the case of OT-I T cells. In the experiments the number of OT-I cells after one week (about 5 × 10 5 ) is larger than the population of F5 T cells at day 18 (about 4 × 10 5 ). Therefore, the population of F5 T lymphocytes never reaches the carrying capacity and the role of competition for resources can be neglected for this clonotype. Before performing the Bayesian inference, we make some assumptions in our model. Several studies have shown that the first division in this type of experiments usually requires a longer time than subsequent 2 divisions need, since cells may take some time to become activated before starting to divide [17,21,25]. Thus, we assume here that all generations but 0 are comprised of the same number of stages N , whereas generation 0 is characterised by N 0 stages. Similarly, cells in generation 0 proceed to divide with birth rate λ 0 , whilst all the other generations have a birth rate λ. On the other hand, we propose per cell death rates over generations to be linearly dependent on the number of cell divisions that the cell has undergone [10,26], as follows µ g = α · g, g ≥ 0,  Table 1].
where α is a parameter to estimate. These linear death rates encode the fact that cells are more likely to die when they have already undergone several divisions [10,26]. We also merge, for the dataset in Figure 4, cells within the highest generations into a single group 5+, which combines all the cells that divided five or more times. This is to reduce errors in the quantification of labelled cells with low CFSE fluorescence, as is the case for five or more divisions, as described in Refs. [4,10]. Finally, the initial number of cells, C 0 , is considered a parameter to estimate in the model, since the actual number of transferred cells which made it to the lymph nodes or spleen is not measured. We estimate model parameters with the ABC-SMC algorithm [35]. Thus, the posterior distribution of the parameters is obtained by T sequential applications of the ABC algorithm, where the posterior obtained in each iteration is used as prior for the next iteration. This algorithm requires the definition of prior distributions for the first iteration, a distance function, a tolerance threshold for each iteration, and a perturbation kernel [35]. We assume all parameters are initially distributed according to a uniform prior distribution, as described in Table 1. When a prior distribution spans several orders of magnitude, the uniform distribution is taken over the exponent to efficiently explore the parameter space. Given the data point x g D (t) which denotes the experimentally observed number of cells in generation g at time t, for g ∈ {0, 1, 2, 3, 4, 5+}, and the corresponding model prediction, x g M (t) = M g (t) for a particular choice of parameters θ = (C 0 , N 0 , N, λ 0 , λ, α), the distance function is defined as where T is the set of time points and depends on the clonotype of interest, σ g D (t) represents the standard deviation of the experimental data point at time t and generation g, and G is the merged generation class G = 5+. In practice, we define the first tolerance threshold ε 1 in the ABC-SMC algorithm as the median value of the distances obtained from 10 4 preliminary realisations, with the parameters sampled from the prior distributions in Table 1. The subsequent tolerance thresholds, ε j , j = 2, . . . , T can be then defined as the median of the distance values obtained from the previous iterations of the algorithm. Finally, we use a uniform perturbation kernel to perturb the parameters during the different iterations [35], and implement the algorithm for T = 16 in the case of the multi-stage model and T = 7 for the single-stage one.

Model parameters Description
Prior distribution C 0 Initial number of cells C 0 = 10 x , x ∼ U (4, 6) N 0 , N Number of stages U discrete (1, 50) λ 0 , λ Birth rate λ 0 = 10 y , λ = 10 z , y, z ∼ U (−3, 1) α Death rate slope α = 10 w , w ∼ U (−5, −1) To obtain these predictions, we run the model with the parameters being sampled from the estimated posterior distributions and compute the median of all the simulations, which corresponds to the solid blue (multi-stage model) and green (exponential model) lines in Figure 6. The bands surrounding the median predictions represent the 95% confidence intervals. The data points are plotted together with the standard deviation from the multiple experimental replicates. As shown in Figure 6, the calibrated multi-stage model successfully captures the dynamics of the proliferating T lymphocyte populations (OT-I and F5), whereas the single-stage model significantly underestimates the expected number of cells beyond generation 1, particularly in the case of OT-I T cells.
Overall, the multi-stage model is able to explain the data from the OT-I transfer experiment better, since this data set is less noisy than that of F5 T cells. The marginal posterior distributions for each parameter are shown in green and blue in Figures 7  and 8, for the multi-stage and exponential models, and the (uniform) prior distributions are plotted in red. Summary statistics for these posterior distributions are reported in Tables 2 and 3. Cell death is governed by the parameter α, and is estimated to be low for both models and clonotypes, suggesting that cell death does not have a significant impact on the dynamics during lymphopenia, which is in fact dominated by cell division. This result is in agreement with Hogan et al. [19], where the death rate is assumed to be zero. The initial number of cells can be estimated with relative success, and does not seem to depend heavily on the model considered. On the other hand, cell division is governed by parameters (N 0 , λ 0 , N, λ), with N 0 = N = 1 in the exponential model. We note that in both models, N 0 /λ 0 and N/λ represent the mean time until the first and subsequent divisions, respectively. Although all division-related parameters can be estimated from the data, for both models and clonotypes, a correlation between the division rate and the number of stages is seen in the scatter plots of Figure 9. Instead of plotting the marginal posterior distributions for these parameters, one can consider the posterior distribution for the mean times N 0 /λ 0 and N/λ (see Figure 9). The fact that N = 1 is never chosen as an accepted parameter value in the posterior distribution for the multi-stage model and the OT-I clonotype already suggests that a multi-stage representation of cell division is preferred for this clonotype. On the other hand for the F5 clonotype the marginal distribution for N shows a non-zero frequency for the value 1, but larger values of N are also represented in its posterior distribution. The mean time to both first and subsequent divisions, N 0 /λ 0 and N/λ, are significantly longer for the F5 clonotype than the OT-I clonotype. In fact, our results estimate that F5 T cells divide slowly compared to OT-I cells, requiring on average 192 hours to carry out a first division (59 hours taken by OT-I T cells), as depicted in Figure 9 for the multi-stage model. The time to subsequent divisions is represented by the blue histograms. Interestingly, our estimation of the mean time to first division of OT-I cells, on average 59 hours, is close to the value obtained in Ref. [19] (52 hours when considering the best fit parameter estimates). In the case of F5 cells, we predict an average of 192 hours to undergo their first division, whereas Hogan et al. obtained a value of 137 hours. We note that the value 137 hours is within the range covered by our predicted posterior distribution. Together, our multi-stage model results indicate that OT-I T lymphocytes require on average 59 hours for their first division, and a bit less time, 46 hours, for sub subsequent divisions (see upper left plot of Figure 9). Our results allow us to conclude that a multi-stage model, with a constant division rate after the first division event, is a suitable description of the dynamics of recovery from lymphopenia [11]. On the other hand, the multi-stage model estimates that F5 cells take on average slightly less than 200 hours to divide, both for the first or subsequent division rounds, as shown in the lower left plot of Figure 9. This difference might be related to the different response of OT-I and F5 T cells to lymphopenia [19]. Finally, it is evident that for both clonotypes the exponential model (see Figure 9) found a shorter time to first division than to subsequent ones, contradicting 1.98 · 10 −2 1.08 · 10 −1 4.64 · 10 −2 4.56 · 10 −2 1.45 · 10 −2 λ 2.80 · 10 −2 8.08 · 10 −1 1.48 · 10 −1 1.20 · 10 −1 1.01 · 10 −1 α 1.00 · 10 −5 5.97 · 10 −3 5.06 · 10 −4 1.76 · 10 −4 7.47 · 10 −4   2.68 · 10 −3 7.20 · 10 −2 1.70 · 10 −2 1.47 · 10 −2 1.07 · 10 −2 λ 2.06 · 10 −3 5.88 · 10 −1 2.20 · 10 −2 9.54 · 10 −3 3.90 · 10 −2 α 1.00 · 10 −5 6.21 · 10 −3 1.35 · 10 −3 8.19 · 10 −4 1.40 · 10 −3  previous findings [17,21,25]. This seems directly related to the fact that, overall, the exponential model is not able to capture the observed cell dynamics for neither of the clonotypes, as can be observed in Figure 6.

Conclusion
The multi-stage model implemented here takes cell death into account while retaining practical advantages: we are able to find closed expressions for the mean number of cells in each generation, and generate numerical realisations using the Gillespie algorithm [12,13]. A longer mean time to first division, N 0 /λ 0 , than mean time to subsequent divisions, N/λ, is a natural part of the framework without the need to introduce extra parameters. On the other hand, our calculations rely on the assumption that cells are independent of each other. Further work is needed to model the late stages of lymphopenia-induced proliferation, or possible cell fate correlations within family trees.