Mixture survival models methodology: an application to cancer immunotherapy assessment in clinical trials

Progress in immunotherapy revolutionized the treatment landscape for advanced lung cancer, raising survival expectations beyond those that were historically anticipated with this disease. In the present study, we describe the methods for the adjustment of mixture parametric models of two populations for survival analysis in the presence of long survivors. A methodology is proposed in several five steps: first, it is proposed to use the multimodality test to decide the number of subpopulations to be considered in the model, second to adjust simple parametric survival models and mixture distribution models, to estimate the parameters and to select the best model fitted the data, finally, to test the hypotheses to compare the effectiveness of immunotherapies in the context of randomized clinical trials. The methodology is illustrated with data from a clinical trial that evaluates the effectiveness of the therapeutic vaccine CIMAvaxEGF vs the best supportive care for the treatment of advanced lung cancer. The mixture survival model allows estimating the presence of a subpopulation of long survivors that is 44% for vaccinated patients. The differences between the treated and control group were significant in both subpopulations (population of short-term survival: p = 0.001, the population of long-term survival: p = 0.0002). For cancer therapies, where a proportion of patients achieves long-term control of the disease, the heterogeneity of the population must be taken into account. Mixture parametric models may be more suitable to detect the effectiveness of immunotherapies compared to standard models.


Introduction
Advances in immunotherapy have revolutionized the treatment landscape for advanced lung cancer, raising survival expectations beyond those historically anticipated with this disease [1,2]. More important than the improvements in the average survival time is the presence of a long and stable plateau with a heavy censoring at the tail of the curve [3]. This means that a proportion of patients are still alive even after long follow-up, it suggests the disease is controlled in a subgroup of long-term survivors. This new phenomenon is not currently captured by the most commonly used statistical procedures for the survival analysis, which generally assume that all patients are equally susceptible to the event of interest [3]. As an alternative, mixture parametric survival models have been proposed to take into account the heterogeneous structure in the data analysis [4,5].
The finite mixture models have stayed widely used in many disciplines. For example, Nemec & Nemec [6] apply the mixture models of astronomy distributions to study the number of distinct stellar populations in the Milky Way; Cameron and Heckman [7] use these types of models to assess the effect of family history on educational performance; Kollu [8] refers to several applications in engineering, Neale [9] in genetics and Land [10] in the social sciences. These models can be easily applied to the data set in which two or more subpopulations are mixed. Because of its flexibility, these models have been received intense attention in recent years, both from a practical and theoretical point of view. A complete description of the theory of mixture distributions and their applications can be found in McLachlan and Peel [11] or in Titterington et al. [12]. Finite mixture models have considered different distributions according to the data and problems that are modeled. For example, West [13] and Richardson and Green [14] studied the fit and applications of the mixture of normal distributions, McLachlan and Peel [11] the mixture of t-students, Wiper and Rugginieri Gamma distributions [15], and Tsionas Weibull [16]. However, in the analysis of data from clinical trials, specifically those that evaluate the effect on survival is rarely used [17,18].
The present study proposes a methodology for the evaluation of the effect of therapies in the presence of long-term survivors in the context of clinical trials.
We have structured the methodology as follows: in session 2.1 we describe the multimodality test, a necessary condition for the application of the methods proposed below; Session 2 .2 describes the parametric models of simple survival and mixture distributions; in session 2 .3 the methods for estimating parameters of the proposed models are explained; session 2.4 explains how to make the selection of the best model and finally in session 2.5 the hypothesis tests for comparing the effectiveness of therapies in the context of randomized clinical trials are detailed . Additionally, the proposed methodology is illustrated for the analysis of data from a clinical trial that evaluates the effectiveness of the therapeutic vaccine CIMAvaxEGF for the treatment of advanced lung cancer.

Multimodality test
Before adjusting a mixture parametric model, we must prove the existence of multimodality in the data, which justifies the application of the model. In this work, we assume the approach proposed by Silverman [19] and adapted by Hall and York [20] for the case in which we want to test whether the true distribution of a variable in a population is unimodal or bi-modal. Formally, given a sample of a random variable with density function f (in our case the survival time), denoting by j the number of modes in f, the hypothesis to test will be considered: H0: j ≤ k, against H1: j > k, Where k ∈ Z + is the number of modes to contrast.
The test proposed by Silverman is based on the concept of a critical width. The critical width will be the smallest width h for which it is verified that the estimate of the density function has at most k modes. From this critical width, denoted by hk, Silverman proposes to use this parameter as a contrast statistic. Therefore the null hypothesis will be rejected when hk is very "large", since this would mean that it is necessary to overstate the core estimate for get a kmodal structure.
The critical width, hk, would be the last value of h before one of the modes "separates" giving rise to an estimate with (k + 1) modes.
The R silvermantest package (available at http://cran.r-project.org/) has implemented this method, which facilitates its implementation in practice.

Simple model
Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen. For the subjects in which the event could not be observed at the end of the follow-up period, the final state of the patient is called "censored", since the actual duration of survival time cannot be known [21].
We define the random variable T as the time that elapses since the patient is diagnosed with the disease until he dies. The survival function is defined as S (T) = P (T> t) where 0 <t <1, which can be obtained as: Where f (u) is the density function. Simple models assume a unimodal distribution of survival. The survival function, in this case, can be estimated from the data assuming a parametric model. The most used are Exponential, Weibull, Loglogistic and Lognormal (Table 1). Exponential, Weibull, Log-logistic and Lognormal.

Model
Density function

Mixture parametric survival model
A model with a mixture of distributions assumes the existence of two or more populations within the sample, for each of which, the random variable T follows a different distribution. This means that there is heterogeneity in patient survival and that individuals in each subpopulation have different risks of dying. In this study, we will work with two-component mixture models [18].
The density function for this model is given by: f(x; ψ) = π1 f1(x; θ1) + π2f2(x; θ2) Where f1 and f2 are the density functions of two of the distributions given in table 1. The parameter π (population weight parameter) is such that 0 <π ≤ 1, π1 can be interpreted as the proportion of one component or population in the model, and π2 = 1 -π1 the proportion of the other. The vector ψ includes the vectors π; θ1 and θ2.

Methods for parameter estimations
To estimate the parameters of the distribution of a simple model, the maximum likelihood method implemented by the Optimum function of R was used. In the mixture models, the parameters were estimated by this same method, using the EM (Expectation-Maximization) algorithm, which was implemented in R and it will be described later.

Likelihood function
Let x1, x2, ..., xn be a countable set of values for the discrete random variable X and Pψ (x) = Pψ (X = x). For the correct values of the parameter vector ψ the function P allows us to find the probability that X takes each of the values x1, x2,…. Let's look at this same function from another point of view: let's say we know that the random variable X takes the given values and follows a distribution Pψ (x) for an unknown value of ψ, so let Φ be the space of the possible values of ψ, we can interpret Pψ (x) as a function of ψ given x. Seen in this way, the function Pψ (x) is known as the likelihood function and is denoted by Lx (ψ).
In general, when you have a continuous random variable, for uncensored data, the likelihood function is given by Lx (ψ) = ∏ ( ; ) =1 , it is assumed that the experiments in which the values of X were found were independent and f (xi; ψ) (the density function corresponding to the distribution with which the variable is assumed) replaces Pψ (x). Thus, if we considered the mixture distribution model defined in the previous section, we obtain: A problem arises when considering censored data since the value of the variable for all the elements of the sample is not known exactly. However, taking into account the type of censorship existing in our data, this can be solved using the survival function instead of the density function (Since the information we have is that the individual survived at least a number of months, we take the probability of surviving y or more months instead of surviving exactly y months). Thus, the likelihood function is as follows: Where n1 and n2 are the number of not censored and censored individuals respectively.
Given a space of possible values θ for the parameter ψ, the maximum likelihood method consists of finding the maximum likelihood estimate ψ̂ = arg maxψ∈Φ Lx (ψ). Taking into account the monotony of the logarithm function, it is sometimes convenient to use Lx (ψ) = ln Lx (ψ) (log-likelihood function) instead of the maximum likelihood function, to calculate the maximum likelihood estimate.

EM algorithm
We now have to answer the question: How to maximize the log-likelihood function? For this, there are several methods among which are the Moments Method, the EM Method, and the Fisher Information Method. More information on these can be found in [22]. In this study, we will use the EM algorithm, which is one of the most used in the literature.
To make the notation less cumbersome while describing the algorithm, it is assumed that the data is not censored, otherwise they would have, x1, x2,. . . , xn1 uncensored data and y1, y2,. . . , and n2 censored data, and the density or survival function would be taken as appropriate, the rest being analogous.
Let x1, x2,. . . , xn the values that the variable takes in a sample of individuals.
The information can be considered incomplete since it is unknown to which of the two existing populations, in which the random variable distributes differently, within the sample each individual belongs. To complete the information, the indicator variable zi, i = 1, 2, . . , n is taken, where zi = 1 if x1 is given by f1 and zi = 0 otherwise. The system (zi, xi) will contain the complete information of the data. Initially, the zi values can be found as a random sample of Bernoulli distribution (α), where α is the weight parameter.
The EM algorithm is divided into two stages, stage E and stage M. Each of these is described below in the k + 1-th iteration. In stage E, the value of zi is estimated from the conditional expectation E (zi | xi), which, using the Bayes total probability formula, can be found as: Where θ1 (k) and θ2 (k) are the parametric vectors of the distributions of each component, estimated in the k-th iteration.
In step M, the parameter values are maximized, completing the information with the ẑi found above.
For a single component, knowing the individuals belonging to that component, the problem of finding the values of the parameters that maximize the likelihood function is no more difficult than estimating the parameters of a simple model.
This was done, using the Optimum function of R.
The iterations of stages E and M of the algorithm are repeated until | lx (ψ (k + 1)) -lx (ψ (k)) | is less than a small value, specified previously. The EM algorithm satisfies that lx (ψ (k + 1)) ≥ lx (ψ (k)), this property is the fundamental reason for its convergence [21].

Method for selecting the best model
If the actual distribution (G) of the variable with density function g is known, to measure how our model f (x | θˆ) approximates, the Kullback-Leibler Information (K-L) is used. K-L is given by: Where the best model will be the one that differs to a lesser extent from the real one and, therefore, the value of the K-L Information is lower. Moreover, in most cases, as in ours, the actual distribution of the variable is unknown. However, note that the expression EG (X) [ln g (X)] is a common constant for all models, so that the best model will be that, such that the value of EG (X) ln f (x, θˆ) be higher. This value can be approximated by the log-likelihood function plus a bias b(θˆ). Depending on how this approach is taken, several Information Criteria are defined for the selection of the best model. This paper uses the Akaike Information Criterion (AIC), defined as −2ln (Lx (θˆ)) + 2k.

Hypothesis testing to evaluate the effect of the treatment
The Log-rank test is the most used method to compare the survival of groups. It or the median survival of some subpopulation, or simultaneously both effects [23].
To evaluate the effect of immunotherapy we allowed the model parameters to depend on the treatment. Table 2 summarizes the assumptions of the model and hypothesis tests considered on the parameters to assess the effect of IT.  of stable disease or an objective response at least 4 weeks before randomization.
A total of 405 patients were randomized (2: 1) to vaccinated or control groups.
The vaccination group (n = 270) received the CIMAvaxEGF vaccine plus the best supportive care while the control group (n = 135) only received the best supportive therapy. Both groups were balanced according to the prognostic factors of the disease. More details can be found in [24]. The main evaluation criterion of the trial was overall survival (OS).

Existence of bimodality
The p-values obtained for the different k considered in the Silverman test are shown in Fig 1. It can be seen that the unimodality hypothesis is rejected for both treaties (p = 0.01) and controls (p <0.001). The first value of the test that is not significant is k = 2 in both cases, which indicates that survival in the two groups has a bimodal distribution.

Models Fitting and best model selection
For the illustration of the proposed methodology, we have considered a Weibull distribution or a mixture of Weibull distributions. Table 3     It has been shown that therapies for certain types of cancer induce a subset of long-term survivors, such as melanoma [25], breast cancer [26] and multiple myeloma [27]. Specifically for advanced lung cancer, evidence has been found of patients at different risk of death in both population studies [29][30] and in clinical trials [31][32][33]. The identification of this finding before modern therapies suggests that some patients experience prolonged survival regardless of treatment [29]. In this location, the fraction of long survival has been estimated in about half of the participants in the clinical trial, while in population studies it is estimated at around 10% [30]. This difference is attributed to the restrictive selection criteria in clinical trials, which means that there is an overrepresentation of patients more likely to be long-term survivors.

Application to the evaluation of the effect of therapies
It should be noted that in many of the studies that refer to the presence of long Both proposals were recently amended to incorporate bonuses and adjustments that capture the tail of the survival curve [34,35]. However, a study that critically analyzes the application of these proposals in 107 clinical trials concludes that neither of the two proposals was consistent as a measure of the absolute survival benefit [36].
The models of the mixture of distributions presented here are an alternative to consider and can be a useful tool for reinterpreting data from clinical trials already carried out based on taking into account heterogeneity. In our study, we do a retrospective re-analysis of the data from a clinical trial, and certainly, the percentage of long survivors may be overestimated. However, due to the randomization process, this must have occurred to the same extent in the treated and control groups. More important than the estimation itself of the fraction of long survivors, is the proposal to apply the mixture distribution model to test hypotheses about the effect of immunotherapies in clinical trials. This is in our opinion the greatest novelty of this work.
The methodology proposed here has the advantage that heterogeneity assumes without requiring groups to be defined beforehand. This is illustrated for the case in which there are two populations, but can easily be extended to the case where there are multiple subpopulations. It is important to keep in mind that the estimation of the mixture fraction can be very sensitive to the parametric distribution chosen to work [28]. For this reason, the selection of the parametric distribution to model the observed data should be done carefully. McCullagh and Barry [37] proposed a model selection process algorithm and recommended adjusting different distributions to the data and selecting the best distribution using one of the available information criteria.
Once the existence of heterogeneity in the main variable that is evaluated in retrospective exploratory studies has been detected, confirmation of this finding The implementation in the exposed case study confirmed that vaccination with