## Overview of approach

In this study, we used the Vanderbilt-NC State (V/NCS) discrete-event microsimulation model developed by researchers at Vanderbilt and North Carolina State University [20].

The V/NCS model mimics the natural history of colorectal neoplasia for each hypothetic entity in the cohort. In addition, the model can be used to evaluate colorectal neoplasia screening strategies. One can specify the input cohort by sex, birth year, and family history of each simulated entity. Then for each entity, discrete events are simulated along the adenoma-carcinoma sequence. In the V/NCS model, events trigger changes at discrete times to the states of each adenoma along the progression pathways and the hypothetic person as a whole. These events lead to the collection of statistics and the creation of new events. Events that are relevant to our work include *new person creation*, *natural death*, *cancer death*, *non-advanced adenoma incidence*, *advanced adenoma incidence*, *cancer incidence from an adenoma*. Other events in the model include *regional cancer*, *distance cancer*, *cancer symptomatic*, *terminal cancer*, *colonoscopy*, *recover from cancer*, *terminal cancer charge*, and *age-based utility*. For more information, we refer to Roberts et al. (2007) [20] .

The purpose of this study is to estimate a set of model parameters that specify four quantities of cohort heterogeneity, which are used to govern the progression of each adenoma created and consequently the natural history of colorectal neoplasia. These quantities cannot be directly observable in an observational study and sex-specific estimates of these parameters are not available to simulation models currently available in the literature. Further, it is somewhat unethical to follow the continued progression of an adenoma once it is observed in clinical practice. Instead, polypectomy is recommended, and the natural progression is halted. Therefore, we resorted to calibrating the V/NCS model against prevalence data from a published study of Brenner et al. (2013)[19]. While the V/NCS model offers sufficient fidelity to the adenoma-carcinoma sequence and lends flexibility on candidate parameter design selection with a user-friendly interface, we faced a severe computational burden. It took 40–50 minutes to do one simulation run with a cohort size of 10,000 on a regular personal computer. In response, we developed an efficient two-phase calibration procedure.

## Disease Progression Modeling In The V/ncs Model

CRC begins as a non-visible, benign adenoma. Once such an adenoma appears, it transitions to the next stage, depending on the pathway to cancer it follows. The V/NCS model includes three types of progression (i.e., pathways to cancer): non-progressive (i.e. #1 in Fig. 1), slowly progressive (#2), and immediately progressive (#3). The first type is non-progressive. An adenoma of this type has no chance ever to become cancerous, but can grow as a benign adenoma (i.e., advanced adenoma) to match the data on the portion of adenomas that can be detected. The second type is progressive. A non-advanced adenoma of this type can either become an advanced adenoma defined by its histology or become cancerous directly, with the former being more common. The transition of this type is then modeled as a competing process between the above two possibilities. The third type of progression is immediately progressive, which implies that an adenoma with this type immediately progresses to becoming cancerous upon its initiation. Regardless of the second or third type, as long as an adenoma becomes cancerous, it follows the usual CRC pathway (i.e., from pre-clinical phase to clinical phase and through cancer stages).

In the V/NCS model, there are several key assumptions on adenoma incidence and progression. First, to each individual, the incidence and progression of each adenoma are independent of other adenomas. Second, both adenoma incidence rate and progression time are influenced by an individualized risk (Liebsch 2003) [26]. Obviously, family history (i.e., presence of CRC in first-degree relatives) is an important characteristic affecting the risk. Sex and race, to a lesser degree, also affect the risk. Specific risks to individuals may further include factors like fibrous diet, lack of exercise, familiar adenomatous polyposis, and heredity nonpolyposis CRC, but the exact effect of these factors is not known and therefore not included in the model. The risk is modeled with a JohnsonSB distribution for individuals with or without family histories. In both cases, the JohnsonSB distribution has a minimum of 0.0 and a maximum of 1.0, which implies the absolute individual risk on a scale between 0.0 and 1.0. Further, both distributions are highly positively skewed with a mode close to zero, and a mean of 0.11 for individuals with family history and a mean of 0.056 for individuals without any family history. These additional specifications match with the published research findings through a meta-analysis (Johns and Houlston 2001) [27].

In addition to the risk adjustment, the baseline adenoma incidence follows a non-homogenous Poisson process with the incidence rate modeled by an age-dependent piecewise linear function. The baseline time taken to progress to an advanced adenoma either from a non-progressive non-advanced adenoma (i.e., NP_NON) or from a slowly progressive non-advanced adenoma (i.e., P_NON) follows a JohnsonSB distribution with a minimum of 0.0 and a maximum of 60.0. The baseline time for advanced adenoma becoming cancerous also follows a JohnsonSB distribution with a minimum of 0.0 and a maximum of 60.0. Given that an adenoma does not occur likely before the age of 40, the maximum of 60.0 implies the longest allowable time for making a transition is 60 years and thus gives sufficient time-lapse during an individual’s lifetime. For the above three quantities, the actual transition time is further adjusted by the personal risk.

We list several additional assumptions on adenoma progression and cancer incidence in the V/NCS model that are only indirectly relevant to our calibration work as follows. First, the distribution of adenoma to progression type is dependent upon age. As the body repair mechanism deteriorates, the ability of the body to deal with abnormal cells begins to decline. Thus, it is assumed that the percentage of progressive adenomas increases as a person’s age increases. Second, for immediately progressive adenomas, they become cancerous as soon as they are initiated. Since these adenomas progress immediately, it is further assumed that they then only take no more than 10 years until cancer becomes symptomatic. The actual duration follows a JohnsonSB distribution as well. Third, the time to cancer incidence follows a JohnsonSB distribution with a mean of 20 years and a mode of 22 years.

V/NCS model assumptions on cancer staging and mortality include (1) regional and distance metastasis rates are independent processes; (2) pre-clinical cancer stage progression and symptom development are lesion-specific and independent of personal risk; (3) times to clinical cancer at the regional and distance stages follow Johnson SB distributions; and (4) rate of progression of death, and potential survival (from CRC) is determined by cancer stage at the time of symptoms.

## V/ncs Model Adaptation For The Calibration

With the V/NCS model platform, one can input a realistic population that matches the U.S. census or a hypothetic population of any arbitrary size, and enter a specific simulation start year. The simulation can then trace colorectal neoplasia development of the cohort to a pre-determined end year. One beneficial feature of the V/NCS model platform is that it generates a trace statement that summarizes a sequence of time-stamped events capturing disease development. To process the trace statement, we developed a procedure to extract all the events that take place in the input population (see Fig. 2). For each individual, we created a state transition chart to record the time at which his/her events occur. By following each individual through the simulation duration, one can characterize the health state (i.e., NOV, ADV, and CRC) of each individual at any specific point. The state NOV includes individuals who have had at least one P_NON, NP_NON, or at least one adenoma that immediately progresses to cancer, but no advanced adenoma. The state ADV includes those individuals who have had at least one advanced adenoma but none has become cancerous. The state CRC includes those individuals who have had cancerous adenomas or have developed CRC. For each of the five age groups (54–59, 60–64, 65–69, 70–74 and 75–79), we then counted its population at the end of the simulation horizon (i.e., at a particular year) and calculated the portion of the corresponding population subgroup in each of the three states as the three corresponding prevalence values.

To adapt the V/NCS model for our calibration study, we made further specifications on the model calibration. First, we set an all-white mixed-family-history cohort in the V/NCS model. Second, we set the base incidence of non-visible adenomas to the same as in the original version of the V/NCS model. Note that there is no evidence showing significant differences in the division of having family history and not in an all-white cohort in the US and the Germany cohort studied by Brenner et al. (2014) [28], which is believed to be predominantly white. Third, we assumed that the time for some non-advanced progressive adenoma (i.e., P_NON) directly becoming cancerous does not change from the original version of V/NCS to our work. As it is quite rare for an adenoma to follow this pathway, we did not expect this fixation affects the sex-based comparative results significantly.

Consequently, our calibration efforts were focused on quantifying the following random variates: (1) individual risk; (2) if an adenoma created is progressive, its baseline dwell duration in the non-advanced adenoma state before transitioning to the advanced precancerous polyps state (termed transition time of P_NON to ADV); (3) if the adenoma created is non-progressive, its baseline dwell duration in the non-advanced adenoma state before transition to the advanced adenoma state (termed transition time of NP_NON to ADV); and (4) if the adenoma is in the advanced adenoma state, its baseline dwell duration before transitioning to the cancerous adenoma state (termed transition time of ADV to CRC). Further, as stated earlier, each of the above four random variables is modeled with a JohnsonSB distribution. A JohnsonSB distribution is a four-parameter distribution family where the shape and location of the distribution are governed by two model parameters *δ* and *γ*. The other two parameters, *minimum* and *maximum*, specify the scale. For these four random variates, we assumed the minimum and maximum to be unchanged with the Germany cohort. Hence, for either sex, we had eight calibration variables, i.e., four *δ*’s and *γ*’s, to estimate based on three system responses, namely sex-specific age-group-aggregate NON, ADV, and CRC percentages.

## A Two-phase Calibration Procedure

We took a two-phase approach for the model calibration. In phase I (a preliminary phase), we performed a series of low-dimensional searches on subsets of model parameters (i.e., a pair of *δ* and *γ*) in an ad-hoc manner. We conducted these searches progressively against varied calibration targets that are aggregate over age groups. The purpose was to identify a set of promising values on each model parameter, which would serve as the multiple starting points (parameter designs) in phase II. The sequence of these searches and design inclusion criteria were developed from discussions with a domain expert. In phase II, we viewed the model calibration task as a nonlinear optimization problem and performed the Nelder-Mead algorithm (simplex search algorithm), one of the best-known algorithms for multidimensional unconstrained optimization without derivatives. Given the high-dimensionality of the “black-box” optimization problem, we explored two variants of the search procedure, namely one-shot globally over the entire model parameter space and sequentially based on interconnections in subsets of model parameters. Design of the loss functions was consulted with the domain expert as well. As for our calibration targets, we used age-specific prevalence data from Brenner et al. [18]. More specifically, we used the prevalence of both men and women in five age groups (54–59, 60–64, 65–69, 70–74 and 75–79).

## Phase I (Preliminary Phase) -- Identify promising initial search points for Phase II.

In our study, it is computational challenging to quantify the joint influence of the eight input model parameters (calibration variables) on the fifteen age-group-specific system responses. We thus performed low-dimensional searches on the four pairs of model parameters over respective pre-specified search ranges. Note that the original version of the V/NCS model was previously calibrated against the SEER (Surveillance, Epidemiology and End Results Program) data for U.S. populations. We consulted the domain expert to finalize each search range, which is centered at the previously calibrated value. Through our preliminary simulation analysis, we observed that in each pair of *δ* and *γ*, the prevalence values are a lot more sensitive to changes in *δ* than in *γ*. Thus, we set a larger range for each *δ* than the paired *γ*. We divided the search subspace of (*δ*0, *γ*0) with a five-by-five grid and divided each of *δ*1, *γ*1, *δ*2, *γ*2, *δ*3, *γ*3 with ten even intervals.

Basically, we followed the adenoma-carcinoma sequence to calibrate the model parameters progressively, i.e., first adenoma progression propensity, then transition from NON to ADV, and finally transition from ADV to CRC. In the first step, we performed a grid search on (*δ*0, *γ*0) and fixed the other parameter values as it is first to mimic the adenoma progression risk distribution of the simulated cohort. Our calibration targets are all three prevalence values over age groups. At the end of this step, we identified promising (*δ*0, *γ*0) values such that all three predicted prevalence values are reasonably close to the observations (less than 15% relative error). Next, we fixed (*δ*0, *γ*0) values to the identified ones and performed orthogonal sampling in the subspace formed by (*δ*1, *γ*1). The use of a sampling-based search as opposed to a grid search is because multiple promising (*δ*0, *γ*0) values were identified and thus using all of them for ensuing search would be computationally expensive. Our calibration targets are aggregate prevalence values of NON and ADV over age groups. We then followed the same idea to search in the subspace formed by (*δ*2, *γ*2) and again used the same calibration targets. We perturbed (*δ*1, *γ*1) first because there were many more transitions from P_NON to ADV than from NP_NON to ADV. At the end of this step, we identified promising (*δ*1, *γ*1) and (*δ*2, *γ*2) designs such that both predicted prevalence values (i.e., at states NON and ADV) were further closer to the observations (less than 10% relative error). Finally, we fixed (*δ*0, *γ*0), (*δ*1, *γ*1), (*δ*2, *γ*2) to be the identified values and performed orthogonal sampling in the subspace formed by (*δ*3, *γ*3). Our calibration targets are aggregate prevalence values of NON, ADV, and CRC over age groups. At the end of this step, we identified promising (*δ*3, *γ*3) designs such that all three predicted prevalence values fall in a close range of the target values (less than 10% relative error on NON, less than 10% relative error on ADV, and less than 5% relative error on CRC).

For any identified parameter values, we used a built-in interactive visual tool to graph the corresponding Johnson SB distributions (mainly their shapes and locations) and checked them with our domain expert. We then discarded some of the parameter designs according to the domain expert’s suggestions.

## Phase 2. Local-Search based Nonlinear Optimization

In this phase, we employed the Nelder-Mead algorithm for gradient-free nonlinear optimization to further improve the model fitting. We set the parameter designs identified on Phase 1 as the starting points for the Nelder-Mead algorithm. We used the weighted sum squared of the relative errors on the three aggregate prevalence values as a similarity measure and the objective function of the unconstrainted nonlinear optimization problem (i.e., loss function of the calibration variables). Through consulting with our domain expert, we assigned a larger weight to the set of CRC similarities than to the set of ADV similarities and the set of NON similarities.

Considering the resultant optimization problem is computationally expensive due to the fact that it takes a long time to evaluate just one parameter design, we designed two search paths that differ by the search space chosen along the solution process. In the base case algorithm, we considered an 8-dimensional search space at any moment of the solution (i.e., all eight model parameters are possible to be varied). We termed this strategy the “full-space local search strategy.” As an alternative option, we considered four 2-dimensional subspaces progressively (i.e., (*δ*0, *γ*0) first, then (*δ*1, *γ*1), then (*δ*2, *γ*2), then (*δ*3, *γ*3)). We termed this strategy the “sequential local search strategy.” When we performed Nelder-Mead in one subspace, other were fixed at the initial values. The above order used in this alternative algorithm followed the same idea as in phase I. That is, the system responses are more sensitive to (*δ*0, *γ*0), than (*δ*1, *γ*1), than (*δ*2, *γ*2), and finally (*δ*3, *γ*3).