Toward Precision Medicine Using a “Digital Twin” Approach: Modeling the Onset of Disease-Specific Brain Atrophy in Individuals with Multiple Sclerosis

Digital Twin (DT) is a novel concept that may bring a paradigm shift for precision medicine. In this study we demonstrate a DT application for estimating the age of onset of disease-specific brain atrophy in individuals with multiple sclerosis (MS) using brain MRI. We first augmented longitudinal data from a well-fitted spline model derived from a large cross-sectional normal aging data. Then we compared different mixed spline models through both simulated and real-life data and identified the mixed spline model with the best fit. Using the appropriate covariate structure selected from 52 different candidate structures, we augmented the thalamic atrophy trajectory over the lifespan for each individual MS patient and a corresponding hypothetical twin with normal aging. Theoretically, the age at which the brain atrophy trajectory of an MS patient deviates from the trajectory of their hypothetical healthy twin can be considered as the onset of progressive brain tissue loss. With a 10-fold cross validation procedure through 1000 bootstrapping samples, we found the onset age of progressive brain tissue loss was, on average, 5–6 years prior to clinical symptom onset. Our novel approach also discovered two clear patterns of patient clusters: earlier onset vs. simultaneous onset of brain atrophy.

fundamental aspect of MS, occurring about 3x faster in MS patients than in healthy controls and likely representing the net accumulation of tissue damage due to the disease. As a major relay nucleus, the thalamus is particularly susceptible to neurodegeneration and has been shown to be one of the earliest regions impacted by atrophy in MS 27,28 . Interestingly, MS plaques and thalamic atrophy can be observed on MRI several years before the onset of rst clinical symptoms 27,29 , suggesting that the biological onset of the disease may precede the clinical onset by several years. As such, the 'true' biological onset of MS remains unknown. This represents a major barrier to understanding the earliest events in the MS pathophysiology and even the natural history of MS, which is typically based on clinical disease duration. Using brain atrophy (and more speci cally, thalamic atrophy) is an appealing application of the HDT concept in MS. Brain atrophy occurs as part of normal aging, which has been studied extensively in healthy individuals 30 .
In the absence of disease, brain volume trajectories are relatively predictable; in fact, recent work has presented normative brain growth charts across the human lifespan 31 . In principle, healthy brain trajectories can be leveraged to create HDTs for patients with neurologic and psychiatric diseases. In the case of MS, one could estimate when an individual MS patient's thalamic atrophy trajectory deviates from that of a healthy individual. The onset of progressive brain tissue loss should be closer to the true biological onset of the disease, and may further our fundamental understanding of MS.
Using normalized thalamic volumes from brain MRI images, our main objective was to develop statistical learning models to estimate when the thalamic atrophy trajectory of an MS patient deviated from their expected thalamic atrophy trajectory based on their corresponding HDT. The age when MS atrophy trajectory departed from normal aging was de ned as the onset of progressive brain tissue loss. We hypothesized that the age of progressive brain tissue loss would be statistically earlier than the age of clinical onset determined by clinicians.

Challenges
The rst challenge is to identify a large longitudinal normal aging dataset with subjects imaged across several decades using brain MRI scans to create the HDT as a normal aging reference for a given MS patient. However, such a longitudinal dataset rarely exists. Most of the datasets collected to study normal aging are crosssectional. As such, it is di cult to gather real-life datasets to generate reliable trajectories over the entire lifespan.
Even if we had repeated scans for both MS and normal aging, the best statistical method to t an accurate trajectory curve has not yet been identi ed. Moreover, it has been shown that the aging brain trajectory is not linear [31][32][33][34][35][36][37][38] . The conventional statistical approach for longitudinal data is a mixed model using year(s) at study entry as the time unit to t a linear or quadratic trend. However, linear or quadratic approaches may not be the most effective method for representing the complexity of aging data 32,39 , as they can result in biased estimates and low power in statistical tests 39 . On the other hand, as a nonparametric method, a spline model is recommended for its exibility and robustness to accurately model the age trajectories of neuroimaging markers 38 . There remain several unanswered questions, including whether it is possible to augment longitudinal data from cross-sectional data, whether lifespan data can be augmented from only a few longitudinal data points, how to choose an appropriate spline setting from many mixed spline candidates, and how to select covariates having two-way or three-way interactions with the spline slope.

Overall Study Design and Concept
Most longitudinal MRI datasets only cover a few years of an individual's lifespan. For such a short period, when using years of follow-up as the time variable, a linear trend may be the best t to the data, even though true brain atrophy over lifespan is non-linear. However, when using the actual age in years as the time variable, the model will look very different. For an entire sample, age has a wide coverage for the lifespan, but for each individual, age only covers a small fragment of the lifespan. This data structure can be conceptualized as a " sh bone" (Fig. 1), where the constructed spline curve can be considered the "back bone" and the straight lines (representing observed longitudinal data) can be considered "rib bones". By using cross-sectional data or the intercept from a longitudinal model with age as the time variable, we should be able to construct the "back bone" of the spline. Adding the "rib bones" from large number of individuals in different age categories can enhance the shape of the spline.
Generally speaking, to obtain the lifespan trajectory, we should observe the "rib bones" in our typical longitudinal datasets and then attempt to construct the "back bone." In this work, we attempt the converse; using the "back bone", we attempt to grow the "rib bones". In other words, our approach is to rst t an accurate spline model from cross-sectional data to model the non-linear trajectory across the lifespan. We then use the age slope to augment the longitudinal data for each normal aging subject, given that a linear model can su ce to model brain atrophy over a short (5-year) period of time.
The study design includes the following steps: 1) identify a well-tted spline model using cross-sectional data from normal aging populations; 2) augment longitudinal data from this well-tted spline model using a linear slope at a given age point; 3) compare 12 different mixed spline models through simulated data and identify the mixed spline model that ts the " sh bone" data structure; 4) combine augmented longitudinal normal aging data with longitudinal MS data to t mixed spline model and compare across 12 mixed spline models; 5) use a manual forward then backward model building strategy to select the covariates from 52 covariate structures; 6) identify the individual age of onset of progressive brain tissue loss with associated 95% con dence interval using a 10-fold cross validation procedure through 1000 bootstrapping samples.

Study Sample
Our dataset was assembled from the following three sources (Table 1): 1) The Human Connectome Project (HCP: http://www.humanconnectome.org), 2) Alzheimer's Disease Neuroimaging Initiative (ADNI: http://www.adni-info.org) and 3) a single-center, prospective case-control cohort MS study conducted from January 2005 through December 2010 28 . Normal aging samples were from HCP, ADNI and 89 healthy control cases from the single-center study. Age at scan date and sex were extracted from each of the data sources.

Longitudinal Normal Aging Data Augmentation from Large Cross-Sectional Data
Multivariate Adaptive Regression Splines (MARS) was used to t a cross-sectional spline so that we could augment the longitudinal data. MARS was chosen because of its robustness to outliers and its ability to autosearch non-linear associations with high dimensional interactions 40 . For this demonstration study, only age at scan, intracranial volume (ICV), and sex were used, with three-way interactions among them, as predictors of thalamic volume (percent of total brain volume). ICV and sex were treated as constant for each individual subject when augmenting the longitudinal data. Longitudinal thalamic volumes were augmented at ± 2 years from age at scan. We reserved 433 repeated measurements from 229 individuals as independent testing. ICC two-way mixed with absolute agreement and repeated measure correlation were used to assess the agreement/correlation between MARS model-augmented longitudinal data vs. observed testing longitudinal data. SAS9.4 ADAPTIVEREG was used to t the MARS model.

Mixed Spline Model of Thalamic Atrophy Trajectory
After data augmentation, we tted the mixed spline model. Let n be the number of subjects. For the i th participant, denote t i as the age, denote as the thalamus volume at the j th measurement for subject i, and denote X ij as other predictors such as sex. To model the age effects accurately and e ciently, we use a semiparametric model of the form given below: where is the unspeci ed aging trajectory for subject i at the j th time evaluated at age t, and β are the regression coe cients of the other predictors at the j th time. is the random effect of each subject. The measurement errors ϵ ij are assumed to follow a normal distribution N(0,R), where R is the covariance matrix. This semiparametric regression model is a parsimonious way to both capture the potential nonlinear age trajectory and investigate the effects of other predictors. The simplest special case of this model is the linear mixed model where = . Regression splines are a broader class of models and could be tted under this framework, which can be based on truncated power function (TPF) basis, B-spline basis or natural spline basis. These models vary by the choice of the spline basis and tuning parameters (the number of knots and the knot positions) that have an impact on the estimated shape of a spline function. Parameter-function estimation contains two major steps: (i) approximation using basis functions (e.g., TPF, B-Spline) which allows to t lower-order polynomials within very small interval partitions (based on knots) and (ii) smoothing the approximation via penalty (e.g., random SPLINE coe cients, TOEPLIZ G-side matrix, RSMOOTH G-side matrix).
The smoothing could be done via generalized cross-validation (GCV) 41 or mixed effects approaches 42,43 , which are known to facilitate the choice of the knot positions in spline modelling 44 . They also allow a penalty to be applied directly to the model coe cients (P-spline penalty penalizes the squared differences between adjacent model coe cients, which in turn penalizes wiggles).
We then compared penalized splines (P-spline) with B-spline basis and truncated power function (TPF) basis with different random effect structures such as P-SPLINE and RSMOOTH (radial smoothing). For the P-spline, the unspeci ed function is approximated with a cubic B-Spline or TPF basis. Following Ruppert, Wand and Carroll (2003) 45 , the cubic spline can be represented as: Estimation of parameters is made by minimizing the penalized log-likelihood function using proc GLIMMIX in SAS 9.4 with smoothing implemented using P-SPLINE smoothing (Random x/type = pspline) or radial smoothing (Random x/type = rsmooth). This mixed model formulation of spline smoothing has the advantage that the smoothing parameter is selected automatically 45 and is shown to be more robust with misspeci cation of error dependence structure, compared to GCV-based approach 46 .
For the 12 spline structures described above, model comparison was made using four criteria: i) Akaike information criterion (AIC) and Bayesian information criterion (BIC) criterion, with lower values indicating better t; ii) repeated measure correlation coe cient 47,48 and intraclass correlation two-way mixed for longitudinal data between model-predicted vs. observed data from reserved 10% testing dataset; iii) visual inspection of the expected shape for projected lifespan spline (the normal curve must inherit the shape of the spline based on cross-sectional normal aging, and the MS curve must followed the shape of observed spaghetti plots as in

Simulation Study Design
The purpose of the simulation study is to compare spline models to choose the most appropriate spline model for the " sh bone" data structure. The simulated data mimic the sh bone data structure by combining 10 sets of data from 10 different age blocks (k = 1 to 10) with age range from 30 to 80 by 5-year intervals (e.g., [30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]. Each simulated data set was based on the covariance parameters estimated from a linear mixed model (with random intercept and slope). Block-speci c weights (W k , V k ) were added to the xed effects of intercept and slope respectively for block k. W k and V k were altered to mimic a spline shape ("back bone" as shown in Fig. 1). The nal mixed effects model was as follows: , The training sample was a combination of 10 datasets with 50 MS subjects each (age span 30 to 80 years). Each MS subject had 5 longitudinal MRI data points within each block, simulated using the linear mixed model above. Therefore, we simulated 500 subjects total in the training data. W k and V k started with small values in younger age, e.g. 1% decrease from the previous age block, but larger in middle age, e.g. 5% decrease, then became smaller again in older age, e.g. 1% decrease. The testing data followed the same simulation procedure, but we used the same subject ID across the 10 blocks; thus, the testing data contained 50 MS subjects, and each subject had 50 simulated age points. Because our ultimate goal is to predict the thalamic volume at an age that is younger than the observed age, the testing data included 4 more younger age points: 26, 27, 28, and 29, in addition to the 50 age points.
We considered twelve different models with three G-side covariance types (TOEPLIZ, P-SPLINE and radial smoothing) and four basis functions (Cubic-B-Spline, Cubic -TPF, Natural-TPF, Natural-B-Spline). To estimate the prediction accuracy from the spline model, we made comparisons using AIC/BIC with 500 iterations. The testing data were scored through each of the 12 spline models. We then obtained the estimated thalamic volume with associated 95% con dence interval at each age point. We took the average of the 500 replicates for the model-predicted thalamic volume and associated 95% CIs, then graphed the spline plots to visually inspect the overlap between true spline curve and the model-estimated spline curves, as well as the width of 95% con dence band.

Real-Life Data Application
We applied 12 different scenarios of spline models (listed in Table 2) to a real-life dataset with 520 MS subjects and 2053 normal aging subjects. For the normal subjects, we used augmented longitudinal data with 5 followup years (actual scan year at the middle). Among the 520 MS subjects, we randomly selected and reserved 52 MS subjects as the independent testing data. We repeated this iteration 10 times with 10 mutually exclusive independent testing data. For the rst iteration, we selected the optimal spline structure and covariates using multiple interaction terms together; c) three-way interactions for each covariate one by one with MS status and age spline except DMT 0 and age of clinical onset; d) select any two covariates with the three-way interactions; e) select any three covariates with the three-way interactions; f) select any four covariates with the three-way interactions; g) backwards selections with the terms showing model improvement from a-f. The same criteria de ned in 3.3 were used for model selection. Once the nal model was selected, we used the same model structure in the other 9 training datasets to obtain estimated coe cients, then applied them to independent testing data at each fold of iteration.
For each independent testing MS patient, we constructed the hypothetical individualized normal aging trajectory curve (Health Digital Twin) and MS lifespan trajectory curve using the patient-speci c covariates, which included sex, Thalamus 0 , ICV 0 , age of clinical onset (set as 0 for normal control), DMT 0 (set as 0 for normal control) and sequential age points from 15 to 75 with an interval of 1. The age at which the MS trajectory curve began to depart from the HDT trajectory curve (or when both curves crossed in young age) was de ned as the age of onset of progressive brain tissue loss. A bootstrapping procedure with 1000 iterations was used to determine the 95% con dence interval of this brain atrophy-de ned age of onset. The bootstrapping procedure was conducted at the patient level, i.e., once a patient ID had been selected, all longitudinal scans associated with this patient were selected as a completed block. We repeated this procedure 10 times with 10 mutually exclusive independent validation data (10-fold cross validation procedure with 10% testing data each fold). Thus, each patient had an age of onset of progressive brain tissue loss (PBTL) with 95% CI estimated from their exclusive training dataset. In the end, we identi ed two groups of MS patients: earlier onset (i.e., the upper 95% CI limit of PBTL onset is younger than clinical onset age; in other words, the age of onset of PBTL was statistically signi cantly earlier than the age of clinical onset); and simultaneous onset (clinical onset did not differ statistically from the age onset of PBTL). We examined the different patterns of the onset age gap (age of clinical onset minus the age of onset of PBTL) between the earlier onset and simultaneous onset groups used Bland-Altman plots. Figure 2A shows the smooth data cloud across age for healthy controls. The spline constructed by MARS represents the trend in the data. After data augmentation, independent validation was conducted by comparing the MARS model-predicted longitudinal data vs. the 433 observed longitudinal data. The predicted data had an agreement vs. the observed values with ICC 0.62 95% CI (0.56, 0.68), using two-way mixed with absolute agreement. The predicted value explains 45% of total variance in the observed value (r = 0.67 and R = 0.45) based on repeated measure correlation (Fig. 2B). Figure 3 shows the spline shapes for both simulated training and testing data from one of the 500 iterations, with red line as the normal aging trajectory and blue line as MS trajectory. The curve from the training data shows the expected value of simulated ' sh-bone' structure which contains 10 datasets with 50 MS subjects each (age span 30 to 80 years). The curve from the testing data shows the expected value of simulated 'continuous spline' data structure which contains 50 MS subjects with 50 simulated age points (referred as continuous age points across lifespan) plus 4 additional earlier age points (age 26-29). Table 2 shows the mean and standard deviation of AIC and BIC comparing mixed-spline models and their corresponding G-side covariance from 500 iterations based on each of the 12 spline modeling scenarios. In general, unrestricted B-Spline had the smallest AIC or BIC showing the best tting index, followed by unrestricted TPF. The restricted basis functions, both natural-B-Spline and natural-TPF, performed poorly. For Gside matrix, the TOEPLIZ had the best performance, followed by radial smoothing. P-SPLINE had the worst performance. In addition to model tting, we also assessed the prediction accuracy of each of the 12 spline models based on the prediction accuracy of the testing data. We visually inspected the overlap of the observed spline and the model-estimated spline curves, as well as the width of 95% con dence band. The visual inspection matched the AIC/BIC nding, with the best performance from B-Spline with a TOEPLIZ covariance model. The visual illustration of selected spline curves from TOEPLIZ is presented in Fig. 4. Figure 4 shows the patterns of the smoothed spline from simulated value (ground truth) vs. the predicted value constructed using the mean of predicted values with 95% con dence band over 500 iterations from the testing data. In Fig. 4A, the predicted spline from TOEPLIZ with Cubic-B-Spline overlapped well with the ground truth spline, and the predicted 95% con dence band is narrow for the younger age. When using TPF as basis function (4B), the 95% con dence band became very wide at early ages. The restricted splines (4C&4D) tted a line as straight as a linear line, which is largely deviated from the simulated spline. Because our overall objective is to model the disease onset, we are most interested in modeling accuracy around the younger ages.

Results of Real-Life Data Analysis
Supplemental Figs. 1-7 illustrate the model tting indices of 12 spline structures and 52 covariate structures.
As described in 2.3, the modeling tting criteria included i) AIC; ii) repeated measurement correlation from 10% independent testing data between observed and predicted longitudinal values; iii) shape of trajectory curves; and iv) predicted 95% con dence band for trajectory curves. The real-life data application results concurred with the simulation study, with the best tting model being the B-Spline with TOEPLIZ. The covariate structure included: age-spline, MS-status, ICV 0 , sex, sex*MS-status, Thalamus 0 , sex*age-spline, age at study entry, age of clinical onset, and DMT 0 . The nal model reached a repeated measure correlation coe cient of 0.88 based on 10-fold cross validation. Trajectory curves and scatter plots (Fig. 5, Supplemental Fig. 8) demonstrate that the spline curve ran through the observed data points in most of the cases (illustrated as black diamonds in Fig. 5).
The AIC and BIC from the nal model was also the smallest among all model structures.
Using this nal model, we were able to augment the lifespan thalamic atrophy trajectory curve for both an individual MS patient and the corresponding normal aging as a health digital twin. Figure 5 illustrates two example cases. In each, the age of onset from progressive brain tissue loss (green dot) was younger than the age of clinical onset (red dot). The 95% con dence interval was derived from a bootstrapping procedure with 1000 iterations. Figure 5A shows that the upper limit of the 95% CI of the age of onset from progressive tissue loss was younger than the age of clinical onset (earlier onset). In contrast, Fig. 5B shows that the 95% CI for the age of onset from progressive brain tissue loss overlaps with the age of clinical onset (simultaneous onset). Figure 6 shows the age of onset of progressive brain tissue loss based on 1000 bootstapping samples of 520 MS patients (Fig. 6A). The x-axis was centered by the age of clinical onset; therefore, the tick marks in each horizontal line, including the mean and 95% con dence interval, are presented as the gap between the two ages of onset (age of clinical onset minus age of onset of progressive brain tissue loss). If the upper limit of 95% CI is left of the center line, it suggests earlier onset. Otherwise, if the 95% CI includes the center line (0 gap between age of clinical onset and age of onset of progressive brain tissue loss), it suggests simultaneous onset. Overall, the age of onset of progressive brain tissue loss was younger than the age of clinical onset, with a mean difference of 5.1 ± 3.8 years and a median difference of 6 years (IQR 3.1-8.1).
Using our de nitions, 55.4% of patients could be classi ed as earlier onset, while 44.6% of patient could be classi ed as simultaneous onset. Wilcoxon rank sum tests showed earlier onset patients had statistically signi cantly older onset age compared to simultaneous onset, for both age of clinical (Median 36, IQR 33-41vs. Median 22, IQR 26-30, p < 0.01) and progressive brain tissue loss onset (Median 29, IQR 26-32 vs. Median 24, IQR 23-26, p < 0.01). Bland-Altman plots (Fig. 6B & C) showed age of onset of progressive brain tissue loss is much younger compared to age of clinical onset for patients in earlier onset group. Such difference is more scattered in the simultaneous onset group around the 0-reference line.

Discussion
Health digital twin is a novel and promising concept to further advance precision medicine. It includes many major components such as personal devices, AI algorithms, right data to train the AI, and Internet of Things (IoT) for rapid data synchronization while providing real time decision-making 22 . The AI component can be considered the heart of the health digital twin approach. In aging-related elds such as neurodegenerative diseases, such AI algorithms must be developed from the scope of lifespan.
In this study, we apply the health digital twin conceptual framework to build an AI algorithm in addressing a fundamental clinical problem in multiple sclerosis, which is to identify the disease-related onset of brain atrophy. By estimating the deviation of the thalamic atrophy trajectory curve of an individual MS patient from the corresponding hypothetical health digital twin with normal aging, our major nding is that progressive brain tissue loss precedes clinical disease onset in MS by a mean of 5.1 ± 3.8 years and a median of 6 years (IQR 3.1-8.1). Although the onset of progressive brain tissue loss measured by MRI is not synonymous with the true biological disease onset, our results suggest a major improvement in estimating MS disease duration compared to the standard practice of de ning the disease onset as the time of rst clinical symptom. This may have signi cant implications for MS clinicians, researchers, and patients, and could lead to a fundamental shift in our disease understanding and, one day, determining its cause.
Our novel approach in developing this AI algorithm towards health digital twin has several innovations. The rst innovation is the development of a novel statistical application of mixed splines to overcome the challenge of lacking lifespan longitudinal data. Longitudinal studies usually have limited sample sizes and short follow-up periods. Generally, high-quality longitudinal MRI datasets, such as those that use the same pulse sequences and scanner, only contain 3-5 years of follow-up. To overcome this challenge, we describe the concept of the " sh bone" data structure. By structuring the data this way, we can have two bene ts: 1) augment longitudinal data from large cross-sectional data; 2) augment individual lifespan trajectory based on small fragments of follow-up periods over a widespread age range.
To demonstrate these bene ts, we successfully tted a spline model from a large cross-sectional dataset across a wide age range that re ects the non-linear trajectory of brain atrophy across the lifespan, and then used this to augment the longitudinal data with a good repeated measures correlation with the observed data (r = 0.67, p < 0.01). Moreover, rather than using follow-up time since study entry per the conventional approach, we used age at follow-up as the time variable for longitudinal data. This approach has been used in a recent study to predict mild cognitive impairment in Alzheimer's disease based on the brain atrophy trajectory pattern by different periods 49 . However, the authors only explored a quadratic term as the non-linear effect. We have advanced the modeling strategy to mixed spline model.
When tting the longitudinal spline (mixed spline) model with an uneven time scale and short follow-up period, it was not trivial to determine the best t spline structure from 12 different candidate spline structures. In our study, we used both simulation and real-life data to reach the conclusion that the best tting model is B-Spline with TOEPLIZ as G-side matrix. This may be due to the simplicity of the mixed spline structure when applied to this special data structure. Determining the appropriate covariates was also not trivial, as those that interact with the spline slope will alter the slope, while non-interaction terms will affect the elevation of the spline and parallel distance between the MS curve and the normal aging trajectory curve. Interactions can be two-way, 3way, or higher dimensional interactions. Given this, the covariate selection cannot follow the conventional forward, backward, or LASSO approach 50 . The interaction term must be intact with the marginal effect as a bundle. When adding higher dimensional interactions, each piece of lower-level interaction terms must be intact, too. Therefore, we used a manual forward then backward covariate selection strategy.
Given these complexities, determining the nal model with the best tting spline structure and covariates is challenging. The conventional approach is to use AIC and BIC; more stringent criteria require cross-validation.
We initially used AIC/BIC plus a 10-fold cross validation using the repeated measure correlation between modelpredicted and observed values. However, we observed that this approach can be misleading. When constructing the 12×52 lifespan trajectory curves (Supplemental Figs. 1-7) for a given individual, we observed scenarios where tting indices were strong (small AIC/BIC, large r), but the trajectory curve was wild. For example, Supplemental Fig. 5A row E_01, column 3, a model with 3 three-way interaction terms, is likely over t, and the spline shape did not inherit the shape we observed in large cross-sectional data (Fig. 3A). This phenomenon could be due to model tracing for few observed data points in the middle of the lifespan while sacri cing the t of both far right (older) or far left (younger) ends. Since both AIC/BIC and repeated measure correlation were driven by the observed data, these tting indices misled the lifespan trajectory. We added two additional criteria for model selection: (i) visual inspection of the shape of projected lifespan spline (normal aging curve must inherit the shape of the spline from large population based cross-sectional normal aging data and individual MS curve must follow the shape from group estimates and spaghetti plots); (ii) both MS and health digital twin trajectory curves must have narrow predicted bands along the age span. We were able to identify the optimal model based on this approach.
Our normal aging data were combined from four different studies. As such, one potential issue could be data heterogeneity due to slight differences in MRI protocols and scanner settings. A common practice in neuroimaging is to use a statistical model such as neuro-ComBat to harmonize the data before conducting further analysis 51 . Since each individual dataset of normal aging represents a subset of the age category, age must be added to neuro-ComBat as a covariate. Supplemental Fig. 9 shows the distribution of both the original (A) and ComBat harmonized (B) percent thalamic volumes along with age. The data distribution did not change from the original to ComBat harmonized thalamic volumes. In fact, the original data had a very smooth cloud across age. The robustness of our normalized thalamus volumes from different scanners and settings may be due to the use of relative values (thalamus as a percentage of ICV) instead of using absolute thalamic volumes. In a phantom study, we found that using relative values was robust to different scanner settings and protocols 52 . Forcing age as a covariate in neuro-ComBat removes the spline effect, which contradicts the nonlinear brain trajectory reported from other major studies 31 . Therefore, we used and retained the original percent thalamic volumes throughout the study.
Herein, we describe a novel statistical modeling strategy to overcome limitations in real-world longitudinal neuroimaging datasets and provide an application of the Health Digital Twin framework to address a fundamental clinical conundrum in MS. We found that the MS thalamic atrophy trajectory deviated from the corresponding hypothetical normal aging trajectory curve prior to clinical symptom onset in an average of 5-6 years. While there is no ground truth to validate our ndings, this is consistent with clinical observations that white matter lesions and thalamic atrophy are already present prior to rst clinical symptoms 27,29 and is consistent with the observation across many neurologic and psychiatric diseases that the biological disease onset often starts before rst clinical symptoms 53   Distribution of normalized thalamic volume (%Thalamus) in healthy controls across age from cross-sectional data (A). Correlation between augmented and observed longitudinal data (B). Illustration of simulated spline vs. predicted spline from simulation study Figure 5 Page 20/21 Trajectory curves for earlier onset (A) vs. simultaneous onset (B) Figure 6 Distribution of age of clinical onset and age of onset of progressive brain tissue loss (PBTL) between earlier and simultaneous onset patient groups.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.