We selected the data set published by Rabin et al.26 for the following reasons: (1) One strain of laboratory animals (male Sprague-Dawley rats) was exposed to a broad range of space-relevant radiations (H, C, O, Si, Ti, and Fe ions), and cognitive performance was assessed using the same behavioral endpoint – novel object recognition. This consistency should minimize the variability due to differences between animal strains/species and test types, potentially allowing the effects of radiation to be evaluated more clearly. (2) The LET range was broad, from 0.22 to 181 keV/µm, which covers most of the spectrum seen in space. (3) The dose range was also wide, including very low doses (0.001 to 0.05 Gy) as well as high doses (1-2 Gy). (4) The effects of time after exposure were assessed by performing the novel object recognition test at two time points after irradiation (1 to 17 months since exposure). The full data for each rat and study condition were kindly provided by Dr. Rabin, which we combined and processed for analysis (923 samples, Supplementary Data File online).
The outcome (dependent) variable in the analyzed data set was the fraction of time that a rat spent exploring the novel object 26. Because these data are by definition fractions between 0 and 1, we log-transformed them to bring the distribution closer to normal and performed subsequent dose response modeling on the transformed scale, conceptually analogous to modeling of cell survival dose responses. In this way, we generated the outcome variable “Response” as follows, where Fnov is the fraction of time spent exploring the novel object reported in Rabin et al.26:
This log-transformation changed the data scale from a fraction between 0 and 1 to a continuous number ≥0, which is more amenable to dose response analysis using models such as linear or linear quadratic.
Radiation response modeling
We considered three variables – dose, LET, and time since exposure (irradiation) – as the most reasonable potential predictors of the response in this data set. To assess their relative strengths, we calculated Spearman’s correlation coefficients of each of these variables with the response variable (Response) using R 4.0.2 software. The correlation coefficient for time with the response was close to zero (0.00102) and not statistically significant, suggesting that time was the least important variable to consider. If time was added as a predictor in the models described below, it did not reach statistical significance, and therefore we did not include it in further analysis. To investigate the issue of radiation quality dependence, LET was binned into four categories (identified by index i): low (labeled L, 0.22 keV/µm), medium (labeled M, 13-16 keV/µm), high (labeled H, 41-50 keV/µm) and very high (labeled VH, 106-181 keV/µm). Radiation dose (in Gy) was treated as a continuous variable.
We modeled the radiation response using several model structures that differ in their assumptions about the dose dependences (e.g. linear or quadratic) and LET dependences of the TE and/or NTE components. Exploratory calculations showed poor support for complex models that included different TE (with linear and/or quadratic dependences on dose) along with different NTE coefficients for each LET category. In the fits of such models, many parameters – particularly the TE terms – had very large uncertainties. For example, the only TE term in these highly parametrized models that achieved statistical significance (p-value = 0.0375) was the one representing a quadratic dependence on dose for the highest (VH) LET category. To reduce parameter uncertainties and clarify differences in performance between different sets of model structure assumptions, we generated 18 less parametrized simpler model variants (listed in Table 1) for more detailed evaluation. In our notation, D is radiation dose (in Gy), B is the baseline response parameter in unirradiated rats, and kTE and kNTE are parameters that represent TE and NTE, respectively. The TE or NTE parameters were allowed to differ by LET category (Table 1). In some simplified models (labeled S1 to S4), TE parameters were allowed to be adjustable only for specified LET categories (e.g.i=LM, H, VH indicates that TE parameters were the same for L and M LET categories, but different for H and VH categories) and/or set to zero for certain LET categories (e.g.i=VH indicates that TE parameters were non-zero only for the VH LET category).
The structure of the NTE terms in the tested models (Table 1) is based on our previous publications 8,9. We assume that stress response signals from irradiated cells propagate to other cells and cause them to enter into a stressed “activated” state, which can be persistent. This NTE process is assumed to be binary (“on/off”), so that the probability of the effect (but not its magnitude) increases with dose. In the NTE state, cells can experience oxidative stress, elevated rate of DNA damage, along with other modifications of functioning. The total radiation effect is assumed to be the sum of TE and NTE components.
Based on these assumptions, the commonly observed tendency of NTE dose responses to have a steep initial “rise” at low doses, followed by saturation towards a “plateau” at higher doses, was modeled by the following mathematical expression, where D is radiation dose, kNTEr is the “rise” parameter and kNTE is the “plateau” parameter: kNTE×(1 – exp[-kNTEr×D]). Preliminary fitting attempts showed that kNTEr attained a very high value with a very large uncertainty. Consequently, we fixed kNTEr at 103 Gy-1 instead of allowing it to be freely adjustable. In essence, using this large constant allows the response to rapidly increase from the background value and saturate at low doses, but retains the model’s properties as a smooth function instead of a biologically implausible step function.
Each formalism described in Table 1 was fitted to the data by a robust nonlinear regression algorithm (the nlrob function in R). The robust fitting approach was chosen instead of ordinary least squares (OLS) to reduce the influence of outlier data points in this data set, where the data “scatter” is considerable (OLS regression on these data violated the assumption of normally-distributed residuals, assessed by Shapiro-Wilk normality test).
Assessments of model performance
The performances of different models were compared by information theoretic analysis using the Akaike information criterion with sample size correction (AICc) 27,28. This approach is useful because it takes into account not only the closeness of model fit to the data (the maximized likelihood value), but also model complexity (the number of adjustable parameters) and the sample size. More complex models are “penalized” more strongly, compared with simpler models, and the penalty increases at small sample sizes. ∆AICc, defined as the given model’s AICc score minus the minimum AICc score across all tested models, indicates relative information theoretic support for a given model. This relative support is quantified by the equation exp[-∆AICc/2]. Therefore, the best-supported model has ∆AICc=0, and ∆AICc>6 suggest poor support (i.e. >20-fold lower, relative to the best model). In addition, for each model we calculated the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE).
To test the stability of the best-supported model variant (the one with ∆AICc=0) behaviors under random perturbations of the data, we performed 300 random 50:50 splits of the data set into training and testing parts. In each split, the model was fitted to the training data and the predictions from this fit were tested on the testing data. R2, RMSE, MAE, and model parameter values were also calculated and compared on the training and testing data.
For the best-supported model variant (the one with ∆AICc=0), we performed quantile regression (using the quantreg R package) to model the median (50th percentile), as well as the 25th and 75th percentiles of the radiation response. This approach provides additional information about model uncertainties and data spread without some of the stringent assumptions of OLS regression and is therefore quite useful for this data set.
Mixed effects modeling
Since the novel object recognition test was repeated for most rats twice at different time points, correlations of the responses shown by the same rat at different times could be potentially important, but were not accounted for by the robust and quantile regressions. In other words, although time since irradiation apparently did not play an important role in response magnitudes, a given rat could consistently exhibit higher than average (or lower than average) responses whenever it was tested, causing the data points corresponding to this rat not to be statistically independent. To address this issue, we implemented nonlinear mixed effects modeling (nlme R package) using the best-supported model variant. This approach is commonly used in situations such as the one here, where repeated measurements are made on the same individuals. In addition to the fixed effects, random effects by rat were included for the model parameters.
To reduce potential outlier data point effects on the mixed effects model fit, we first analyzed the data set by the OutlierDetection R package and removed 40 outliers (883 samples were retained). The data set version without these outliers (provided in Supplementary Data File online) was used for the mixed effects modeling, whereas the full data set was used for all other analyses. The Fligner-Killeen test for homogeneity of response variances was significant for radiation dose (D) and LET (p-values 0.00017 and 0.016, respectively), but not for time (p-value 0.080). Consequently, in the mixed effects model the variance function (weights in nlme) was allowed to vary by D and LET. The following analyses of model residuals were used to diagnose potential violations of the main assumptions: plot the residuals, regress residuals vs D and time, boxplot by LET, Shapiro-Wilk normality test, qq plot, calculate skewness and kurtosis, and plot the autocorrelation function.
Machine learning analysis
To assess whether or not the NTE functional form (NTEf = 1-exp[-103×D]) is important for describing the data outside of the parametric regression framework described above, we performed a machine learning analysis based on random forests (RF). RF is a powerful and frequently utilized machine learning method, which captures complex dependences in the data by generating ensembles of decision trees 29. The techniques of bootstrap aggregation or “bagging” (randomly selecting samples from training data with replacement) and “feature randomness” (selecting a subset of predictors randomly for each tree) are used in RF to improve performance. For regression problems such as the one considered here, predictions from all trees are averaged.
For the machine learning analysis, we considered radiation dose (D), energy (in MeV/n), time since exposure, LET (in keV/µm), and NTEf as predictors for the response. The data set was split randomly into training and testing halves. The Boruta feature selection algorithm 30 (Boruta package in R) was implemented on the training data to generate a ranking of importance scores for the predictor variables. Boruta iteratively compares the RF-based importance score of each predictor with the importance score of its randomly shuffled “shadow”. The Boruta analysis was repeated 100 times with different initial random number seeds, and the predictor variables were ranked by the median value of median importance scores across all repeats. This procedure was intended to identify the most important predictors and/or the least important ones, which could potentially be discarded from further analysis.
Using the retained important predictors, we implemented RF (2000 trees), optimizing its parameters (number of variables to possibly split at in each node mtry, splitting rule splitrule, and minimal node size min.node.size) by RMSE using repeated cross-validation (3-fold, 30 repeats) on the randomly-selected training half of the data. Performance (R2, RMSE and MAE) was measured on the testing half of the data. This approach, designed to minimize the probability of “overfitting”, was implemented by the caret and rangerR packages. Robustness of RF predictions and performance metrics to random data fluctuations was assessed by applying the algorithm (with previously optimized parameters) to 300 random training/testing splits of the original data set.