Comparison of statistical methods for estimating continuous paediatric reference intervals: a simulation study

Background: Reference intervals (RIs), which are used as an assessment tool in laboratory medicine, change with age for most biomarkers in children. Addressing this, RIs that vary continuously with age have been developed using a range of curve-fitting approaches. The choice of statistical method may be important as different methods may produce substantially different RIs. Hence, we developed a simulation study to investigate the performance of statistical methods for estimating continuous paediatric RIs. Methods : We compared four methods for estimating age-varying RIs. These were Cole’s LMS, the Generalised Additive Model for Location Scale and Shape (GAMLSS), Royston’s method based on fractional polynomials and exponential transformation, and a new method applying quantile regression using power variables in age selected by fractional polynomial regression for the mean. Data were generated using hypothetical true curves based on five biomarkers with varying complexity of association with age, i.e. linear or nonlinear, constant or nonconstant variation across age, and for four sample sizes (100, 200, 400 and 1000). Root mean square error (RMSE) was used as the primary performance measure for comparison. Results: Regression-based parametric methods performed better in most scenarios. Royston’s and the new method performed consistently well in all scenarios for sample sizes of at least 400, while the new method had the smallest average RMSE in scenarios with nonconstant variation across age. Conclusions: We recommend methods based on flexible parametric models for estimating continuous paediatric RIs, irrespective of the complexity of the association between biomarkers and age, for at least 400 samples.


Background
In laboratory medicine, a reference interval (RI) is an important clinical assessment tool, used for identifying patients who may need further clinical investigation. Statistically, a RI is defined by a range between the α/2 th and (100 -α/2) th percentiles where 0 < α < 100, within which (100 -α)% of the population values should fall. The definition may be straightforward, but estimating RIs is a complex process which requires appropriately selected samples from a well-defined reference population, collection and testing of blood samples and lastly, appropriate statistical methods for estimation [1]. Although all the steps are important, the issues associated with the selection of appropriate statistical methods are often ignored, and these are complex for biomarkers that change with age especially in children [2].
Traditionally, for biomarkers that vary with age, paediatric RIs have been estimated separately for different subgroups of age. However, this approach is considered misleading as children whose ages are close to the age cut-off values may be misdiagnosed [3,4].
There are several statistical methods available for estimating continuous RIs. Cole's lambdamu-sigma (LMS) method for fitting smooth centile curves is the most commonly used approach, based on the power transformation of Box and Cox, assuming the age-specific distribution is normal or may be transformed to normality [9]. Royston's parametric method, which has also been applied in several studies, is based on fitting a fractional polynomial model in age for each parameter of a normal, exponential normal (EN) or modulusexponential normal (MEN) density, with estimation by maximum likelihood. These parameters are then used to transform the data towards normality and centiles are calculated [10]. Quantile regression has also been used in estimating continuous RIs, predicting a conditional quantile of the dependent variable without assuming any parametric conditional distribution [11]. Finally, in recent years, the Generalised Additive Model for Location Shape and Scale (GAMLSS) method has been used increasingly. This method is a modification of the LMS method and can be applied for any distribution of the values and continuous age [12]. A few modifications of the Royston method have also been used in the estimation of continuous RIs [7,13,14].
The choice of statistical method is important as there may be differences in RIs due to the underlying assumptions. Studies investigating the performance of the different statistical methods in estimating continuous RIs using real datasets are unable to compare the various estimated RIs against the "true" RIs [11,15]. Li et al compared four methods which require age partitioning, but they did not consider regression-based parametric methods such as Royston's method [16]. On the other hand, only one simulation study compared the performance between two methods involving a nonparametric method and Royston's method [13]. There is also no recommendation for minimum sample sizes required for different statistical methods except for Royston's and Griffith et al's method [13,17].
In summary, there are several sophisticated curve-fitting approaches available for estimating continuous RIs. However, there is no comparison of the statistical methods across different scenarios to provide a clear understanding of the way in which sampling variation may affect the capacity of each method to reproduce the true underlying pattern of variation with age.
Guidelines for minimum samples sizes required are also needed. Hence, we developed a simulation study to investigate the performance of four statistical methods used in estimating continuous RIs for a range of (true) relationships with age and varying sample sizes. In addition, we provided an illustration of differences between the RIs obtained by the four methods for two biomarkers using data from the HAPPI Kids [18] study, for two different sample sizes.

Methods
We performed a simulation experiment comparing four statistical methods for estimating continuous paediatric RIs. We examined how these methods performed under five scenarios chosen to provide examples of biomarkers that displayed increasing complexity in their age dependence e.g. potassium, creatinine, total protein, alkaline phosphatase (ALP) and phosphate (Figure 1) and four sample sizes (n = 100, 200, 400, and 1000) for each scenario.
The details of the simulation design are provided below, following reporting guidelines proposed by Morris et al [19].

Data generation
For each replication (n=1000), we first generated two covariates i.e. age and sex. The age variable was generated between 1 month and 18 years based on a uniform distribution, while half of the samples were assigned as male and the other half as female. The reference values of the biomarkers were generated using the models presented in Table 1 based on the relationships with age that were observed in the HAPPI Kids study: see Figure 1 [7].

Target of analysis
The parameters of interest were the 2.5 th and 97.5 th percentiles at each integer age from 1 to 18 years and by sex. The true 2.5 th and 97.5 th percentiles denoted by ( ) (where α =2.5 or 97.5) at each age j for sex k were calculated using − 1.96 * and + 1.96 * respectively, where is the mean and is the standard deviation (SD), using the parametric data-generating equations in Table 1.

Statistical methods for estimating continuous RIs
The statistical methods to be compared were selected based on a systematic review of

Royston's method [10]
This method is described by Royston and Wright [10]. Initially, the best fitting model for the mean and SD values are identified from 44 different models fitted using the fractional powers of age from the set (-2, -1, -0.5, 0, 0, 1, 2, 3), following the model selection procedure EN and MEN is selected by comparing deviance between these models.
As an extension of the Royston approach, in order to evaluate the influence of sex on the RIs in our context, the difference in mean values by sex and the interaction between age and sex are evaluated for the selected model. If the regression coefficient for sex reaches conventional statistical significance (i.e. p < 0.05), sex is included in the final model, otherwise it is excluded. If the difference in deviance between the model with and without an age-by-sex interaction is significant, the data are analysed separately by sex.

Hoq et al's method [7]
We have used this method to estimate continuous paediatric RIs for 30 biomarkers [7].
Initially, the best fitting fractional polynomial model for the mean values of the biomarker is identified following the model selection procedure described as part of Royston's method [10]. Subsequently, the influence of sex on the RIs is evaluated applying the procedure described above. However, instead of analysing the data separately by sex, the interaction between sex and the fractional polynomial representation of age is included in the final model when the age-by-sex interaction is significant. Finally, quantile regression is used to estimate RIs representing the quantiles of the biomarker, constrained to follow the same power variables of age as identified (with sex interaction where indicated).

Cole's lambda, mu and sigma (LMS) method [9]
For In this simulation study, we have used generalised additive models (GAM) to obtain predicted values of the parameters of the Box-Cox transformation due to their flexibility for modelling complex nonlinear relations, similarly to the approach proposed by Cole and Green [9,24]. The smoothness of the curve depends on the number of equivalent degrees of freedom (EDF) selected for each of the three parameters. Hence, to decide on the optimal smoothing parameter, we apply a statistical approach known as cross-validation to minimise error in predicting the new data for the selected EDF. As part of the cross-validation, we omit data for each year of life once and estimate the average error in predicting the mean measurement of the omitted group. This process is repeated for EDF from 2 to 6, and the value producing minimum error in predicting the mean measurement is considered for predicting the three parameters for each year of life. The RIs were estimated separately for each sex.

Generalised additive models for location scale and shape (GAMLSS) [12]
The GAMLSS is a modification of Cole

Performance measures
The performance of each method was evaluated based on the average root mean squared error (RMSE) over age and sex i.e.
where the true parameter is denoted ( ) , its estimate ̂( ) , is the integer age from 1 to 18, k = 0 or 1 stands for male or female respectively, α indexes the 2.5 th and 97.5 th percentiles and E(̂( ) −

Results from the simulation study
The average RMSE across age and sex for the four methods in the five scenarios, each with four sample sizes, is presented in Table 2. The average estimated and true RIs for five scenarios are presented in the Supplementary Figure 1-5

Results from the HAPPI Kids Study
Finally, we provide an illustration of differences between the RIs obtained by the various methods for two biomarkers, creatinine and ALP in data from the HAPPI Kids [18] study, for sample sizes 350 and 630 (Figures 4 and 5). The differences in the continuous RIs for creatinine across the four methods were minimal. For ALP, the upper limits of the continuous RIs varied a little by method while the differences were negligible for the lower limits. For both creatinine and ALP, the difference in RIs applying regression-based parametric methods were minimal between sample sizes 350 and 630.

Discussion
We have used a simulation study to compare the performance of four statistical methods for estimating continuous paediatric RIs in five scenarios for four different samples sizes.
Regression-based parametric methods (Hoq et al [7] and Royston [10] The differences seen in the estimated RIs for biomarkers from the HAPPI Kids study using the four methods were consistent with the findings of our simulation study. The small differences in the estimated RIs from sample size 350 to 630 using Hoq et al's and Royston's method may not be clinically significant [2,30]. As already discussed, the flexibility of the GAMLSS methods suggested dynamic variations during pubertal ages for ALP. However, for large sample sizes the differences between methods may be minimal.
The regression-based parametric methods have some advantages over semi-parametric methods, such as expressing continuous RIs as an explicit function of age and sex, and estimation of SE of the estimated RIs [7,10,14]. These functions could easily be incorporated in the modern laboratory information system, a challenge highlighted by several studies in implementing continuous RIs [5,7,31,33]. Estimation of a standard error enables assessment of the precision of the RI limits [7,10,14]. However, the CIs for the continuous RI limits were too narrow for large samples, resulting in poor coverage [10]. Unlike the parametric methods, Cole's LMS and GAMLSS methods do not provide estimates of variability of estimation [12,21]. For these methods, a standard error could be estimated by bootstrapping of subsamples [8,15,30].
In this study, we have only modelled age and sex-specific variation, where sex is a binary variable. Further research is needed to understand the performance of different methods based on the relationship of the biomarker with age and a second continuous variable such as height [16].
For the simulations, the data were generated using parametric functions that may have favoured the parametric methods. An alternative approach for data generation could be to take multiple sub-samples from a real dataset [16]. However, defining the true values, which is essential in assessing the performance, would have been difficult unless we had a very large sample.

Conclusions
In summary, we compared four statistical methods for estimating continuous paediatric RIs.
Our findings indicate that the regression-based parametric methods described by Hoq et al and Royston performed consistently well in most scenarios and for sample sizes 400 and over. The findings of the simulation study should provide evidence for future guidelines on the statistical method for estimating continuous RIs.

List of Abbreviations
Alkaline phosphatase (ALP), confidence interval (CI), equivalent degrees of freedom (EDF), exponential normal (EN), generalised additive model for location shape and scale

Authors Contributions
MH and JC designed the study with input from PM and SD. MH then designed the simulation study, analysed and interpreted all the data and drafted the manuscript. PM provided critical review and input to the manuscript. SD and JC provided advice on the design of the study, analysis and interpretation of the data, and contributed in writing the manuscript. All authors read and approved the final manuscript.