Selection Stability in High Dimensional Statistical Modelling: Defining a Threshold for Robust Model Inference

doi:10.21203/rs.3.rs-738092/v1

Download PDF

Research Article

Selection Stability in High Dimensional Statistical Modelling: Defining a Threshold for Robust Model Inference

https://doi.org/10.21203/rs.3.rs-738092/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Epidemiological research commonly involves identification of causal factors from within high dimensional (wide) data, where predictor variables outnumber observations. In this situation, however, conventional stepwise selection procedures perform poorly. Selection stability is one method to aid robust variable selection, by refitting a model to repeated resamples of the data and calculating the proportion of times each covariate is selected. A key problem when applying selection stability is to determine a threshold of stability above which a covariate is deemed ‘important’.

In this research we describe and illustrate a two-step process to implement a stability threshold for covariate selection. Firstly, covariate stability distributions were established with a permuted model (randomly reordering the outcome to sever the relationship with predictors) using a cumulative distribution function. Subsequently, covariate stability was estimated using the true model outcome and covariates with a stability above a threshold defined from the permuted model, were selected in a final model. The proposed method performed well across 22 varied, simulated datasets with known outcomes; selection error rates were consistently lower than conventional implementation of equivalent models. This method of covariate selection appears to offer substantial advantages over current methods, to accurately identify the correct covariates from within a large, complex parameter space.

Epidemiology

Covariate selection

selection stability

stability threshold

high dimensional data

statistical triangulation.

No competing interests reported.

Supplementarymaterials.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Selection Stability in High Dimensional Statistical Modelling: Defining a Threshold for Robust Model Inference

Status:

Version 1

Abstract

Full Text

Additional Declarations

Supplementary Files

Status:

Version 1