Purpose
A data mining approach was applied to establish a multilevel hierarchy explaining physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior.
Methods
The 46-year follow-up data from the population-based Northern Finland Birth Cohort 1966 were used to create a hierarchy using Chi-square Automatic Interaction Detection (CHAID) decision tree technique for predicting PA behavior. The study’s subjects were classified as physically active or physically inactive based on their activity profiles derived from objective measurement of PA. The variables were a wide list of potentially modifiable factors including self-reported, clinical, and environmental measures. We then analyzed the association of the factors emerging from the model with three PA metrics including sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA) minutes per day.
Results
Model fitting was performed using a total of 168 factors as input variables to classify the PA behavior of 2,701 physically active and 1,881 physically inactive subjects. The decision tree selected a total of 36 factors of different domains by which 54 subgroups of subjects were formed. Factors emerging from the model were associated with the PA metrics, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7).
Conclusions
Using data mining, a data-driven model was established from empirical data that can be potentially utilized to identify subgroups for multilevel intervention allocation. An extensive set of factors was methodologically discovered that can be a basis for additional hypothesis testing in PA correlates research.

Figure 1

Figure 2
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
On 25 Jun, 2020
On 24 Jun, 2020
On 24 Jun, 2020
On 08 Jun, 2020
Received 20 May, 2020
Received 20 May, 2020
Invitations sent on 12 May, 2020
On 12 May, 2020
On 12 May, 2020
On 03 May, 2020
On 02 May, 2020
On 02 May, 2020
Posted 17 Feb, 2020
On 29 Mar, 2020
Received 25 Mar, 2020
Received 18 Mar, 2020
On 04 Mar, 2020
On 28 Feb, 2020
Invitations sent on 27 Feb, 2020
On 14 Feb, 2020
On 14 Feb, 2020
On 13 Feb, 2020
On 12 Feb, 2020
On 25 Jun, 2020
On 24 Jun, 2020
On 24 Jun, 2020
On 08 Jun, 2020
Received 20 May, 2020
Received 20 May, 2020
Invitations sent on 12 May, 2020
On 12 May, 2020
On 12 May, 2020
On 03 May, 2020
On 02 May, 2020
On 02 May, 2020
Posted 17 Feb, 2020
On 29 Mar, 2020
Received 25 Mar, 2020
Received 18 Mar, 2020
On 04 Mar, 2020
On 28 Feb, 2020
Invitations sent on 27 Feb, 2020
On 14 Feb, 2020
On 14 Feb, 2020
On 13 Feb, 2020
On 12 Feb, 2020
Purpose
A data mining approach was applied to establish a multilevel hierarchy explaining physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior.
Methods
The 46-year follow-up data from the population-based Northern Finland Birth Cohort 1966 were used to create a hierarchy using Chi-square Automatic Interaction Detection (CHAID) decision tree technique for predicting PA behavior. The study’s subjects were classified as physically active or physically inactive based on their activity profiles derived from objective measurement of PA. The variables were a wide list of potentially modifiable factors including self-reported, clinical, and environmental measures. We then analyzed the association of the factors emerging from the model with three PA metrics including sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA) minutes per day.
Results
Model fitting was performed using a total of 168 factors as input variables to classify the PA behavior of 2,701 physically active and 1,881 physically inactive subjects. The decision tree selected a total of 36 factors of different domains by which 54 subgroups of subjects were formed. Factors emerging from the model were associated with the PA metrics, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7).
Conclusions
Using data mining, a data-driven model was established from empirical data that can be potentially utilized to identify subgroups for multilevel intervention allocation. An extensive set of factors was methodologically discovered that can be a basis for additional hypothesis testing in PA correlates research.

Figure 1

Figure 2
This is a list of supplementary files associated with this preprint. Click to download.
Loading...