Purpose: A data mining approach was applied to establish a multilevel hierarchy predicting physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior.
Methods: Cross-sectional data from the population-based Northern Finland Birth Cohort 1966 study, collected in the most recent follow-up at age 46, were used to create a hierarchy using the chi-square automatic interaction detection (CHAID) decision tree technique for predicting PA behavior. PA behavior is defined as active or inactive depending on participants’ activity profiles, which were previously created through a multidimensional (clustering) approach on continuous accelerometer-measured activity intensities in one week. The input variables (predictors) used for decision tree fitting consisted of individual, demographical, psychological, behavioral, environmental, and physical factors. Using generalized linear mixed models, we also analyzed how factors emerging from the model were associated with three PA metrics, including daily time (minutes per day) in sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA), to assure the relative importance of methodologically identified factors.
Results: Of the 4,582 participants with valid accelerometer data at the latest follow-up, 2,701 and 1,881 had active and inactive profiles, respectively. We used a total of 168 factors as input variables to classify these two PA behaviors. Out of these 168 factors, the decision tree selected 36 factors of different domains from which 54 subgroups of participants were formed. The emerging factors from the model explained minutes per day in SED, LPA, and/or MVPA, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7).
Conclusions: Using data mining, we established a data-driven model composed of 36 different factors of relative importance from empirical data. This model may be used to identify subgroups for multilevel intervention allocation and design. Additionally, this study methodologically discovered an extensive set of factors that can be a basis for additional hypothesis testing in PA correlates research.
Figure 1
Figure 2
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
On 25 Jun, 2020
On 24 Jun, 2020
On 24 Jun, 2020
Posted 15 May, 2020
On 08 Jun, 2020
Received 20 May, 2020
Received 20 May, 2020
Invitations sent on 12 May, 2020
On 12 May, 2020
On 12 May, 2020
On 03 May, 2020
On 02 May, 2020
On 02 May, 2020
On 29 Mar, 2020
Received 25 Mar, 2020
Received 18 Mar, 2020
On 04 Mar, 2020
On 28 Feb, 2020
Invitations sent on 27 Feb, 2020
On 14 Feb, 2020
On 14 Feb, 2020
On 13 Feb, 2020
On 12 Feb, 2020
On 25 Jun, 2020
On 24 Jun, 2020
On 24 Jun, 2020
Posted 15 May, 2020
On 08 Jun, 2020
Received 20 May, 2020
Received 20 May, 2020
Invitations sent on 12 May, 2020
On 12 May, 2020
On 12 May, 2020
On 03 May, 2020
On 02 May, 2020
On 02 May, 2020
On 29 Mar, 2020
Received 25 Mar, 2020
Received 18 Mar, 2020
On 04 Mar, 2020
On 28 Feb, 2020
Invitations sent on 27 Feb, 2020
On 14 Feb, 2020
On 14 Feb, 2020
On 13 Feb, 2020
On 12 Feb, 2020
Purpose: A data mining approach was applied to establish a multilevel hierarchy predicting physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior.
Methods: Cross-sectional data from the population-based Northern Finland Birth Cohort 1966 study, collected in the most recent follow-up at age 46, were used to create a hierarchy using the chi-square automatic interaction detection (CHAID) decision tree technique for predicting PA behavior. PA behavior is defined as active or inactive depending on participants’ activity profiles, which were previously created through a multidimensional (clustering) approach on continuous accelerometer-measured activity intensities in one week. The input variables (predictors) used for decision tree fitting consisted of individual, demographical, psychological, behavioral, environmental, and physical factors. Using generalized linear mixed models, we also analyzed how factors emerging from the model were associated with three PA metrics, including daily time (minutes per day) in sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA), to assure the relative importance of methodologically identified factors.
Results: Of the 4,582 participants with valid accelerometer data at the latest follow-up, 2,701 and 1,881 had active and inactive profiles, respectively. We used a total of 168 factors as input variables to classify these two PA behaviors. Out of these 168 factors, the decision tree selected 36 factors of different domains from which 54 subgroups of participants were formed. The emerging factors from the model explained minutes per day in SED, LPA, and/or MVPA, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7).
Conclusions: Using data mining, we established a data-driven model composed of 36 different factors of relative importance from empirical data. This model may be used to identify subgroups for multilevel intervention allocation and design. Additionally, this study methodologically discovered an extensive set of factors that can be a basis for additional hypothesis testing in PA correlates research.
Figure 1
Figure 2
This is a list of supplementary files associated with this preprint. Click to download.
Loading...