The figure 2.1 describes the architecture of the proposed model. The set of daily life activities and physical measures of an individual is taken from the users and fed into a pre-processor phase, which processes the input by reducing the number of features and does the required data pre-processing operations.
4.1 Pre-processing:
The daily life activities of an individual that are mainly considered are screen time, sleep time, physical activity, number of cigarettes smoked, units of alcohol consumed. The measures that are mainly considered are age, gender, height, weight, calorie intake. Thus, there are ten features that are collected from an individual. Then, in the pre-processing step, the number of features is reduced by removing the activities and measures that do not have any direct effect on health status. This is achieved by using the Harris-Benedict Equation (Harris JA & Benedict FG,1918).
The Harris-Benedict Equation (Harris JA & Benedict FG, 1918) is a method used to estimate an individual's basal metabolic rate (BMR). It says
For Men
|
BMR =(10 × Weight in kg) + (6.25 × Height in cm) - (5 × Age in years) + 5
|
For Women
|
BMR = (10 × Weight in kg) + (6.25 × Height in cm) - (5 × Age in years) - 161
|
As per the Harris-Benedict Equation (Harris JA & Benedict FG, 1918), the calories to be consumed is depending on the BMR value and the physical activity.
Calories to be consumed = BMR * Physical Activity
Calorie Difference = (Calories Consumed) - (Calories to be consumed)
In the proposed method the number of features is reduced to seven. They are age, gender, sleep time, screen time, number of cigarettes, units of alcohol consumed, and calorie intake.
4.2 Phase-I:
The Phase-I of the model, process the data received from both the data sources and the user. In this phase, a decision tree classifier is used to estimate the health parameter of the user. Initially, the model Is trained with the dataset received from the data sources. The Phase-I of the model estimates the health status of an individual for a particular day. But an individual’s health status can’t be accurate just by considering one day's output. In Phase-I the decision tree classifier is used, it takes the activities of an individual as input and produces the status of the health parameters for one day. Thus, the output of Phase-I is collected over a week and feeds it to Phase-II.
4.2 Phase-II:
The Phase-II of the model, process the data received from the data sources and the output of the Phase-I. In this phase, the decision tree classifier is used to estimate the health parameter of the user. Initially, the model is trained with the dataset received from the data sources. The Phase-II of the model estimates the health status of an individual for a week. The output of Phase-II estimates the health status and generates the alerts and suggestions that are to be notified to the individual. In Phase-II the decision tree classifier is used, it takes the daily status of the health parameters over a week as input (i.e. the output of Phase-I) and outputs alerts & predictions of that health parameter.
4.3 Dataset Generation
Below sub-section provides the details of the rules collection and the dataset generation. The generated dataset is used for training the model proposed in the previous section.
4.3.1 Rules Collection
For preparing the datasets a proper set of rules is required on how the daily life activities of an individual affect his health status. From different trusted sources (Hirshkowitz M, et al., 2015, Bjartveit K, &Tverdal A, 2005) the rules are collected. Based on the activities and measures of an individual, these rules give the overall health status of an individual. For example, the recommended sleep time for the person aged between 6 to 13 years is 9 to 11 hours. if the sleep time is between 7 to 8, it is a little less than normal. if the sleep time is between 11 to 12, it is a little more than normal. if the sleep time is more than 12 or less than 8, then it affects health.
4.3.2 Feature Selection
Selecting the features from the rules that are collected and these rules depend on some activities and measures of an individual. For example, alcohol consumption rules for females are different from males. Similarly, the calorie value recommended for a person of 100kg is different than that of a person of 50kg(Bjartveit K, &Tverdal A, 2005). In these examples, gender and weight are the features that are selected. In a similar fashion all the features like age, gender, height, weight, calorie intake, units smoked, units drunk, physical activity, screen time and sleep time were collected
4.3.3 Feature Reduction
Although the features were collected, some of them might not affect the health status of a person directly. Thus, the collected features need to be transformed into the actual features which affect the health status. Here, the Harris-Benedict equation is used to reduce the features. The Harris-Benedict equation (Harris JA & Benedict FG, 1918) is a method used to estimate an individual's basal metabolic rate (BMR). It says that the calories to be consumed depends on the BMR value and physical activity.
For example, If the physical activity is sedentary or a little active, then the calories to be consumed is 1.2*BMR. If the physical activity is lightly active, then the calories to be consumed is 1.375*BMR. If physical activity is moderate, then the calories to be consumed is 1.55*BMR. If physical activity is an intense exercise, then the calories to be consumed is 1.725*BMR. If physical activity is an extra hard exercise, then the calories to be consumed is 1.9*BMR.
Calorie Difference = (Calories Consumed) - (Calories to be consumed) (1.1)
Thus, the total number of inputs is reduced to seven. They are Age, Gender, Number of units smoked, Units of Alcohol Consumed, Screen Time, Sleep Time, Calorie Difference.
4.3.4 Dataset Generation from Rules
Based on the rules discussed in section 2.4.3.1, all the required features are extracted. The features include daily life activities and physical measures of an individual. From the features extracted, the number of features is reduced using some standard techniques as discussed (Harris JA & Benedict FG, 1918).
There are two phases in the proposed system. Thus, the Phase-I needs one dataset and the Phase-II needs a different dataset with class labels. The example dataset is described in Table 2.1
Table 2.1: Sample Dataset for Phase-I
Class
|
Condition
|
Class label
|
Description
|
sleep
|
0
|
for age less than 2 sleep value between 11 and 14
For age between 3-5 sleep value between 10 and 13
For age between 6-13 sleep value between 9 and 11
For age between 14-17 sleep value between 8 and 10
For age between 18-25 sleep value between 7 and 9
For age between 26-64 sleep value between 7 and 9
For age greater than 65 sleep value between 7 and 8
|
normal
|
It tells the optimal sleep value for different age groups
|
1
|
for age less than 2 sleep value between 9 and 10
For age between 3-5 sleep value between 8 and 9
For age between 6-13 sleep value between 7 and 8
For age between 14-17 sleep value between 7 and 8
For age between 18-25 sleep value between 6 and 7
For age between 26-64 sleep value between 6 and 7
For age greater than 65 sleep value between 5 and 6
|
less sleep
|
It tells the sleep value is less than the optimal value for different age groups
|
2
|
for age less than 2 sleep value between 15 and 16
For age between 3-5 sleep value between 13 and 14
For age between 6-13 sleep value between 11 and 12
For age between 14-17 sleep value between 10 and 11
For age between 18-25 sleep value between 9 and 10
For age between 26-64 sleep value between 9 and 10
For age greater than 65 sleep value between 8 and 9
|
more sleep
|
It tells the sleep value is more than the optimal value for different age groups
|
Smoke
|
0
|
if the number of cigars smoked is 0
|
good smoke status
|
|
1
|
if the number of cigars smoked is between 1 and 4
|
smoking status is reasonable
|
|
2
|
if the number of cigars smoked is between 5 and 15
|
bad smoking status
|
|
3
|
if the number of cigars smoked is more than 15
|
dangerous smoking status
|
|
Drink
|
0
|
if the number of units consumed is 0
|
drinking status is good
|
|
1
|
if gender is male and the number of units consumed is less than 2
If gender is female and the number of units consumed is less than 1
|
drinking status is reasonable
|
|
2
|
if gender is male and the number of units consumed is between 3 and 4
If gender is female and the number of units consumed is less than 2 and 3
|
drinking status is bad
|
|
4.3.5 Example
Let the individual’s activities and measures for a day are:
Input = (Age=21) ∩ (Gender=Male) ∩ (No. of cigars smoked=0) ∩ (Units of Alcohol Consumed=2) ∩ (Screen Time=6) ∩ (Sleep Time=8) ∩ (Height=176) ∩ (Weight=63) ∩ (Calorie Intake=1800) ∩ (Physical Activity=Lightly Active)
5.1 Pre-processing:
BMR = (10 × Weight in kg) + (6.25 × Height in cm) - (5 × Age in years) + 5 ---------(from Harris JA & Benedict FG,1918) BMR = 10 × 63 + 6.25 × 176 – 5 × 21 + 5 = 1630
Calories needs to be consumed = BMR × Physical Activity = 1630 × 1.375 = 2241.25
Calorie Difference = Calories consumed – Calories needs to be consumed= 1800 – 2241.25 = -441.25
Thus, inputs after pre-processing are:
Input1 = (Age=21) ∩ (Gender=Male) ∩ (No. of cigars smoked=0) ∩ (Units of Alcohol Consumed=2) ∩ (Screen Time=6) ∩ (Sleep Time=8) ∩ (Calorie Difference=-441.25)