Prediction Modeling of Mental Well-Being Using Health Behavior Data of College Students

Background: Since the onset of the COVID-19 pandemic in early 2020, the importance of timely and effective assessment of mental well-being has increased dramatically. Due to heightened risks for developing mental illness, this trend is likely to continue during the post-pandemic period. Machine learning (ML) algorithms and artificial intelligence (AI) techniques can be harnessed for early detection, prognostication and prediction of negative psychological well-being states. Objective: Studies using machine learning classification of mental well-being are scarce in Asian populations. This investigation aims to develop reliable machine learning classifiers based on health behavior indicators applicable to university students in South-East Asia. Methods: Using data from a large, multi-site cross-sectional survey, this research work models mental well-being and reports on the performance of various machine learning algorithms, such as generalized linear models, k-nearest neighbor, naïve-Bayes, neural networks, random forest, recursive partitioning, bagging, and boosting. Prediction models were evaluated using various metrics such as accuracy, error rate, kappa, sensitivity, specificity, Area Under the recursive operating characteristic Curve (AUC), and Gini Index. Results: Random forest and adaptive boosting algorithms achieved the highest accuracy of identifying negative mental well-being traits. The top five most salient features associated with predicting poor mental well-being include body mass index, number of sports activities per week, grade point average (GPA), sedentary hours, and age. Conclusions: Based on the reported results, several specific recommendations and suggested future work are discussed. These findings may be useful to provide cost-effective support and modernize mental well-being assessment and monitoring at the individual and university level.


Introduction
Over four decades of research has linked positive mental well-being to improvements in health, development, and longevity (Agteren et al., 2021). Mental well-being can be seen as a separate, independent state from mental illness. A 10-year longitudinal study showed that improving mental wellbeing reduced the risk of developing mental illness by up to 8.2 times in people without mental health disorders (Keyes, Dhingra, and Simoes 2010). Long term poor psychological well-being is an important indicator in developing mental illness, viz., depression, anxiety disorders, eating disorders, and addictive behaviors (Srividya, Mohanavalli, and Bhalaji 2018). Thus, elevating mental well-being has become an essential therapeutic route to disease prevention.
Negative mental well-being typically manifests in young adults. However, some evidence showed that help-seeking behaviors start late when symptoms of mental illness have already appeared (Kessler and Bromet 2013). Help-seeking is further delayed with societal stigma, particularly in Asian cultures that often results in under-reporting cases related to mental illness (Tan et al. 2020). Therefore, it is important to have a predictive mechanism to identify young people with negative mental well-being early to minimize the risk of developing mental health disorders (Srividya, Mohanavalli, and Bhalaji 2018; Tan et al. 2020).
The literature evidence promisingly that good physical health is a crucial factor in uencing mental wellbeing (Kanekar and Sharma 2020; Nagy-Pénzes, Vincze, and Bıŕó 2020; Milne-Ives et al. 2020; Linden and Stuart 2020). Meta-analyses aggregating the results from numerous studies have revealed important links between mental disorders and physical inactivity and association with non-communicable diseases (NCDs) such as diabetes, heart disease, and multi-morbidity disorders (Stein et al. 2019). Although NCDs are usually asymptomatic in young adults, it is an added bene t to promote healthy and active behaviors early to prevent or delay the development of NCDs -as these health habits track into mid-and older-age [REF]. The health behaviors of young adults, particularly university students, provide important insight into NCD levels in the future (Secretariat 2017).
As data science techniques are no longer restricted to its predecessors of applied mathematics, statistics, and computer science, it is timely and practical for social and health sciences to utilize machine learning algorithms to address issues that have profound effects on human lives (Metcalf and Crawford 2016). In this regard, machine learning classi ers can be used to close this gap and provide more effective early detection and assessments of mental well-being in health prevention programs. Furthermore, mental wellbeing is directly related to the social and cultural aspects of the population in different regions (Srividya, Mohanavalli, and Bhalaji 2018). Hence, it is essential to use regional data for a prediction system that is customized for the target region. This system is particularly important for the Association of South-East Asian Nation (ASEAN) University Network-Health Promotion Network (AUN-HPN) as mental illness prevention is one of the key and immediate priority of the network. In addition, studies on mental wellbeing have become a priority in higher education institutions during the COVID-19 pandemic and will continue to be important in the post-pandemic era due to the heightened risk for developing serious mental health issues (Agteren et al. 2021; Liu et al. 2021). Furthermore, the Southeast and East Asia region, where ASEAN is inclusive, is the fastest growing digital market in the world with values exceeding US$100 billion in 2019 and is expected to grow by four times that of the regional gross domestic product by 2023 (Chen & Ruddy, 2020). To incentivize the growing digital economy, higher education institutions should prioritize and ensure data infrastructure readiness and connectivity in the region for easement of research and development in the era of digital revolution. Therefore, this study aims to classify negative mental well-being based on indicators of healthy behaviors among university students in ASEAN using machine learning prediction models.

Page 4/20
The data used in this study was extracted from an online cross-sectional survey of 15,366 university students from the ASEAN countries. The target universities consisted of 17 ASEAN University Network (AUN) member universities across seven ASEAN countries, namely, Brunei Darussalam, Indonesia, Malaysia, Philippines, Singapore, Thailand, and Vietnam.
The questionnaire was developed in several rounds of consultation meetings with experts from the AUN Health Promotion Network committee and member universities. The measurement tools used selected were widely used and validated in multiple countries (Appendix A). The features are extracted based on the focus of this study. Mental well-being was measured using the shortened Warwick-Edinburgh Mental Well-being Scale (WEMWBS), a reliable and valid tool for university student. WEMWBS score was dichotomized into poor well-being" (7.0-17.99) and good well-being" (≥18.00).
Physical activity (PA) was measured using the Global Physical Activity Questionnaire (GPAQ) version 2.0. Low PA is classi ed as those who had less than 600 Metabolic equivalents (MET)-minutes/week that resulted in a failed to comply with the conditions of minimum energy expenditure for physical activity. Number of sport activities were also collected and categorized into none, one to three, four to six, and more than six activities per week.
Health-risk behaviors were also collected including consumption of alcohol, smoking, fruits and vegetables, salts, and sugar-sweetened beverages were measured using items from existing instruments. For tobacco consumption, students who smoked daily were dichotomized into "Yes" (current smokers) and "No" (not current smokers). For alcohol consumption, students were asked if they do or do not drink alcohol. For fruit/vegetable consumption, students were asked how many servings of fruits/vegetable they usually eat each day, and consumption of ≥5 servings/day was considered healthy. Consumption of snacks/fast food was assessed by asking how many days per week students eat fast food. Students who consumed fast food every day were categorized into "Yes" and the remaining responses were collapsed into "No." Salt intake was assessed by asking if they added salt in their food before eating (<1 tea spoon to ≥3 tea spoons). Adding ≥1 tea spoon or 6 gm/per day was considered excessive sodium intake. Students were also asked how many days they drank sugar-sweetened beverages. Response were handled similarly to the consumption of fast food. Participants provided demographic information including age, gender, GPA (grading system for students' academic performance), and Body Mass Index (BMI). An open-ended question regarding opinion on physical activity was asked to obtain textual data.
Ethical approval was obtained from the institutional review board of each university prior to conducting the study (See Declarations).

Data preprocessing
Data cleaning procedures were employed including removal of ineligible cases, duplicate responses, responses with more than 50% missing values (listwise deletion), and invalid questionnaire responses. A total of 15,366 remaining cases were used in the subsequent analysis. Missing data in these valid cases were handled using multiple imputation techniques -MICE (Multivariate Imputation via Chained Equations) using 10 multiple imputations to replace missing with predicted values, using R package mice (Zhang 2016). The dataset with unbalanced with respect to the binary outcome of negative or poor mental well-being. To avoid potential bias in the AI/ML modeling, the dataset was re-balanced using the Synthetic Minority Oversampling TEchnique (SMOTE) (Chawla et al. 2002).

Feature selection
According to the principle of parsimony, simplicity or a simple apriori model often provides the best explanation of a problem, relative to more complex models because inclusion of unnecessary features creates intrinsic and extrinsic noise (Naser 2021). Accounting only for key data elements avoids model over tting, provides better predictive accuracy and generalization, and facilitate practical application (Guan and Loew 2020). Due to limitations of different types of feature selection method, three strategies were used to validate selection of salient variables or features that will be used in the training models in this study. The rst strategy was based on the Benjamini-Hochberg False Discovery Rate method that controls for expected proportion of false rejection of features in multiple signi cance testing (Benjamini and Hochberg 1995), which could be expressed as follows: Second, a deterministic wrapper method based on stepwise selection, an iterative process of adding important features to a null set of features and removing worst-performing features from the list of complete features, was computed (Naser 2021). The nal strategy utilized a randomized wrapper method, Boruta, which iteratively removes features that are relatively less statistically signi cant compared to random probes, was employed (Kursa, Rudnicki, and others 2010). Our aggregate featureselection technique utilized the intersection of these three variable elimination strategies and generated a smaller collection of variables used in the subsequent AI modeling.

Training Machine Learning Classi ers
Classi cation is a supervised machine learning technique that group records into sets of homologous observations associated with particular classes. Different classi ers or classi cation algorithms are available. In this study, six different classi ers were trained including generalized linear model (glm), knearest neighbor (knn), naïve-Bayes (nb), neural network (nnet), random forest (rf), and Recursive partitioning (RPART).
The generalized linear model, speci cally, logistic regression, is a linear probabilistic classi er. It takes in the probability values for binary classi cation, in this case, positive (0) and negative (0) mental wellbeing, and estimate class probabilities directly using the logit transform function (Myers and Montgomery 1997).
Naïve-Bayes predicts class membership probabilities based on the Bayes theorem and naive assumption that all features are equally important and independent (Dinov 2018). Bayes conditional probability could be expressed as: Essentially, the probability of class level L given an observation, represented as a set of independent features F 1 , F 2 , . . . , F n . Then the posterior probability that the observation is in class L is equal to: where the denominator, ∏ n i = 1 P F i , is a scaling factor that represents the marginal probability of observing all features jointly.
For a given case X = (F 1 , F 2 , . . . , F n ), i.e., given vector of features, the naive Bayes classi er assigns the most likely class Ĉ by calculating for all class labels L, and then assigning the class Ĉ corresponding to the maximum posterior probability. Analytically, Ĉ is de ned by: As the denominator is static for L, the posterior probability above is maximized when the numerator is maximized, i.e.,Ĉ = argmax L P C L ∏ n i = 1 P F i C L ).
Arti cial neural networks, or simply neural nets, simulate the underlying intelligence of the human brain by using a synthetic network of interconnected neurons (nodes) to train the model. The features are weighted by importance and the sum is passed according to an activation function, and generate an output (y) at the end of the process (Dinov 2018). A typical output could be expressed as: Random forest classi er is a randomized ensemble of decision trees that recursively partition the dataset into roughly homogeneous or close to homogeneous terminal nodes. It may contain hundreds to thousands of trees that are grown by bootstrapping samples of the original data. The nal decision is obtained when the tree branching process terminates and provides the expected forecasting results given the series of events in the tree (Dinov 2018; Nguyen, Wang, and Nguyen 2013).
Recursive partitioning (RPART) is another decision tree classi cation technique that works well with variables with de nite ordering and unequal distances. The tree is built similarly as random forest with a resultant complex model. However, RPART procedure also trims back the full tree into nested terminals based on cross-validation. The nal model of the sub-tree provides the decision with the 'best' or lowest estimated cross-validation error (Therneau, Atkinson, and others 1997).
The caret package was used for automated parameter tuning with repeatedcv method set at 15-fold cross-validation re-sampling that was repeated with 10 iterations (Kuhn 2009).
In this study, random forest outperformed other machine learners. However, general decision trees might over t model to noise in the training dataset. To overcome this, we implemented bootstrap aggregation (bagging) and boosting to reduce variance and bias, respectively.
Bagging decreases the variance in the prediction model by essentially generating additional data for training original dataset using bootstrapping methods. Boosting reduces bias in parameter estimation by sub-setting the original data to produce a series of models and boost their performance (in this case, measured by accuracy) by combining them together (Dinov 2018 Where, True Positive(TP) is the number of observations that correctly classi ed as "yes" or "success." True Negative(TN) is the number of observations that correctly classi ed as "no" or "failure." False Positive(FP) is the number of observations that incorrectly classi ed as "yes" or "success." False Negative(FN) is the number of observations that incorrectly classi ed as "no" or "failure" (Dinov 2018).
Whereas, error rate is the proportion of mis-classi ed observations calculated using: The accuracy and error rate and accuracy add up to 1. Therefore, a 95% accuracy means 5% error rate (Dinov 2018).
Kappa statistic measures the possibility of a correct prediction by chance alone and evaluate the agreement between the expected truth and the machine learning prediction. When kappa = 1, there is a perfect agreement between a computed prediction and an expected prediction (typically random, bychance, prediction). Kappa statistics can be expressed as (Dinov 2018): where P(a) and P(e) simply denote the probability of actual and expected agreement between the classi er and the true values. Sensitivity is a statistic that indicates the true positive rate measures the proportion of "success" observations that are correctly classi ed (Dinov 2018). This can be expressed as: On the other hand, speci city is a statistic that indicates the true negative rate measures the proportion of "failure" observations that are correctly classi ed (Dinov 2018). This can be expressed as: The Gini index is based on variable importance measure and evaluate information gain by calculating the estimated class probabilities (Dinov 2018). This can be expressed as: where k is the number of classes.

Results
The cleaned and preprocessed dataset comprises n=15,366 cases with k=20 features. The majority of respondents were from Vietnam (33.3%), followed by Indonesia (28.8%) and Thailand ( Feature importance Figure 1 illustrated ten features that are salient to the prediction model of mental well-being. This is corroborated by the error plot, variable importance plot (accuracy) and Gini index ( Figure 2). The ten salient indicators for mental well-being rank order by importance based on health behaviors comprised of body mass index, number of sports activity per week, grade point average (GPA), sedentary hours, age, gender, salt intake, fruits and vegetables consumption, hours of sleep, and achieved recommended physical activity levels.

Model evaluation
The dataset was randomly partitioned into the training set (80%) and the testing set (20%). The training dataset was used to build the classi er models using different classi cation algorithms including generalized linear model, k-nearest neighbors, naïve-Bayes, neural net, random forest, and recursive partitioning. The performance of the trained classi ers were then evaluated using accuracy and kappa statistics. Figure 3 illustrates the results of the trained model performance. The overall performance effectiveness of a classi er indicated using accuracy and kappa statistics showed that random forest (accuracy = 0.921, kappa = 0.788) was the best classi er, followed by k-nearest neighbor (accuracy = 0.775, kappa = 0.554) and naïve-Bayes (accuracy = 0.723, kappa = 0.433).
The trained model classi ers were then applied to the testing data set to evaluate how well they predict poor mental well-being. Table 1 shows the model evaluation on testing data. With tuning parameters using repeatedcv method set at 10-folded cross-validation re-sampling repeated with 5 iterations showed that random forest clearly outperforms other classi ers (AUC=0.966). Model optimization using bagging (AUC=0.677) did not improve the selected random forest classi er. However, boosting (AUC=0.959) also performed similarly. Adding complementary unstructured-text information to the structured data elements did not signi cantly improve the performance of the random forest classi er (AUC=0.951). Such data augmentation adds more than 20 text-derived structured data elements to the standard survey features, which runs counter to the principle of parsimony. As the top performing classi er (random forest) represents an implicit (black box) model, Figure 4 illustrates an example of a single decision tree from the aggregate forest model that illustrates one explicit classi cation strategy for predicting poor mental well-being.

Main ndings
Mental well-being is an important indicator for mental health and this study developed prediction models using machine learning classi ers to predict mental well-being status among university students in ASEAN. This is particularly important, with no end in sight of the pandemic due to the emergent of different COVID-19 variants, social and physical distancing and isolation will be the way of life in the new normal that indubitably increase risk of developing serious mental illness in the future.
In the present study, the prediction models that produced high accuracy were achieved by random forest, random forest with text predictors, and adaptive boosting. Models using the additional text-derived features did not improve the model performance. This could be explained by the non-speci c nature of the open-ended survey question regarding physical health. Future studies could examine this aspect closely using more sophisticated methods of natural language processing (NLP), deep learning, and language syntax techniques to transform the unstructured text into quantitative data elements (Dinov, 2018). Such advanced machine learning strategies could enhance the contribution of the textual content in the forecasting of mental well-being. Nevertheless, studies on mental well-being using various classi cation techniques among samples in the Asian population are scarce. A few studies also reported that random forest or decision tree-based algorithms were some of the best techniques for forecasting mental health (

Limitations
The key strength of the present study is the use of large data set from multiple sites in South-East Asia to objectively and rank order signi cant variables of predicting mental well-being. The utilization of more than one feature selection methods, machine learning classi ers, and model evaluation metrics reduces the errors and biases of the results. Despite the strengths, several limitations should be noted for the current study. The data was collected using a cross-sectional survey design and is not able to draw causal inferences. Even though the survey consisted of items from widely used, validated questionnaires, self-reporting bias and likelihood of under-reporting are still present. However, this survey took place during the COVID-19 pandemic and would continue to complement and bene t future studies postpandemic.

Recommendations and future work
The ASEAN university network (AUN) has a central role in coordinating effective use of available digital infrastructure and research activities, which currently focused on particular individual universities. Here are three-point recommendations, from general to speci c, that provides feasible and practical solutions particularly for its health promotion network. Firstly, the continuity of data collection to train the machine learning algorithms are vital. An appropriate central data collection, processing, and analytical centre using existing infrastructure, particularly from resource-rich AUN member universities, need to be identi ed and established. With agreed upon data collection, storage, and sharing policies and regulations, it encourages active participation and contribution of university students' health data, even from resource deprived institutions. The systematic and long-term collection of health data is crucial for answering critical research inquiry, encourage innovative interventions, and experimentation of data created, used, and shared among higher education institutions.
Secondly, the selected data centre that has been setup could be populated with data from affordable technological survey tools, which provide practical, cost-effective, and long-term avenue for monitoring overall health of university students including mental well-being (Woodward et al. 2020). Traditional assessment tools took considerable amount of time to complete due to collection of too many features and collection period over long period before they provide useful insights (Areán, Ly, and Andersson 2016; Woodward et al. 2020). The use of technological survey such as mHealth mobile apps or integration into student evaluation tools that collects only few salient features over short period of time, could have potential for more effective assessment and monitoring. Working in conjunction with the diagnostic assessment by university counselors/ psychiatrists/ care professionals, the collected data could provide precise early detection and early intervention (Lederman et al. 2014;Woodward et al. 2020). However, ethical collection and storage of student information will necessarily require additional steps to ensure privacy and security as well as aligning with code of professional practice such as maintaining patientphysician con dentiality (Aguilera 2015).
Finally, an agreed upon schema of data that will be collected and displayed as active indicators for monitoring and assessment of the current state of university students' health and well-being status on a digital dashboard that is accessible by students and stakeholders alike. The dashboard will provide a sense of digital connectiveness among students in the ASEAN university network -the fundamental goal of ASEAN. The data schema could be updated from time to time. At the moment, this study has provided a rank ordered importance features of predicting mental well-being, which is essential particularly in resource deprived institutions to allocate resources appropriately and as necessary. Among the signi cant features, the top ve variables viz., body mass index, number of sports activity per week, grade point average (GPA), sedentary hours, and age. They appeared to have large effects on the accuracy of the classi ers and therefore should be prioritized when developing and implementing psychological wellbeing monitoring and promotion interventions. Future research is needed to continue improving precision of the prediction models. Given that level mental well-being varies widely by social and cultural context, country or culture-speci c models should be developed. In addition, resource rich institutions could consider reducing subjectivity of the data by incorporating salient objective measures such as real-time biofeedback information from physiological sensing technology that collects electroencephalogram (EEG) activity, electrocardiogram (ECG) uctuations, heart rate, breathing rate, temperature, speech Participants in this study gave their informed online consent by clicking "I agree to participate" before completing the survey.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and/or analysed during the current study are not publicly available due to restrictions on intellectual property regulations of the funding organization.

Competing interest
All authors do not have con ict of interest to declare. The sponsors were not involved with or had any roles regarding the conduct of this study and publication.  Variable importance plots of random forest classi cation of salient features (top) of mental well-being using accuracy (left) and Gini index (right) as evaluation metrics.

Figure 3
Evaluation of performance of trained machine learning classi ers. Figure 4