A multi-level scenario-based predictive analytics framework to model community mental health—built environment nexus

Understanding the complex community mental health—built environment nexus remains vital to building healthy and sustainable cities. The recent commission report on mental health and sustainable development goals elevates concerns about understanding these complex relationships for informed-policy actions. However, there are limited analytical and methodological frameworks to unpack these relationships. Here, we develop a multi-level scenario-based predictive analytics framework (MSPAF) to address this limitation. We employ rigorously validated interpretable machine learning algorithms and scenario-based sensitivity analyses to explore the relationship between community mental health, and socio-economic/physical aspects of built environment across the US metropolitan areas. Our results suggest that declining socio-economic conditions (e.g., poverty, low income, unemployment) are significantly associated with increased reported mental health disorders. The results also contribute to the insurance-mental health debate by showing decreased access to public health insurance is associated with increase in reported mental disorders. Finally, adults report increased mental health disorders as travel costs or housing vacancies increase, but this does not hold across all the metropolitan areas, illustrating a mixed effect of built environment’s physical aspects on mental health. We conclude by highlighting future opportunities of incorporating other micro-/macro-level data into the MSPAF framework to examine the mental health—built environment nexus further.


Introduction
The socio-economic and physical aspects of a built environment significantly influence the community health and well-being of people. Physical aspects of a built environment include human-created infrastructure systems, such as transportation and housing infrastructure, and how they interact with the community to impact people and places [1]. Socio-economic aspects of the built environment, on the other hand, refer to economic, racial and ethnic, and relational conditions that may influence a person's ability to cope with stress. Studies have examined how such physical and socio-economic aspects of the built environment impact a community's overall health and well-being in terms of crime rates [2], educational performance, property values [3,4], and physical health outcomes such as obesity, heart disease, cancer, stroke, respiratory disease, and diabetes [5,6]. Understanding and predicting health outcomes as a function of physical and socio-economic aspects of the built environment is a significant focus among urban planning, public health and allied professionals. The SARS-COV-2 pandemic has further exacerbated the urgency to understand how these built environment factors impact specific health outcomes, especially the spread of such outcomes among vulnerable and often minority populations.
Mental health is one of the specific health outcomes impacted by the built environment's socio-economic and physical aspects. Mental illness or disorder contributes significantly to the global burden of disease, accounting for 32·4% of years lived with disability (YLDs) and 13·0% of disability-adjusted life-years (DALYs), globally [7]. As of 2016, global estimates revealed that mental (e.g., chronic depression, anxiety) and substance use disorders were among the largest contributors to disability in young adults; depressive and anxiety disorders were high among females, and substance use and autism spectrum disorders were high among males [8]. In the United States, suicide ideation in such models are simple and easier to interpret, they often fail to approximate the true function since real relationships are often not linear. On the other hand, non-parametric data-driven models do not make any unrealistic assumptions about the functional form, thereby better approximating the true functional form. However, flexibility comes at the cost of interpretability [1]. In this study, we assessed the predictive performance of eight interpretable machine learning models ranging from parametric to non-parametric-generalized linear model (GLM) [22], ridge regression (RR) [23], lasso regression (LR) [24], generalized additive model (GAM) [25], multivariate adaptive regression splines (MARS) [26], gradient boosting method [27], random forest (RF) [28], and Bayesian additive regression tree (BART) [29]. Leveraging an 80 − 20 randomized percentage holdout cross-validation technique, we estimated the generalization performances of the models and selected the model that outperformed all the other models in terms of both in-sample goodness-of-fit and out-of-sample predictive accuracy (see Section 4.1). Our results indicate that BART outperformed all the other models, and thus the following statistical inferencing is conducted using the the BART model (see Section 4.1 and Supplementary Information for BART algorithm details and model comparisons).

Key factors attributing to socio-economic and physical aspects of built environment
We leverage the variable importance plot (VIP) (see Supplementary Information) and the partial dependence plots (PDPs) to identify the key built environment predictors of community mental health, and evaluate their associated relationships (see Section 4.1 for mathematical details of the VIP and PDP). For our analysis, we also controlled for behavioral and underlying health conditions (e.g., smoking habit, principal components of underlying physical health conditions) that significantly influence mental health outcomes. Since this study focuses on the built environment factors, our subsequent discussions will focus on the built environment's physical and socio-economic aspects, which remain under-explored and are central in this paper. Partial dependence plots of the socio-economic aspects of built environment, considered in this study, are depicted in Fig 1. The partial dependence plot of poverty, shown in Fig 1a, indicates a strong positive correlation with community mental health. This shows that as the percentage of families below the poverty level increases from 10% to 80%, the percentage of adults (> 18 years) reporting poor mental health (mental health not good for ≥ 14 days) increases from 12.8% to 13.8% in the community on average. The narrow confidence interval (shaded grey area) indicates that the estimates are associated with less uncertainty. Other significant factors in this category include economic variables like median family income and change in the unemployment rate (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). The partial dependence plot of median family income (Fig 1b) shows a negative correlation. More specifically, we observe that as the median family income decreases from around $130 K to $20 K, the percentage of adults reporting mental health issues increases from 13.0% to 13.3% on average in a community. However, the wider confidence interval around the larger income values indicates that the estimated mental health outcomes for adults in the higher income range significantly vary. On the other hand, the relationship between unemployment changes and the percentage of adults reporting poor mental health is relatively uncertain Fig 1c. Besides the economic status of a community, access to medical insurance plays a major role in predicting the community mental health outcomes. The partial dependence plot of percentage of families with no health insurance (Fig 1d) shows that it has a strong positive correlation with mental health outcomes. It is observed that as the percentage of families with no health insurance in a community increases from 5% to 35%, the percentage of adults reporting poor mental health increases from 12% to 14.5%. The narrow confidence interval indicates lower uncertainty and variations in the estimated relationship across the US metropolitans. Our results also suggest that the insurance type plays a major role in influencing community mental health. The partial dependence plot of insurance type, representing the ratio of percentage of families with public health insurance to private health insurance, is plotted in Fig 1e. From the plot, we observe that as the proportion of families having public health insurance compared to that having private health insurance approximately doubles, the percentage of adults reporting poor mental health declines from 13.4% to 13.0% on average. The decreasing trend indicates that increased access to public health insurance is associated with decreased mental health disorders reported by adults in a community on average. Transportation cost (percentage of household income spent on transportation) and the average vacant properties, which constitute the built environment's physical aspect, were found to be the key predictors of community mental health. The partial dependence plot of transportation cost shows a positive correlation with poor mental health (Fig 2a). More specifically, we observe that as household transportation expenditures increase from 20% to 100% on average, the percentage of adults reporting poor mental health increases from 13.0% to 13.10% on average. Although this increment seems small, it should be noted that these numbers only indicate the national average of large metropolitan communities, with some US states experiencing much higher negative impacts than others. Our scenario based sensitivity analysis (refer to "Data and Methods" section) emphasizes such variations across the various metropolitan areas across the US states. However, the relationship between average vacancy and community mental health is quite uncertain (Fig 2b). We observe a slightly increasing trend in the percentage of adults reporting poor mental health as the average number of vacant properties increases; however, as it reaches the threshold point around 900 vacant properties in a community on average, the association flattens, i.e., community mental health becomes insensitive to changes in vacant properties.

Projected community mental health burden under plausible perturbations of built environment
scenarios into the future Having identified the key built environment factors associated with mental health outcomes, we employ scenario-based sensitivity analysis to understand how the mental health burden might change in the future. Plausible future scenarios are captured through perturbations of the socio-economic and physical aspects of the built environment. Traditionally, in modern epidemiological studies, the sensitivity and uncertainty analyses for any disease burden and risk factor estimates are conducted using different weighting mechanisms and discount rates techniques [30]. However, due to large degrees of uncertainties associated with value judgments and built environment conditions, the choice of discount rates is challenging and often cannot capture the wide range of future uncertainties [31]. To overcome these challenges, we 4/17 limited our analysis to statistical perturbation. The statistical perturbation consists of three significant steps described as follows: 1) we statistically perturbed the socio-economic and physical aspects of built environment, which may lead to increase (e.g., economic growth) or decrease (e.g., economic recession) in the independent variable under consideration; 2) following a general intuition, we hypothesized whether the increase (decrease) in the independent variable leads to better (worse) mental health outcomes and vice-versa; and, 3) we verify if our hypothesis holds good nationally or only for certain US states, leveraging our predictive model (see "Data and Methods" section for details on creating scenarios and list of hypotheses summarized in Table 1).
For illustration purpose, consider K represents the community mental health (response variable in our analysis), measured in terms of "% adults aged > 18 suffering from poor mental health for > 14 days". Hence, improvement in community mental health is depicted by a decrease in K, and deterioration of community mental health is observed when there is an increase in K. The predictor or independent variables under consideration for scenario-based sensitivity analysis are grouped into five categories, viz., (i) economic factors consisting of median household income, % of population below the poverty level and unemployment rate (ii) percentage of families with no health insurance, (iii) proportions of families having public insurance compared to private insurance, (iv) percentage of transportation cost spent as a % of household income, and (v) average number of vacant properties (as of 2014). The first three categories of independent variables capture the socio-economic characteristics of built environment and the last two categories represent the physical aspects of built environment. It is assumed that at a given time period in future, two hypothetical scenarios are considered in our study: 1) the mean of the distributions of socio-economic parameters (i.e., economic conditions, access to health insurance, and type of health insurance) and physical aspects (i.e., travel cost and housing vacancy) of built environment of a community shifts by +1 standard deviation from their historical mean, which represents the base case or as-is scenario; and, 2) the mean of those distributions shift by −1 standard deviation from their historical mean. Note, these statistical perturbations help to provide important insights regarding the trends of community mental health under plausible scenarios. However, it is to be noted that our framework is generalized enough that it can be used to predict how the community mental may change in the future, given the forecasted data on socio-economic aspects and built environment is available. In our study, we present the framework illustrating how future community health might be affected under various future scenarios. Furthermore, to understand whether such shifts results in a favorable outcome (in terms of improvement in the community mental health) or not, we compared the projections with the base case scenario of community mental health outcomes by constructing ten hypotheses (see Table 1 in "Data and Methods" section). Finally, we validated our hypotheses based on our model results and outcomes.

The socio-economic aspects of built environment
In general, it is observed that metropolitan areas of the eastern part of the US suffer severely from poor mental health issues. The variations of community mental health outcomes across the 50 states in the US under the three different scenarios-a) worst-and best-case scenarios of economic condition, b) lack of health insurance, and c) access to public health insurance are discussed in this section.
Economic condition The economic conditions capture the interplay of poverty, median household income, and unemployment rate experienced within a metropolitan area. Since economic condition comprises three variables, for simplicity the hypothetical scenario is constructed by perturbing all the three variables at the same time. It was hypothesized that during declining economic conditions, the expected percentage of people reporting poor mental health (K) would increase (hypothesis: H1), and the opposite effect would be observed during an increase in economic growth/boom (hypothesis: H2). The scenario-based analysis conducted herein supports these two hypotheses throughout all the states in the US. As depicted in Figure 3, when economic depression sets in (blue bars), all the states observe a deterioration in community mental health depicted by ∆K > 0. On the other hand, when the community experiences an economic boom (yellow bars), improvement in community mental health is observed (∆K < 0). The scenario analysis, depicted in Figure 3, shows that the percentage change (increase or decrease) in reported mental disorders among adults is more pronounced in metropolitan areas within states such as Alabama, Georgia, Indiana, Massachusetts, Kentucky, Michigan, Mississippi, Montana, Ohio, South Carolina, Tennessee, Utah, and Wisconsin. A recent systematic review identifies economic conditions as one of the social determinants of mental health [32]. These conditions are linked to poverty [32,12], income [33], and unemployment [34]. The scenario-based analysis confirms some of these earlier studies, but it also goes a step further to provide a metropolitan-level analysis of how increasing or declining economic conditions affect the mental health of adults in specific metropolitan areas in the US. Moreover, we also observe that community mental health is more sensitive to economic depression (longer blue bars for economic degradation) than economic boom (shorter yellow bars representing economic growth).  Unavailability of health insurance In this case, the variable under consideration is the percentage of families with no health insurance or unavailability of health insurance. It was hypothesized that an overall improvement of community mental health would be observed when the unavailability of mental health will decrease or more families will have some health insurance (hypothesis: H3). An opposite effect will be expected with increased unavailability of health insurance, i.e., more families being deprived of health insurance which may lead to worsening mental health problems (hypothesis: H4). The scenario-based analysis in Figure 4 suggests that these two hypotheses generally hold good for all the metropolitan areas across US considered in this study. In Figure 4, it is observed that when the unavailability of health insurance increases (yellow bars), the number of people reporting poor mental health in the community (K) increases compared to the baseline scenario. An opposite effect, i.e., decrease in the number of people reporting poor mental health is observed for the scenario depicting less unavailability of health insurance (blue bars). However, a change (increase or decrease) in access to health insurance results in a minimum shift in the percentage of adults reporting mental disorders in the metropolitan areas of states such as Montana, North Dakota, South Dakota, and Vermont. Studies show that states providing access to mental health insurance minimize suicide rates [35], but another study found that Australia's mental health insurance under its "Better Access scheme" has had no significant effect on the mental health of Australians [36]. The underlying logic follows that increasing access to health insurance and mental health insurance specifically will likely increase the likelihood of people accessing mental healthcare, which will ultimately improve mental health outcomes. The scenario-based analysis results contribute to this debate by explicitly looking at how the lack of access to health insurance in general,

6/17
not only mental health insurance, may contribute to adults' stress and mental health outcomes. In other words, not having health insurance as a form of socio-economic deprivation, similar to experiencing low income and poverty, is a pressure point in increasing the percentage of adults reporting mental disorders within metropolitan areas.  Access to public health insurance Building on the health insurance scenario analysis, it was hypothesized that there are differentials in how access to different types of health insurance, i.e., public vs. private health insurance impact mental health outcomes. Specifically, it was hypothesized that with decreased access to public health insurance (i.e., a less proportion of people with access to public health insurance), the overall mental health of the community would worsen, leading to an increase in K (hypothesis: H5). The opposite effect of improving mental health would be observed with increased access to public health insurance or decreased access to private health insurance within metropolitan areas in the states (hypothesis: H6). However, these hypotheses were minimally supported in the scenario results across the states, as depicted in Figure 5. Although the trend of increasing or decreasing mental health outcomes is found to be consistent across all the states having ∆K < 0 for all the states when access to public health increases (yellow bars) and ∆K > 0 for all the states when access to public health decreases (blue bars), the magnitude of such deviations significantlt varies, ranging between −0.5% to +1.0%. This indicates that the overall mental health outcomes across the US's metropolitan areas are not much sensitive to the type of health insurance that their community has access to. However, the hypothesis of decreasing K with increasing access to public health insurance (hypothesis: H6) was overwhelmingly supported for the metropolitan areas in Vermont. For context, Vermont was the first state in the US to adopt legislation for universal health 7/17 care for its residents in 2011, making health insurance and healthcare publicly available to many residents, including free preventative services such as mental health and substance-based disorder services [37].  Figure 5. Access to public health insurance scenarios for % of adults aged > 18 years reporting poor mental health for > 14 days (K): (a) ∆κ is plotted as the bars and K for base line scenario is plotted as gray scale intensity on the US map; and for (b) ∆K is plotted.

Travel cost
The scenario-based sensitivity analysis for travel cost-measured by the % of transportation cost spent as a % of household income-illustrates the extent to which the commuting cost within sprawling metropolitan areas can impact community mental health outcomes. The hypotheses explored here were that the percentage of adults reporting mental disorders (K) would decrease with decreasing travel cost (hypothesis: H7) and that, K would increase with increasing travel cost (hypothesis: H8). These two hypotheses do not hold for some metropolitan areas in some states. For instance, increased reported mental health issues (K), as travel cost increases (hypothesis: H8) do not hold in metropolitan areas within states such as Colorado, Delaware, Maine, Minnesota, Nebraska, North Dakota, Utah, Vermont, and Washington. In these metropolitan areas, there is a decrease in mental disorders reported by adults as travel costs increase. For metropolitan areas in Washington DC, Maryland, and New Hampshire, an increase or decrease in travel costs has the same effect, i.e., increase in the percentage of adults reporting mental disorders. The mixed results of how travel costs impact mental health outcomes signal the need for an in-depth and granular inquiry into how the built environment's physical aspect impacts mental health outcomes in the cities.
Housing vacancy In the housing vacancy scenario, the hypotheses explored in this study looked at the extent to which neighborhood  decline impacts mental health outcomes in the metropolitan areas across the 50 states in the US. Specifically, it was hypothesized that a decrease in housing vacancy would lead to a decline in adults reporting poor mental health in metropolitan areas or K (hypothesis: H9). On the other hand, an increase in vacant properties or a decline in neighborhood size is expected to increase the percentage of adults reporting mental disorders (hypothesis: H10).
Overall, it has been identified that the intuition-based hypotheses did not hold good in our study. As depicted in Figure 7, when the vacancy is increasing (yellow bars), most states experience an improvement or deterioration in community mental health (K). On the other hand, when a community is expanding, attributed by decreased vacancy (blue bars), most of the states see an increase in K. This result may be an outcome of the "Behavioral Sink" phenomenon [38,39]. However, for some states the reverse phenomenon has been observed. When the vacancy is decreasing, metropolitan areas of some states (Alabama, Florida, Montana, New Mexico, North Carolina, Ohio, Oregon, Washington, and Wyoming) see an improvement in mental health depicted by ∆K < 0. On the other hand, when the vacancy is increasing, some states' metropolitan areas (Arizona, Colorado, Nevada, and New Jersey) see a deterioration of mental health with ∆K > 0. The mixed results from this scenario support the earlier observation that there is more to the story when parsing the impacts of the built environment on mental health outcomes. More granular level analysis complemented by macro-level levels might better help unpack how the built environment's physical conditions at the household, neighborhood, city, and county levels may impact an individual's mental health.
As discussed, the results depicting sensitivity of community mental health (K) to housing vacancy is highly varied across the US states. Hence, it is difficult to classify whether a particular scenario of housing vacancy perturbation leads  ∆K is plotted.
to the best case scenario indicating that the community level mental health improves unanimously across the US; or if the perturbation leads to a worst-case scenario where the community mental health deteriorates across the nation. To address this, we aggregate the individual state-wide results into the mean value of the response variable K (for detailed results, see Supplementary Information). If the mean value of ∆K = K scenario under consideration − K base case scenario is (+)ve, then the perturbation scenario under consideration is depicted as the worst-case scenario. Similarly, if the mean ∆K is found to be (−)ve, then there is a decline in the percentage of the population reporting mental health issues, so the scenario is termed as a best-case scenario.

Discussion
This paper employs a library of supervised interpretable machine learning models and scenario-based sensitivity analysis to explore the relationship between adults' mental health, and the socio-economic and physical aspects of the built environment in the US's largest metropolitan areas. The interpretable machine learning models and scenario-based analysis elicit three essential issues for discussion and serve as crucial conversation points for policy discourses and future research. First, the built environment's socio-economic aspects are vital to understanding the social determinants of adults' mental health in metropolitan communities across the US. The interpretable machine learning models suggest that increasing poverty and unemployment levels are associated with a significant increase in adults reporting mental health disorders. The scenario-based analysis supports this finding by showing that declining economic conditions within metropolitan areas are expected to increase the number of adults reporting mental disorders, and this is pronounced in metropolitan areas within states such as Georgia, Massachusetts, Kentucky, Michigan, Ohio, and Wisconsin. A number of studies have long observed the impact of poor economic conditions, manifesting in issues such as poverty, low-income, and unemployment, on mental health [40,12,34,33]. This paper provides evidence to support existing findings across multiple metropolitan areas, and it allows for both within and across the states comparisons for policy conversations around how to center discussions on community mental health within economic policies at local, state, and national levels.
Second, the results from both the interpretable machine learning models and scenario-based analysis provide an opening to conversations around health insurance and mental health. The literature debate focuses on whether or not access to mental health insurance schemes improves the likelihood of a person accessing mental health services, which leads to improved mental health outcomes. While the evidence seems inconclusive based on contradictory studies across countries [35,36,41], the partial dependence plots of the health insurance variables show that there is a strong increasing trend between lack of health insurance and adults reporting mental health disorders in metropolitan areas across states in the US. This analysis goes a step further to show that decreased access to public health insurance is linked to increased mental disorders reported. The scenario-based analysis showed Vermont, the first state to adopt universal healthcare, as an outlier case. Increased access to public health insurance was linked to a significant decrease in mental health disorders reported within Vermont's largest metropolitan area. This finding does not necessarily suggest the need for universal healthcare. At the very least, it calls for an in-depth research inquiry and policy discourses around how the lack of health insurance, a critical socio-economic need, can impact a person's mental health.
Finally, the physical aspects of the built environment are found to have mixed impacts on community mental health. Adults report increased mental health disorders as travel costs increase in some metropolitan areas, but this does not hold across all the metropolitan areas in our data sample. Similarly, mental disorders reported increased as housing vacancy increase in some metropolitan areas, but this also does not hold in all metropolitan areas. The commissioning about the sustainable development goals (SDGs) and mental health rightly observes the need to understand "how neighborhood domain" impact the community mental health. Specifically, it indicates that, besides biological markers, the decline in neighborhood conditions should also be considered as one of the important social determinants of mental health [12]. This paper adds to some of the existing studies [42,43], supporting concerns raised in this commissioning report. More importantly, it also adds to the literature on how urbanization (e.g., increasing sprawl and associated commuting costs) impact mood disorders [44]. The mixed results call for caution when discussing how the built environment's physical aspects impact community mental health. More importantly, in the future our proposed multi-level scenariobased predictive analytics framework (MSPAF), leveraging data-driven interpretable machine learning algorithms, can consider other multiple micro-and macro-level physical features of the built environment to examine how these different physical aspects impact mental health outcomes in metropolitan areas. The national scale of this paper's analysis and limitations in obtaining data for all metropolitan areas at this scale, made it impossible to include other variables. This paper provides a robust data-driven methodological framework and evidentiary basis to examine the community mental health-built environment nexus. Future research can build on this framework and evidence by incorporating other datasets on the socio-economic and physical aspects of the built environment and their impacts on mental health, not only in metropolitan areas, but also in rural areas.

Data and Methods
Data collection and pre-processing In this study, we conducted a nation-level study for all the metropolitan regions in 50 states across the US. We obtained and aggregated data for public health characteristics, built environment features, and socio-economic conditions from multiple sources. From the US Centers for Disease Control and Prevention (CDC), information about the health-related variables like, mental health conditions, pre-clinical conditions and behavioral factors for the adults aged 18 or above are collected at a census tract level for the year 2014. The housing vacancy data for the year 2014 is obtained from US Housing and Urban Development (HUD) at a census tract level. Finally, the socio-economic characteristics like, race, income, unemployment rate, marital status, education level, and access to health insurance information are obtained for the census tract and metropolitan levels from the American Community Survey (ACS) for the years 2011 to 2015. The travel cost data is obtained from the US Department of Housing and Urban Development Location Affordability Index (LAI), which uses data on housing costs from the American Community Survey (ACS) and estimates transportation costs based on land use mix, commute patterns, and socio-economic information. The data from the multiple sources are matched and aggregated to create the final data set. In our analysis, the percentage of participants who were adults aged 18 years or more and reported that they were suffering from mental health issues for more than 14 days in the last month is considered as the response variable. The other variables on health characteristics, built environment features and socio-economic characteristics are considered as the predictors or independent variables. Out of all the categories of the predictor variables, the pre-clinical health condition related variables are found to be highly correlated.
To consider the effect of all the pre-clinical variables while having a bound on the number of dimensions, we performed principal component analysis (PCA) (see Supplementary Information). PCA is an unsupervised learning method that uses orthogonal transformations to convert a multidimensional data set of observations of possibly correlated variables into a new multidimensional data set of values of linearly uncorrelated variables [45]. PCA is useful for dimension reduction purpose, because a fewer orthogonal components of the transformed data can capture most of the variance of the original data. In this research, we considered three principal components as they were able to express 92% variability of the observations of the original 12 pre-clinical health related variables taken into consideration.

Overview of Statistical Learning
Given a dataset with a response variable Y and a set of p predictor variables X = X 1 , X 2 , ..., X p , interpretable machine learning algorithms try to identify the function f that relates the predictors with the response variable as, Y = f (X) + ε [46]. Here, ε is the irreducible error term that arises from unobserved heterogeneity from the data and is normally distributed N(µ, σ 2 ) where, µ = mean and σ 2 = variance [47]. Using the training data which is a known set of data points, a model is trained to estimate f and using an unknown set of data points known as test data, the performance of the model is evaluated. In this study, we implemented a suite of interpretable machine learning models, which can be crudely classified into three categories, viz. i) parametric models, ii) semi-parametric models and iii) non-parametric models . In parametric models, the problem of estimating the unknown function f gets reduced to estimating a set of parameters through which the model is represented. On the other hand, the non parametric models make no assumption about the unknown function. A semi-parametric model is a hybrid of parametric and non-parametric models. More specifically, we implemented the following algorithms-1. Parametric Models: Generalized Linear Model [22], Ridge Regression [23] and Lasso Regression [24] 2. Semi-parametric Models: Generalized Additive Model [25], Multi Adaptive Regression Splines [26], 3. Non-parametric Models: Random Forest [28] and Gradient Boosting Method [27] Bayesian Additive Regression trees [29] To achieve optimal generalization performance for an interpretable machine learning model, it's complexity should be controlled using the bias-variance trade off technique. Cross validation is the most widely used technique for balancing models' bias and variance. In this study, the best model was selected using an 80 − 20 randomized percentage holdout cross validation technique, where the models were trained on randomly selected 80% of the data set and the remaining 20% of the data set were used as holdout set to assess the out-of-sample predictive performance of the models. This technique is repeated 30 times to ensure each data point of the original data set is used at least once for training the models. The metrics used to compare the performances of the models are R 2 , RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error). This method of model selection is a well-established method and has been used in various previous studies [14,48,49,50,51,52,53,54]. In the following section, we described the Bayesian Additive Regression Trees, which is the best model found in our analysis, and leave the discussion on other methods in the Supplementary Information.

Bayesian Additive Regression Trees
Bayesian Additive Regression Tree (BART) is a sum-of-trees model where the outputs from m 'small' decision trees are aggregated with an underlying Bayesian probability model to generate the response function [29,55]. Mathematically, BART can be expressed as, There are m distinct regression trees T j with their terminal node parameters M j . The function g(X; T j , M j ) assigns the leaf node parameters M of tree T to the independent variables X for all m trees. The main difference of BART compared to other tree ensemble methods is that, BART develops on an underlying Bayesian probability model and

12/17
consists of a prior, likelihood and posterior probability space. The prior terms are responsible for the tree structure, model complexity, regularization and incorporating expert knowledge in the model. Generally, the Metropolis-Hastings algorithm is used to generate draws from the posterior probability space.

Model inference
Although the non-parametric models outperform parametric models in terms of predictive performance, the improved predictability comes at the cost of reduced interpretability. However, statistical inferencing can be conducted for the non parametric models using the variable importance ranking and partial dependence plots (PDPs) [46,29,55]. The importance of the variables are depicted by the inclusion proportion of the variables which denote the number of times a particular variable has been selected to develop the model. To understand how a particular predictor variable affect the response variable, the PDPs are used. The PDP is estimated as follows: Here, p is the statistical response surface; n denotes the number of observations, x − j represents all the independent variables except x j .

Scenario-based sensitivity analysis
The scenario-based sensitivity analysis implemented in this study involves a systematic approach of statistical simulation. First, the independent variable or the set of independent variables for which the scenario is to be created are selected. For each state, the best parametric distribution that fits the sample data of independent variables (predictors) is identified using the Chi-squared goodness of fit and method of moments for parameter estimation [56]. After the best distribution(s) of the predictors(s) is identified, for each state random sampling is implemented to obtain the base case values (BV ). Then, according to the hypothesized scenario, the mean of the historical parametric distribution of the variable of interest is perturbed. Then, using random sampling, new values are obtained from the new distribution with the shifted mean, which corresponds to the hypothesized scenario. The original values of the variable are then substituted by the new values corresponding to the scenario while keeping all the other variables same as original. Following this, using the selected statistical learning model, the percentage of population reporting poor mental health are predicted for the new data set. Finally, we identify whether any significant nation-level and/or state-level increase or decrease in the response (compared to the original response variable) is observed or not. As described before, in this paper, we considered five categories of variables representing socio-economic and physical aspects of a built environment: (i) the economic status of a community characterized by incidence of poverty, unemployment rate and household income, (ii) % of families in a community with no health insurance, (iii) access to public health insurance, (iv) transport cost expressed as a % of income spent towards transportation, and (v) housing vacancy. The mean of each variable's historical distribution is perturbed 1σ (standard deviation) of the variable. Corresponding to these sets of variables, ten hypotheses are created (see Table 1).
For each category of the independent variables, we validate our hypotheses by predicting K scenario of hypothesis denoting the "% adults aged > 18 suffering from poor mental health for > 14 days" under the specific scenario of independent variable perturbation (e.g., economic depression) considered for a particular hypothesis (e.g., H1). The change in the response corresponding to this perturbed condition is captured by, ∆K = K scenario of hypothesis − K base case scenario To normalize the effect of the base line response value, we consider ∆κ which captures the projected change in % of adults aged > 18 years reporting poor mental health for > 14 days and expressed as a percentage of the baseline estimates. ∆κ = K scenario of hypothesis − K base case scenario K base case scenario * 100% In Figures 3, 4, 5, 6 and 7, the output of the sensitivity analysis has been depicted. The ∆K is plotted in part (b) of each figure, representing the exact projected change in K. For each figure, in part (a), the ∆κ is plotted as the bars representing the projected change expressed as a percentage of the baseline estimate with the underlying K base case scenario depicted in the map as gray scale intensities. In the subsequent sections, we discuss the result of the sensitivity of K to different categories of independent variables.