Association between the urban environment and chronic disease to identify communities at risk

Background : With increasing urbanisation rates, assessments must be made on the impact of the built environment on the health of populations. As the bulk of healthcare expenditure in developed countries is borne by the elderly through chronic disease management and treatment costs, intervening using the built environment can have lasting population-wide effects. Methods : Using two cohort studies for training and validation, we quantified each individual’s local context based on their residential address and derived geographical exposures adapted from the International Physical Activity and the Environment Network guidelines. Bayesian inference was used to develop a regression model that examines the impacts of the geographical exposures and predicts mean body mass index and prevalence of type 2 diabetes mellitus, acute myocardial infarction and stroke by communities. Results : The distance to the nearest retail outlet was found to be negatively associated with body mass index. Our prediction model shows good accuracy (AUC > 0.75) for predicting type 2 diabetes mellitus, acute myocardial infarction and stroke. National-level maps were generated that predict the health of communities by mean body mass index and overall chronic disease risk. Conclusions : The predictive model has the ability to predict on a macro scale the overall health of a community. Understanding the geospatial distribution of chronic disease risk allows for evidence-based policymaking with urban–specific interventions that improve overall population health.


Introduction
Chronic diseases are the main cause of mortality and morbidity in developed countries, and constitute a large proportion of the healthcare burden 1 . The direct medical costs of various chronic diseases have been estimated to be US$3,200-4,700 for each type II diabetes mellitus (DM) case annually 2 and US$14,000 and US$25,000 for each episode of acute myocardial infarction (AMI) and acute ischaemic stroke respectively 3 . Globally, rising life expectancy has resulted in aging populations in countries such as Japan and Taiwan. From 2006 to 2015, the DM prevalence in Japan increased from 12.3% to 19.5% for men and 8.2% to 9.2% for women 4 . In Taiwan, the prevalence of multiple chronic conditions increased from 9.6% to 17.1% from 2000 to 2010 5 . The prevention of chronic diseases is therefore of a priority to mitigate the rising healthcare costs that accompanies societal aging. High body mass index (BMI) is associated with increased chronic disease progression and mortality 6 . In Singapore, an island city state, obesity and DM incidence have been increasing since the 1990s, alongside other chronic complications such as AMI and stroke. For residents aged 18-69 in the 1992 National Health Survey, the proportion of obese and overweight individuals was 5.1% and 26.2%. By 2017, these figures surged to 8.7% and 36.2% with DM prevalence also increasing to 8.6% from an estimated 8.3% in the last National Health Survey conducted in 2010 7 . From 2007 to 2016, the annual number of AMI and stroke cases increased from 6,817 to 10,728 8  Many epidemiological studies explore the relationship between chronic disease and its risk factors, many of which are sociodemographic or behavioural 10,11 . Fewer studies explore the relationship between the built environment, which is particularly relevant for high density cities such as the growing urban centres of Asia, and health outcomes. Greater walkability has previously been shown to be associated with lower obesity and DM risk (odds ratios of 0.83 [0.77-0.91] and 0.86 [0.75-0.99]) 12 . Lack of access to healthy food options can encourage a poor diet, which is a major risk factor for obesity, cardiac conditions and stroke 13 . In a US study, simulations of increased supermarket density to improve access to healthier food options resulted in a neighbourhood-wide reduction of 0.09 (0.02 -0.16) BMI points 14 . Those living in food deserts reported a hazard ratio of 1.44 (1.06-4 1.95) of developing an AMI 15 . Similarly, those with fast-food restaurants in their neighbourhood were at higher risk of developing stroke (odds ratios of 1.02 for males and 1.03 for females) in a Swedish study with over 4 million individuals 16 . These studies share consistent directions of effect and a plausible mechanism, suggesting that access to food is associated with obesity and various chronic conditions. These conclusions warrant some investigation as to whether the observed pattern holds true for highly urbanised and population dense countries such as Singapore or Hong Kong, which differ from many of the primarily western-centric studies both geographically and behaviourally 17 .
To answer these questions, this paper has two primary objectives. We aimed to quantify the effect of the urban environment on the health of the resident population of Singapore using a Bayesian approach, as well as predict the expected national risk. Using this methodology, we identify communities in Singapore with large chronic disease risk for the purpose of public health intervention planning which can support the national chronic disease management program set up in 2006 to prepare for the increasing healthcare needs of a rapidly aging population 18 . The identification of these high-risk communities allows for both early interventions to prevent chronic disease onset as well as the strategic placement of primary care facilities to service growing healthcare needs of the population.

Methods
The Singapore Multi-Ethnic Cohort (MEC) study is a closed cohort recruited from 2004 to 2010 that has a total of 14,465 adult individuals with oversampling of ethnic Malays and Indians 19 , Singapore's two main ethnic minorities. The intentional oversampling of ethnic minorities allowed for more accurate risk estimates for DM, AMI and stroke, as well as ethnic-specific BMI distributions. The Community Health Study (CHS) focused on recruiting specifically from two mature estates-Queenstown and Bukit Panjang-with a predominantly older demographic (n = 7844, 3.3% of total population in the area). Information on established risk factors such as age, ethnicity, gender, smoking status and dwelling type were collected for adjustment in the final models [20][21][22] . Dwelling type as a covariate together with the individual's postal code was further post-processed to derive house price (Supplementary information 1). These two cohorts were selected to be the training and validation datasets as detailed socio-demographic information and health outcomes of interest were available, and the questionnaires were standardised during both time periods to prevent misclassification of risk factors and outcomes. For this study, a subset of the MEC (n = 10,499) and CHS (n = 5,275) was selected that had complete data on all risk factors for the training and validation datasets, respectively.

Derivation of exposure explanatory variables
Our procedure for quantifying the built environment was adapted from the International Physical Activity and Environment Network (IPEN) adult study, in particular their definitions for land use and environmental attributes in an urban setting 23,24 . For each individual in the cohort, we used a 500meter radial buffer based on their residential postal code to define their urban environment; Singapore being a compact city state with high connectivity, there are small discrepancies between radial buffers and network buffers 25 . Data on land use type and facilities are obtained from multiple government agencies, primarily from the Urban Redevelopment Authority 26 , National Parks Board, Land Transport Authority and National Environment Agency. Polygonal maps of land use for residential, retail, civic/institutional, public parks and private recreational purposes were generated, which were further supplemented with the locations of public transport access points and food establishments. IPEN guidelines recommend a few different metrics to quantify the urban environment, of which we selected area of a specific land use type within the 500-meter radial buffer and distance to the nearest establishment within a class of premises due to interpretability. IPEN land use types within an individual's urban environment that were relevant to this study included residential, retail, civic/institutional, private recreation and public parks. For the second set of metrics, we measured the distance from the individual's residential address to the nearest retail, civic/institutional, private recreation facility, public park, as well as public transport access point and food establishment.

Bayesian inference of health outcomes
Two Bayesian hierarchical models were developed to investigate the relationships between geography and health. The first model elucidates the relationship between geographical exposures and BMI as a continuous variable while controlling for sociodemographic variables. Increased BMI is associated with the higher risk of developing DM, AMI and stroke [27][28][29] , therefore the second model includes BMI as an additional variable to control for while simultaneously modelling DM, AMI and stroke as binary outcomes, reducing the need to calibrate several models with highly correlated coefficients. In our first model we let denote the value of individual 's BMI, and assume where the mean is modelled to be a linear combination of predictors, In a model with explanatory variables with denoting the coefficients and , the value of the m th explanatory variable for the i th individual. For the second model, we let , denote the value of each binary outcome for each individual , where , is drawn from a Bernoulli distribution with probability , . In this model, , is defined as � with explanatory variables, denoting the coefficients for each explanatory variable and , representing the value for each explanatory variable. Each outcome has a specific intercept , with implying identical regression coefficients for DM, AMI and stroke. A more general structure was initially used before settling on a special case with common effect sizes. JAGS, a Gibbs sampler that uses a Markov Chain Monte Carlo algorithm, was used to sample all coefficients from a posterior distribution 30 in order to calculate 95% confidence intervals (CIs) for all parameters. Confidence intervals were Agresti-Coull confidence intervals 31 in order to obtain uncertainty estimates for minizones with 0% actual or expected prevalence.

Model validation
Receiver operating characteristic (ROC) curves were used to evaluate overall predictive accuracy for the training and validation datasets. Administratively, Singapore is divided into 55 planning areas with over 200 subzones. From the perspective of identifying at-risk communities, subzones were too coarse, so we further sub-divided the subzones into smaller areas termed "minizones". Each minizone was created to account for property type and was restricted from spanning multiple subzones and electoral boundaries as policies tend to be implemented within political constituency boundaries. The method used to generate these minizones is described in detail in Supplementary information 2. In this way, the map of Singapore was segregated into 954 minizones, of which 662 were populated by at least one individual from the MEC training dataset. Individuals from the validation dataset were intentionally more spatially concentrated, occupying 33 out of 954 minizones. To validate our models, we predicted the mean BMI and the prevalence of individuals with DM, AMI and stroke in each minizone for the CHS validation dataset, excluding minizones with < 10 people as they would give unstable estimates. These metrics were then evaluated using Hosmer-Lemeshow plots and subsequently visualised.

National-level risk prediction maps
Using population estimates from the publicly available 2015 General Household Survey 32 , we generated a synthetic population with the attributes of race, gender and ethnicity. From the same survey, aggregate tables with geographical information on planning area, subzone and dwelling type were used to construct a hill-climbing algorithm that imputed geographical location and dwelling type , by minizone in Singapore. To further investigate the impact of the urban environment on health, we adjusted for all demographic covariates except for house price, due to its dual nature as both a demographic and geographic factor, weighting the respective demographic coefficient with the known distribution of the population from the 2015 General Household Survey, and predicted the mean BMI and chronic disease risk attributable to geographical features.

Software
All analyses were performed in Microsoft R Open 3.5.2, using JAGS 4.3.0. Spatial analyses and visualisation was done using QGIS 3.6.2 30,33,34 . Table 1 describes the characteristics of the MEC, our training dataset and the CHS, our validation dataset. Participants in the MEC tended to be younger on average with a mean age of 46 as opposed to the CHS with a mean age of 55 which was expected since the CHS participants came from mature estates. The proportion of males is comparable at 43% for the MEC and 41% for the CHS.

Results
Oversampling in the MEC also resulted in a larger proportion of Malays and Indians, 26% and 28% respectively than the CHS at 13% and 11%, which follows the national ethnic distribution of Singapore more closely. There were significant differences in the average house price between both datasets. The mature estates that the CHS participants come from predominantly consist of public housing which tends to be lower in value than the mix of private and public housing that MEC participants reside in, evidenced by the mean housing price of S$430,000 for the CHS against $630,000 for the MEC. In terms of geography, there were fewer significant differences given the urban density of Singapore. Land use was comparable for both datasets, as well as the distance to the nearest land use type, with the exception of the distances to the nearest private recreation and parks.
CHS participants were on average 450m from the nearest private recreation outlet compared to the MEC participants at 820m, almost a difference of twofold. For health outcomes, there were no major differences in the mean and spread of BMI. However, for chronic diseases, CHS participants were slightly unhealthier, likely due to being almost a decade older on average than MEC participants. DM and AMI prevalence of CHS participants were 12% and 4% in contrast with the MEC at 10% and 2%, while stroke prevalence was equal at 1%.
Investigating the relationship between geography and health ( Table 2), BMI demonstrates a significant negative association with the distance to the nearest retail outlet, implying that individuals who live closer to retail outlets tend to have increased BMI. None of the geographical covariates were associated with chronic disease risk, although BMI as a risk factor was found to be highly and positively associated with chronic disease risk, suggesting that geography influences health through BMI as a proxy even after adjusting for demographics. Almost all demographic variables displayed prominent associations with BMI and chronic disease risk. The direction of the associations was consistent for all of the demographic variables save for gender.

Community and national-level predictions
To evaluate the predictive ability of the model for DM, AMI and stroke, receiver operating characteristic (ROC) curves were plotted for the MEC (training) and CHS (validation) datasets ( Figure 1). In the training dataset, the area under the curve (AUC) was >0.8 for all three conditions. In Applying the same method to a synthetic population of Singapore and predicting mean BMI and chronic disease risk yields clear clusters of healthy and unhealthy communities (Figure 3). Many of the areas predicted to have increased BMI and higher chronic disease risk correspond to mature estates in Singapore with a large proportion of elderly. In particular, southern and eastern parts of Singapore have higher expected burdens that may warrant intervention. Although demographics is a stronger driver of ill health, geography plays an additional subtler role, with the majority of demography-adjusted relative risks to be within 0.95-1.05, save for a few outlier areas. Some examples are the southernmost area of Singapore, the island of Sentosa which has a few residential properties for the extremely affluent, and Changi village, the small coastal area at the easternmost point of Singapore. Due to the non-conventional land use in the immediate proximity of those areas, with large amounts of land dedicated to private recreation and almost no other land use type, we observe contrasting effects for BMI and health despite the causal link. Another outlier is the area of Lim Chu Kang, the large area in the northwest of Singapore, which is primarily dedicated to agricultural purposes with almost no other land use type, leading to the estimates being mainly driven by demographics. Nevertheless, such outliers are few, with most of our results being consistent. These demography-adjusted maps therefore provide useful information as to the extent that geographic factors impact health through a myriad of means across Singapore, such as proximity to food and access to public and private facilities.

Discussion
The relationship between geography and health has been established in scientific literature 35 .
Numerous features of the environment, both natural and artificial, have been shown to be associated with health outcomes for instance green spaces have been found to be positively associated with perceived community health 36 and increased urban mobility with increased physical activity levels 37 .
In our study, we found that individuals who lived further away from retail premises were expected to have lower BMI with an expected decrease of -0.95 points per kilometre (95% CI: [-1.71 to -0.23]).
One plausible mechanism responsible for this observation is that walking or taking public transport to access retail outlets results in increased physical activity levels with more distant retail outlets requiring more commuting time and therefore more physical activity, as prior research shows that public transport usage was positively associated with physical activity levels through increased walking during commuting 38,39 . This hypothesis is also supported by our reported effect size for the distance to nearest public transport access point on BMI (-1.36, 95% CI: -2.82 to 0.01), which despite being marginally insignificant, is suggestive that individuals who stay further away from public transport have lower BMI. Despite the inclusion of geographical covariates previously shown to have a positive impact on health such as private recreation facilities and parks in the analysis 40,41 , in our study no significant effect was detected, we hypothesise that park use may be associated with BMI, but that presence of these facilities does not always translate into usage. Another prominent geographical feature associated with health is access to healthy food 42 . In the National Nutrition Survey 2010, 61.1% of participants reported eating at least one meal at hawker centers, locales that serve a variety of affordable but unhealthy food, daily 17,43 .

Identification of high-risk communities
Applying our model to the population of Singapore, our national-level maps provide useful insights into the potential high-risk communities, which may be useful for evidence-driven policy making 35,44 .
From a health services research perspective, maps of chronic disease risk can potentially aid urban planners and regional health systems in ensuring that the facilities in an area are sufficient to support the healthcare needs of the community 45 . From 2009 to 2017, the proportion of those aged 60 and above suffering from three or more chronic conditions increased from 19.8% to 37.0% 46 , reflecting the growing chronic disease burden and prompting the need for adequate medical care. Urban planners choosing potential locations of homes for the elderly, dialysis centres or other primary care facilities would benefit from information on the spatial distribution of elderly and chronic disease risk.
Healthy Urban Planning is one of the themes of the World Health Organisation 47 ; the objectives of this theme are to promote healthy lifestyles, facilitate access to healthy food and increase accessibility to healthcare facilities through proper urban planning.
Our method was applied to Singapore but is generalisable to other populations to estimate the geospatial distribution of chronic disease. Globally, there is increased awareness as to the connection between urban planning and the health of communities. A case study of Dortmund, Germany identified significant associations between green space, air quality and socioeconomically disadvantaged communities 48 . In China, a link between population density and obesity was reported based on data from 450 communities over 30 provinces 49 . At the national level, the Canadian Urban Environmental Health Research Consortium was established in 2015 to consolidate a wealth of geospatial exposure data and cohort studies for healthcare research, including features such as transportation networks and land use 50 . With the appropriate combination of census and geospatial information, geospatial distributions of disease similar to ours can be generated to better inform policy at the city scale that many planners operate in.

Limitations
A major limitation of our study is the cross-sectional study design. Although based off cohort studies, our observations were derived from a cross-section of geographical features, so aetiological conclusions cannot be drawn. Differential usage of retail, recreational and food facilities was also not explored, due to constraints on collecting such detailed information on a large scale. Quantifying the usage of facilities could potentially improve model accuracy in predicting health outcomes, as lifestyle choices such as diet and exercise heavily influence chronic disease risk 13,51 . The land use category of retail is diverse and can positively or negatively affect health, illustrated by the distribution of healthy food outlets in Australia 52 and fried chicken stores in South Korea 53 . Potential improvements to this study would be obtaining the actual anonymised geographic distribution of the population with detailed health information as opposed to a synthetically reconstructed population for additional validation, although this was not possible due to issues of data privacy.

Conclusion
Geographical distributions of chronic disease risk are instrumental in understanding differences in community health at a national level and identifying high-risk communities. Understanding the joint role that demography and geography plays in impacting the health of communities helps in constructing predictive models of health, which are critical for evidence-based policy-making.
Healthy Urban Planning initiatives would be able to enhance decision-making using maps of chronic disease risk in order to improve population health through urban design. At a population level, urban interventions allowing for greater access to healthcare and healthier lifestyle options have a small effect across the whole population but may yet prove to be efficacious as urbanisation increases globally.

Ethics approval and consent to participate
This study was approved by the NUS-IRB, reference number S-19-121.

Consent for publication
The authors hereby consent to publication of the article.

Availability of data and materials
The data that support the findings of this study are available from the Saw Swee Hock School of Public Health but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data can however be requested from the following website https://blog.nus.edu.sg/sphs/.