Spatial Distribution of Cardio-Vascular Diseases in India


 Objective: Cardio-vascular Diseases (CVDs) are a leading cause of death and disease burden across the world, and the burden is only expected to increase as the population ages. The objective of this paper is to explore the patterns of CVD risk factors among women in the late reproductive ages (35-49 years) across 640 districts in India, and investigate the association between area-level socioeconomic factors and CVD risk patterns., using a nationally representative sample of 239,729 women aged 35–49 years from all 36 States/UTs under NFHS-4 (2015–16). Methods: Age-standardized prevalence of CVDs have been calculated, along with 95% CI among women in their late reproductive ages (35–49 years) in India. The spatial dependence and clustering of CVD burden has been examined by Moran's I indices, bivariate Local Indicator of Spatial Autocorrelation (LISA) cluster and significance maps. Ordinary Least Square (OLS) regression has been employed with CVD prevalence as the outcome variable. To consider for spatial dependence, Spatial Autoregressive (SAR) models have been fitted to the data. Diagnostic tests for spatial dependence have also been carried out to identify the best fit model. Results: Higher values of Moran's I imply high spatial autocorrelation in CVD among districts of India. Smoking, alcohol consumption, hailing from a Scheduled Caste background, more than 10 years of schooling, as well as urban places of residence appeared as significant correlates of CVD prevalence in the country. The spatial error model and the spatial lag model are a marked improvement over the OLS model; among the two, the spatial error model emerging to be the most improved of the lot. Conclusions: A broader course of policy action relating to social determinants can be a particularly effective way of CVD risk addressal. Social policy interventions related to health like reduction in inequalities in factors like education, poverty, unemployment, access to health-promoting physical or built-environments are crucial in tackling the long-term effects of CVD inequalities between geographical areas.

effective way of CVD risk addressal. Social policy interventions related to health like reduction in inequalities in factors like education, poverty, unemployment, access to health-promoting physical or built-environments are crucial in tackling the long-term effects of CVD inequalities between geographical areas.

Background
In the last few years, spatial analysis has gained relevance in epidemiological studies and the identi cation and management of risk factors associated with diseases [1,2]. A majority of these studies employ spatial statistics to reveal factors, playing an essential role in decision-making, planning interventions, and distributing available resources. Although particularly useful for infections requiring vectors [3,4], they are also valuable for studying cardio-vascular diseases (CVDs) and other noncommunicable disorders [5,6,7]. The techniques employed in spatial analysis can be di cult to explain to a non-specialist, but its outcome, e.g., a map, can be immediately grasped by a non-specialist. Different spatial techniques are helpful at different levels, and they are complementary to each other.
Cluster detection is an important epidemiological tool, as it can help identify factors spatially closely related to a disease. Positive spatial autocorrelation (SA) implies the rates for a given phenomenon are likely to be similar for neighboring areas compared to the rates of geographically distant regions [8,9]. In the case of health variables, too, the variables tend to be spatially correlated [10,11] with their underlying related factors, which is because of the high probability of closely situated areas having similar underlying factors with regards to different phenomena [9].
There is a difference between the global and local situation when it comes to clustering. When locating global clusters, the focus is on their existence, not location [12,13]; local cluster analysis aims to quantify SA and clustering within small geographical units in the study area [13]. Moran's I is a commonly used spatial statistic for the detection of global clustering [14], with another tool called local indicators of spatial association (LISA) used for nding local clustering [15,16]. Next, regression techniques are employed to determine possible associations between variables of interest and decipher the strength and direction of the association of the relation. Other techniques utilized for estimating spatial regression are the ordinary least squares (OLS) [17], bivariate LISA [18,19], the generalized additive model [20], the spatial lag model [21], and the spatial error model.
CVDs are a leading cause of death and disease burden across the world, and the burden is only expected to increase as the population ages [22,23,24]. Most commonly used CVD risk prediction algorithms have been derived from the Framingham Risk Equation (FRE), used in general practice to assess risk for individual patients [25]. The trend in primary prevention of CVDs has traditionally been to depart from the relative risk factor assessment but treat these factors as absolute CVD risk [26]. The most effective prevention strategies require knowledge and a contextualized understanding of people, communities, environments, as well as variations in CVD risk. Despite the availability of clinically proven CVD risk factor assessment tools, the most at-risk populations rarely take part in such assessments until disease progression is well underway. Though imprecise proxies for risk can be wielded for community-level risk estimation, a considerable knowledge gap persists due to the unavailability of ne-grained population tools to predict "hotspots" for the future CVD risks from general practice clinical data [27].
There are a few studies that have attempted to inspect the spatial variation of NCD risk at a smaller geographic scale worldwide. Noble et al. examined the feasibility of mapping chronic disease risk among the general population. They created a small-area map of diabetes risk from general practice clinical records in the UK [28].
Factors of importance for cardiovascular disease spatial distribution patterns

Socioeconomic status
The association between socioeconomic status and CVD is chronicled in multiple studies [29]. Spatial analysis has only added to the extant knowledge. In Australia, CVD clustering tended to occur in relatively disadvantaged areas [27]. Even in Harris, Texas, geographically weighted regression depicted a correlation between CVD mortality and social deprivation at the community level [17]. Similar results have been found in Strasbourg, France, where high-risk clusters of myocardial infarction (MI) were seen to accumulate in economically disadvantaged areas, despite good access to health amenities [30].
Socioeconomic status can be a signi cant determinant of the opportunity of access to health services.
Clustering of heart disease mortality before any attempt at transport has been observed in areas of lower socioeconomic status and household amenities [31]. Mapping diseases at the district level and risk analysis with rapid inquiry facility (RIF) techniques showed people residing in highly-deprived areas showed a low relative frequency of prescription statin treatments [32].

Education level
It has been found in a study by Pedigo and colleagues [33] that neighborhoods from eastern Tennessee, USA had high-risk clusters of neighborhoods prone to stroke and MI mortality, along with a high prevalence of low educational level. These results have been replicated in another study conducted in Brazil, reporting spatial clustering for ischemic heart disease mortality and relating it with illiteracy in the study population [18].

Rural vs urban residency
Some characteristic peculiarities are unique to rural and urban areas, which may play a role in CVD development. A recent study conducted in Peru highlighted the clustering of obesity among urban children, while the prevalence was low among rural areas [34]. A study examining MI and stroke determinants in Tennessee found rural residency to be an important factor [35]. There is signi cant clustering among the rural Taiwanese population, characterized by the underutilization of cardiovascular drugs, which is further connected to cardiology specialists' low presence in some areas [36]. Hence, rural places of residence can select for inadequate access to health services, impeding timely disease management. Spatial regression analysis carried out in Taiwan has further revealed that mortality due to heat or cold waves is more rampant in rural areas, owing to limited access to medical facilities and resources, as opposed to urban metropolitan areas [37].

Alcohol intake
High alcohol intake and heart disease are connected, which is, in fact, also corroborated by a study from Chile, where a particular study region, which was a high-risk zone for CVD deaths, was also associated with alcohol consumption. These results are also supported by the presence of two clusters in the Valparaíso and Biobio areas; the alcohol consumption in Biobio has been among the highest in Chile for the last 45 years [38].

Smoking
Smoking in terms of risk factor clustering has been observed in both United States [39] and China [40]. It has also been observed when studying the CVD incidence among Asian and Caucasian populations [41].
The main objective of this study has been to explore the patterns of CVD risk factors among women in the late reproductive ages (35-49 years) across districts in India, and investigate the association between area-level socioeconomic factors and CVD risk patterns. Hence, the production of ne-grained maps of CVD risk is possible through this approach for clinicians and policymakers to use, enabling geographic targeting of community interventions for CVDs.

Method
The CVD prevalence within a region can be modeled as a spatial process [42], as can all the demographic and socioeconomic variables associated with the disease prevalence. These observed processes are likely to exhibit spatial dependence, as well as non-stationarity.
In case of disease prevalence, individuals living in close proximity to each other tend to have similar socio-demographic characteristics like age, income, access to healthcare facilities, and, hence, similar disease prevalence, thereby begetting positive spatial dependence. The process is non-stationary, though, given the fact that the disease prevalence is not constant over space: prevalence rates vary from young and wealthy areas to retirement establishments (inconstant mean), variability within a young neighborhood greater than in a relatively older populace (inconstant variance), also, the spatial extent of the spatial dependence varying across regions, from densely populated areas to suburbs (inconstant covariance).
When modeling for the aforementioned spatial process, procedures should be designed with a view to reduce model variance, while also considering for spatial dependence and non-stationarity. Also, the fact that these spatial processes entail both the properties' simultaneous occurrence is worth noticing. Despite the known effects of this relationship [43], most of the existing advanced spatial techniques address only one of the properties: more speci cally, spatial autoregressive methods [44] focus on spatial dependence, while disregarding non-stationarity; and local, or geographically weighted methods [45] focusing on nonstationarity while disregarding spatial dependence.
The present study is limited to applying spatial autoregressive procedures; the analytical implementation begins with an examination of spatial dependence. Spatial autocorrelation measures based on the Moran's I [46] are commonly used to test for clustering tendency of medical data, even while analysing for multivariate speci cations [47]. Throughout this study, the traditionally used spatial autocorrelation tool Moran's I [46] has been implemented. Even though the authors are well aware of the limitations of the index [48] the interpretation of the Moran's I results can be used to assess the presence and magnitude of spatial dependence. A single index for exploratory analysis, individual variables, and model residuals is vital to decipher spatial dependence, as seen in the current data.
Computation of this index requires the speci cation of a spatial dependence model, de ned by a spatial weight matrix, which can be something as simple as a binary construct, or a more complex speci cation, which might include various types of weights accounting for distance decay effects.
The identi cation and quanti cation of a geographical variable's spatial clustering have been a central issue in all con rmatory and exploratory spatial research. Moran's I provides summary statistics for overall spatial clustering [14,49,50,51,52]. Local Indices, e.g., local Moran's I i , allow for exploring local disparities in spatial dependence by quantifying each area's relative contribution to the global measure [53,15,44]. These measures are part of a broader attempt to spatialisegeneral statistics, in view of the knowledge and recognition of regular statistical assumptions not applicable to spatial data. For example, data in geographically referenced datasets are not independent, as is generally assumed in statistical analysis, but are in uenced by each other, a phenomenon termed as spatial autocorrelation [54]. Spatial distributions are prone to signi cant local variations, giving rise to discrete spatial patterns in the study area (spatial heterogeneity or non-stationarity) [44,55]. Univariate spatial association measures exclusively on the spatial clustering of observations pertaining to a single variable, while employing a bivariate spatial association ensures deciphering the relationship between two variables, in purview of the topology of the observations. Hence, we are able to parameterize the bivariate spatial dependence [56].
Spatial contiguity can be accounted for in various ways [57]: a common way being the de nition of k orders of spatial neighbors or the speci cation of a threshold distance, or even a method based on shared borders (for areal units only). Some methods are reliant on the spatial units' topology, but the computation of spatial neighbours is a far more general method. In any case, the extent of spatial dependence can be de ned, either via a maximum distance parameter, or by a maximum number (k) of nearest neighbors.
Spatial autoregressive methods use generalized least squares (GLS) and maximum likelihood (ML) models; the covariance structure generally explained by a conditional autoregressive (CAR), a simultaneous autoregressive (SAR), or a moving average (MA) speci cation. Generally, a constant covariance structure is assumed, and a spatial weight matrix dictates the spatial units which are spatially dependent [42]. The model is speci ed by the following equation: Where ρ (rho) is the autoregressive parameter and W is the spatial weight matrix.
The autoregressive parameter is a correlation coe cient, ranging between -1 and +1. The de nition of the spatial weight matrix entails the same speci cations as the spatial autocorrelation index. A backward method of model selection is conducted for all the regressions. Once each regression is speci ed, the spatial autocorrelation index is calculated on the regression individuals in the aforementioned process. The spatial weight matrix plays an important role in the methodology de ned, eventually in uencing the spatial autocorrelation index value, as well as the e cacy of the spatial autoregressive models. Spatial matrix de nition remains subjective, owing to its resting on an estimate of the spatial dependence in the spatial processes involved [58].
Incorporation of spatial dependence in a regression model can be done in one of two ways; spatial error model or a spatial lagged model. Tools are available assessing which of the models is a better t for the data, namely, the Lagrange Multiplier tests.
The difference between these models is technical and conceptual. A spatial error model basically implies that the: "spatial dependence observed in our data does not re ect a truly spatial process, but merely the geographical clustering of the sources of the behavior of interest. For example, citizens in adjoining neighborhoods may favor the same (political) candidate not because they talk to their neighbors, but because citizens with similar incomes tend to cluster geographically, and income also predicts vote choice. Such spatial dependence can be termed attributional dependence" [59].
On the other hand, a spatially lagged model incorporates spatial dependence by adding a "spatially lagged" variable y on the right hand side of the regression equation, which, essentially considers a spatially lagged "dependent" variable among its explanatory factors. Hence, the values of CVD in the neighboring areas of observation n~i is an important predictor of CVD in each individual area n~i. In other words, this implies that spatial dependence may be resulting from a process such as the diffusion of behavior between neighboring units: "If so the behaviour is likely to be highly social in nature, and understanding the interactions between interdependent units is critical to understanding the behavior in question. For example, citizens may discuss politics across adjoining neighbors such that an increase in support for a candidate in one neighborhood directly leads to an increase in support for the candidate in adjoining neighborhoods" [59]. However, the estimates available for men and women were provided at the state level. The study aims to explore the spatial differentials of CVDs among women in their late reproductive years (35-49 years).
For the analyses presented in the current study, the variables have been normalized, using the total resident state women population aged 35-49 years as denominators. This normalization renders all the variables becoming rates, instead of numbers. The data has been strati ed by age, and the outliers have been removed, resulting in 239,729 women as the sample size.
For carrying out spatial analysis, districts have been set as the unit of analysis, for which the shape le for 640 districts has been generated. Next, to examine the spatial dependence and clustering of CVD burden, Moran's I indices, and bivariate Local Indicator of Spatial Autocorrelation (LISA) cluster and signi cance maps have been generated.
Ordinary Least Square (OLS) regression has been employed with CVD prevalence as the outcome variable. To consider for spatial dependence, Spatial Autoregressive (SAR) models have been tted to the data. Diagnostic tests for spatial dependence have also been carried out to identify the best t model.

Results
The present study depicting the spatial variation in the prevalence of CVDs is based on 239,729 women in their late reproductive ages (35-49 years) from 36 states/Union Territories of India.
The descriptive statistics encompassing prevalence results for various socio-demographic and behavioral indicators are portrayed in Table 1. Results from NFHS-4 data show 17.4 percent of women in the late reproductive ages of 35-49 years are currently suffering from CVDs. A higher prevalence of CVDs is found among urban women (21.9 percent), as compared to rural (14.9 percent) women. Women with more than 10 years of schooling have a higher prevalence of CVD-affected individuals (22 percent), as compared to their counterparts with less than 10 years of schooling (18.7 percent), or even those who have not had any formal education (14.7 percent). Women practising Islam (20.3 percent) or religions other than Hinduism or Islam (19.5 percent) have a higher prevalence of CVDs, whereas Hindu women have a lower CVD prevalence (16.8 percent). As the wealth index progresses from least well-off to most well-off, the prevalence of CVDs also gradually increases from 10.3 percent for the least well-off category to 22.8 percent for the most well-off category. Non-smoking women have a higher CVD prevalence (17.4 percent), as compared to their counterparts who smoke (15.9 percent), as do women who do not consume alcohol (17.5 percent), as compared to their counterparts who do (13.9 percent).   Table 3 presents the results of the Ordinary Least Square (OLS) model, Spatial Lag Model (SLM), and Spatial Error Model (SEM). Concentrating on the OLS model, it is found that smoking, alcohol consumption, hailing from a Scheduled Caste background, more than 10 years of schooling, as well as urban places of residence appeared as signi cant correlates of CVD prevalence in the country. However, it would be early to conclude before model diagnostics are taken into consideration. Once the presence of spatial dependence has been established, the spatial lag model is applied with maximum likelihood approach, the results of which are presented in table 3. An additional indicator in the form of spatial lag coe cient (Rho'ρ') appears in the model. It shows the spatial dependence inherent in the sample data, calculating the average in uence of neighboring observations on a particular observation. Inclusion of this criterion and the t of the spatial lag model also converts into a higher Rsquared value. The effects of other independent variables remain virtually the same.
Next, the spatial error model has been employed. In the SEM, an additional term emerges, i.e., the coe cient on the spatially correlated errors (Lambda 'λ'). It has been found to have a positive effect and is highly signi cant, too. Hence, the general t improved, as evidenced by the R-squared value.
Both the spatial models are a marked improvement over the OLS model; among the two, the spatial error model emerging to be the most improved Also interesting are those district clusters which have high CVD prevalence among women, but low percentage of women without formal education like Aurangabad and Nagpur. Almost 14 districts have low CVD prevalence but high percentage of women without formal education, like Nalgonda, Chittor, Kolam, Idukki.
In case of bivariate results depicting CVD among women aged between 35-49 years and those with less than 10 years of schooling (Fig. 1b), the high-high hotspot cluster in Kashmir remained almost similar to the previous scenario of high CVD prevalence concurrent with a high percentage of women without any formal education. Other districts with the same situation are Bilaspur, Solan, Ludhiana, Bulandshahr, Kaithal, Karnal, Gonda, Maharajganj, Guntur, Sri Potti, Sriramulu, Nellore, Thiruvallur, Kancheepuram, Ramanathapuram, Virudhanagar, Thottukud, Madurai, Sivaganga, Pudukottai, Thanjavur.
The 106 districts with low CVD prevalence as well as low prevalence of women with less than 10 years of schooling consisted of districts like Kachchh, Banas Kantha, Sirohi, Udaipur, Uttarkashi, Rudraprayag, Chamoli, Tehri Garhwal, Raipur, Bilaspur, Mahasamund, Hingoli, majorly constituted by southern Rajasthan, northern coastal Gujarat, almost the whole of Uttarakhand, southern Uttar Pradesh, southern Bihar, Chhattisgarh, central and eastern Maharashtra.
The high CVD prevalence along with low prevalence of women with less than 10 years of schooling consists of districts like Pune, Nagpur, Neemuch.
Low CVD prevalence along with high prevalence of women with less than 10 years of schooling is found in 19 districts, a few of which are Erode, Salem, Idukki, Hanumangarh.
On the basis of classi cation by caste (Fig. 1c), bivariate results between CVD among women aged between 35-49 years and women from OBCs show high prevalence of both the phenomena in 63 districts, a few of which are Kupwara, Punch, Srinagar, Ludhiana, Ambala, Baghpat, Sonipat, Panipat, Nellore, Vellore, Ramanathapuram, Virudhanagar, Madurai, Sivaganga, from Jammu, and a few districts of Punjab, Haryana, and coastal Tamil Nadu down south.
Low prevalence of CVD in 35-49 yr women as well as hailing from OBCs has been observed in Nashik, Ahmednagar, Osmanabad, Bijapur, Jaipur, Sirohi, Udaipur, Rajsamand, and a majority of the districts of Chhattisgarh.
There are 15 districts apiece with low prevalence of CVD-affected women but high percentage of women from OBCs and vice versa.
Bivariate results of CVD among women aged between 35-49 years and women from Scheduled Caste households (Fig. 1d) show high-high clusters in 56 districts like West District in Sikkim, West Garo Hills, Tawang, Bghpat, Sonipat, Jind, Kancheepuram, Tiruvallur.
There are a few districts like Neemuch, Dumka, East Khasi Hills with High CVD prevalence, but low percentage of women from Scheduled Caste households.
Women aged between 35-49 years with CVD and hailing from rural areas (Fig. 1e) have high-high clusters in 70 districts like Firozpur, Jalandhar, Ludhiana, Fatehgarh, Kaithal, Guntur, Nellore, and most parts of Tamil Nadu.
Low-low clusters have been found in 112 districts like Koppal, Bellary, Bagalkot, Gadag, Bijapur etc.High CVD prevalence but low rural residential percentage has been found in 12 districts like Dumka, Jamui, Bokaro, Nagpur, Kalahandi.
In case of bivariate results of CVDs among women and women who smoke (Fig. 1f) A higher CVD prevalence has also been found among the women hailing from the richest wealth quintile as compared to the other wealth quintiles, corroborating a positive social gradient still prevalent in the CVD prevalence in India among women in later reproductive ages (35-49 years Target areas coming up in studies of this sort should direct the next course of action to be undertaken in terms of morbidity combat. Region-speci c steps are another area of action which can help in addressing the issue. The risk factors, aiming which can control the disease prevalence in one area, might not be effective in combatting an increased CVD prevalence in another area. In some parts, an increased proportion of women living in rural residences has been found to be having a high prevalence of CVDs, too, and clustered signi cantly in northern districts of states like Haryana, Punjab, Jammu, but there do exist 16 districts with high proportion of women hailing from rural areas with low prevalence of CVDs like Chittoor, Nalgonda, Erode, Namakkal, Idukki. Program-based interventions can be instrumental in creating momentum in CVD control schemes and awareness regarding behavioural changes to delay the onset of symptoms.

Conclusion
The main objective of this study has been to explore the patterns of CVD risk factors among people across districts in India, as well as investigate the association between area-level socioeconomic factors and CVD risk patterns. Hence, the production of ne-grained maps of CVD risk is possible through this approach for use by clinicians and policy makers, enabling geographic targeting of community interventions for CVDs. Khardha, Jagatsinghpur, Nagpur, Bhadrak are a few of the 26 districts found with high CVD prevalence but low prevalence of women who smoke. Most of the hotspots exist in Jammu, Uttarakhand, Punjab, Haryana, coastal Andhra Pradesh, Tamil Nadu, NCT, while the majority of the cold spots for all the bivariate LISA results are clustered in the stretch of area encompassing districts of Gujarat, Rajasthan, southern Uttar Pradesh, Chhattisgarh, eastern Maharashtra, northern Karnataka, and western Odisha.
Studies of this kind highlighting geographical disparities can rightly shift the focus on rural-urban differentials, provincial or district-level inequalities, hence demonstrating the need for targeted action and population-wide interventions to reduce CVD burden as well as associated behavioural risks. Globalization and urbanisation have been working at the macro-societal level, leading to the developing world increasingly being subjected to risky behaviour like smoking, drinking, low physical activity, as well as unhealthy food habits. Limited access to healthcare facilities, public health education, and prevention plans as compared to their counterparts in the developed world further compound the problem [68,69].
Disparities in CVD prevalence and CVD health point towards a deeper problem. The need of the hour is comprehensive tobacco control policies, smoking cessation programs, increased access to medical facilities, physical activity campaigns via Information Education Communication efforts are all methods to decreasing the CVD risk, all the while targeting the disadvantaged areas of the country. A broader course of policy action relating to social determinants can be a particularly effective way of CVD risk addressal [70,71,72]. Social policy interventions related to health like reduction in inequalities in factors like education, poverty, unemployment, access to health-promoting physical or built-environments are crucial in tackling the long-term effects of CVD inequalities between geographical areas. Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.