Geospatial-Temporal, Explanatory, Demand, and Financial Models for Heart Failure

Background About 5.7 million individuals in the United States have heart failure, and the disease was estimated to cost about $42.9 billion in 2020. This research provides geospatial-temporal incidence models of this disease in the U.S. and explanatory models to account for hospitals’ number of heart failure DRGs using technical, workload, nancial, and geospatial-temporal variables. The research also provides updated nancial and demand estimates based on inationary pressures and disease rate increases. Understanding patterns is important to both policymakers and health administrators alike for cost control and planning. Methods Geographical Information Systems maps of heart failure diagnosis-related groups (DRGs) from 2016 through 2018 depicted areas of high incidence as well as changes. Simple expenditure forecasts were calculated for 2016 through 2018. Linear, lasso, ridge, and Elastic Net models as well as ensembled tree regressors including were built on an 80% training set and evaluated on a 20% test set. large The best traditional regression model explained 75% of the variability in the number of DRGs experienced by hospital using a small subset of variables including discharges, DRG type, percent Medicare reimbursement, hospital type, and medical school aliation. The best ensembled tree models achieved R2 over .97 on the blinded test set and identied discharges, percent Medicare reimbursement, hospital acute days, aliated physicians, staffed beds, employees, hospital type, emergency room visits, medical school aliation, geographical location, and the number of surgeries as highly important predictors.


Introduction
Coronary heart disease (CHD), cardiovascular disease (CVD), and coronary artery disease (CAD) are leading causes of death in the US, taking the lives of 647,457 in 2017 [1]. Heart disease is the leading cause of death in most developed countries, causing the deaths of one third of those over the age of 35 [2] and one quarter of deaths in the US [3]. Heart disease does not discriminate between races: of deaths attributable to heart disease, 23.8% were non-Hispanic whites, 23.8% were non-Hispanic Blacks, 22.2% were Asian or Paci c Islander, and 18.4% were Native American or Alaskan Native [3]. Incidence of total coronary events in the US increases sharply with age [2]. This means that incidence of heart disease increases with age, which makes it even more dangerous for the elderly. An update of heart disease and stroke in 2016 reported 15.5 million people > 20 years old have CHD [4], which is close to 6% of that population in the U.S. [5]. The risk factors for heart disease are high-blood pressure, high cholesterol, and smoking, and 47% of Americans report at least one of these conditions [3]. It affects men slightly more than women: 1 in 4 male deaths (347,879) versus 1 in 5 female deaths (299,578) [6], and food insecurity (associated with poverty) is an obvious correlational factor [7].
Heart disease was not a common cause of death at the turn of the 20th century, but the prevalence of coronary atherosclerosis grew until 1960 [8]. In 1900, heart disease was the fourth cause of death, surpassed by infectious conditions [9]. Longevity in our nation increased after 1900 only due to the decrease in infectious diseases [10]. In 1900, less than 5% of Americans smoked, but in 1960 incidence of smoking was 42% [10]. After the 1950s, Americans decreased smoking and reduced cholesterol levels [8].
From 1980 to 2008, the decrease was 64%: from 345 to 123/100,000 [11]. Since the 1960s, age-adjusted incidence of heart disease has experienced a steady decline, but it is still the number one cause of death in our nation [1]. Mechanisms to track heart disease and predict admissions would be another mechanism to control this killer of Americans: particularly the elderly who are more susceptible to the condition [12].
Heart failure as a subset of heart disease is prevalent in about 5.7 million adults in the United States, and one out of 9 deaths in 2009 were attributed at least in part to heart failure. Approximately 50% of individuals diagnosed with heart disease will die within 5 years, and the annualized cost is estimated to be $30.7 billion per year [13].
A recent study used decision tree algorithms for the prediction of heart disease [14]. Decision tree algorithms are particularly useful when variable directionality is less important than prediction. Other research has used geospatial analysis to look at several aspects of heart disease such as emergency transport and inter-hospital transfer of myocardial infarction [15] as well as individual and contextual correlates of cardiovascular disease [16]. But so far, researchers have not conducted a geospatialtemporal analysis of heart failure with predictive modeling to provide epidemiological and administrative descriptive and inferential insight as well as economic implications for supply and demand. This research does just that.
Despite the national average of 383 people per physician in the United States, the number of people per cardiologist is 14,572 [17]. There is certainly an element of arti ciality in those numbers because while all people in the U.S. seek some medical care, a much smaller number need specialty care from a cardiologist. However, the message is the same: The cardiologist is a highly specialized, highly sought area of care.
While the general trend is up for cardiovascular disease (CVD), the growth of those entering cardiology is relatively at. It is estimated that 40.5% of the U.S. population will have some form of CVD by 2030. This equates to a 3.1% incidence rate and $818 billion in cost of care [18]. A 2018 study of heart failure incidence from 1990 to 2009 revealed that heart failure with reduced ejection fraction (HFrEF) was down, while heart failure with preserved ejection fraction was up (HFpEF) [19]. More recent studies are not readily available.
While the elderly have already been mentioned, it is also important to note the increased risk associated with minorities and economically depressed populations. A hospital's relative payment system for a given Diagnosis-Related Group (DRG) is directly affected by several factors: 1) relative wages in the area, 2) number of low-income patients because it affects the disproportionate share reimbursement, among others. Medicare fee-for-service patients are at greater risk for hospital-based health-care costs. Underserved minority patients and economically repressed areas affect care and cost of care. These populations are less healthy than wealthier, non-minority populations, and less healthy people will take greater intervention to stabilize [20].
This research seeks to understand the geospatial-temporal incidence of this disease in the U.S. and build explanatory models that might account for hospitals' number of heart failure DRGs using technical, workload, nancial, and geospatial-temporal variables. Further, the research provides nancial and demand estimates based on in ationary pressures and disease rate increases. Understanding patterns is important to both policymakers, epidemiologists, and health administrators alike for cost control and planning efforts.

Data
De nitive Healthcare provided the heart failure data for this study. Diagnostic-related groups (DRGs) associated with heart failure (DRG 291,292, and 293) were selected for inclusion. The De nitive Healthcare datasets contain the Centers for Medicare and Medicaid Services (CMS) Standard Analytical Files (SAF) [21]. Population data for rate calculations were from the Census Bureau [5]. For years 2016 through 2018, there were {13.66, 13.52; 13.35} thousand hospital observations in the study, respectively.

Variables
Variables in this study come from the De nitive Healthcare dataset [21]. The primary variable of interest is "heart failure" as de ned by Diagnostic-Related Groups 291, 292, and 293. The Diagnosis Related Group 291 encompasses "Heart Failure and Shock with Major Complication of Comorbidity (MCC)"; DRG 292 relates to "Heart Failure and Shock with Complication or Comorbidity (CC); DRG 293 pertains to "Heart Failure and Shock without Complication or Comorbidity (CC) / Major Complication or Comorbidity". The dependent variable is measured at the hospital level and aggregated by county for geospatial mapping. Inpatient claims for heart failure provide a measure of the met demand for services and is suggestive of which areas may need additional funding and resources from health policy decisionmakers.
Variable groups evaluated in the explanatory models included four categories: nancial variables, workload variables, technical variables, and geo-spatial temporal variables. All variables are measured at the hospital level by year. Table 1 provides the appropriate de nitions and scope of the independent variables sans the geo-spatial temporal group.

Train and Test Sets
For the explanatory analysis, data were divided randomly using a pseudo-random seed for replication and consistency in model comparison into 80% training and 20% test set of sizes 32,419 and 8,104, respectively. Models were built on the training set and evaluated on the test set. The primary model selection metric of interest was the Root Mean Squared Error (RMSE), a metric which penalizes outlier forecasts heavily.

Geospatial Analysis
Geospatial maps for the rates of heart failure DRGs from 2016 through 2018 were generated at the state level. Rate data adjust for population changes, allowing comparison of incidence rates. Although descriptive in nature only, these maps highlight geographic variation. Heat maps have been used for describing birthing incidence [22], the opioid epidemic [23], evaluating back surgery growth over time [24], and in many other health-related studies. The signi cance of changes for 2016 to 2018 (DRG rates) are evaluated by a non-parametric Friedman's test. The Wilcoxon non-parametric test is preferable and more conservative than repeated samples ANOVA, as normality, homogeneity of variance, and independence assumptions do not hold [25].

Explanatory Analysis
Linear regression, lasso regression, robust regression, Elastic Net regression, extreme gradient-boosted random forests, and bagging regressors estimate the DRG heart failure admissions. To investigate the bias-variance trade-off [26], we built multiple models on an 80% training and evaluated on a 20% test set. All models are compared based on Root Mean Squared Error (RMSE), which penalizes outliers. The models are exploratory to see which features (workload, nancial, technical, and geospatial-temporal) might be explanatory.
Lasso regression is a constrained regression that penalizes over tting using an L1-norm penalty function (absolute value), while ridge regression is similar to lasso regression but penalizes using the L2-norm (squared). Elastic Net combines both Lasso and Ridge penalty functions.
While coe cients are easily interpreted in regression-type models, the data, typically need scaling and transformations with no single best solution available. Unlike tree ensemble models (forests), regression models are unable to nd polytomous splits of variables automatically and are not scale invariant. To address the concerns of collinearity, multivariate Box-Cox methods are employed on all quantitative variables simultaneously after location adjustments to make them positive de nite.
Random forests a ensemble of de-correlated tree models. Every tree produces a forecast, and all trees produced are than averaged to produce the estimate. Trees are "pruned," to prevent over tting [26]. Figure  1 is an example of a tree with three branches. The tree splits observations by the number of hospital discharges less than or equal to versus greater than or 12,406 initially to obtain the maximum separation (RMSE). Gradient boosted random forests are a special class of ensembled trees. These models use nonlinear optimization to optimize a cost function based on the (pseudo)-residuals of a given function. Unlike random forests, gradient boosted random forests do not produce uncorrelated trees. Instead, the residuals of each tree are re-tted with the possible independent variables in other tree models. Essentially, the focus is on the residuals. A more complete discussion of gradient boosting is provided in The Elements of Statistical Learning [26].
Gradient boosted random forests are scale-invariant, as they nd relationships (splits) which the researcher might miss and generate importance metrics for explanatory purposes. These models will, however, over t the data if the researcher does not restrict the growth of the trees. Cross-validation is necessary.
A Bagging regressor is an ensemble which ts base regressors on random subsets of the original dataset. The estimates from these regressors are then aggregated by voting or averaging to generate a nal prediction. The result reduces variance of other block-box estimators by random sampling and ensembling. A good implementation and discussion of bagging regressors is available from the Python [27] SciKit-Learn module.

Software
All analysis was performed in Anaconda Python Release 3.7 [27], R Statistical Software (inside of Python using the r2py library) [28], and Microsoft Excel 2016 [29]. Python was used primarily for tree models, while R provided regression analysis. Excel's Bing-based 3D mapping software generated the GIS maps.

Missing Observations
About 2% of quantitative observations were missing, so simple imputation using the mean was employed. This is conservative, as it tends to hide results that might be statistically relevant by reinforcing mean values. For the categorical variables, all but ownership were fully complete. There were only 14 missing observations for this variable, and these were imputed with the mode.

Descriptive Statistics-Quantitative Data
Descriptive statistics for the quantitative data are provided in Table 2  Year over year, both DRGs and rates of DRGs per 1000 population increased as illustrated in Figure 2.
The signi cance of the DRG increase is the nancial consideration. The signi cance of the rate of DRG increase is the epidemiological consideration. If the DRG rate is considered a proxy for incidence rate, then there is either a signi cant increase, a coding issue, or something else. These considerations are found in the discussion section. One might expect the DRG rate graph to remain horizontal (static). Independent variables remained relatively constant year-over-year likely due to repeated measures on the same facilities. Of the hospital observations, 6700 were rural (42%) while 9279 were urban (58%). Most of the hospitals (8342 or 52%) were voluntary non-pro ts with 29% (4641) proprietary and 18.7% (2996) governmental. The vast majority (11,914 or 75%) had no a liation with a medical school and were shortterm care facilities (9,604 or 60%). Nearly no hospitals were classi ed as Department of Defense (DoD) or children's hospitals. Figure 3   In FY 2008, the Centers for Medicare and Medicaid (CMS) estimated that heart failure DRGs 291, 292, and 293 national average total costs per case were {$10.235, $6.882, $5.038} thousand, respectively. By FY 2012, CMS increased those estimates to {$11.437, $7.841, $5.400} thousand, respectively. In four years, the accumulation rates (1 plus the in ation rate) were 1.139, 1.117, and 1.072 for the DRGs in ascending order. Using these accumulation rates, estimates for 2016, 2017, and 2018 were generated. Table 3 shows these extrapolated estimates.  Both estimates are fairly close. To estimate costs, we used both of these tables separately as upper and lower bounds. Since these total costs represent only CMS costs, the actual nancial burden across all payers is likely underestimated as commercial third-party insurers can reimburse up to 90% more than Medicare for the same diagnosis [31]. Figure 4 illustrates the number of DRGs by year, while Figure 5 shows the associated aggregate cost estimates. In Figure 4, it is clear that DRG 291, the DRG with the highest average reimbursement rate per case, has increased nonlinearly, while DRG2 292 has seen a small drop, and DRG 293 is at. In Figure 5, the total cost estimates for 2018 are nearly $66 billion more than 2016 on average. DRG 291, the most expensive DRG, has seen reimbursement increases of $92 billion on average. Reasons for such an increase are explored in the discussion section.

Descriptive Statistics-Correlational Analysis
Hierarchical clustered correlation analysis of quantitative variables ( Figure 6) illustrate tight relationships among many variables. Hierarchical clustered correlation analysis clusters variables based on distance measures (e.g., Euclidean), so that those which are most highly correlated are close in location. These variables are then placed into a correlation plot or correlogram. Figure 7 illustrates that discharges, acute days, and staffed beds are most closely associated with the number of diagnoses, our primary variable of interest.

Geospatial Analysis
A descriptive analysis of heart failure over time using geographical informat systems was conducted to evaluate regional differences. Primarily, we were interested in rates per 1000 in the population. Populations over time were based on Census Bureau estimates [5].
While DRG rates per 1000 were not constant over time, the concentrations were fairly consistent.
There is a clear bifurcation in the center of the United States separating high and low rates. That bifurcation suggests a clear West-East difference, favoring the West Coast. Washington, D.C. has had (on average) the highest diagnoses of heart failure followed by West Virginia, Alabama, Mississippi, Michigan, Louisiana, Kentucky, and North Dakota. Of interest is that previous studies indicate these states are also plagued by the opioid crisis [23].  From 2016 through 2018, the average rate of diagnoses per 1,000 population increased for nearly all states. A Friedman rank sum test (paired, non-parametric ANOVA) of rates by state by year revealed signi cantly different rates by year by state (c 2 2 =70.941, p<.001). Figure 12 illustrates the changes by year and by state.  [32]. A Spearman's test for correlation of obesity prevalence and 2018 DRGs per 1000 was statistically signi cant with rho=.689, S=6,867.7, p<.001. Figure 13 provides the map of obesity and DRG rates. Figure 13. Map of DRG Rates / 1000 versus obesity prevalence 3.6. Explanatory Models Explanatory models were sought to explain the number of diagnoses by facility. The importance of these models is that we might estimate demand based on workload, technical, nancial, and geospatial-temporal variables. A discussion of data preparation and analysis follows.

Box-Cox Multivariate Transformations
To meet required regression assumptions, multivariate transformation using Box-Cox methods was conducted on location-transformed variables. The location transform was necessary to ensure that all variables were positive de nite. Box-Cox methods search for the optimal power transform of all variables simultaneously such that the assumption of multivariate normality cannot be rejected. A logarithmic transform is de ned as the power of zero. In order to ensure that all possible transformations are feasible, the data must be positive de nite. Thus each variable that was non-positive de nite had the absolute value of the minumum added to each observations plus .01. Doing so ensured a positive de nite location transform. These transformations are necessary only for non-tree models. (Tree models are location-scale invariant.) To prevent bias from being induced into the unknown test set, the transformations are completed only on the training set. The optimal powers found from the optimization associated with Box-Cox multivariate analysis are then applied to the test set. See Table 4 for the optimal powers. Using the postive de nite, Box-Cox transformed data, a regression model was t hierarchially using the following blocks (in order): technical, workload, nancial, geo-spatial. The multivariate transformation assumes that at least some independent variables cannot be fully observed or that we have incomplete observations on variables that might be fully observed. Thus, the transformations from the Box-Cox methods attempt to achieve multivariate normality rather than univariate normality.
Hierarchical models attempt to t obvious (known) variable blocks rst followed by those of mmost interest. In our case, all blocks were statistically relevant to the analysis (see Table 5). Linear regression on the training set resulted in a reasonable t that accounted for R 2 =.750 or 75% of the sum of squared variability. No collinearity problems were present after transformation.
Performance on the training set was insightful; however, the proof of model explanatory power rests in the training set estimates of the test set values. Applying the parameter estimates generated from the training set to the test set resulted in an R 2 of .749, barely any loss.
Given the model's ability to predict, the linear regression model was re-run on the entirety of the dataset after re-estimating the Box-Cost transformations, transformations which were only slight different in magnitude than those produced by the training set. The results again produce R 2 =0.749. The actual versus predicted plot is shown in Figure 14.  Table 6. Discharges, Medicare percentage, and hospital type are the primary variables of interest.

Tree Ensemble Models
Several tree regressor models were built and compared on an 80% training set. There was no need to use the transformed data for tree models, as these are location / scale invariant. These tree models included a bagging regressor (BR), a random forest regressor (RFR), an extra trees regressor (ETR), a gradient boosted regressor (GBR), and an extreme-gradient boosted model (XGR). Tree models are atheoretic, as each tree developed may be different from the previous one. When ensembled, variable importances emerge that determine which items are most important to determining how to classify or regress in a nonlinear fashion (piecewise). The number of trees used for each estimator was tuned along with the maximum depth of the trees (number of branches). A pseudo-random number ensured that any model improvements were not due to the random number stream. The results of these models on the unseen test set are shown in Table 6. Most importantly, all of these models account for more variance than regression models. The models predict at 97.1% and above in terms of variability capture. (See Table 7.) Because of the tight congruence of these models, we ensembled the estimates of the number of DRGs forecast by each to produce importance statistics. The variables of most importance includes discharges, Medicare percentage, acute days, a liated physicians, staffed beds, employees psychiatric hospital status, ER visits, medical school a liation status, Puerto Rico status and surgeries. See Table 8. When comparing the regression models with the ensembled forests, we see that the rst two terms are congruent (discharges and Medicare percent). Interestingly, no nancial models are in the top 10 effect sizes of the regression or tree models. Facility technical and workload variables are the most important determinants of heart failure. In the tree models, there were piecewise linear effects identi ed for states that were not seen in the regression models.

Discussion
With Figure 2 (DRGs per year), we can see that the number of DRGs for heart failure is increasing over time. We do not have su cient data or monthly data to run time series analyses such as exponential trend seasonality and auto-regressive integrated moving average models. Even without those models, it is clear that there appears to be an increase in heart failure diagnoses and a change in intensity from 2016. What is most interesting is that intensity changes are largely in the North Central while current incidence rates are highest East of the Texas panhandle.
Further, we see variables that explain the number of DRGs of a facility over time. Some of these are logically associated with the size of facility (e.g., number of discharges). One of these is logically associated with age (Medicare, available to those 65 and older.) However, the tree model ensembles Complication or Comorbidity, is at. A DRG is determined by the principal diagnosis, the principal procedure, if any, and certain secondary diagnoses identi ed by CMS as comorbidities and complications (CCs) and major comorbidities and complications (MCCs) [33]. A comorbidity is a condition that existed before admission. A complication is any condition occurring after admission, not necessarily a complication of care [34]. Although heart failure DRGs represented the largest cause of hospitalizations among Medicare bene ciaries and were among the costliest to Medicare prior to 2016, the results of our study now suggest that total cost estimates for these three diangoses related groups in 2018 are now nearly $61 billion more than 2016 [35][36][37]. DRG 291, the most expensive DRG, is associated with $91 billion cost increases from 2016.
Although our research has demonstrated substantial reliability in the explanatory factors associated with the longitudinal growth trajectory, it does not explain the reasons why we see such substantial growth in DRG 291 versus DRGs 292 and 293. Given our study results, there are several potential drivers that could meaningfully contribute to the growth in DRG 291 from 2016 to 2018. First, there may have been a signi cant increase in patients with cardiac conditions with additional major comorbidities. This cannot be simply dismissed given the rapid increase in Medicare eligible bene ciaries -by some estimates as many as ten thousand per day -and the prevalence of obesity, coronary obstructive pulmonary disease, and other age and lifestyle related conditions [38][39][40]. However, given the relatively at or declining rate in DRGs 292 and 293, we do not believe this is the only driver of our ndings. Our ndings support other predictions that soon patient demand will outpace the supply [41,42].
Second, up until October 2018, all extracorporeal membrane oxygenation (ECMO) cases were assigned to DRG 003, which typically reimburses at a rate of roughly $100,000 per case [43]. In scal year 2019, which started in October 2018, that reimbursement methodology changed so that every ECMO case would no longer be assigned to DRG 003. Rather, the DRG assigned depends on the path of the cannulation. If the ECMO patient is accessed centrally, DRG 003 is still applied. However, if cannulated peripherally, then it falls into another (lower-paying) DRG, on of which is DRG 291 [44,45]. Although there is only a three month overlap of this change and our study dataset, there is high likelihood this additional volume is re ected in our study in 2018.
Third, since 2010 and the passage of the Affordable Care Act, many cardiologists have sought hospital employment versus private practice. The uncertainty of continued healthcare reform efforts, burdensome electronic health record costs, declining CMS reimbursement rates in physician professional fees for noninvasive testing procedures (e.g., electrocardiograms, nuclear stress tests, etc.), and younger clinicians' different expectations related to work and personal life balance have all combined to prompt cardiology groups to seek ways to stay nancially viable. Today more than 70 percent of cardiologists are employed by hospitals or health systems [46,47]. Hospitals, in turn, seek to maximize utilization and reimbursement from the highly resource intensive cardiology service lines. Prior research from the National Bureau of Economic Research found that hospitals responded to price changes by up-coding patients to diagnosis codes associated with large reimbursement increases. These authors indicate hospitals do not alter their treatment or admissions policies based on diagnosis-speci c prices; however, they employ sophisticated coding strategies in order to maximize total reimbursement [48,49].
Fourth, we suspect the recent transition from ICD-9 to ICD-10 that occurred in October 2015 is a contributing factor. Starting on October 1, 2015, there were 68,069 valid ICD-10-CM diagnosis codes, representing a nearly 5-fold increase from the 14,025 valid ICD-9-CM diagnosis codes. ICD-10-CM diagnosis codes are structured differently from ICD-9-CM codes and provide more detail [49]. This code expansion allows providers the ability to capture the severity and speci city of the condition in much greater detail -which may prompt increased use of DRG 291.
As we look at the number of times many of the codes are being assigned to any particular patient, we see a signi cant change in how physicians are diagnosing. Previously, we had an ICD-9 diagnosis code with some generic areas that covered many patients. A very general and generic set of heart failure codes existed under 428.x in ICD-9. There was little speci city as to sidedness of the issue or speci cs of the disease. ICD-10 codes allow a very speci c diagnosis per codes, and these codes will continue to change over time due to physicians' adaptation of coding in this manner. For example, the I50.8xx codes did not exist in 2016, but they have been used since 2017, with another change adding more subcodes in 2018.
Today, we have very speci c codes for very speci c diseases and processes which go on within the heart, to include acute on chronic concerns as well. The adjustment to ICD-10 codes has undoubtedly created a learning phase for practitioners on determining the appropriate codes as well as when and how to use them.
We would expect to see some elevation from year to year with the growth of the Baby Boomer population coming into healthcare, without an age adjustment to the population. This is shown in the numbers from 2016-2018 with total diagnoses moving from 5.39M to 5.61M to 5.69M. However, how the diagnosis codes are being applied shows variation from year to year, to include some years of negative numbers in several codes. Many of the negative values for codes are for "unspeci ed" types of heart disease. This shows that we are moving away from generic diagnoses and towards diagnoses based in speci city instead, which is one of the purposes of moving to ICD-10.
One could draw a conclusion of upcoding: a monetary free-for-all, assigning diagnoses based on what pays the most. However, in many cases the physician is not billing based on a diagnosis code, but on the level of the visit and the type. This is obviously dependent upon insurance types, contracts, and other inputs outside the discussion level of this paper.
Of curious note, we are seeing an interesting trend looking at the GIS information included in this paper to where heart failure diagnoses are being seen. In the areas which are surrounding oil and gas pipelines, we have seen a growth in the numbers of heart failure diagnoses in those areas. For our purposes here, the conclusion is only empirical, however there is a signi cant change in the heat maps in the areas surrounding pipelines. If the reader will overlay the route of the Keystone Pipeline from Canada to Galveston, Texas, you will not a curious overlap with incidence of heart failure. Recently, one author also noted an increased use of methamphetamine and cocaine by oil eld workers [51]. It is certainly beyond the scope of this research, but it might be something to consider for future research because a consequence of the use of these illicit drugs are differing heart disorders, to include heart failure.

Conclusions
The policy implications of this analysis are several. First, clearly the need to continue to focus on a population health approach to reduce obesity rates across the country is needed, focusing speci cally on the geographic states identi ed with the highest incidence and prevalence across the study timeline. The large increase in the DRGs 291 -293 show that shifting funding to prevention from chronic disease management certainly has the nancial evidence to support this approach. The argument is certainly made that education is not su cient to change lifestyle and behaviors contributing to the rise of heart disease shown here, so it is time to begin exploring a punitive annual health assessment requirement for high-risk individuals who fail to make signi cant risk factor changes. The health administrator will certainly need to analyze both the volume and scope of services within these analyzed DRGs to ensure the evident increase in demand indicated will be available, speci cally in the identi ed high incidence geographic areas. In Certi cate of Need (CON) states, this analysis will be bene cial in getting the CON approved based on the increased demand. Evidence shows that CON states for cardiac services, of which most of the high incidence and prevalence states in the study are, have higher mortality rates for cardiac services [51]. Another signi cant potential policy implication is a continued re-evaluation of the need for CONs in general, as multiple researchers are showing it is in question if they are still needed in today's healthcare environment, and potentially are leading to restriction of services that are in increasing demand and lead to higher mortality [52].         Diagnoses per 1,000 by year by state Figure 13