3.1 Missing Observations
About 2% of quantitative observations were missing, so simple imputation using the mean was employed. This is conservative, as it tends to hide results that might be statistically relevant by reinforcing mean values. For the categorical variables, only ownership was not fully complete. There were only 14 missing observations for this variable, and these were imputed with the mode.
3.2. Descriptive Statistics-Quantitative Data
Descriptive statistics for the quantitative data are provided in Table 2. The average hospital observation during any given year had 1,635.62 observations of DRG 291, 292, and 293 (median of 383). That same average hospital had 146.39 staffed beds (median of 87), 6,996.58 discharges (median of 2,825), 6,348.76 surgeries (median of 4,490), and 34,181.38 acute days (median of 14,051). The average hospital had positive income (in millions) of $17.232 (median of $2.044), significant cash-on-hand ($20.28, median of $1.99), and positive equity. The typical hospital had 1005.53 employees (median of 437) with 231.37 affiliated physicians (median of 104) and was reimbursed 45% by Medicare (median of 42%). Only 9% reimbursement was from Medicaid (median of 6%).
<Insert Table 2 About Here>
Year over year, both DRGs and rates of DRGs per 1000 population increased as illustrated in Figure 2. The significance of the DRG increase is the financial consideration. The significance of the rate of DRG increase is the epidemiological consideration. If the DRG rate is considered a proxy for incidence rate, then there is either a significant increase, a coding issue, or something else. These considerations are found in the discussion section. One might expect the DRG rate graph to remain horizontal (static). Independent variables remained relatively constant year-over-year likely due to repeated measures on the same facilities.
<Insert Figure 2>
3.3. Descriptive Statistics-Categorical Data
California, Texas, and Florida had the largest number of diagnoses for all years and year-over-year, largely due to population size, with averages of 1,669,210; 1,631,021; 1,490,983; respectively. When adjusted per 1000 population, the District of Columbia, West Virginia, and Delaware dominated the with total rates per 1,000 population of 109.47, 102.86, and 94.15, respectively. Utah, Hawaii, and Colorado had the smallest average rates, 26.17, 28.87, and 34.74, respectively. Appendix Aillustrates the rates by state / territory.
Of the hospital observations, 6700 were rural (42%) while 9279 were urban (58%). Most of the hospitals (8342 or 52%) were voluntary non-profits with 29% (4641) proprietary and 18.7% (2996) governmental. The vast majority (11,914 or 75%) had no affiliation with a medical school and were short-term care facilities (9,604 or 60%). Nearly no hospitals were classified as Department of Defense (DoD) or children’s hospitals. Figure 3 depicts the categorical breakout by year.
<Insert Figure 3>
3.4. Descriptive Statistics-Financial Estimates
In FY 2008, the Centers for Medicare and Medicaid (CMS) estimated that heart failure DRGs 291, 292, and 293 national average total costs per case were $10.235, $6.882, and $5.038 thousand, respectively. By FY 2012, CMS increased those estimates to $11.437, $7.841, $5.400 thousand, respectively. In four years, the accumulation rates (1 plus the inflation rate) were 1.139, 1.117, and 1.072 for the DRGs in ascending order. Using these accumulation rates, estimates for 2016, 2017, and 2018 were generated. Table 3 shows these extrapolated estimates.
<Insert Table 3 About Here>
Another method for estimating these costs involved the use of the Federal Reserve Bank of Saint Louis (FRED) producer price index for general medical and surgical hospitals [46]. The annual accumulation rates for 2013 through 2018 were estimated as 1.022, 1.012, 1,007, 1.013, 1.018, and 1.023, respectively. Applying these to the 2012 total costs from CMS results in Table 4 estimates for 2016 through 2018.
<Insert Table 4 About Here>
Both estimates are fairly close. To estimate costs, we used both of these tables separately as upper and lower bounds. Since these total costs represent only CMS costs, the actual financial burden across all payers is likely underestimated as commercial third-party insurers can reimburse up to 90% more than Medicare for the same diagnosis [47]. Figure 4 illustrates the number of DRGs by year, while Figure 5 shows the associated aggregate cost estimates.
<Insert Figure 4>
<Insert Figure 5>
In Figure 4, it is clear that DRG 291, the DRG with the highest average reimbursement rate per case, has increased nonlinearly, while DRG2 292 has seen a small drop, and DRG 293 is flat. In Figure 5, the total cost estimates for 2018 are nearly $66 billion more than 2016 on average. DRG 291, the most expensive DRG, has seen reimbursement increases of $92 billion on average. Reasons for such an increase are explored in the discussion section.
3.4. Descriptive Statistics-Correlational Analysis
Hierarchical clustered correlation analysis of quantitative variables (Figure 6) illustrate tight relationships among many variables. Hierarchical clustered correlation analysis clusters variables based on distance measures (e.g., Euclidean), so that those which are most highly correlated are close in location. These variables are then placed into a correlation plot or correlogram. Figure 6 illustrates that discharges, acute days, and staffed beds are most closely associated with the number of diagnoses, our primary variable of interest.
<Insert Figure 6>
Analysis of the relationship between some categorical variables and the number of diagnoses also proved interesting. Notched boxplots by year and medical school affiliation reveal that a major affiliation experiences a larger number of diagnoses at the .05 level a result that is to be expected. (See Figure 7). Further, voluntary not-for-profits see a larger number of diagnoses (Figure 8).
<Insert Figure 7>
<Insert Figure 8>
3.5. State Level Geospatial Analysis
A descriptive analysis of heart failure over time using geographical informat systems was conducted to evaluate regional differences. Primarily, we were interested in rates per standardize unit in the population of the geographical area. Populations over time were based on Census Bureau estimates for each geographic region [7,36].
While DRG rates per 1000 were not constant over time, the concentrations were fairly consistent. There is a clear bifurcation in the center of the United States separating high and low rates. That bifurcation suggests a clear West-East difference, favoring the West Coast. Washington, D.C. has had (on average) the highest admission rate for diagnoses of heart failure (perhaps, due to the large presence of military and veteran care facilities) followed by West Virginia, Alabama, Mississippi, Michigan, Louisiana, Kentucky, and North Dakota. Of interest is that previous studies indicate these states also see many admissions due to the opioid crisis [33]. Figure 9 shows the admission rates for diagnoses per 100,00 persons by area for all years and for by years on a standardized scale. These diagrams were produced with R [45] using the usmap package [48].
<Insert Figure 9>
From 2016 through 2018, the average rate of diagnoses per 1,000 population increased for nearly all states. A Friedman rank sum test (paired, non-parametric ANOVA) of rates by state by year revealed significantly different rates by year by state (c22=70.941, p<.001). Figure 10 illustrates the changes by year and by state.
<Insert Figure 10>
When 2018 data are aggregated at the state / territory level, the DRGs per 1000 paint a slightly different picture, with high-intensities in Washington D.C., West Virginia, Delaware, Mississippi, Kentucky, North Dakota, Michigan, and Missouri (listed in descending order.) Further, evaluating obesity prevalence intensity from the Centers for Disease Control and Prevention (CDC) shows significant correlation between obesity and DRGs per 1000 [49]. A Spearman’s test for correlation of obesity prevalence and 2018 DRGs per 1000 was statistically significant with rho=.689, S=6,867.7, p<.001.
An exploratory spatial regression model using a first-order Queen contiguity criterion to evaluate the importance of geography wase performed using rolled, Z-scaled, state-level independent variables on the state-level admission rate variable (admissions per population in each state). The final model (after backwards stepwise regression based on Akaike Information Criterion using the MASS package in R [50] included mean profit margin for hospitals within the state, total surgical cases, total acute hospital days, total staffed beds, total physicians, mean proportion of Medicare patients, proportion of voluntary not-for-profits (NFP), proportion with a major or graduate affiliation with medical schools, and proportion of urban facilities. The regression was of reasonable effect size, Adjusted R2 = .539. Table 5 summarizes the regression coefficient table.
<Insert Table 5 About Here>
Most important to this preliminary analysis was whether state-level spatial data were important to evaluating admission rates. The spatial map of the standardized residuals [51] is shown in Figure 11. The spatial residual shows little spatial correlation. The visual check was confirmed by a global test for spatial relationships, Moran’s I (observed = -0.085, expected = -0.020, p =.510) [52].
<Insert Figure 11>
3.6. County-Level Spatial Analysis
Admissions have significant county variation as one might expect. Figures 12 and 13 depict the sum of the admissions and the admission rate (based on average county populations) for the years 2016 through 2018. The rate of admissions is more useful, as larger populations should likely experience more admissions. The outlier is Winchester County, VA (small population with a major medical center). Charts were generated by the tmap package in R [51].
<Insert Figure 12>
<Insert Figure 13>
The heart failure admissions per county population per state for the top five states (West Virginia, Alabama, Mississippi, Michigan, and Louisiana), based on admission rates (year 2018) are shown in Figures 14 through 18, respectively. There is little change in concentration over time. These county maps show that the admissions are generally (as expected) in large metropolitan areas.
For West Virginia, the densest concentration of heart failure admissions is in Logan County. Logan County has a population of 33,674 (2018) and is located south of Charleston. The admission rate density was 0.252 in 2018. In Alabama, Houston county (home to Dothan and a population of 28,838 in 2018) has the highest density, 0.277. Forrest County, Mississippi, home of Hattiesburg and a population of 31,372 in 2018, saw 75,564 admissions for a rate per person of 0.415. The county also has the second largest medical facility based on discharges [35]. Emmet County (home of Petoskey and 32,875 individuals) had the highest density for Michigan (0.329), perhaps because it has a hospital that serves many counties. Red River Parish in Louisiana, a small parish of 8,621 individuals, had the highest rate of heart failure admissions (0.247).
<Insert Figure 14>
<Insert Figure 15>
<Insert Figure 16>
<Insert Figure 17>
<Insert Figure 18>
Similar to what was done at the state level, an exploratory spatial regression model using a first-order Queen contiguity criterion to evaluate the importance of geography wase performed using rolled, Z-scaled, county-level independent variables on the county-level admission rate variable (admissions per population in each county). The final model included average profit margin for facilities in the geography, total ER visits, total surgeries, total staffed beds, total physicians, proportion Medicare patients, proportion voluntary non-profits, proportion with graduate or major medical school affiliations, and proportion that were short-term acute care hospitals (STACs). Results of the regression are in Table 6, and the residual map is shown in Figure 19. The regression accounted for only a small fraction of the sum of the squares (R2 = .155).
<Insert Table 6 About Here>
<Insert Figure 19>
The residual map is not suggestive of spatial autocorrelation given the residual dispersion by county. Moran’s I again indicated no significant spatial correlation (observed = 0.019, expected=-0.001, p=.140). Future explanatory models can omit spatial correlation.
3.7. Explanatory Models
Several models were leveraged to explain the number of diagnoses admissions by facility level (the unit of observation in the dataset). The importance of these models is that we might estimate demand based on workload, technical, financial, and geospatial variables. A discussion of data preparation and analysis follows.
3.6.1. Box-Cox Multivariate Transformations
To meet required regression assumptions, multivariate transformation using Box-Cox methods was conducted on location-transformed variables. The location transform was necessary to ensure that all variables were positive definite. Box-Cox methods search for the optimal power transform of all variables simultaneously such that the assumption of multivariate normality cannot be rejected. A logarithmic transform is defined as the power of zero. In order to ensure that all possible transformations are feasible, the data must be positive definite. Thus each variable that was non-positive definite had the absolute value of the minumum added to each observations plus .01. Doing so ensured a positive definite location transform. These transformations are necessary only for non-tree models. (Tree models are location-scale invariant.) To prevent bias from being induced into the unknown test set, the transformations are completed only on the training set. The optimal powers found from the optimization associated with Box-Cox multivariate analysis are then applied to the test set. See Table 7 for the optimal powers.
<Insert Table 7 About Here>
3.6.2. Regression Models
Using the postive definite, Box-Cox transformed data, a regression model was fit hierarchially using the following blocks (in order): technical, workload, financial, geo-spatial. The multivariate transformation assumes that at least some independent variables cannot be fully observed or that we have incomplete observations on variables that might be fully observed. Thus, the transformations from the Box-Cox methods attempt to achieve multivariate normality rather than univariate normality. Hierarchical models attempt to fit obvious (known) variable blocks first followed by those of mmost interest. In our case, all blocks were statistically relevant to the analysis (see Table 8).
<Insert Table 8 About Here>
Linear regression on the training set resulted in a reasonable fit that accounted for R2=.750 or 75% of the sum of squared variability. No collinearity problems were present after transformation. Performance on the training set was insightful; however, the proof of model explanatory power rests in the training set estimates of the test set values. Applying the parameter estimates generated from the training set to the test set resulted in an R2 of .749, barely any loss.
Given the model’s ability to predict, the linear regression model was re-run on the entirety of the dataset after re-estimating the Box-Cost transformations, transformations which were only slight different in magnitude than those produced by the training set. The results again produce R2=0.749. The actual versus predicted plot is shown in Figure 20.
<Insert Figure 20>
Further, we evaluated the coefficients and directions of those coefficients for forecasting the transformed dependent variable based on the number of variables included in the model. The top 1- variables in the regression model with their associated R2 are shown in Table 9. Discharges, Medicare percentage, and hospital type are the primary variables of interest.
<Insert Table 9 About Here>
Outside of simple linear regression, we explored constrained regression techniques (lasso, ridge, and Elastic Net). Lasso regression was inferior to linear regression in terms of R2 (.651 vs. .750) on the test set. Ridge regression and Elastic Net were also unable to beat linear regression in terms of effects with both R2 nearly identical to that of lasso, 0.651 and 0.681, respectively. In this case, the linear regression model was not overfit.
3.6.3. Tree Ensemble Models
Several tree regressor models were built and compared on an 80% training set. There was no need to use the transformed data for tree models, as these are location / scale invariant. These tree models included a bagging regressor (BR), a random forest regressor (RFR), an extra trees regressor (ETR), a gradient boosted regressor (GBR), and an extreme-gradient boosted model (XGR). Tree models are atheoretic, as each tree developed may be different from the previous one. When ensembled, variable importances emerge that determine which items are most important to determining how to classify or regress in a nonlinear fashion (piecewise). The number of trees used for each estimator was tuned along with the maximum depth of the trees (number of branches). A pseudo-random number ensured that any model improvements were not due to the random number stream. The results of these models on the unseen test set are shown in Table 6. Most importantly, all of these models account for more variance than regression models. The models predict at 97.1% and above in terms of variability capture. (See Table 10.)
<Insert Table 10 About Here>
Because of the tight congruence of these models, we ensembled the estimates of the number of DRGs forecast by each to produce importance statistics. The variables of most importance includes discharges, Medicare percentage, acute days, affiliated physicians, staffed beds, employees psychiatric hospital status, ER visits, medical school affiliation status, Puerto Rico status and surgeries. See Table 11.
<Insert Table 11 About Here>
When comparing the regression models with the ensembled forests, we see that the first two terms are congruent (discharges and Medicare percent). Interestingly, no financial models are in the top 10 effect sizes of the regression or tree models. Facility technical and workload variables are the most important determinants of heart failure. In the tree models, there were piecewise linear effects identified for states that were not seen in the regression models.