The aim of this study was to determine how clusters or subgroups of insulin-treated people with diabetes, based upon healthcare resource utilization, select social demographic/clinical characteristics, and diabetes management parameters, are related to health outcomes including acute care (ER and hospital inpatient) visits and total inpatient days.
We did this study to help identify groups of patients that may be amenable to emerging diabetes management technologies. In this study, we identified seven clusters of insulin-treated people with diabetes, which have different patterns of healthcare utilization and diagnosed comorbidities in a large healthcare claims database. The most important factors in defining the clusters were the number of endocrinology visits, total inpatient days, concordant comorbidities, number of ER visits, comorbidity burden as measured by CCI and DCSI scores, and percentage of diabetes-related medical claims. Multivariable modeling showed that these clusters are significantly associated with ER visits, inpatient hospitalizations, and total inpatient days, suggesting that this approach may help identify patients at greater need for targeted disease management efforts at the population level. The clusters also offer providers clinically relevant information regarding treatment decisions for a patient population with diabetes.
Cluster analyses can reveal how variables, in our case administrative claims from people with diabetes, are related in complex datasets. The use of cluster analyses in healthcare decision making is still relatively uncommon but appears to be gaining acceptance [17, 18, 19, 20, 21]. Our work builds upon a few previously published cluster analyses in diabetes, which focused on readiness of CGM and other diabetesrelated devices, self-management patterns in a pediatric population, and factors influencing people with diabetes who have poorly controlled conditions [22, 23, 24, 25, 26]. These previous studies involved smaller numbers of participants from relatively homogeneous populations (e.g., T1DM registry) and/or more controlled conditions (e.g., clinical trial). In contrast, our study used a large healthcare claims database to evaluate whether routinely available data could identify relevant subgroups of insulin users.
Not only did clusters differ with respect to the specific variables used to form them (by design), but also on important other characteristics. Clusters 2 and 6 were formed primarily based on the use of endocrinologists. Not surprisingly, these were the clusters with the highest proportion of people with T1DM as well as utilizers of diabetes technology (pump, BGM, or CGM). Those in Cluster 2, however, had higher comorbidity burden and mean number of HbA1c tests than those in Cluster 6, but there was little difference in mean HbA1c values for these two clusters.
Two clusters were identified with high levels of acute care utilization. Those in Cluster 4 had the highest total inpatient days and everyone in Cluster 5 had an ER visit. These clusters differed, however, in their comorbidity burdens and glycemic control. Interestingly, the lowest mean observed HbA1c value was for Cluster 4, with the highest levels of overall medical utilization (median of 43.0 claims), acute care utilization (99.1% had an inpatient hospitalization), and highest CCI and DCSI scores. These results could suggest that a high burden of comorbidities or diabetes complications and increased interactions with hospitals facilitated more intensive diabetes management. However, because HbA1c values were only available on a subset of the study population (approximately 30%), additional analyses on datasets with more complete HbA1c data are needed to confirm this finding.
Conversely, higher mean HbA1c’s were observed among Clusters 3, 5, and 7 (in order of highest to lowest values).
Clusters 3 and 7 differ from Cluster 5 in that they fell into the low utilization grouping (both acute care and overall utilization via number of medical claims) and had among the lowest CCI and DCSI scores.
They differed from each other in one key aspect: the proportion of medical claims that were diabetes-related. Approximately three-fourths of the claims for Cluster 3 were related to diabetes, compared to less than 20% in Cluster 7. Because the CCI and DCSI scores are derived from the presence of diagnosis codes in claims data, on one hand it is not surprising for these clusters who have the lowest overall number of medical claims to have the lowest scores due to fewer opportunities to derive those diagnoses. But, on the other hand, the lack of diagnoses of comorbidities in the observed claims or lack of encounters altogether could suggest a healthier underlying population. Either way, the relatively high observed HbA1c values along with the low rates of interactions with healthcare providers suggested suboptimal diabetes self-management.
The current study demonstrated that even after adjusting for other covariates, cluster assignment was significantly predictive of future outcomes. Specifically, cluster assignment was associated with the likelihood of experiencing an ER or hospital inpatient visit and the total number of inpatient days for those with an admission. These results suggest that the specific combination of variables used in the cluster formations shed additional light onto the risk of untoward outcomes above and beyond traditional risk stratification, for example, based upon parameters including diabetes type, age, and HbA1c.
Furthermore, as these clusters were derived from variables routinely found in healthcare claims data where detailed clinical data are often missing, this approach could aid healthcare payers with population management efforts. We found some clusters utilizing less healthcare resources had higher observed mean HbA1c levels. This finding could suggest population management efforts in diabetes that are targeted at some of the lower healthcare utilizers in efforts to improve glycemic control, which could yield better long-term health outcomes for patients and improved quality metric ratings for providers and payers.
This study has limitations that should be considered. The cluster analysis was based on realworld claims data that are proxies for clinical outcomes. Therefore, there may be data coding errors and errors in patient records because of the standardized coding systems used for the identification of medical conditions, procedures, and medications. Additionally, as of October 2015, all claims switched from ICD9 to ICD-10 (study period January 2015 to June 2018) [27]. This switch possibly could have led to inaccuracies in coding due to unfamiliarity with the new system or mistakes in cross walking codes from ICD-9 to ICD-10 by providers. This potential for error should have had limited impact since condition identification was based on families of codes with multiple codes within a family and not on any single code. The comparisons of HbA1c values were incomplete as only a subset of patients (~ 30%) had values in the database. A number of relevant risk factors, including insulin dosing, diet, and exercise, were not available in the database.
Despite the limitations, this study was based on a well-developed study design and included data from a large number of insulin-using people with either T1DM or T2DM. Evaluating the impact of patientreported outcomes and more socioeconomic data on cluster formations would be of interest to study in the future.