The current study found no evidence of significant differences in the overall CCI, frequency of individual comorbidities, or performance in predicting one-year mortality among hospitalized patients between the newly adapted SNOMED CT and Quan coding algorithms. In contrast, prior research has shown large discrepancies in patient identification and measurement of the CCI between the OHDSI and the Quan adaptations of the CCI.7,8 The improved consistency in patient identification between algorithms was achieved by adapting the SNOMED CT coding algorithm directly from the Quan adaptation of the CCI. Furthermore, all discrepant codes between coding algorithms were carefully vetted by clinical subject matter experts considering the cause and potential impact of each respective discrepant code on patient identification.
While the origins of the ICD system stem from epidemiology, the roots of SNOMED CT may be traced to bioinformatics. Consequently, fundamental differences exist in the constructs of these terminologies. Whereas the ICD system is a taxonomy, SNOMED CT is an ontology and, in contrast to the ICD system, polyhierarchical. For instance, pregnancy related renal disease is classified under pregnancy, childbirth and the puerperium in the ICD system but is associated to kidney disease, disorders of pregnancy, and complications of pregnancy, childbirth and/or puerperium in SNOMED CT. Due to its polyhierarchical nature, SNOMED CT facilitates the aggregation of related concepts in the development of code-based algorithms. As it relates to the translation of coding algorithms from ICD to SNOMED CT, the difference in constructs poses both unique challenges and opportunities.
A total of 5,343 ICD-9/10-CM codes mapped to either coding algorithm among which 695 (13.0%) were inconsistent between algorithms. The primary source of discrepant codes was the mapping of multiple ICD codes to a single SNOMED CT code (n = 560), which was especially prevalent among the code sets for rheumatic disease (n = 130) and diabetes with chronic complications (n = 211). These discrepant codes were in part due to the presence of diagnosis codes of unspecified or not otherwise specified (e.g., unspecified nephritic syndrome) in ICD that are typically represented as higher-level terms within SNOMED CT (e.g., nephritic syndrome). In 24.6% (n = 138) of cases, this was associated with the additional capture of clinically relevant diagnosis codes by the SNOMED CT coding algorithm leading to information gain. Although the additional capture of these diagnosis codes represents a technical departure from the Quan adaptation, the difference may be due in part to the advantages of the SNOMED CT construct or differences in clinical opinion. Other sources of discrepant codes included a lack of mapping of deprecated ICD codes to SNOMED CT (n = 123), and lack of specificity in the mapping between ICD and SNOMED CT codes (n = 12).
Nevertheless, no significant differences in the overall CCI were observed between the SNOMED CT vs. Quan coding algorithms among inpatient visits occurring in either 2013 (MDCD: 3.75 vs. 3.6; and DOD: 3.63 vs. 3.51) or 2018 (MDCD: 4.04 vs. 3.91; and DOD: 4.55 vs. 4.43). Despite a slight increase in patient identification by the SNOMED CT code sets for dementia, renal disease, rheumatic disease and diabetes with chronic complications, no significant difference in the frequency of comorbidities comprising the CCI was observed as indicated by a SMD less than 0.1. These findings reflect the low prevalence of patient records associated with discrepant codes.
In contrast, the currently implemented OHDSI adaptation has been associated with a higher average CCI as compared to the Quan adaptation by both Fortin et al. and Viernes et al.7,8 Specifically, Fortin et al. found several comorbid conditions identified in over 5% of the study population by either only the OHDSI coding algorithm (chronic pulmonary disease, diabetes with chronic complications, renal disease, and malignancy) or only the Quan adaptation (peripheral vascular disease, chronic pulmonary disease, and mild liver disease).7 Viernes et al. hypothesized the higher average CCI was associated the mapping of the OHDSI SNOMED CT coding algorithm to additional ICD codes although the current study indicates the impact of discrepant codes on the CCI is also a function of the prevalence of each respective discrepant code observed in the study population.8
The performance between coding algorithms in predicting one-year mortality among hospitalized patients was comparable. However, as indicated by the c-statistic, the predictive performance of the CCI fluctuated between data sources and by calendar year. It follows the impact of data source, vocabulary, and time-dependent effects on the performance of the CCI were consistent between the newly adapted SNOMED CT and Quan coding algorithms in the current study.
Limitations
The current study was subject to limitations. First, MDCD and DOD do not contain complete capture of patient deaths. As such, the predictive performance of models may have been underestimated. However, the degree of underestimation was expected to be consistent between models thereby preserving the validity of comparisons of performance between models. Second, the SNOMED CT coding algorithm was validated in two large U.S. administrative claims databases. Estimates of predictive performance of the CCI may not be generalizable to other healthcare databases. Nevertheless, in practice, the CCI is most frequently used as a measure of disease burden as opposed to a predictor of one-year mortality. Third, new releases to SNOMED CT are published every 6 months, and, consequently, additional differences between coding algorithms for the CCI may surface over time. Although the newly proposed SNOMED CT coding algorithm represents a significant advancement in terms of transparency and reproducibility, periodic validation and update of the coding algorithm using the methods outlined in this paper may be warranted.