In this single institution cohort of CAYA survivors, clinical informatics tools based on discrete data elements from the EHR were leveraged to construct clinically computable phenotypes and evaluated the prevalence of CVRFs prior to diagnosis and during treatment. This represents a feasible approach to identify CVRFs on a population health level for at risk survivors. No significant associations were observed between CVRFs and CVD in the early survivorship period for this cohort, yet this analysis and methods inform efforts to harness real world data to drive improvement in survivorship-focused care. Furthermore, the presented analyses identified survivors at high risk for late effects in general and those with OKHCA coverage were at increased risk of CVD in the early survivorship period. Claims data augmented the detection of cardiac events among survivors with OKHCA coverage and the analysis from this subcohort suggested that those from rural areas were at increased risk of CVD even after adjustment for late effects risk strata. Rural-urban differences, particularly inequities in cardiovascular health, in the general population underscores the need to prevent CVD, particularly for CAYA survivors at risk for late morbidity and mortality.
The disproportionate burden of CVRFs and CVD in rural areas in the United States is well documented. In 2020, the American Heart Association released a call to action to reduce longstanding inequities in CVD among rural populations with a focus on individual factors, social determinants of health, and health delivery systems. Indeed, adults in rural areas demonstrate a higher risk of mortality from heart failure, at an individual and a community level, from population-based studies throughout the United States.16,32–34 The evidence of recent progress on closing the rural-urban gap is mixed, and the persistence of these geographic disparities in the general population should inform healthcare delivery interventions for survivors.17,35 Stratification by OKHCA coverage, particularly with the highlighted differences in race/ethnicity, rurality, hypertension, and obesity, further controls for these potential confounders and helps characterize this vulnerable population. The observed increased risk of CVD among survivors of CAYA cancer from rural areas in Oklahoma with public insurance suggests that, even as soon as one to five years after the initial diagnosis, there is an opportunity to intervene and mitigate risk.
Data science and the development of clinical informatics tools have the potential to catalyze improvements in health services research, guide population health management, and drive systems-level changes to promote equity for all survivors of CAYA. The presented methodology derived from data standards, such as RxNorm’s RxCUI codes for antihypertensive medications, and the novel creation of clinically computable phenotypes support the feasibility of such tools to characterize modifiable risk factors among survivors at a population health level.29 The analyses of the Oklahoma cohort failed to identify significant associations between CVRFs and CVD, which may reflect limitations in this cohort or perhaps suggest further refinement of clinically meaningful phenotypes to predict CVD are needed. Nevertheless, data standards are foundational to ensure the interoperability of key information between health systems, both from a research and clinical operations viewpoint.36 Moreover, the Childhood Cancer Data Initiative (CCDI) seeks to address the fragmented data ecosystem and has made progress toward an infrastructure to facilitate data sharing to learn from every child, adolescent, and young adult with cancer.37 More than a decade after the Health Information Technology for Economic and Clinical Health (HITECH) Act, lessons across the healthcare field in various specialties and domains offer insights to adapt evidence-based technologies for oncology and survivorship-focused care.38,39
The observations and analyses from the CAYA survivor cohort require contextualization for potential limitations. First, this cohort represented a single institution. While the majority of children in the state are treated at Oklahoma Children’s Hospital, young adults may have received treatment at community-based oncology centers and there is one other site in Oklahoma that cares for children with cancer. Therefore, the data may not be representative of the state of Oklahoma or generalizable to the national population. Data linkage with claims data uncovered rural-urban differences in CVD, which likely reflects detection bias from institutional data as the absence of diagnosis records does not necessarily mean the absence of disease.40 Alternatively, the observed differences may only exist in the Medicaid population. Under-detection of CVRFs, such as dyslipidemia or diabetes, is also possible if they are not routinely assessed or documented from EHR-based data. The lack of robust historical data prior to 2009 and moderate cohort size may have contributed to insufficient power to detect potential associations between CVRFs and CVD. Additionally, in this cohort, acute cardiotoxicity was observed and events within a year of diagnosis were excluded from analysis, as assessment of baseline CVRFs prior to diagnosis was likely incomplete and would have muddled the temporal relationship.
The long latency period for heart failure, specifically, poses a significant challenge to capture enough events to facilitate real world evidence for the association between baseline CVRFs and subsequent CVD in the early survivorship period.41–43 One approach to circumvent this long latency period is to identify early markers of cardiac dysfunction, such as echocardiogram parameters and cardiac biomarkers, which are useful predictors of subsequent CVD risk.44–46 Previously developed and validated NLP algorithms, such as EchoExtractor, serve as an example for open source informatics to automatically extract echocardiogram parameters.47 Left ventricular ejection fraction (LVEF) was the most commonly extracted echocardiogram measurement and the system has subsequently provided key data for population health studies on cardiac function, including the scalability of this system at multiple hospital sites.48–50 The sole reliance on ICD-9/10 coding, while based on methods from large multi-institutional cohorts, may also lead to misclassification of cardiac events, which could be amenable to more precise measurements from echocardiograms.13 Even with the implementation of such tools, underdetection bias may still persist if echocardiogram reports are unavailable. Adolescent survivors in Oklahoma were previously identified as approximately five times more likely to receive suboptimal guideline-adherent echocardiogram surveillance.51
In conclusion, clinical informatics tools to integrate data from various sources for cohort construction and apply data standards to characterize CVRFs highlight opportunities to leverage data to improve survivorship-focused care for CAYAs impacted by cancer. Survivors from rural areas may be at increased risk for CVD, even in the early survivorship period. Modifiable CVRFs at baseline and during treatment merit additional investigation to determine their impact on later CVD for survivors. This study provides a framework to adapt clinical informatics-based approaches for CAYA survivors to promote interoperability based on data standards, facilitate interinstitutional collaborations to detect relevant predispositions to CVD, and, ultimately, improve care for equitable outcomes among all survivors.