Information technology tools can automate imaging measures for Emergency Department patients with suspected pulmonary embolism

CT pulmonary angiography (CTPA) utilization rates for patients with suspected pulmonary embolism (PE) in the Emergency Department (ED) have increased steadily with associated radiation exposure, costs and overdiagnosis. A new measure is needed to more precisely assess efficiency of CTPA utilization normalized to numbers of patients presenting with suspected PE, based on patient signs and symptoms. This study used natural language processing (NLP) to develop, automate, and validate SPE (“Suspected Pulmonary Embolism PE”), a measure determining CTPA utilization in ED patients with suspected PE. This retrospective study was conducted 4/1/2013-3/31/2014 in a Level-1 ED. A NLP engine processed “Chief Complaint” sections of ED documentation, identifying patients with PE-suggestive symptoms based on four Concept Unique Identifiers (CUIs: shortness of breath, chest pain, pleuritic chest pain, anterior pleuritic chest pain). SPE was defined as proportion of ED visits for patients with potential PE undergoing CTPA. Manual reviews determined specificity, sensitivity and negative predictive value (NPV). Among 5,768 ED visits with 1+SPE CUI, and 795 CTPAs performed, SPE=13.8% (795/5,768). NLP identified patients with relevant CUIs with specificity=0.94 95%CI (0.89-0.96); sensitivity=0.73 95%CI (0.45-0.92); NPV=0.98. Using NLP on ED documentation can identify patients with suspected PE to computate a more clinically-relevant CTPA measure. This measure might then be used in an audit-and-feedback process to increase the appropriateness of imaging of patients with suspected PE in the ED.


Introduction
There is significant interest in the appropriateness of CT pulmonary angiography (CTPA) for patients with suspected acute pulmonary embolism (PE) in the Emergency Department (ED) [1][2][3]. CTPA utilization rates are variable but, overall, have increased steadily [4,5]. This rise in ED CTPA use results in increased radiation exposure [6], cost, and overdiagnosis-the latter of which can result in more downstream imaging and, potentially, inappropriate treatment (with its associated potential complications) [7].
Evidence-based recommendations exist to guide clinicians in the diagnostic workup of patients with suspected PE; the combination of risk-stratification using a validated tool (e.g., the Wells criteria [5,8]), supplemented by D-dimer measurement [9] has been used for over 15 years [10] and adopted by a number of professional societies [11,12]. However current measures of CTPA utilization or adherence to Wells criteria do not accurately capture providers' adherence to evidence, as patients who are appropriately not imaged are not well represented in existing measures. For example, appropriateness is often determined by using a denominator of patients who underwent CTPA and does not include patients not imaged (who may have been excluded from imaging using the Pulmonary Embolism Rule-Out Criteria [8] or clinical gestalt). Similarly, using overall CTPA use per ED visit does not limit the measure denominator to only patients with suspected PE, so comparisons between EDs with different prevalence of disease are not meaningful.
Thus, there is a need for a new measure that more precisely assesses the efficiency of CTPA utilization normalized to the number of patients with suspected PE who present to the ED, based on patients' signs and symptoms at presentation. Therefore, the purpose of our study was to develop, automate, and validate a new tool-using unstructured data from clinical notes -to define a cohort of patients with suspected PE, which can then be used to develop an imaging quality measure, Suspected Pulmonary Embolism (SPE).

Study Setting and Compliance with Ethical Standards
This HIPAA-compliant-retrospective cohort study was conducted between April 1, 2013 and March 31, 2014 in the ED of an urban Level-I adult trauma center with ~60,000 visits annually. It was approved by the Institutional Review Board (Protocol Number: 2013P000267).

Data Sources
Data sources included the ED information system, the radiology information system (RIS), and the computerized physician order entry (CPOE) system. For each ED visit, we obtained the text of the ED attending notes well as the text of the "Chief Complaint" field. From diagnostic imaging exam information, we extracted the imaging accession number, medical record number (MRN), and the final report text. All data fields collected were transferred to several tables in a Microsoft SQL server environment (Microsoft, Redmond, WA).

Defining a PE-suspect Cohort
To construct the denominator for the imaging measure, we sought to quantify the cohort of ED patients with signs and symptoms at presentation suggestive of PE, using standard, ontologies-based, natural language processing (NLP) tools. After consulting a multi-disciplinary group of clinical, informatics, and imaging experts, we based our algorithm on four of the most common signs and symptoms of PE as represented by Concept Unique Identifiers (CUIs) extracted from the ED note "Chief Complaint" field: shortness of breath (C0013404), chest pain (C0008031), pleuritic pain (C0008033, C0423632) and anterior pleuritic pain (C3532941).

NLP Engine and Customizations
cTakes version 3.0.1 [13] [including YTEX [14,15]] was customized with RadLex [16] and the latest releases of the SNOMED-CT vocabulary files using the NCI-supported Knowledge Representation languages RDF and process definitions from MetamorhoSys' sub-setting utility [17]. The extraction of the CUIs was done using a SQL query with multiple joins for the unique batch name of the job, resulting in a table, each line of which contained the CUI and the ID of the input "Chief Complaint" snippet of text. We also included polarity in the extraction query; a polarity of -1 corresponded to a negation of the named entity [13]. In addition, we customized cTakes to correctly flag CUIs based on a custom list of common medical abbreviations (for example, 'SOB' stands for shortness-of-breath, etc.)

NLP Validation Process
To assess the accuracy of the NLP-based PE cohort discovery process, we conducted a manual validation, in which the results of a human-expert classification were compared to those extracted by the NLP algorithm. A physician research assistant was instructed and trained by an attending emergency physician to perform manual chart review classification while blinded to the results of the NLP-based classification. A validation sample size of 245 (5% of) cases was reviewed, and 10% of these were overread by the attending emergency physician.

Outcome Measures
As the primary outcome for this study, we computed the value for the new imaging measure, SPE. Our secondary outcomes were the test characteristics (sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) of the NLP algorithm.

Statistical Analyses
All analyses and visualizations were carried out in the R statistical programming environment [18], version 3.0.2. We used Pearson r and t-test statistics to quantify correlation and similarity of distributions between monthly series of the specific measures. P-values<0.05 were considered statistically significant. The agreement between PE cohort discovery using the NLP algorithm and manual chart review was compared using sensitivity and specificity (and corresponding 95% Confidence Intervals [CIs]), as well as NPV and PPV [19].

Study Cohort
A total of 55,781 ED visits were recorded during the study. The mean patient age was 49±19.9 years, and 60.0% (33,467/55,781) were women. Of all ED patient visits, 38.6% (21,521/55,781) had diagnostic imaging. There were 1,159 CTPA imaging exams performed in ED during the study period.   Figure 1 displays the results of a histogram analysis of the "Chief Complaint" field content across the mapping CTPA <-> ED visit. Notably, the large third bar corresponds to an empty "Chief Complaint"

Distribution of the ED "Chief Complaint" Note Text
field. In addition, the shape of the distribution has a long tail corresponding to symptoms non-specific for PE, i.e., "fever" or "weakness".

Limitations
Our study has a number of limitations. First and foremost, our algorithm is dependent on the quality of the data in the electronic health record, notably the presence and completeness of the "Chief Complaint" field in the ED record. For straightforward data gathering, we chose to base the algorithm on a single, pre-parsed text field from the ED notes ("Chief Complaint"), even though additional signs or symptoms might have been present in the free text of the "History of Present Illness" section. In addition, it was conducted in a single academic healthcare center, potentially limiting generalizability.

Discussion
We have introduced and computed a new measure of utilization of CTPA imaging in patients with suspected PE in the ED. In contrast with existing imaging utilization metrics, SPE is normalized to the number of patients in whom PE is suspected, a patient cohort whose identification is based on patients' signs and symptoms at presentation. We have automated the calculation of this metric by casting it as a NLP task on unstructured clinical narratives and structured EHR documentation, and then defining the cohort of PE-suspect patients using 4 common CUIs. Calculation of the new measure, SPE, resulted in 13.8% of patients presenting with symptoms of PE who obtained CTPA.
Current imaging quality measures fail to capture the appropriate patient populations.
Appropriateness-based measures require resource-intensive calculation of pretest probability and ddimer measurement, but still exclude patients in whom these data are not available, or who were excluded prior to the determination of these values (e.g., by using PERC.) Conversely, global utilization measures compute the number of CTPAs performed compared to overall ED visit volume, a method that cannot take into account local prevalence of PE.
Our validation of the algorithm for detecting patients with suspected PE had a sensitivity of 73% when compared to manual chart review. This is not surprising, given the other illnesses that can present with a "Chief Complaint" of chest pain or shortness of breath. However, the specificity of 94% and the NPV of 98% are reassuring, in that we likely excluded the vast majority of patients in whom PE would not have been suspected by the treating physician.
In order to determine whether the four CUIs we selected to model SPE patients were an adequate definition for the cohort, we reviewed the most common indications recorded in the "Chief Complaint" field ( Figure 1), and found that the terms included in the 4-CUI model are the most common indications. Apart from the case of empty "Chief Complaint" fields (the relatively large 3rd bar in the figure), the rest of the indications are much less common, as well as much less clinically relevant, to PE. Additional studies with longer time spans and at additional institutions might be needed to elucidate this point further.

Implications
Our findings have the potential to improve the quality of care delivery by more accurately measuring the appropriateness of CTPA use for ED patients with suspected PE. Current measures typically only include patients who have undergone CTPA, missing completely those patients who are not imaged by physicians based on clinical criteria. Thus, physicians are unable to accurately determine whether they are appropriately evaluating patients with PE when compared with their peers, limiting the utility of auit-and-feedback reporting meant to improve the appropriateness of imaging.
It would be ideal to verify our findings across different institutions in both community and academic healthcare delivery settings to determine generalizability prior to widespread adoption of this new imaging metric. However, given the potential utility of this model in this imaging modality and indication, performing computation of imaging utilization metrics using appropriate patient cohorts using advanced but existing NLP public tools and ontologies is likely possible for other imaging scenarios as well.
For example, head CT imaging use in ED patients with mild traumatic brain injury (MTBI) has been shown be disproportionally variable [20]. At the same time, mature guidelines for use of imaging in MTBI exist (e.g., the Canadian CT Head Rule) [21]. Determining the rate of head imaging for patients with suspected MTBI, using appropriate CUIs, would be much more appropriate than the broad utilization metrics currently being considered [22]. Similarly, magnetic resonance imaging use in adult primary care patients with low back pain [23]-for which guidelines [24] and point-of-care clinical decision support implementations [23] both exist-might be an appropriate target as well.

Conclusions
Use of NLP of physician notes in the ED can help identify patients with suspected PE via flagging specific CUIs in the Chief Complaint field. This should allow for computation of a more clinicallyrelevant measure of imaging use efficiency of CTPA.

Declarations
Conflict of Interest: The authors declare that they have no conflict of interest. Tables   Table 1. Results of cohort discovery validation comparing results achieved from natural language processing (cTakes) vs. the manual classification ("chart review") of the "Chief Complaint" field of 245 randomly selected entries from the Emergency Department visits data set.  Figure 1 Frequency distribution of the top-15 most common "Chief Complaint" text snippets for all CT pulmonary angiography exams performed in the Emergency Department (ED) during the study period. Of note is the large third bar corresponding to empty "Chief Complaint" field, and the long tail of the distribution corresponding to other Concept Unique Identifiers.