Study Population
In this retrospective cohort study, we randomly selected patients who were at least 18 years of age and discharged from three acute care facilities in Calgary, Canada, between January 1 and June 30, 2015. Obstetric admissions were excluded because they have a short length of stay and lack conditions of interest. We randomly selected one hospitalization per patient if multiple discharges occurred during the study period [15]. Six nurses reviewed charts to determine the existence of CeVD [15].
Data sources
EMR: Sunrise Clinical Manager (SCM)
The EMR data are from SCM, a city-wide, population level EMR system used in the three acute care hospitals in Calgary. SCM provides patient-level clinical information containing medical and nursing orders, medication records, clinical documentation, diagnostic imaging and lab results [16].
Administrative Discharge Abstract Database: DAD
The inpatients’ administrative, clinical, and demographic information at the time of discharge is coded in the DAD [17]. The clinical coder records up to 25 diagnostics codes for each inpatient based on available information from patient charts. The DAD, EMR data and chart data were linked with Personal Health Number (a unique lifetime identifier), chart number (a distinctive number associated with a patient’s admission), and admission date.
Phenotyping Algorithm Framework
We trained, validated, and tested an EMR data-driven phenotyping algorithm using NLP techniques to detect CeVD. NLP techniques are used to process and analyze human language, and contain a wide range of tasks, including named entity recognition (NER), information extraction, and text classification [11, 16]. They were applied to analyze the free text clinical notes and derive a CeVD phenotype to detect the disease automatically. As depicted in Figure 1, the general framework consists of 1) input document selection from patients’ clinical notes, 2) model training, and 3) performance evaluation using chart review as a reference standard.
Document selection and feature engineering
Many types of clinical notes could be generated during the hospitalization of patients involved in this study, such as nursing transfer reports, inpatient consultations, discharge summaries, and surgical assessment and history. However, not all document types contribute equally to the detection of CeVD. Noise and redundant information can hamper the detection performance of ML models [19]. The first step is determining and selecting the appropriate document type(s) sensitive to CeVD identification.
The method we used is a feedforward sequential selection method [20], to iteratively add the document type that contributes most to model performance, until the performance stops increasing or reaches a predefined criterion, as shown in Figure 2. All the documents are first converted into vectors by 1) extracting relevant medical concepts from the text and 2) turning concepts into numeric features. To examine the extraction performance, we compared two types of commonly used concept extraction methods: Bag of Words (BOW) using ScispaCy [21] and Concept Unique Identifiers (CUIs) from the Unified Medical Language System using cTAKES (see Appendix 1 for a detailed explanation) [22]. We also compared two types of feature construction methods: Term Frequency-Inverse Document Frequency (TF-IDF) and word count. The obtained vectors are fed into the ML models and validated by the model performance. To estimate better generalization of the selected document types, 5-fold cross validation was applied to the selected patients (i.e., 80% training, n= 2429 and 20% test, n=607). The model development is detailed in the following section.
Model development
The model outcome is a binary classification where hospitalized patients with CeVD are considered positive cases. Two supervised ML methods were trained, validated, and tested using the obtained input vectors and chart review output labels, including random forest (RF) and XGBoost [19, 20]. The two methods are known for handling datasets with high dimensionality, missing data and outliers, and providing accurate and reliable predictions, especially for NLP tasks containing thousands of concept features [21, 22].
With the different combinations among methods of concept extraction, vectorization, and ML models, we have 8 model variations, such as “BOW + TF-IDF + RF” and “CUI + TF-IDF + XGBoost.” As both methods, RF and XGBoost, use decision trees as the base models, we assigned 100 decision trees to them, respectively. These models’ performance was then estimated by 5-fold cross validation, maintaining the same proportion of positive and negative patients in each group.
Performance metrics
To evaluate and compare the models developed, we calculated their sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score using chart data as a reference standard. We also calculated binomial proportional confidence intervals for all the metrics.
We compare the results with ICD-based CeVD identification algorithms in DAD after defining CeVD using ICD-10 codes (e.g., G45-46, I60-69, H34, see Table S1 in Appendix) [3]. The performance metrics of the developed NLP models were reported on the same level of specificity as with the ICD-based algorithm.