The usual procedure of the medical record review (our “gold standard”)
Since 2008, a team of trained (according to the EMGO/NIVEL standards18) nurses screened the medical records of all deceased inpatients (approximately 700 annually) for the presence of (one of the fifteen) triggers (figure 1). Triggers are clues which alert screeners for potential AEs (for example “unplanned transfer to the intensive care unit’’). To accommodate the process, a database facilitating the necessary steps in this procedure was introduced in 2010 (Medirede®, Clinical File Search; Mediround BV). We used the triggers originally proposed by the Harvard medical practice study (HMPS) in 199119 with a slight adjustment to fit the group of deceased patients. Therefore, the triggers regarding transfer to another acute care hospital and unplanned inappropriate discharge to home were omitted as they have no relevance in deceased patients.
The medical records with at least one trigger were redirected to the review committee. This committee consisted of both active and retired medical specialists with considerable clinical experience in the field of quality and safety in healthcare and medical record review. After a thorough review by a member specialised in the field of medicine related to the main diagnosis of the case (e.g. a surgeon investigates surgical patients etc.), this case was presented to the other members of the review committee in a regular meeting. A first conclusion on the potential presence of an AE was then established. Subsequently, after consulting the involved specialists, the committee finally decided on the presence of an AE and the potential preventability of this AE. Since 2012 there is a stable formation of the review committee.
Previous research showed that the average time nurses needed for the manual screening of triggers was 38 minutes (they had no time restrictions), for the reviewers this was on average 60 minutes (excluding the time needed for the discussion in a meeting). Thus, it takes approximately 1.5 hours for the total review of a single medical record.
Figure 1: Presentation of the regular medical record review procedure
Data
The total dataset (originated from our “gold standard” procedure) consisted of 2987 medical records of patients who died during their hospitalization. All records between 2011 and 2016 were included. Of these records 59% contained one or more triggers after the screening by the nurses. In 742 of these medical records (42%) with a trigger, one or more AEs were detected by the review committee. 208 of these AEs were classified as potentially preventable. For these records, there was full access to surgical reports, discharge letters, patient records, nursing reports, use of medication, radiology reports, lab results and the medical history.
Definitions
An AE was defined as an unintended outcome arising from the (non)-action of a caregiver and/or the health care system with damage to the patient resulting in temporary or permanent disability or death of the patient.20
The patient file was defined as the document including all reports, letters, lab results, scans, reports and medical history.
Modified data
From the basic dataset several parts of data were selected. Machine learning is based on NLP. This is the ability of a computer program to understand human language as it is spoken. It is a component of artificial intelligence. In machine learning, it is important to make a selection out of data with a high signal-to-noise ratio. Preferably you would like to find the “signal” in the data, rather than fitting the noise. The signal was in this case the useful information in the medical record that is pointing towards the AE and the noise is the information in the medical record that is not helpful in locating/finding the AE. To gain insight into the signal to noise ratio of the various resources, several subsets of our basic dataset were tested (A-F).
A: Last general letter to the general practitioner (GP)
This letter describes the last communication from the hospital to the GP of the patient.
In this selection, all records without a last general GP letter were excluded. There were 476 medical records without this letter, leaving 2511 (84%) for inclusion in the analysis. We have chosen the last GP letter because we assumed this contained the most useful information regarding the hospitalization of the patient.
B: Last letter
In this selection, the last general GP letter was used, but for the 476 cases in which this letter was missing, the last written document was used instead. Therefore, in this analysis, the original 2987 records were included.
C: Last three letters
In this analysis, the last three letters of all medical records were included.
D: Patient file
For this selection, the full patient file was used.
E: General GP letter combined with patient file
In this selection, the general GP letter was combined with the patient file, for every record with a general GP letter.
F: Last (general GP) letter combined with the edited patient file
For this dataset, the patient record was electronically edited, leaving only 20% of the rarest words in the patient record. After the editing, the patient record was combined with dataset B. This was executed by a preprocessing script. First the whole text was evaluated and then the rare words were filtered out. After the editing, the patient record was combined with the last letter (dataset B).
Outcome measures
The following outcome measures were determined;
- Accuracy (agreement): calculated as the sum of true positives and true negatives divided by the total population.
- Precision: calculated as the sum of true positives divided by the number of predicted condition positives (in this case AE present)
- Recall: calculated as the number of true positives divided by the total number of medical records which were identified as containing an AE.
- Negative predictive value (NPV): calculated as the number of true negatives divided by the number of predicted condition negative (in this case no AE present).
- Specificity: calculated as the number of true negatives divided by the total number of medical records which were identified as not containing an AE.
Computer NLP algorithms
We tested different computer algorithms and explored the feasibility of this software (Open Mines Platform supplied by “the Praktijk Index’’).
The following NLP algorithms have been used:
- Naive Bayes (NB) with n-gram input and term frequency–inverse document frequency (tf-idf) scores;
- Fast-text (FT) 2-layer neural network with hierarchical softmax output
- Linear Support Vector Machine (SVM)
- Convolutional neural network (CNN) based on pre-trained word vectors21,22
Experiments
As a first step, the 6 datasets (described in section datasets) were provided to the NB algorithm to select the dataset that provided the highest performance in predicting an AE. This selected dataset was then used for the next experiments. To correct for the variation of the initialization, this experiment was repeated 28 times. Due to time and computing power restrictions, we chose to test the fast NB algorithm for all selections. In the second experiment (scalability) was tested whether the performance would decrease if a smaller training set was available. In the second experiment, we attempted to predict the preventability of an AE. AEs were therefore categorized as ‘’probably not preventable’’ or ‘’potentially preventable’’. The results of these test were compared with the outcome of the gold standard (committee review)