VAP Risk Index: Early prediction and hospital phenotyping of ventilator-associated pneumonia using machine learning

doi:10.21203/rs.3.rs-1765229/v1

Background: Ventilator-associated pneumonia (VAP) is a leading cause of morbidity and mortality in intensive care units (ICUs). Early identification of patients at risk of VAP enables early intervention, which in turn improves patient outcomes. We developed a predictive model for individualized risk assessment utilizing machine learning to identify patients at risk of developing VAP.

Methods: The Philips eRI database, a multi-institution electronic medical record (EMR), was used for model development. For adult (≥18y) patients, we propose a set of criteria using indications of the start of a new antibiotic treatment temporally contiguous to a microbiological test to mark suspected infection events, of which those with a positive culture are labeled as presumed VAP if 1) the event occurs at least 48 hours after intubation, and 2) there are no indications of community-acquired pneumonia (CAP) or other hospital-acquired infections (HAI) in the patient charts. The resulting VAP and no-VAP (control) cases were then used to build an ensemble of decision trees to predict the risk of VAP in the next 24hrs using data on patient’s demographics, vitals, labs, and ventilator settings.

Results: The resulting model predicts the development of VAP with an AUC of 76% and AUPRC of 75% at a lead time of 24h in advance. Additionally, we group hospitals that are similar in healthcare processes into distinct clusters and characterize VAP prediction for the identified hospital clusters. We show inter-hospital (teaching status and healthcare processes) and cohort-specific (age groups, gender, early vs late VAP, ICU mortality status) differences in VAP prediction and associated symptomologies.

Conclusions:

Our proposed VAP criteria provide a temporal context for the infection event and enable studying the disease course. Using the proposed criteria, we curated a patient cohort and used it to build a model to predict an impending VAP event before any clinical suspicions. We present a clustering approach to tailor the VAP prediction model for different hospital types based on their EMR data characteristics. The developed VAP prediction model provides instantaneous risk score that allows early interventions and confirmatory diagnostic actions.

ventilator-associated pneumonia

early prediction

machine learning

inter-hospital comparison

clinical suspicion of infection

Patients receiving invasive mechanical ventilation (MV) are at high risk of developing complications including ventilator-associated pneumonia (VAP). VAP is an infection that starts at least 48 hours after intubation and is associated with prolonged length of stay (LOS) and increased morbidity and mortality. The VAP rate varies between 5% and 40% in patients on MV for > 2days [1], accounting for 32% of all hospital-associated pneumonias [2]. A VAP event imposes a significant financial burden on the healthcare system, adding upward of $40,000 in cost per hospital stay [3].

VAP management strategies have been a part of various patient safety initiatives and healthcare quality indicators either directly (e.g., National Database of Nursing Quality Indicators [4]) or indirectly (e.g., Medicare value-based purchasing program [5]). These strategies primarily focus on VAP prevention and/or treatment, while the prediction of an impending VAP remains a major predicament [6]. This is in part due to the lack of a standard and reliable clinical definition for VAP, which often leads to diagnostic errors (inaccurate, missed or delayed VAP diagnosis), a common healthcare harm domain. A recent scoping review highlights an unmet need for artificial intelligence (AI) models for early detection of hospital-acquired infections, including VAP, prior to any clinical suspicion to help enable early interventions [7]. Furthermore, the review recognizes the corrective role of these models in addressing diagnostic errors.

Prior applications of AI for VAP management 1) mainly use an electronic nose (e-nose), 2) are oblivious to the event onset time, which could lead to falsely regarding an already occurring/diagnosed VAP as impending, and 3) often have small sample sizes that could impede their generalizability. There are only a few studies leveraging electronic medical records (EMR) to build machine learning models for VAP prediction [8], [9], [10], with the lack of prediction gap consideration in [9], [10], rendering the models reported in these studies as detection or classification models rather than early prediction models (i.e., these models observe patients up to and in some cases, beyond the VAP onset time). The focus of our work is early prediction modeling, which is clinically valuable as it could enable early intervention.

In this work, we developed and evaluated a machine learning pipeline for early VAP prediction that includes: 1) annotation criteria inspired by clinical guidelines to mark incidences of presumed VAP events and their onsets in EMRs, 2) a predictive model that receives instantaneous patient status described in terms of demographics, vitals, lab results, and care processes among other parameters, and returns the likelihood (risk) of VAP in the next 24 hours, 3) a hospital phenotyping approach to identify hospitals similar in their healthcare processes as it pertains to mechanically ventilated patients, and 4) a characterization of inter-hospital differences in VAP prediction performance.

2.1 VAP annotation

The first step in building our model is to curate a dataset of MV episodes with a ground truth label for VAP and no-VAP events. For prediction tasks, we need to mark the onset of VAP events to enable developing models capable of estimating VAP risk prior to the VAP onset time (i.e., prediction gap). While there are some necessary criteria to diagnose VAP, there is no clinical definition that provides sufficient, sensitive, and specific criteria for identifying VAP events [11]. The absence of ground truth for VAP events and, more importantly, their timing poses a significant challenge to the development of predictive models.

A common approach to identify adverse events or diagnoses in EMR is to use ICD codes. This approach does not provide the onset time of the events, as ICD codes are typically assigned at the discharge point only. Even if limited to a one-shot prediction at a pre-set time (e.g., 48hrs after intubation [8]), the prediction time might overlap with the VAP window (temporal window where VAP is present and has already been clinically observed). More importantly, ICD codes are not a reliable indicator of adverse events during patient stays and, in particular, ICD codes used to chart VAP diagnosis have a positive predictive value of 0.57, with a low sensitivity ranging from 0.18 to 0.35 [12]. On the other hand, the clinical approach to diagnose VAP requires at least two serial chest radiographs with new or progressive and persistent infiltrates accompanied by at least two clinical parameters that all require administration of instrumental diagnostic tests (e.g., pulse oximetry, blood and sputum tests, arterial blood gas tests) [5]. Some of these tests are costly and labor intensive and therefore are not widely or frequently available in different healthcare setups; hence, their absence could lower the sensitivity of VAP diagnosis. Additionally, chest radiographs are not consistently used and when available, they suffer from high interrater disagreement on radiologic observations [13]. The subjective nature of radiology assessments is one of the main reasons the CDC has introduced a new guideline for tracking complications during invasive MV, collectively referred to as ventilator-associated events (VAE) [11].

Proposed EMR criteria for VAP annotation

The IDSA/ATS[1] guidelines define hospital acquired pneumonia (HAP) as a pneumonia that occurs 48hrs or more after admission to the hospital for patients that did not appear to be incubating at the time of admission, thereby ruling out community-acquired pneumonia (CAP). Both CDC[2] and IDSA/ATS guidelines define VAP as a HAP that arises more than 48-72hrs after endotracheal intubation [14], [5].

We present a set of criteria for VAP annotation (Figure 1) that consolidates clinical guidelines (IDSA/ATS for hospital-acquired infections (HAI)s and VAP [14], NHSN[3] pneumonia [5], and CDC VAE guidelines [11]):

(a) For adult (≥18y) patients on invasive mechanical ventilation, mark suspicions of nosocomial respiratory infection as events of 1) administration of a new antibiotic agent, temporally contiguous to 2) any indications of ordered microbiological tests (cultures):

Within the temporal window starting at 48hrs after intubation and extending to 72hrs after extubation, search for antibiotic agents from the list of eligible agents and administration routes for VAEs [11].
Ensure that the detected agents are new; an agent is new if it is initiated ≥48hrs after intubation and was not administered in the 2 days preceding the current start dates.
Search for cultures within the temporal window of 72hrs prior to and 24hrs after the detected new antibiotic agent.
Limit the culture types to those obtained from respiratory tract [11] (e.g., sputum, endotracheal aspirate, bronchoalveolar lavage, lung tissue, protected specimen brush) and blood cultures [15].
Mark the onset of a suspicion of infection as the earlier of the new antibiotic agent and its accompanied ordered culture times.

(b) Mark the VAP events from within the suspected infections identified in (a) as those satisfying the following criteria:

Positive culture results
No indications that the suspected infection is associated with community-acquired pneumonia (CAP)

No indication of pneumonia in their admission diagnosis,
No indication of CAP in their discharge ICD codes.

iii. The onset of clinical suspicion is ≥48hrs after intubation, excluding early infections.

The use of antibiotic administration temporally adjacent to culture orders has been previously reported as the criterion for marking suspicion of infection, notably in the Sepsis 3 definition [16]. Patients with suspected infections as defined in Sepsis 3 were found to have worse composite outcome defined as in-hospital mortality and/or ICU length of stay ≥3 days (46.30% vs 19.90%) [17]. Specific to VAP, antibiotic administration at least 48hrs after intubation has been previously used as the criterion to identify patients with presumed VAP in EMRs [18], [19]. The latest IDSA/ATS guidelines [14] also recommends initiating an empiric antibiotic treatment as soon as a VAP is suspected, accompanied by a culture order to help tailor the antibiotic treatment according to identified pathogens. In another study, the presence of an antibiotic administration and the ordering of a sputum culture were among the most salient features to VAP. Finally, in a recent systematic review, 73% (32/44) of studies used microbiological confirmation of infection as the sole reference standard for VAP diagnosis [20].

Given the lack of a standard diagnosis for VAP, the identified VAP cases in our study using the proposed criteria can be regarded as presumed VAP cases. The concept of presumed infection has been previously used and protocolized in other infection surveillance areas including the presumed serious infection (PSI) in sepsis surveillance systems that is defined as at least a blood culture (regardless of the result) temporally contiguous with the start of a new antibiotic treatment that is continued for 4 consecutive days [21]. Computational models for PSI prediction and the importance of such prediction for early interventions have been previously reported (e.g., [22]).

2.2 Prediction model design

Data

The patient data used in this study is derived from the EMR data in the Philips eICU Research Institute (eRI) [23]. The eRI dataset includes the EMR data from intensive care units (ICU)s in 459 United States hospitals of various sizes and teaching statuses and was collected over 12 years, from 2004 to 2016. We found that among the eRI hospitals, 292 hospitals have mechanically ventilated patients (invasive ventilation). We further excluded hospitals that do not report culture data, bringing the number of hospitals to 118. Additionally, for the remaining hospitals, we excluded the years with no culture or antibiotic prescription data.

The methods of data acquisition, validation, aggregation, security procedures, and a description of the ICUs in eRI are reported in [23]. All analyses were conducted after removal of patient and institutional identifying information under a waiver of the requirement to informed consent by the appropriate institutional review boards from participating institutions that contributed to the eRI database. Furthermore, the study and the use of eRI in this work have been approved by the Internal Committee of Biomedical Experiments at Philips.

Sample definition

Samples are defined following a one-shot prediction approach in which one sample is defined per mechanical ventilation episode. The sample definition is governed by three parameters: 1) reference point, 2) prediction gap, and 3) observation window. For MV episodes with a VAP event, the reference point is the time of the event. For each MV episode without VAP events (control patients), we select a random timepoint in the episode starting at 48hrs post-intubation as the reference point, with a constraint that the distribution quartiles of the resulting control references points should match that of the VAP samples.

A prediction gap defines how early (in hours) the model is predicting an impending VAP event (see the prediction gap illustration in Figure 1 and 5). In this work, we set the prediction gap to 24hrs prior to the reference point. An observation window (ObW) immediately precedes the prediction gap and is the temporal interval over which we observe the patient and compute an abstract feature-based representation of their instantaneous status based on their EMR entries.

Feature engineering

The features used to compute a patient's status in an observation window include: 1) demographics, 2) vitals, 3) labs, 4) ventilator settings, and 5) derived features including VAE-related features (Table 1). We did not use admission diagnoses as features due to discrepancies between hospitals in how they chart these diagnoses. The VAE features are inspired by the CDC VAE guidelines [11]. These features capture the counts of stability and oxygen worsening events normalized by the temporal distance (in hours) of the observation window from the intubation time. The stability count (“stable_n”) captures the normalized number of times from intubation where the daily minimum of PEEP or FiO2 remains unchanged or decreases over two consecutive days. The oxygen worsening feature (“OW_n”) captures the normalized number of times from intubation where the daily minimum of PEEP or FiO2 increases over the span of two consecutive days by 3 cmH2O and 20% for PEEP and FiO2, respectively. Finally, when available for computation, we include 2 additional features: body mass index (BMI) and PaO2/FiO2 (pf-ratio).

Different features have different temporal profiles in EMRs; some are available routinely at fixed intervals, others are measured intermittently, and yet others are rare or completely missing for some patients. This is caused by differences in patient needs and ICU journeys as well as differences in healthcare processes in different hospitals. We define feature-specific observation windows to account for differences in measurement frequencies (column “ObW” in Table 1). Vitals and labs that fell outside their standard reference ranges were excluded. Standard reference ranges were defined by clinical experts and based on the published laboratory test reference ranges from the American Board of Internal Medicine. We used a set of descriptive statistics (average, minimum, maximum, last observed value) for each vital, lab, or ventilator setting computed over their corresponding ObW as features.

Table 1

Features used for VAP prediction modeling.

Featrure category	Feature name	OW(hr)	MWS(hr)	Featrure category	Feature name	OW(hr)	MWS(hr)	Featrure category	Feature name	OW(hr)	MWS(hr)
Demographics	Age	-	-	Labs	Hematocrit	24	24	Labs	ALT	24	48
	Gender	-	-		Hemoglobin	24	24	(cont.)	Chloride	24	24
	Weight	-	-		Glucose	24	24		Calcium	24	24
	Height	-	-		Lactate	24	48		TotalCO2	24	24
Vitals	HR	12	4		Phosphate	24	48		INR	24	48
	RR	12	4		Magnesium	24	48		Albumin	24	24
	Systolic BP	12	6		PaO2	24	48		Platelets	24	24
	Diastolic BP	12	6		pH	24	24		BUN	24	24
	Mean BP	12	6		PaCO2	24	48		WBC	24	24
	Temperature	12	4		BaseExcess	24	48
	SpO2	12	4		AnionGap	24	48
	EtCO2	12	24		Creatinine	24	24
Ventilator	PEEP	24	24		Bilirubin	24	48
setting	FiO2	24	24		PTT	24	48
	Tidal volume	24	48		Potassium	24	24
	TV/Kg IBW	24	48		Sodium	24	24
VAE	stable_n	24	-		ALP	24	48
	OW_n	24	-		AST	24	48

ObW: observation window, MWS: backward search window for missing features, BMI: body-mass index, HR: heart rate, RR: respiratory rate, BP: blood pressure, EtCO2: end-tidal carbon dioxide, PEEP: baseline corrected positive end-expiratory pressure where baseline is defined as the initial value of PEEP at the intubation time, FiO2: baseline corrected fraction of inspired oxygen where baseline is defined as the initial value of FiO2 at the intubation time, OW: oxygen worsening, PaO2: partial pressure of oxygen, pH: potential hydrogen, PaCO2: partial pressure of carbon dioxide, PTT: partial thromboplastin time, ALP: alkaline phosphatase, ALT: alanine transaminase, AST: aspartate aminotransferase, INR: international normalized ratio, BUN: blood urea nitrogen, WBC: white blood cell count, TV/Kg IBW: Mean tidal volume per kilogram ideal body weight.

When a feature is missing in an observation window, we search the temporal interval immediately preceding the observation window and use the most recent measurement for the feature if it exists in that interval. Due to the dynamically evolving status of mechanically ventilated patients, in the search for missing features, we assume that observations too far away from the prediction time are invalid. As such, we define an upper limit (in hours) for the backward search for missing features to ensure clinical and physiological relevance to the current patient status. Furthermore, we make the upper-limit feature-specific to account for differences in measurement frequencies, e.g., vitals are more frequently charted as compared to labs (column “MWS” in Table 1).

Risk prediction model

The risk prediction model predicts the risk of an impending VAP within the next 24hrs. We trained an ensemble of decision trees using the XGBoost gradient boosting algorithm [24]. In our experiment, the XGBoost model surpassed other classifier models (random forest, logistic regression, ADABoost, KNN) in performance and risk score interpretability as well as the ability to handle correlated features. The XGBoost parameters were empirically selected through cross-validation experiments optimizing the AUC ROC for an internal validation set: 500 estimators with early stopping set at 100 rounds, maximum depth of 3, and learning rate of 0.05. In each boosting round, 90% of training samples and 50% of features were used to mitigate overfitting. We used SHAP (SHapley Additive exPlanations [25]) values to quantify the local and global impact of every variable on VAP risks.

Model evaluation

The model was trained and tested using a hold-out cross-validation experiment in which the data was split at the patient level into 80% training patients and 20% holdout test patients. The experiment was repeated 10 times, where in each run, a model was trained using the training patients and then, the resulting model was tested on the hold-out test patients.

Missing data handling

In this work, we use XGBoost that learns to assign missing cases to a tree branch that will optimize the loss function, which can be regarded as an implicit imputation. However, missingness patterns in EMR data might not be random (e.g., laboratory tests are ordered more frequently for sick patients [26] or a measurement might be missing only at some hospitals) and simply imputing them might result in the model learning the missingness pattern as a salient feature, rather than the actual pathophysiological feature underpinning the condition. To protect our model from selection biases caused by non-random missingness patterns, we have implemented the following:

A missingness threshold of 30% based on which a feature is eliminated if found missing in more than 30% of the training patients.
Balancing class-specific missingness: it is more likely for negative samples to have missing features [26]. In order to avoid situations where a model associates a missing feature with a particular class, we implement class-specific balancing anchored at the ‘temperature’ measurement to ensure that temperature is present/missing uniformly between the two classes. We select temperature as it is a frequently measured and important vital sign in infection. Ensuring equal class-specific missingness rate for temperature results in more similar rates of missingness in all other features.

Modeling safeguards

There are differences in care practices at different institutions that could significantly impact the performance of AI models built using EMR datasets, which, in turn, has direct implications on the applicability of the models in different types of hospitals. We implemented safeguards to manage and control, as feasible, the irrelevant sources of variations in the data:

Balanced hospital representation: intrinsic differences between hospitals (size, teaching status, care practices) lead to unequal amounts of data available for model training from different hospitals. We clip the number of training samples from a hospital to 100 to avoid a situation where a large-size hospital dominates our training samples.
MV-length matched VAP and control samples: It is critical to ensure that the control class is matched with the VAP class in terms of MV length (i.e., where in the patient journey a sample is extracted from to train the prediction model) to avoid a selection bias situation where the model learns the episode length as a predictive feature; associating longer episodes with a higher risk of VAP (as seen in [10]) , which albeit true and well-known, is a trivial feature that can obscure other salient pathophysiological features from being learned by the model. In this work, we match distribution quartiles of no-VAP samples to those of VAP samples.

Hospital phenotyping model

We present a hospital clustering approach based on the frequencies of measured vitals and labs as well as reported ventilator settings to capture differences in healthcare practices for mechanically ventilated patients at different hospitals. Vitals are consistently measured across hospitals; however, their frequency of measurement could reflect hospital-specific practices. Labs are important as 1) there are reports on associations between the presence of laboratory tests and patient outcomes [26], 2) a large amount of laboratory test data are available in EMRs, 3) lab results are time-tagged and commonly available in EMR data, whereas other healthcare process variables, such as doctor experience, specialty, and hospital policies, are more difficult to quantify or obtain in EMRs, and 4) the type and frequency of labs are varied between hospitals and this, in part, could reflect healthcare processes in use as well as available facilities at different hospitals. Finally, since our target cohort is mechanically ventilated, we include reported ventilator settings. We are interested in macro-level differences and as such we use aggregate measures across all mechanically ventilated patients at a hospital to capture these differences (Figure 3). Macro-level differences reflect inter-hospital variabilities in healthcare practices and protocols, whereas micro-level differences are at the patient level; e.g., lab works with small repeat intervals could indicate anomalies or clinical concerns specific to the ongoing patient status.

Given EMR datasets from different hospitals and a target patient population (mechanically ventilated patients), we extract EMR records for all vitals, lab works, and ventilator settings during patient stays. We exclude MV episodes that are shorter than 48hrs and retain only the hospitals that have at least 10 MV episodes. Next, we compute the time intervals between subsequent measurements for each variable for each patient. It is important to emphasize that we do not use the actual vital, lab or ventilator setting values but rather, we use the temporal interval between subsequent measurements. Next, we compute a hospital-level representation for each hospital: for a hospital and a measurement pair e.g., heart rate, we compute statistics descriptive of the distribution of patient-level intervals (25, 50, 75 percentiles) for the measurement at that hospital. We then project the resulting hospital-level representations into a low-dimensional space using the uniform manifold approximation and projection (UMAP) clustering [27] to obtain a 2D representation of the hospitals. Using K-means clustering, we then search for clusters of similar hospitals in the resulting 2D UMAP space. Using silhouette coefficients and elbow method on the sum of squared errors (the square of the Euclidean distance of a hospital to its cluster head or cluster centroid) to find an optimal number of clusters. Larger cluster numbers result in more homogeneous clusters, while fewer clusters will have a larger amount of within-cluster variability. The latter enables training cluster-specific predictive models that are more generalizable as compared to those trained on a smaller and more homogenous cluster.

The identified clusters were then used to build cluster-specific VAP prediction models and evaluate in-cluster and out-of-cluster prediction performances. The latter constitutes an external validation of our algorithm where a model trained on a particular hospital cluster is validated on other hospital clusters.

We identified 57,944 patients with 71,941 ICU stays and 80,153 MV episodes in the eRI dataset. There were 9,204 presumed VAP events identified based on the criteria presented in Section 2.1. We excluded underdetermined cases: 1) cases where there are new antibiotics with no contiguous culture and 2) cases where there is a VAP ICD code, but our criteria did not find a VAP event. Furthermore, some episodes might have multiple VAP events, for which we use the first event only, resulting in a total of 7,379 VAP events in our final cohort (Figure 4).

Table 2 shows the demographics of the control and VAP groups. The control group has a significantly higher hospital mortality rate (proportions z-statistics: 5.06, p<0.001), the VAP group is younger (Brunner Munzel statistic: 10.33, p<0.001), and has a larger number of male patients (proportions z-statistics: -4.72, p<0.001), and longer LOS (Brunner Munzel statistics: -80.49, p<0.001) and MV length (Brunner Munzel statistics: -62.87, p<0.001). The VAP day typically is the 5^th day of MV in the resulting dataset.

Table 2

Demographics of Control and VAP patients in the Philips eRI. Values are reported as the median and interquartile range (IQR) or counts and percentage.

	N	Age, y	Male	Hospital mortality	ICU LOS, days	MV length	VAP day
Control	9,340	63 (52-74)	5,294 (56%)	2,441 (26%)	7 (4-12)	6 (3-9)	-
VAP	7,379	61 (49-71)	4,450 (60%)	1,678 (23%)	15 (10-23)	11 (7-17)	5 (3-7)

Table 3 shows the overall test performance for an incrementally growing feature set. Using all feature types (last row of Table 3), the VAP events are predicted at AUC ROC of 76% and AUC PR of 75% at the 24hrs prediction gap. We also report the performance at two break-even points 1) TPR=PPV, where recall equals precision, and 2) FNR=FPR, where false negative rate equals false positive rates. The top contributing features and their risk curves are shown in Figure 5. Risk curves offer the critical threshold(s) for each variable that can be used to associate different ranges of values with the VAP risk.

Table 3

VAP prediction performance on hold-out test patients. Chance precision=47%. FPR=FNR: break-even point where false positive rate equals false negative rate. TPR=PPV: break even point where true positive rate equals the positive predictive value.

			FPR=FNR		TPR=PPV
	AUC ROC	AUC PR	Specificity	Precision	Specificity	Precision
demographics	56.77±0.75	50.22±1.20	54.91±1.63	48.69±1.08	59.47±1.62	49.14±1.35
demographics, vitals	62.00±1.02	62.19±1.85	58.08±1.75	59.54±1.50	57.45±1.80	59.40±1.45
demographics, vitals, labs	71.13±0.73	71.01±1.11	64.28±1.58	65.42±1.58	63.86±1.69	65.29±1.59
demographics, vitals, labs, vent	73.12±1.25	72.82±1.83	65.48±2.20	66.51±2.02	65.03±2.37	66.38±2.09
demographics, vitals, labs, vent, vae	75.61±1.17	75.33±1.71	67.45±1.51	68.45±1.47	67.03±1.59	68.35±1.50

Table 4 shows the VAP prediction performance for different patient cohorts. The performance across subgroups of patients is similar except for patients who expired in ICU (lower AUC ROC, Wilcoxon signed-rank statistics=55.0, p<0.001). This might suggest that the model is picking up on signals generally associated with severity of illness and as a result classifying these patients as high-risk VAP patients, leading to a higher false positive rate for the expired patients.

Table 4

Model performance in different patient subgroups. VAP events before and after the 5^th ICU day are early and late VAP events, respectively.

Age group, y	n_episodes	AUC	AUPRC	VAP rate
[18, 45]	2,804	70.99%	79.38%	51.25%
(45, 65]	6,913	71.80%	75.58%	44.79%
65+	6,946	72.52%	72.14%	40.48%
Gender	n_episodes	AUC	AUPRC	VAP rate
Female	6,990	72.88%	73.34%	41.87%
Male	9,721	71.72%	76.30%	45.78%
ICU mortality	n_episodes	AUC	AUPRC	VAP rate
Survived	12,164	73.29%	76.73%	44.89%
Expired	4,129	70.10%	69.82%	40.64%
VAP time	n_episodes	AUC	AUPRC	VAP rate
early VAP	9,827	73.17%	76.41%	44.98%
late VAP	6,891	71.02%	72.99%	42.94%

Inter-hospital differences in VAP prediction performance

We have grouped hospitals based on their teaching status (teaching vs non-teaching) and built group-specific models (Table 5). Non-teaching hospitals are often of a smaller size and fewer resources as compared to the teaching hospitals. These differences might manifest as sparser and more sporadic data available in non-teaching hospitals, and we hypothesize that this lack of data richness is detrimental to the VAP prediction performance. Most of the hospitals in our eRI study cohort are non-teaching (72 non-teaching hospitals versus 10 teaching hospitals, and 36 unknown). In our study cohort, the number of MV episodes from eRI’s teaching hospitals (8,405 episodes) is larger than non-teaching hospitals (4,770 episodes). The VAP positivity rate is significantly higher in teaching hospitals (47%) as compared to non-teaching hospitals (37%) (proportions z-statistics: 138.4, p<0.001). VAP patients are predominantly male regardless of hospital type and there is no significant difference in the patient gender between teaching and non-teaching hospitals (proportions z-statistics: 0.0785, p=0.779). There is no significant difference in mortality rate among VAP cases in teaching and non-teaching hospitals (proportions z-statistics: 0.0114, p=0.915). VAP patients in teaching hospitals are younger (Brunner Munzel statistic: 11.95, p<0.001), have a significantly longer ICU stays (Brunner Munzel statistic: -51.58, p<0.001) and longer MV episodes (Brunner Munzel statistic: -41.50, p<0.001) as compared to VAP patients in non-teaching hospitals.

Table 5

Demographics of Control and VAP patients in different hospital groups. Values are reported as the median and interquartile range (IQR) or counts and percentage.

		VAP	N	Age, y	Male	Hospital mortality	ICU LOS, days	MV length	VAP day
	non- teaching	0	3,020	64 (53-75)	1,592 (53%)	784 (26%)	7 (3-12)	5 (3-9)	-
	non- teaching	1	1,750	63 (50-73)	1,059 (61%)	400 (23%)	14 (10-22)	9 (6-15)	4 (3-7)
	teaching	0	4,433	61 (51-72)	2,581 (58%)	1,163 (26%)	7 (4-12)	6 (3-9)	-
	teaching	1	3,972	59 (48-71)	2,388 (60%)	913 (23%)	16 (10-24)	11 (7-17)	5 (3-8)
Derived clusters	1	0	909	65 (55-75)	492 (54%)	240 (26%)	7 (4-12)	6 (3-10)	-
	1	1	1,878	62 (50-72)	1,119 (60%)	427 (23%)	14 (10-21)	10 (6-16)	4 (3-7)
	2	0	3,653	62 (51-72)	2,282 (60%)	1,031 (28%)	7 (3-12)	6 (3-10)	-
	2	1	3,352	59 (48-71)	2,080 (62%)	785 (23%)	16 (11-24)	12 (7-19)	5 (3-8)
	3	0	1,037	63 (51-73)	576 (56%)	253 (24%)	9 (5-15)	6 (3-10)	-
	3	1	1,311	61 (50-72)	789 (60%)	280 (21%)	15 (10-22)	10 (7-15)	4 (3-7)
	4	0	3,737	64 (52-75)	2,042 (55%)	919 (25%)	7 (4-11)	5 (3-8)	-
	4	1	836	63 (49-74)	461 (55%)	186 (22%)	14 (9-21)	9 (6-14)	4 (3-7)

Furthermore, using the approach presented in Section 2.2, we have identified 4 clusters of hospitals (Table 5). These clusters have different compositions of teaching and non-teaching hospitals (number(%) of teaching hospitals in different clusters: cluster 1: 3(11%), cluster 2: 3(9%), cluster 3: 1(3)%, cluster 4: 3(13%)). Number of MV episodes from teaching hospitals in each cluster are as follows, cluster 1: 1,023(37%), cluster 2: 4,950 (71%), cluster 3: 46(2%), and cluster 4: 2,440(53%). The clusters have different positivity rates. Table 6 shows VAP prediction performance using cluster-specific models.

Table 6

VAP prediction performance using cluster-specific models. Clusters 1 to 4 correspond to the hospital clusters identified using the approach presented in Section 2.2. in-cluster: performance on 20% test hospitals from the cluster. out-of-cluster: performance of the cluster-specific models on patients in hospitals from other clusters.

					AUC ROC (median (IQR))%		AUC PR (median (IQR))%
	cluster	hospitals (teaching)	episodes	VAP rate	AUC ROC	out-of-cluster	AUC PR	out-of-cluster
	non- teaching	72(0)	4,707	37.18%	75.25 (74.66-76.74)	67.98 (67.76-68.76)	74.03 (72.20-76.63)	69.13 (68.85-69.72)
	teaching	10(10)	8,474	46.87%	77.30 (76.45-77.82)	67.93 (67.50-68.73)	76.29 (74.86-78.01)	67.29 (66.25-68.09)
Derived Clusters	1	27(3)	2,760	68.04%	72.03 (70.59-75.57)	63.57 (61.94-64.07)	91.12 (89.79-91.54)	68.44 (30.96-75.06)
	2	32(3)	7,022	55.88%	73.31 (71.77-74.70)	66.13 (61.98-74.71)	77.43 (75.42-78.23)	77.24 (31.77-90.86)
	3	31(1)	2,346	47.74%	67.85 (65.94-69.35)	61.87 (60.13-72.13)	76.84 (75.43-80.04)	64.97 (32.44-89.83)
	4	24(3)	4,582	18.25%	82.06 (80.84-82.46)	64.23 (62.41-67.87)	54.44 (51.88-56.31)	74.13 (66.57-86.80)

Figure 6 shows the top salient features for different hospital clusters. Temperature and mv_hrs (hours passed since intubation) as the most common salient features for all 6 hospital clusters reported in Figure 6, whereas aspartate aminotransferase (AST), gender, mean tidal volume per kilogram ideal body weight (TV kg/IBW), alanine transaminase (ALP), base excess, hemoglobin, and platelets each appear as a salient feature in a single hospital cluster.

Cluster 3 has the lowest in-cluster AUC ROC. Table 7 shows the number of hours between subsequent datapoints for a) labs and ventilator settings and b) vitals across hospitals. Hospitals in cluster 3 have the longest inter-measurement intervals (sparsest datapoints), particularly for PEEP and FiO2 as compared to the rest of the clusters. Only 3% of the hospitals in cluster 3 are teaching hospitals, contributing only 2% of the episodes in the cluster.

Table 7

Number of hours (median) between subsequent datapoints for different variables in the derived hospital clusters.

Clusters	PEEP	TotalCO2	FiO2	Sodium	Potassium	Hemoglobin	Chloride	Hematocrit	PaO2	PaCO2
1	15	22	10	22	19	23	23	23	12	12
2	3	19	3	21	19	23	22	22	12	12
3	26	10	16	21	18	23	22	22	16	16
4	9	11	7	13	11	16	16	16	8	8

Clusters	Temperature	SpO2	RR	MeanBP	HR
1	6.9	1.2	1.2	1.1	1.4
2	2.7	0.6	0.8	0.7	0.7
3	4.3	2.5	2.2	2.1	2.1
4	3.6	1.3	1.4	1.2	1.3

Looking at the performance gain achieved by incrementally adding features starting from demographics, and adding vitals, labs, ventilator settings, and VAE features one by one (Table 8), reveals that the addition of labs improves AUC ROC by 12% on average, the most significant performance gain (Wilcoxon signed-rank p<0.001). On the contrary, the effect of vitals is varied and in the case of cluster 1, the addition of vitals is detrimental to VAP prediction, whereas vitals improve the performance in cluster 4 and teaching hospitals by 13% and 6%, respectively. Ventilator settings improve AUC ROC by 3% on average, with cluster 4 showing the least gain. VAE features improve the performance by 2% on average across the clusters (Table 8). These differences can be a result of differences in data availability and freshness of the observations at the time of prediction (Table 7) in different clusters.

Table 8

Cluster-specific test AUC ROC for an incrementally growing feature set starting from demographics, and adding vitals, labs, ventilator settings (vent), and VAE features in order. The horizontal bars show the performance gain achieved by adding each feature set to the previous one. Red and blue horizontal bars indicate negative and positive performance gains, respecitively.

We presented a set of criteria to extract presumed VAP events in EMR datasets. Our proposed criteria use administration of new antimicrobial agents temporally contiguous to microbiological culture orders to mark a suspicion of infection 48hrs after intubation. The identified suspected infections are then labeled as VAP if any bacterial growth were identified in the corresponding culture. The proposed criteria provide a temporal context for the infection event that enables the study of the disease course and more importantly, develop models to predict an impending infection before any clinical suspicions. Using the criteria, we have extracted MV episodes with VAP and no VAP events and developed a VAP prediction model capable of predicting VAP events at 24hrs lead time at an AUC ROC and AUC PR of 76% and 75%, respectively.

Features most salient to VAP prediction are shown in Fig. 5. Our model picks the clinical parameters used in the diagnosis of pneumonia, namely high temperature, high heart rate, low oxygen desaturation (SpO2) as salient features [28]. Some variables have monotonic risk curves (e.g., higher temperature is associated with higher VAP risks), while others have nonlinear risk curves such as FiO2 (Fig. 5.b). Deviations from the baseline should capture changes in patient status more accurately than the mere observed value of the feature. FiO2 is the baseline-corrected FiO2, where the baseline is defined as the initial FiO2 value at the intubation time. Our analysis shows that a large deviation in FiO2, regardless if its direction, is associated with higher VAP risk. We found that lower age (< 65) and male gender are associated with higher risk of developing VAP in our study. The parabolic associations observed between mv_hrs and VAP risk could be linked to early versus late VAP cases and warrants further investigation in our future studies.

High levels of chloride identified as a VAP risk by our model can be caused by hyperchloremic acidosis and can be a sign of the body’s response to infection as reported for septic shocks [29]. Our model associates lower bicarbonate (total CO2) levels with higher risk of VAP, which could be a sign of metabolic acidosis. In the future, we will further explore the association between metabolic acidosis, and metabolic or respiratory alkalosis and VAP using interactions of arterial blood gases (e.g., PaCO2, total CO2, and pH) and other patient variables.

Bradypnea (low-respiratory rate) was identified as a VAP risk in our model and could be a sign of hypercapnia or elevated PaCO2, another salient feature identified by our model. Hypercapnia can be caused by respiratory acidosis, a sign of acute respiratory failure emblematic of VAP patients [30] and is associated with negative outcomes for CAP patients [31]. For respiratory rate, the standard deviation, displayed as a grey-shaded area around the VAP risk curve (Fig. 5.b), shows increased VAP risk at higher respiratory rates for a subset of patients. High respiratory rates (tachypnea) can be caused by shortness of breath or difficulty breathing, a common symptom in pneumonia.

Our model finds associations between elevated mean arterial blood pressure (MeanBP in Fig. 5.b) and increased risk of VAP. A recent study reports on a similar link between elevated blood pressure and increased pneumonia risk [32]. Low creatinine levels were found as another VAP risk in our model. Low creatinine levels are reported to be significantly associated with increased in-hospital respiratory failure [33]. Our model identified high BUN levels (≥ 30mg/dL) as another VAP risk, which corroborates previous reports on increased BUN levels with pneumonia [34]. In pneumonia severity scores (pneumonia severity index [35], IDSA/ATS severe pneumonia criteria [36], and CURB-65 by the British Thoracic Society [37]), elevated BUN levels are included as a risk factor for pneumonia. Finally, our model identifies, low hematocrit level (< 30%), anemia, as another risk factor for VAP. Hematocrit level (< 30%) is a component in the pneumonia severity index [35]. In the future, we will further explore the associations between kidney variables and VAP.

Hospitals vary in size, specialties, facilities, and healthcare processes and these differences manifest in their EMRs as different datapoints (measurements, actions, and interventions) reported at different frequencies for the same patient if treated at these different hospitals. We found that VAP prediction in teaching hospitals is significantly better than in non-teaching hospitals (Wilcoxon signed-rank z = 50.0, p = 0.009). Inter-hospital differences in VAP prediction were also reported in [9]. These differences in performance may be due to the availability of a larger sample size as well as a larger number of VAP cases from teaching hospitals. For the identified clusters, the AUC ROC for within-cluster samples is 8.9% higher than the AUC ROC on the out-of-cluster samples, which indicates the importance of taking hospital-specific differences into account when designing predictive models. Among the identified clusters, cluster 3 has the lowest performance. Almost all the episodes (98%) in cluster 3 come from non-teaching hospitals. Despite having the second largest number of hospitals, cluster 3 has the smallest number of mechanically ventilated patients among the hospitals. More importantly, measurement freshness is critical to accurately capture instantaneous patient status for VAP prediction. Hospitals in cluster 3 have the least frequent vitals and sparsest labs and ventilator settings among the clusters (Table 7). The observed measurement staleness in cluster 3 is emblematic of small community hospitals due to lack of resources and staff and could impede the application of data-driven predictive models such as ours in these hospitals. We also observed that the top contributing features for different hospital clusters and teaching statuses are different (Fig. 6). There is a significant agreement between the top features in teaching hospitals and clusters 2 and 4 (Kendall Ƭ p < 0.001, Bonferroni-corrected for multiple comparisons). On the contrary, the top features in clusters 1 and 3 are in significant agreement with those in non-teaching hospitals (Kendall Ƭ p < 0.001, Bonferroni-corrected for multiple comparisons). These two clusters have low number of episodes from teaching hospitals. As for the out-of-cluster performance (Table 6), models trained on cluster 2 have the highest AUC ROC (Wilcoxon signed-rank statistics > 368, p < 0.002). Incidentally, this cluster has the highest number of episodes from teaching hospitals. These observations further highlight the importance of considering hospital type and workflow in designing data-driven clinical decision support systems.

The higher VAP rate in teaching hospitals observed in our study is congruent with previous reports on VAP rates [38] and hospital-acquired infections in teaching versus non-teaching hospitals [39]. The differences in VAP rates between teaching and non-teaching hospitals may be partially caused by differences in clinical resources and protocols. For instance, there might be more frequent checkups, timely cultures, and antibiotic administrations in teaching hospitals, which in turn, allows us to identify more cases of presumed VAP from their EMRs as compared to non-teaching hospitals. On the contrary, the lower positivity rate in non-teaching hospitals could be an artifact of more sporadic checkups, leading to delayed and missed detection of symptoms and in turn, delayed interventions (cultures and antibiotics).

Irrespective of inter-hospital differences in VAP positivity rates, the high VAP positivity rate in our study cohort is due to 1) exclusion of all MV episodes < 48 hrs, and 2) the exclusion of the underdetermined cases from our negative samples, as described in Section 4.1. The exclusion of underdetermined cases is done to ensure true presumed VAP and control cases for model development, a common data curation practice for predictive modeling. As for the exclusion of shorter episodes (< 48hrs), it makes our prediction task more difficult as in our modeling we are left with control samples that are matched in terms of episode length and the need for mechanical ventilation to their VAP counterparts. In future works, we will systematically investigate VAP prediction in terms of various prediction gaps and characterize prediction gaps for different healthcare settings and patient cohorts. For baseline corrected features, we will investigate various approaches for baseline definition including feature-specific baselines capturing earlier observations of features during which the patient was stable. We will adapt the definition of stability from the CDC VAE [11] for baseline definition purposes.

Limitations

Our study has several limitations. In the absence of a gold standard for VAP definition, we introduced a set of criteria to mark VAP events and their onset times in EMRs. The onset time marked using our criteria might not align with the actual physiological onset of the infection. This limitation is not unique to our criteria and all alternative diagnostic approaches, including the clinical approach (two or more radiographic signs along with clinical parameters), suffer from this limitation. Our criteria rely on treatments and as such ensures that our model can predict an impending VAP prior to any clinical suspicion/treatment. However, this also means that our criteria are susceptible to factors external to the patient's physiology such as workflow differences similar to the one discussed above for cluster 3 of hospitals. In this work, we did not stratify the VAP events based on the identified pathogens, nor did we stratify the events by their distance from the ICU admission time or the intubation time. VAP events before and after the 5th ICU day are known as early and late VAP events, respectively [14], and our model performance for early and late VAP events is reported in Table 4. It is important to characterize our model performance for different pathogen families, a direction for our future research. We will further investigate the predictability of early and late VAP cases in the future.

In this work, we use a one-shot approach to define our samples. In the future, we will implement a rolling window approach to define sequential samples starting from the intubation time. As a result, for an episode with a VAP event, we will have both negative samples (far away from the event) and positive samples (closer to the event) to better emulate the patient journey. Finally, in the absence of a gold standard, a prospective randomized trial is needed to determine the clinical benefits, expressed in terms of patient outcomes (length of ICU stay, length of mechanical ventilation, antibiotic use, and healthcare costs) for our predictive model, a direction for our future exploration.

We presented a set of criteria that enable identifying presumed VAP events in EMR datasets, a necessary step for building early prediction models. We then built and validated a predictive model capable of accurate and early prediction of impending VAP events in mechanically ventilated patients using commonly measured physiological medical records. The model shows generalizability across different patient cohorts. We proposed a hospital clustering approach to handle inter-hospital differences in healthcare processes and discussed hospital-level characteristics that are detrimental to early prediction of VAP events. We demonstrated the superiority of the cluster-specific models in VAP prediction and presented them as a solution to overcome differences in healthcare processes and hospital types that lead that impact data availability.

ObW	Observation window
MWS	backward search window for missing features
BMI	body-mass index
HR	heart rate
RR	respiratory rate
BP	blood pressure
EtCO2	end-tidal carbon dioxide,
PEEP	baseline corrected positive end-expiratory pressure where baseline is defined as the initial value of PEEP at the intubation time
FiO2	baseline corrected fraction of inspired oxygen where baseline is defined as the initial value of FiO2 at the intubation time
SpO2	oxygen desaturation
OW	oxygen worsening
PaO2	partial pressure of oxygen
pH	potential hydrogen
PaCO2	partial pressure of carbon dioxide
PTT	partial thromboplastin time
ALP	alkaline phosphatase
ALT	alanine transaminase
AST	aspartate aminotransferase
INR	international normalized ratio
BUN	blood urea nitrogen
WBC	white blood cell count
TV/Kg IBW	mean tidal volume per kilogram ideal body weight
CDC	Centers for Disease Control and Prevention
IDSA	Infectious Diseases Society of America
ATS	American Thoracic Society
NHSN	National Health and Safety Network
VAP	Ventilator-associated pneumonia
ICU	intensive care units
EMR	electronic medical record
CAP	community-acquired pneumonia
HAI	hospital-acquired infections
MV	Mechanical ventilation
eRI	Philips eICU Research Institute
LOS	length of stay
HAP	hospital acquired pneumonia
PSI	presumed serious infection
SHAP	SHapley Additive exPlanation
UMAP	uniform manifold approximation and projection

Ethics approval and consent to participate

Consent obtained for use of eRI database.

Consent for publication

Not applicable

Availability of data and materials

The data used in this study is curated from the public eRI dataset. The curated dataset that supports the findings of this study are available from the corresponding author upon reasonable request.

Competing interests

A.S. and K.v.Z are employed by Philips Research North America—a manufacturer of Medical Devices and systems. T.W. was previously employed by Philips Research North America, during which he contributed to the data analyses and model developed reported in this paper.

Funding

Not applicable

Author contributions

A.S.: Performed data curation, led the data analyses and drafted the manuscript; T.W.: Assisted in performing the data analyses; K.v.Z: Overall project oversight and finalizing the manuscript.

L. Papazian, M. Klompas and C.-E. Luyt, "Ventilator-associated pneumonia in adults: a narrative review," Intensive care medicine, vol. 46, p. 888–906, 2020.
S. S. Magill, E. O’Leary, S. J. Janelle, D. L. Thompson, G. Dumyati, J. Nadle, L. E. Wilson, M. A. Kainer, R. Lynfield, S. Greissman, S. M. Ray, Z. Beldavs, C. Gross, W. Bamberg, M. Sievers, C. Concannon, N. Buhr, L. Warnke, M. Maloney, V. Ocampo, J. Brooks, T. Oyewumi, S. Sharmin, K. Richards, J. Rainbow, M. Samper, E. B. Hancock, D. Leaptrot, E. Scalise, F. Badrun, R. Phelps and J. R. Edwards, "Changes in Prevalence of Health Care–Associated Infections in U.S. Hospitals," New England Journal of Medicine, vol. 379, pp. 1732-1744, 2018.
E. Zimlichman, D. Henderson, O. Tamir, C. Franz, P. Song, C. K. Yamin, C. Keohane, C. R. Denham and D. W. Bates, "Health care–associated infections: a meta-analysis of costs and financial impact on the US health care system," JAMA internal medicine, vol. 173, p. 2039–2046, 2013.
I. Montalvo, "The National Database of Nursing Quality Indicators (NDNQI)," The Online Journal of Issues in Nursing, 2007.
Centers for Disease Control and Prevention, Pneumonia (Ventilator-associated [VAP] and nonventilator-associated Pneumonia [PNEU]) Event, 2022.
T. Frondelius, I. Atkova, J. Miettunen, J. Rello and M. M. Jansson, "Diagnostic and prognostic prediction models in ventilator-associated pneumonia: Systematic review and meta-analysis of prediction modelling studies," Journal of Critical Care, vol. 67, pp. 44-56, 2022.
D. W. Bates, D. Levine, A. Syrowatka, M. Kuznetsova, K. J. T. Craig, A. Rui, G. P. Jackson and K. Rhee, "The potential of artificial intelligence to improve patient safety: a scoping review," NPJ digital medicine, vol. 4, p. 1–8, 2021.
C. Giang, J. Calvert, K. Rahmani, G. Barnes, A. Siefkas, A. Green-Saxena, J. Hoffman, Q. Mao and R. Das, "Predicting ventilator-associated pneumonia with machine learning," Medicine, vol. 100, p. e26246, 2021.
B. Liquet, J. Timsit and V. Rondeau, "Investigating hospital heterogeneity with a multi-state frailty model: application to nosocomial pneumonia disease in intensive care units.," BMC medical research methodology, vol. 12, no. 1, pp. 1-14, 2012.
A. Pearl and D. Bar-Or, "Decision Support in Trauma Management: Predicting Potential Cases of Ventilator Associated Pneumonia," European Federation for Medical Informatics and IOS Press., pp. 305-309, 2012.
C. for Disease Control and Prevention, Ventilator-Associated Event (VAE), 2022.
A. Wolfensberger, A. H. Meier, S. P. Kuster, T. Mehra, M.-T. Meier and H. Sax, "Should International Classification of Diseases codes be used to survey hospital-acquired pneumonia?," Journal of Hospital Infection, vol. 99, p. 81–84, 2018.
M. Klompas, "Interobserver variability in ventilator-associated pneumonia surveillance," American journal of infection control, vol. 38, p. 237–239, 2010.
American Thoracic Society and Infecious Diseases Society of America, "Guidelines for the Management of Adults with Hospital-acquired, Ventilator-associated, and Healthcare-associated Pneumonia," American Journal of Respiratory and Critical Care Medicine, vol. 171, no. 4, pp. 388-416, 2005.
V. Fabre, S. L. Sharara, A. B. Salinas, K. C. Carroll, S. Desai and S. E. Cosgrove, "Does this patient need blood cultures? A scoping review of indications for blood cultures in adult nonneutropenic inpatients," Clinical Infectious Diseases, vol. 71, p. 1339–1347, 2020.
C. W. Seymour, V. X. Liu, T. J. Iwashyna, F. M. Brunkhorst, T. D. Rea, A. Scherag, G. Rubenfeld, J. M. Kahn, M. Shankar-Hari, M. Singer, C. S. Deutschman, G. J. Escobar and D. C. Angus, "Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)," JAMA, vol. 315, pp. 762-774, February 2016.
A. E. W. Johnson, J. Aboab, J. D. Raffa, T. J. Pollard, R. O. Deliberato, L. A. Celi and D. J. Stone, "A comparative analysis of sepsis identification methods in an electronic database," Critical care medicine, vol. 46, p. 494–499, 2018.
R. J. Searcy, C. A. Jankowski, D. W. Johnson and J. A. Ferreira, "Evaluation of sedation-related medication errors in patients on contact isolation in the intensive care unit," Journal of Hospital Infection, vol. 98, p. 175–180, 2018.
C. A. Schurink, S. Visscher, P. J. Lucas, H. J. van Leeuwen, E. Buskens, R. G. Hoff, A. I. Hoepelman and M. J. Bonten, "A Bayesian decision-support system for diagnosing ventilator-associated pneumonia," Intensive care medicine, vol. 33, no. 8, pp. 1379-1386, 2007.
B. Al-Omari, P. McMeekin, A. J. Allen, A. R. Akram, S. Graziadio, J. Suklan, W. S. Jones, B. C. Lendrem, A. Winter, M. Cullinan and others, "Systematic review of studies investigating ventilator associated pneumonia diagnostics in intensive care," BMC pulmonary medicine, vol. 21, p. 1–19, 2021.
Centers for Disease Control and Prevention, Hospital toolkit for adult sepsis surveillance, 2018.
A. Tabaie, E. W. Orenstein, S. Nemati, R. K. Basu, G. D. Clifford and R. Kamaleswaran, "Deep Learning Model to Predict Serious Infection Among Children With Central Venous Lines," Frontiers in pediatrics, vol. 9, 2021.
T. J. Pollard, A. E. W. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark and O. Badawi, "The eICU Collaborative Research Database, a freely available multi-center database for critical care research," Scientific data, vol. 5, p. 1–13, 2018.
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016.
S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal and S.-I. Lee, "From local explanations to global understanding with explainable AI for trees," Nature machine intelligence, vol. 2, p. 56–67, 2020.
G. Hripcsak, D. J. Albers and A. Perotte, "Parameterizing time in electronic health record studies," Journal of the American Medical Informatics Association, vol. 22, pp. 794-804, February 2015.
L. McInnes, J. Healy and J. Melville, "Umap: Uniform manifold approximation and projection for dimension reduction," arXiv preprint, p. arXiv:1802.03426, 2018.
Centers for Disease Control and Prevention, Pneumonia (Ventilator-associated [VAP] and nonventilator-associated Pneumonia [PNEU]) Event, 2022.
C. Filis, I. Vasileiadis and A. Koutsoukou, "Hyperchloraemia in sepsis," Annals of Intensive Care, vol. 8, no. 1, pp. 1-8, 2018.
L. Morales-Quinteros, M. Camprubi-Rimblas, J. Bringue, L. D. Bos, M. J. Schultz and A. Artigas, "The role of hypercapnia in acute respiratory failure," Intensive care medicine experimental, vol. 7, no. 1, pp. 1-12, 2019.
D. D. Sin, S. P. Man and T. J. Marrie, "Arterial carbon dioxide tension on admission as a marker of in-hospital mortality in community-acquired pneumonia," The American journal of medicine, vol. 118, no. 2, pp. 145-15-, 2005.
S. M. Zekavat, M. Honigberg, J. P. Pirruccello, P. Kohli, E. W. Karlson, C. Newton-Cheh, H. Zhao and P. Natarajan, "Elevated blood pressure increases pneumonia risk: epidemiological association and mendelian randomization in the UK Biobank," Med, vol. 2, no. 2, pp. 137-148, 2021.
C. Thongprayoon, W. Cheungpasitporn, A. Chewcharat, M. A. Mao, S. Thirunavukkarasu and K. B. Kashani, "The association of low admission serum creatinine with the risk of respiratory failure requiring mechanical ventilation: a retrospective cohort study.," cientific reports, vol. 9, no. 1, pp. 1-7, 2019.
D.-Y. Feng, Y.-Q. Zhou, X.-L. Zou, M. Zhou, J.-X. Zhu, Y.-H. Wang and T.-T. Zhang, "Differences in microbial etiology between hospital-acquired pneumonia and ventilator-associated pneumonia: a single-center retrospective study in Guang Zhou," Infection and drug resistance, vol. 12, pp. 993-1000, 2019.
M. J. Fine, T. E. Auble, D. M. Yealy, B. H. Hanusa, L. A. Weissfeld, D. E. Singer, C. M. Coley, T. J. Marrie and W. N. Kapoor, "A prediction rule to identify low-risk patients with community-acquired pneumonia," New England journal of medicine, vol. 336, no. 4, pp. 243-250, 1997.
L. A. Mandell, R. G. Wunderink, A. Anzueto, J. G. Bartlett, G. D. Campbell, N. C. Dean, S. F. Dowell, T. M. J. File, D. M. Musher, M. S. Niederman, A. Torres and C. G. Whitney, "Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of community-acquired pneumonia in adults," Clinical Infectious Diseases, vol. 44, pp. S27-S72, 2007.
W. S. Lim, M. M. van der Eerden, R. Laing, W. G. Boersma, N. Karalus, G. I. Town, S. A. Lewis and J. T. Macfarlane, "Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study," Thorax, vol. 58, no. 5, pp. 377-382, 2003.
M. A. Dudeck, T. C. Horan, K. D. Peterson, K. Allen-Bridson, G. Morrell, D. A. Pollock and J. R. Edwards, "National Healthcare Safety Network (NHSN) Report, data summary for 2010, device-associated module," American journal of infection control, vol. 39, no. 10, pp. 798-816, 2011.
M. A. Dudeck, J. R. Edwards, K. Allen-Bridson, C. Gross, P. J. Malpiedi, K. D. Peterson, D. A. Pollock, L. M. Weiner and D. M. Sievert, "National Healthcare Safety Network report, data summary for 2013, Device-associated Module.," American journal of infection control, vol. 43, no. 3, pp. 206-221, 2015.
K. McLintock, A. M. Russell, S. L. Alderson, R. West, A. House, K. Westerman and R. Foy, "The effects of financial incentives for case finding for depression in patients with diabetes and coronary heart disease: interrupted time series analysis," BMJ Open, vol. 4, 2014.
M. Klompas, D. S. Yokoe and R. A. Weinstein, "Automated surveillance of health care–associated infections," Clinical infectious diseases, vol. 48, p. 1268–1275, 2009.
Y.-F. Chen, A. Boyal, E. Sutton, X. Armoiry, S. Watson, J. Bion and C. Tarrant, "The magnitude and mechanisms of the weekend effect in hospital admissions: a protocol for a mixed methods review incorporating a systematic review and framework synthesis," Systematic reviews, vol. 5, p. 1–11, 2016.
D. J. Albers and G. Hripcsak, "A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data," Physics letters A, vol. 374, p. 1159–1164, 2010.
D. Agniel, I. S. Kohane and G. M. Weber, "Biases in electronic health record data due to processes within the healthcare system: retrospective observational study," BMJ, vol. 361, 2018.
J. E. Brinkman and S. Sharma, "Respiratory Alkalosis," StatPearls [Internet], 2021.

Competing interest reported. A.S. and K.v.Z are employed by Philips Research North America—a manufacturer of Medical Devices and systems. T.W. was previously employed by Philips Research North America, during which he contributed to the data analyses and model developed reported in this paper.

VAP Risk Index: Early prediction and hospital phenotyping of ventilator-associated pneumonia using machine learning

Status:

Version 1

Abstract

Figures

1. Introduction

2. Methods

2.1 VAP annotation

Proposed EMR criteria for VAP annotation

2.2 Prediction model design

Data

Sample definition

Feature engineering

Risk prediction model

Model evaluation

Missing data handling

Modeling safeguards

Hospital phenotyping model

3. Results

Inter-hospital differences in VAP prediction performance

4. Discussion

5. Conclusions

Abbreviations

Declarations

References

Additional Declarations

Status:

Version 1