Study design
Rennes and Tours hospitals are academic hospitals with similar activity and organisation (Table 1). The CDW data model and technical design are the same in these two hospitals [13]. The same algorithms for selection criteria, readmission criteria, data management, and the same prediction algorithms were ran separately using the Rennes and Tours CDW data.
Table 1 – Number of hospital beds in Rennes and Tours academic hospitals in 2020.
Number of hospital beds
|
Rennes
|
Tours
|
Medicine
|
917
|
748
|
Surgery
|
464
|
537
|
Gynaecology/Obstetrics
|
107
|
126
|
Total
|
1488
|
1411
|
Patients with at least one hospital admission date in Rennes or Tours between 1 January 2013 and 31 December 2018 were included in the study. The two datasets were then divided into training set (admission date before 31 December 2016) and test set (admission date after 1 January 2017).
Rennes and Tours CDWs contain most of the EHR data, including clinical notes, drug prescriptions, laboratory test records, and claim data. The different stays are chained by patient in a de-identified EHR. All information about a patient's hospital stay is de-identified and organised in structured and unstructured data in the CDW.
Diagnoses and comorbidities were coded using the French version of the International Classification of Diseases, 10th Edition (ICD-10) [14], and grouped with the F-DRG [15]. Medical and surgical procedures were also coded according to the French classification of clinical procedures (CCAM) [16]. Drug prescriptions were coded using the Anatomical, Therapeutic and Chemical (ATC) codes in the Rennes CDW. Drug data were not yet available in the Tours CDW at the time of the analysis. Hospital departments and laboratory data are currently coded according to a local thesaurus.
Inclusion criteria
To ensure the result comparability with the French national indicator for 30-day readmission (RH-30), the selection criteria were those of the ATIH RH30 national methodology [8]: ≥18-year-old patients who received obstetrical/gynaecological, surgical, or medical care.
Non-inclusion criteria
Patients without geolocation in mainland France and patients in palliative care settings were excluded. Hospital stays with a different entry mode than from home, and iterative stays were also excluded (chemotherapy/radiotherapy sessions, transplant context, renal haemodialysis sessions, cataract surgery). Readmissions within 30 days after an iterative stay were not considered as unplanned readmissions.
Unplanned readmission definition
An unplanned readmission was defined as a hospital stay within 30 days after the end of the index stay. The index stay was defined as a hospital stay with a discharge to home and not preceded by another hospitalisation within 30 days before the index admission date. Stays corresponding to the inclusion and exclusion criteria and with at least one index stay were included. The aim was to predict readmission within 30 days of the index stay.
Data extraction
Covariates extracted from the two CDWs [13] were: age, sex, length of stay, number of previous hospitalisations, illness severity (F-DRG classification), major diagnostic categories, medical diagnoses and comorbidities (ICD-10), medical and/or surgical procedures (CCAM classification), hospital department, and available laboratory data.
ICD-10 and CCAM codes were grouped by the three first characters indicating the diagnostic category and the procedure and organ, respectively. Laboratory data were coded as binary variables, and were considered as abnormal if at least one value during the stay was outside the normal range.
Demographic data, aggregated at the city and district levels when the municipality had more than 10,000 inhabitants [17], were merged with the patient using the geometric map background and the corresponding geolocation:
- Median pre-tax household income per year
- Part of ≥15-year-old population, education level
- Unemployment rate for the 15 to 64-year-old population
- Socio-professional categories for the 15 to 64-year-old population
Data Processing
Missing data were imputed using the K Nearest Neighbor method (taking the mean of the five nearest neighbors); numerical values were considered Missing At Random (MAR). Features with less than 1% of events were removed to avoid unpredictable and inexplicable responses from prediction models related to rare events. For the logistic regression models, only significant covariates (p-value ≤0.05 by univariate analysis) were retained and then, a multivariate model with step-by-step selection of variables was performed to obtain the most parsimonious model to maximise the area under the Receiver Operating Characteristic (ROC) curve (AUC). The algorithms used for readmission prediction were the most frequently described in the literature: logistic regression, Random Forest, Gradient Boosting, Naive Bayes, and Neural Networks [9, 18–21].
The main outcome was the AUC, and the secondary outcomes were sensitivity, specificity, positive and negative predictive value of the cut-point closest to the top-left corner of the ROC space. The calculation time was evaluated after the data management step and was from the start of the feature selection to the end of model fitting.
As a secondary objective, the model explainability was assessed by identifying the covariates considered to be important by the different algorithms. This importance was assessed according to criteria adapted to each algorithm: Odds Ratio (OR) for logistic regression, relative influence for Gradient Boosting, Gini index for Random Forest, and Garson algorithm for Neural Networks. The most relevant covariates were compared between hospitals.
Data handling and pre-processing were performed on R Studio, version 3.6.0 [22].
Ethics and Consent
The clinical data warehouses have been authorized by the Commission Nationale de l'Informatique et des Libertés (CNIL national commission for information technology and civil liberties), requiring individual and collective patient information. Patients have a right to oppose the reuse of their data.