Data collection and study design:
-
SafeICU Framework: All the data reported in this research was collected from the Paediatric ICU of AIIMS, New Delhi, a tertiary care hospital in India. There were 8 beds including neonatal beds in the Paediatric ICU. We set up our servers to collect the real-time physiological vitals periodic data. We also warehoused laboratory investigations, daily doctors’, nurses’ notes, treatment charts, and thermal imaging.
-
Ethical Approval and Patient Consent: The study was carried out with the approval of the Pediatric Intensive Care Unit of All India Institute of Medical Sciences, New Delhi, India. Since thermal images only capture infrared radiation, these don’t reveal patient identity and the study did not involve any contact or change in routine patient care. Hence a waiver of consent was sought and granted by the Institute Ethics Committee (Ref. No. IEC/NP-211/08.05.2015, AA-2/09.02.2017). All experiments were performed in accordance with relevant guidelines and regulations as approved by the ethics committee.
-
Vital records from multi-parameter monitoring: For monitoring the patients, Paediatric ICUs are equipped with Mindray TM monitors. We installed a dedicated server for querying and storing the vitals streaming data from Central Monitoring Station (CMS). Client socket programming with respect to the device protocols was used. In-house software was written for Health Level 7 (HL7) Standards[22] based querying of the CMS. These vital data were received at the resolution of 15 seconds for unsolicited data and 1 second for the real-time data. To receive streaming data, 64*1024 bytes of character array was used as a buffer. Pipe delimited text file was generated every day at 00:00 hour. Software code was written in such a way that it automatically logs the data into a text file on a daily basis. These high-resolution vital data have been warehoused for all the ICU patients starting from February 2016 to May 2020.
-
Cohort based on Binary shock index: The SAFE-ICU described earlier has warehoused over 1.5 million patient-hours of monitoring data from the PICU. It is used to extract time-stamped vital data for the patients at 0–6 hour heart rate and blood pressure recordings. Shock index was calculated as the ratio of median heart rate and median non-invasive blood pressure or the arterial systolic blood pressure. This was calculated over the median of moving sequential windows of 30 data points at a resolution of 15s. Shock-Index Paediatric Age-Adjusted (SIPA) is used to compute shock/no-shock age-specific binarized outcome for each patient [16].
-
Thermal Imaging: Standard thermal video capturing and operating procedures were followed, in order to ensure minimal disturbances by the extraneous factors, say, patient positioning, device handling etc. (Supplementary Methods S1). Thermal cameras only capture infrared radiation, so as to make sure that study does not reveal the patient's identity. The camera was placed properly and at a good distance from the patient so that there was no direct contact involved nor any change in patient routine care. The thermal videos were captured in a standard color-scale guaranteeing that the full body of infants was visible. Thermal videos of every single patient were collected through an Android Smartphone attached Seek Thermal® camera at different time points on different days. Thus each patient possibly has different values for shock-status, which in turn eliminates bias due to the patient's propensity characteristics, say gender, age, etc. Vital data with respect to the time-stamp of the videos were extracted from the data warehouse at 15s intervals (SAFE ICU)[9]. A comparison was made between the shock and non-shock groups using either Wilcoxon rank-sum test or two-tailed Student’s t-test, after testing for normality by D'Agostino-Pearson normality test using GraphPad Prism version 6.00, GraphPad Software, La Jolla California USA, www.graphpad.com.
Classification into Covered and Uncovered
The patients in ICU are kept under observation for a long duration. Since it is a very critical area, the patients are kept covered by a blanket most of the time. The blankets are removed for a short period of time generally only when a nurse or a doctor comes to provide the care. To train the data, images were augmented and normalized by their mean and variance especially extracted out to suit the thermal data. A ResNet-152 architecture was trained using PyTorch[23] framework in Python3[24] to classify each frame into covered and uncovered, i.e. abdomen and feet are visible. The model was finally implemented on videos sampled at 1fps.
Multiple Person Detection
In the intensive care unit, caregivers tend to provide care to the patient. The caregiver might come into the field of the camera mounted over the bed. For the CPD extraction task, there is a need to filter out the presence of this additional person, so as not to confuse the algorithm between the caregiver and the patient. A variety of images of the patient alone and along with the person/caretaker was taken and augmented. The frames could just have been discarded but a few videos in the dataset contained the presence of the caretaker throughout the duration for which they were captured. The now visible area could be further used for CPD extraction. The thermal images were manually annotated for which they were captured. The now visible area could be further used for CPD extraction. The thermal images were training the multiple person detector.
We used YOLOv3[25] in PyTorch having DarkNet53 as its base architecture. Finally, the trained detector was evaluated for the IoU(Intersection over Union) area; the best performing detector model was used for detecting and masking the caregiver.
Segmentation and CPD Extraction from Abdomen and Feet
Nagori A. et al.[15] proved that the probability of shock depends directly on Center-to-Peripheral Difference (CPD). For this study, the abdomen has been taken as the center and the peripheral is taken to be the foot. The images were annotated manually and pixel-wise using js-annotator-tool. The target maps contained 3 one-hot encoded layers corresponding to the abdomen, feet, and background. The input images were normalized with the mean and variance especially extracted from the distribution of the dataset in use. Appropriate image padding was done to ensure the aspect ratio of the images remains the same in case of any change in the input dimension. To account for a low dataset of pixel-wise segmented images for training, a ResUNet with ResNet-18, pre-trained on ImageNet, was used as an encoder. UNet[26], being specifically introduced to segment the less abundantly found medical data, helps to gather more local and global information even in the dearth of data, and thus efficiently segmenting out the images. The skip connections from the encoder to the decoder helps the model to keep the original pixels at that particular scale in consideration while recreating at the decoder and thus learning finer details efficiently. Smaller skip connections in ResNet-18 encoder helps to deal with the problem of vanishing gradients and thus make the learning more efficient[23]. A cutoff threshold was set on the predicted outputs to remove any weakly predicted pixels. The area detected was used as a region of interest in the original image and the mode of the detected probabilities was taken as the point of temperature extraction from the segmented out abdomen and feet. The difference was divided by the abdomen value to keep CPD robust from the thermal noise.
Difference percent =\(\frac{(Abdomen Intensity - Foot Intensity)}{Abdomen Intensity}\times 100\)
LSTM Time Series Sequence Classification
The videos were sampled at 1 fps to extract the CPD data from every uncovered window possible. Windows of 256 data points corresponding to 256s (4.26 min, padded, if necessary) were taken as an input to the LSTM based classifier. The windows less than 256 are padded with 0s and the windows greater than 256 are split in an overlapping fashion, when necessary. Each CPD, along with the heart rate at its corresponding time point was taken to finally label it with the shock index, and hence the presence of shock/non-shock. The missing heart rate data at certain points was imputed with linear interpolation if the missing data was less than 10% of the time series length. Since the data is highly imbalanced with more non-shock sequences, the training data is augmented with the SMOTE[27] oversampling method. The LSTM sequence classifier was followed by a series of dense layers with a dropout of 0.2, which then passed through a sigmoid layer to output the binary shock index, and hence, the occurrence of shock/no-shock.
Linear Mixed-Effects and Random Forest sequence classification on tsfresh features
The tsfresh[28] features were extracted from 256-length sequences and trained on the same train and validation distributions as the previous LSTM model. ‘Boruta’[29] package from R-language was used for this purpose. Variation Inflation Factor (VIF) is used to reduce multicollinearity in data. If the VIF value exceeds 10, then the collinearity is considered problematic and hence that particular variable causing it should be removed. The remaining features were used to train the linear mixed-effects and random forest models.
Direct Classification of thermal images/videos for future risk of shock
Apart from CPD extraction, an attempt was made to classify into shock/no-shock by directly giving the whole images/videos as the input. In one direction, we tried to classify each video frame read at a time and conducted experiments with several modern architectures based on convolutional neural networks (CNN). The concepts of TV Chambolle denoising, data augmentation and undersampling/oversampling, were used to get the best shock detection AUROC of 0.60 using ResNet-50. Also, the information extracted from a single image frame can be very limited. So instead, we tried to use direct and continuous video samples of length 256s as an input to a conjunction of various CNN and LSTM models, trained in a time distributed manner. Being a fundamental extension of the direct image classification problem, it suffered from similar limitations.
Outcome variable - Binary shock index
The SAFE-ICU initiative has enabled this research to gather the PICU data and extract the vitals and the corresponding time stamps at the 0th hour (time of video capturing) and at the next 6 hours. Shock index was taken as the median heart rate and median non-invasive blood pressure or the arterial systolic blood pressure, for moving sequential windows of 30 data points at a resolution of 15s. Shock-index Paediatric Age-Adjusted (SIPA) was then used to compute the age-specific binary outcome for each patient.
Time Points
The time at which the video was captured was taken as the 0th hour, and the predictions of shock/no-shock were performed for the next 6 hours.
Model Evaluation: The video data was first partitioned patient-wise such as to keep train, validation, and test sets unseen from each other. For the 10-fold cross-validation, the data was partitioned with the ratio of 60:20:20 into these three sets in a stratified manner, i.e. keeping the distribution of low-percentage shock class comparable in all three sets. The training data was augmented for the low-found shock class using SMOTE oversampling method; the validation and test sets remain unchanged in their size in each respective fold. The model analysis was mostly done on the Area Under Precision-Recall Curve (AUPRC) and Area under Receiver Operating Characteristic (AUROC) curve. Other standard metrics like F1-score, PPV, NPV, Specificity, and Sensitivity, were evaluated at the Youden’s Index (J)[30]. Since there is a high significance of prevalence in the medical domain, calculating the metrics at Youden’s Index becomes important.