Physiological Vital Time Series Forecasting using Fractional Calculus and Deep Neural Network

doi:10.21203/rs.3.rs-4117200/v1

Download PDF

Research Article

Physiological Vital Time Series Forecasting using Fractional Calculus and Deep Neural Network

https://doi.org/10.21203/rs.3.rs-4117200/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Continuous physiological monitoring integrated with time series analysis and multi-step forecasting is vital when encountering postoperative cases either hospitalized in intensive care units (ICU) or given home health care will experience adverse cardiac events. The low-cost common vital signs, i.e., heart rate and arterial blood pressure are captured and predicted with adjustable horizons up to 30 minutes in advance to achieve punctual clinical decision-making to prevent the events of bradycardia, tachycardia, hypo-tension, and hypertension. Scaling properties of physiological stationary/non-stationary signals are necessarily determined and drastically affected by the selection and architecture design of time series forecasting models. In contrast to integer-order difference that achieves stationary memory-erased series, fractional order difference ensures the stationary of the data while preserving as much memory as possible. The deep learning architecture for multi-step forecasting is the combination of two direct and iterative methods which utilizes the concepts of U-Net convolutional networks and multi-layer bi-directional long short-term memories (Bi-LSTMs). Various scenarios of observe-target windows e.g. (20, 30, 60, or 120) - (7, 15, 20, or 30) minutes are trained using hyper-parameter tuning and evaluated by mean absolute percentage error (MAPE). The results of the proposed method indicate that crucial vital signs such as heart rate, systolic blood pressure and mean arterial blood pressure will be predictable in an adjustable observe-target window size from 20 − 7 to 120 − 30 minutes with narrow ranges of MAPE values between [2.78%, 4.17%], [4.69%, 6.47%] and [4.45%, 6.86%], respectively.

Time series forecasting

non-stationary process

fractional calculus

machine learning modeling

adverse clinical events

Recently, continuous physiological monitoring integrated with time series analysis and forecasting has drawn biomedical researchers’ attention to prevent respiratory and cardiac complications and death, especially in the postoperative period (ElMoaqet et al., 2016). This is revolving around a vital field when encountering 1–6% (thousands annually) of children being cared for in an intensive care unit (ICU) who will experience a cardiac arrest (Kennedy and Turley, 2011). These arrests happen despite ECG, pulseoximetry, and frequent blood pressure measurements have been continuously monitored. The most active researches provide rigorous predictions based on classification algorithms, which map the input physiological signal(s) to output values to diagnose and label without modeling the underlying physiological data dynamics. On the other hand, the accurate and reliable system of multi-step forecasts is a particular open trial. Furthermore, exploring deep learning algorithms is still necessary for multi-step forecasting models (Masum et al., 2018, Lim and Zohren, 2021). Continuous monitoring should accompany adjustable long-horizon forecasting to achieve a punctual clinical decision making. Accurate forecasting of physiological time series can equip treatment procedures and healthcare professionals to intervene early and prevent adverse clinical events. Time series forecasting has been utilized to design intelligent alarm threshold-based monitoring which is the state-of-the-art technology for hospitalized patients (Masum, 2019). These systems adjustable on high sensitivity alarms forecast an adverse event based on the monitored signal(s) to distinguish anomaly behavior from a normal stream.

However, the desire for intelligent monitoring and benefits is growing recently, these current systems have not been shown to reproducibly improve outcomes in hospitalized patients (Watkinson et al., 2006). New techniques for state detection, such as the fusion of physiological signals from multiple channels, have been developed (Tarassenko et al., 2006). Nevertheless, these methods have not been proven to improve patient outcomes yet. Two common challenges with intelligent monitoring systems are i) the relative rarity of adverse events, especially in the early postoperative period (Watkinson et al., 2006, Watkinson and Tarassenko, 2012); and ii) meeting stationary condition which is the most important assumption for many time series models, and the non-stationary time series should be transformed to stationary ones (Hyndman and Athanasopoulos, 2018, Dickey, 1984).

The first challenge limits the clinical precision of intelligent systems. Additionally, an investigation of patterns of in-hospital deaths shows that late detection of clinical instability results in delayed recognition and reduced successful clinical intervention (Lynn and Curry, 2011). ElMoaqet et. al. in 2016 developed a framework for multi-step ahead prediction models with a performance metric to compensate for and resolve this challenge in intelligent monitoring systems (ElMoaqet et al., 2016). They have proposed a performance metric evaluating near term predictions of critical levels of anomaly in physiological time series. Thereafter, utilizing this metric to build a framework for multi-step ahead prediction models which are capable of forecasting critical levels of anomaly. As for the second challenge, in the analysis of scaling signals, stationary and non-stationary time series determine not only the form of auto-correlations and moments but also the selection of estimators (Kristoufek, 2014). A time series is stationary when the main properties including mean, variance, or auto-correlation structure remain constant all over time. Contrary, if any of these properties vary over time, it is categorized as a non-stationary signal (Hyndman and Athanasopoulos, 2018). Additionally, raw acquired data may require prepossessing and/or decomposition ahead of scaling processes (Jebb et al., 2015).

Either utilizing decomposition models to better understand the underlying dependencies of time series (Hyndman and Athanasopoulos, 2018) or only a normal cleaning of the data, classifying the signal into stationary or non-stationary is necessary for further analysis and forecasting. The theory of scaling processes has affected meaningfully the performance of analyzing techniques as well as forecasting models in several fields of applied science (Pacheco et al., 2012). Aspects of scaling behavior have been proved in finance (Beran, 1992, Beran, 2017), in the analysis of heart rate variability and EEGs as examples of time series in physiology (Cannon et al., 1997, Eke et al., 2000) in mood characterization and other psychological behavioral variables (Delignieres et al., 2006, Jebb et al., 2015), in the modeling of computer network traffic and their lags in Local Area Networks and Wide Area Networks (Lee and Fapojuwo, 2005), and most signals in physiology and neuroscience (Gujral et al., 2020, Xue et al., 2012) among others. Determining stationary or non-stationary conditions is crucial for analysis or estimation purposes as many techniques have been developed for stationary time series forecasting. Once a non-stationary signal encounter to an analysis or modeling technique designed for stationary conditions will result in an ambiguity or drastic performance reduction, respectively. Formal statistical tests for stationary are unit root tests. The most common approach is the augmented Dickey–Fuller test (Said and Dickey, 1984) which tests the null hypothesis that a unit root is present in an auto-regressive (AR) time series model (the null hypothesis i.e. the series is non-stationary). Another widely used test is the Kwiatkowski–Phillips–Schmidt–Shin (KPSS), which determines if a time series is non-stationary because of a unit root or stationary around a mean or linear trend (Kwiatkowski et al., 1992).

The sources may result in time series being non-stationary accounting for the systematic components as well as the underlying dependencies of the past. It requires a procedure to remove the systematic trend and seasonal effects which are not of interest on the mean level of the series. The most important method is differentiating which converts non-stationary time series into a stationary on the mean of a series. In the simplest case of a linear trend, a series of first differences can effectively “detrend” the original series (Hyndman and Athanasopoulos, 2018). In the case when the time series exhibits a varying trend by itself, then even the first difference may not result in a completely stationary one. Therefore, higher orders of differentiating are applied to make a series stationary. Regularly, in practice, the first or second differences will nearly always make the mean stationary, and it is almost never necessary to go further (Chan and Cryer, 2008). Preventing over differentiating the time series is required to reach the lowest variance of the transformed series (Jebb et al., 2015). Integer order difference data transformations make the series stationary, but the cost is removing all memory from the original series (Lopez de Prado, 2018). Although stationary is a necessary property for inferential purposes, it is a dilemma, especially in the case of time series forecasting that memory preserving is the basis for the predictive models.

The main failure of Auto-Regressive Integrated Moving Average (ARIMA) modeling to times series is imposed by differentiating to achieve stationary memory-erased series (Sutcliffe, 1994). As a complex mathematical remedy, the use of fractional order difference ensures the stationary of the data while preserving as much memory as possible (HOSKING, 1981). It opens up a much wider and more realistic behavior for the trend and seasonal components than traditional integer difference. Nevertheless, fractional differentiating the stochastic process accounts for a burden of computational operation. After Hosking’s paper in the 80s, surprisingly a pause period of using a fractional scheme happened in which the literature on this subject has been scarce i.e. only eight journal articles have been published (Lopez de Prado, 2018).

Once make time series stationary by fractionally differentiate to preserve the memory carries on, the next step is choosing a time series forecasting model. Multi-step forecasting for a long-term horizon is very challenging. In the case of linear statistical models such as ARIMA which is used to predict linear data. However, most real-world applications contain nonlinear data; nonlinear time series forecasting and analysis are yet developing (Kam, 2014). Alternatively, machine learning models are data-driven models which explore patterns within the data to create forecasting models for either linear or nonlinear data. Machine learning has recently been used in healthcare especially in decision support (Frizzell et al., 2017). Particularly, Taieb et al. and Masum et. al. described and compared five different forecast strategies including the Recursive strategy, Direct strategy, Direct-Recursive (DirRec) strategy, Multi-Input Multi-Output (MIMO) strategy, and Direct Multi-Output (DIRMO) strategy (Bontempi et al., 2013, Masum et al., 2019). All strategies utilized the combination of long short-term memory (LSTM), Bidirectional-LSTMs (Bi-LSTM), and Convolutional Neural Networks (CNN). However, they have not used fractional differentiating for preprocessing to make the physiological data stationary. Masum et. al. showed that the forecast model that uses Bi-LSTM with the DIRMO strategy is the more reliable model to forecast heart rate (HR) and blood pressure (BP) time series (Masum et al., 2019). In 2019, Liu and Motani proposed a new approach called generative boosting that includes two parts of the predictive and generative models. Generative boosting utilizes LSTM for both parts leading to a scheme called generative LSTM (GLSTM). The first model consists try to generate synthetic data for the next few time steps, and the second models, try to make long-range predictions based on observed and generated data. Generative boosting mitigates the error propagation in the generative models and reduces the effective prediction horizon in the predictive models. They showed that GLSTM outperforms efficient benchmark models, in such a way that the mean absolute percentage errors (MAPE) of 7.41% and 6.17% were achieved to predict HR and systolic blood pressure (SBP) 20 minutes in advance, respectively (Liu et al., 2019). In 2020, Youssef et. al. proposed a hybrid machine learning algorithm of KNN-LS-SVM instead of LSTM-based models for real-time early warning scores (EWS) estimation and vital signs time-series prediction. They preserved at least one-hour statistical attributes of the different vital signs (i.e., minimum, mean) as input data to forecast statistical attributes one, two, and three hours in advance (Youssef Ali Amer et al., 2020). They achieved the MAPE of predicting a one-hour average heart rate are 4.1, 4.5, and 5% for the next one, two, and three hours respectively for cardiology patients.

This study will present a preprocessing based on fractional-order differentiating followed by deep learning architecture containing directive and iterative steps that utilize U-Net convolutional networks (Ronneberger et al., 2015) and multi-layer Bi-LSTMs. U-Net structure accompanied with skip connections help the entire model transform the raw time series, extract more informative features, and then feed to multi-layer Bi-LSTM for long horizon prediction. The obtained results will be indicated in the result section. Finally, it will end up with a discussion and conclusion.

In this section, the proposed procedure will be described to build deep learning models and how the acquired data is prepared and preprocessed before feeding the predictive models. In order to implement the difference operator to non-integer steps, an alternative method has been utilized which is called Fixed-Width Window Fractional Differentiation (FFD) (Lopez de Prado, 2018). It introduces the idea to find the optimal balance between zero differentiation and fully differentiated time series to recover stationary. In general, time series forecasting can be classified as short-term (single-step forecast), and long-term (multi-step forecast). Choosing forecasting classes and models depends on the application and sampling time of collected data. Deep learning architectures for multi-step forecasting are divided into direct and iterative methods which is utilizing the concepts of U-Net convolutional networks and Bi-LSTM (Bontempi et al., 2013, Masum et al., 2019).

Preprocessing was performed on the raw data. The signals were first smoothed, and the noise was reduced by a convolution filter order (d) equal to 0.3. Then, signals were scaled by the Min-Max scaler. It is an effective normalization technique that enables us to scale data in a dataset to a particular range using the minimum and maximum values of each feature. Finally, a fractional difference with order d = 0.3 was used to make the time-series stationary while keeping its memory and predictive power.

Dataset:

Waveform Database Matched Subset of the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC)-III, a dataset containing 22,247 numeric records from 2001 to 2012 for 10,282 patients in ICU, was collected at the Beth Israel Deaconess Medical Center in Boston, Massachusetts (Johnson et al., 2016). We used records that contain periodic measurements including HR (times/min), SBP (mm-Hg), and diastolic blood pressure (DBP, mm-Hg). We used records in which all three desired signals were present simultaneously. This work was due to the multivariate model that is trained by all three signals and the simultaneity of the signals is required. Records with a frequency of 1.60 Hz and a length of more than 180 minutes were selected. Finally, from 22247 records, 4141 were selected according to the above criteria. Each signal was 212947 hours, with the maximum record length being 34615 hours, the minimum being 180 minutes, and the average being 3085 minutes.

Fractional Order Difference:

The notion of fractional differentiation applied to the time series has been developed by Hosking in 1981 (HOSKING, 1981). In the following, the concept of fractional differentiation is described in detail. Assume a time series ${X}_{t}$ is not stationary and let B the backshift operator $({B}^{k}{X}_{t}={X}_{t-k})$ which is traditionally denoted for integer difference as following equation:

$${\nabla }^{d}{X}_{t}={(1-B)}^{d}{X}_{t}$$

In a fractional differentiation, the exponent $d$ can be a real number, with the following binomial series expansion:

$${\left(1-B\right)}^{d}=\sum _{k=0}^{\infty }\left(\frac{d}{k}\right){\left(-B\right)}^{k}=1-dB+\frac{d\left(d-1\right)}{2!}{B}^{2}-\frac{d\left(d-1\right)\left(d-2\right)}{3!}{B}^{3}+\dots \left(2\right)$$

Despite integer d, the weights, ${\omega }_{k}$, in Eq. 3 will not be zero in real value $d$ that means to preserve memory. Therefore, the current value in time series depends on all the past values that occurred. From Eq. 2 the weights can be generated by following iterative scheme:

$${\omega }_{k}=-{\omega }_{k-1}\frac{d-k+1}{k} \left(3\right)$$

Although, there is an explicit expression (Eq. 3) for fractional order difference but in practice due to data limitations, the fractionally differentiated values cannot be computed on an infinite series of weights. Therefore, two alternative implementations of fractional differentiation have been proposed, i) the standard “expanding window” method, and ii) an efficient method based on FFD. In FFD, the weights are kept based on their modulus ($\left|{\omega }_{k}\right|$) values more than a given threshold while the remains are dropped. This modification results in the advantage that the same vector of weights is used across the entire time series differentiating, thus avoiding the negative drift caused by an expanding window’s added weights.

Deep Learning Algorithm:

In our study, we employed the DIRMO forecasting strategy, as suggested by Masum et al. (Masum et al., 2019), in order to achieve desirable results in forecasting physiological time-series data.

The implementation of the time series regression model was achieved through the utilization of a machine learning architecture, as illustrated in Fig. 1. We utilized a hybrid architecture composed of CNN and Bi-LSTM layers to combine the benefits of both CNNs and LSTMs. CNNs are well suited for learning and extracting salient features from an input feature, while LSTMs can capture temporal information from time series data. The network architecture comprises eight one-dimensional convolutional layers, forming a U-Net-based structure. As the original U-Net (Ronneberger et al., 2015) model, it incorporates skip-connections to combine low-level feature maps with high-level feature maps. Each convolutional layer performs a convolution operation using a kernel size of 16 followed by a Rectified Linear Unit (ReLU) activation function, which introduces non-linearity into the model.

After eight convolutional layers, the resulting feature map, after flattening, is fed into a dense layer with as many units equal to the target length. The output of the dense layer is fed into a Bi-LSTM layer (BILSTM1). Three Bi-LSTM layers are employed in the model architecture. Multiple Bi-LSTM layers allow the hidden state at each layer of the network to operate at a different time scale, thus enabling the model to capture a wide range of temporal dynamics. A Bi-LSTM layer produces an output sequence represented as a vector, which is then used as input to a subsequent Bi-LSTM layer. The output shape of the final Bi-LSTM layer (BILSTM3) is subsequently fed into a time-distributed layer. A time-distributed layer is a wrapper that allows dense layers to process time-series inputs. The output of the time-distributed layer is the model's output, the shape of which is dependent on the target length. The model's output includes the observed sequence (input) and predicted values for subsequent time points.

Fractional differentiation is computed using FFD, which is, since the coefficients are driven by Eq. (3) tend to zero (Fig. 2), a given threshold determines how many coefficients will be preserved (Lopez de Prado, 2018). In this study, the first eight coefficients are kept for applying FFD, therefore the driven signal is lost in the first seven entities. There are two well-known unit root tests to determine whether a given time series is stationary, including the augmented Dickey–Fuller (ADF) test and the KPSS test (Schlitzer, 1995). The null hypothesis of the ADF test assumes non-stationary, whereas the null hypothesis of the KPSS test is stationary.

To find the optimized order of differentiating, the combination of the ADF and KPSS test statistics and Pearson correlation coefficient were utilized. These curves are depicted in Fig. 3. The dashed horizontal line represents the threshold if the ADF test statistic passes through, then it ensures that the fractionally difference series is stationary. The same concept with dash-dotted horizontal line which is associated with KPSS test statistic. A sample HR time series was examined in Fig. 2 that represents the behavior of such signal from our dataset and can help to adjust the difference order for further simulation. In Fig. 2. for the sample HR series, the difference order is about 0.57 and 0.91 accordingly to ADF and KPSS criteria, respectively, which shows a substantial variation. On the other hand, finding different orders based on this approach depends on the length of the time series. Additionally, the inverse fractional-order difference operation is required in practice to demonstrate the predicted series in the original scales. The inverse operation contains an intrinsic deterioration, especially in the difference order close to 1. On another note, the Pearson correlation is on the left y-axis, showing the correlation between the original series and the driven series which is almost 0.6 and 0.25 respectively to a difference order of 0.57 and 0.91. When the first-order difference is applied to the series, the Pearson correlation drops drastically down. For all these essential reasons, we decrease and adjust the difference order to 0.3 in this study to utilize fractional difference features while tolerating a little non-stationary trait. However, we examined the different order of a range between 0.3 to 0.6 which consequence a subtle performance variation.

Our study used the U-Net and Bi-LSTM recurrent neural network to forecast the future of HR, SBP, and mean arterial blood pressure (MAP) (Masum, 2019) from uni-variant time-series data. Each model was evaluated using MAPE, MAE, and MSE of different observe-target windows (20 − 7 min, 20 − 15 min, 30 − 7 min, 30 − 15 min, 30 − 20 min, etc.) according to Tables1-4. In all tables, the best value of each criterion is bolded and the worst one has been underlined.

Table 1

The validation loss values as well as classification performance for various observation and target lengths in HR.
Performance	Metrics	Observation length (min)	20		30			60				120
Performance	Metrics	Target length (min)	7	15	7	15	20	7	15	20	30	7	15	20	30
Loss Functions	MAPE		3.19	3.75	3.03	3.68	4.06	2.83	3.43	3.84	4.17	2.78	3.23	3.56	3.94
	MAE		2.49	2.94	2.33	2.84	3.03	2.18	2.65	3.01	3.24	2.17	2.5	2.77	3.07
	MSE		20.85	26.74	18.39	25.7	28.08	15.21	20.85	27.87	29.09	15.2	19.29	21.82	27.41
Confusion Matrix	T-Normal	(%)	97.0	97.5	97.8	98.8	97.3	98.0	98.5	98.0	98.0	97.4	98.2	98.0	98.1
	FP (Brady)	(%)	0.8 + 0.	0.6 + 0.	1.0 + 0.1	0.4 + 0.	0.7 + 0.	0.6 + 0.	0.7 + 0.0	0.3 + 0.2	0.7 + 0.1	0.8 + 0.00	0.8 + 0.00	0.5 + 0.00	0.9 + 0.0
	FP (Tachy)	(%)	2.2 + 0.1	1.9 + 0.1	1.2 + 0.	0.8 + 0.	2 + 0.3	1.4 + 0.	0.8 + 0.0	1.7 + 0.3	1.3 + 0.1	1.8 + 0.2	1.0 + 0.1	1.5 + 0.2	1 + 0.1
	T-Brady	(%)	82.4	76.4	82.1	79.1	76.7	83.9	84.0	75.4	77.9	92.3	89.0	85.0	80.4
	T-Tachy	(%)	86.9	81.5	86.2	79.2	78.3	86.1	82.5	80.5	72.8	86.2	83.4	79.7	76
Confusion Matrix Metrics	ACC	(%)	88.8	85.16	88.73	85.7	84.2	89.33	88.33	84.8	82.96	92.03	90.23	87.63	84.86
	PPV (Brady)	(%)	99.03	99.22	98.67	99.49	99.09	99.28	99.17	99.34	98.98	99.14	99.1	99.41	98.89
	PPV (Tachy)	(%)	97.42	97.6	98.62	99.0	97.14	98.4	99.03	97.57	98.11	97.73	98.69	97.91	98.57
	MCC (Brady)		0.86	0.82	0.85	0.84	0.82	0.87	0.87	0.81	0.83	0.93	0.91	0.88	0.84
	MCC (Tachy)		0.88	0.84	0.88	0.84	0.82	0.88	0.86	0.83	0.78	0.88	0.86	0.83	0.81

Table 2

The validation loss values as well as classification performance for various observation and target lengths in SBP.
Performance	Metrics	Observation length (min)	20		30			60				120
Performance	Metrics	Target length (min)	7	15	7	15	20	7	15	20	30	7	15	20	30
Loss Functions	MAPE		5.61	6.37	5.33	6.19	6.52	4.93	6.02	6.44	6.88	4.69	5.67	6.01	6.82
	MAE		7.11	7.97	6.73	7.73	8.12	6.26	7.50	7.86	8.46	5.93	7.09	7.47	8.36
	MSE		137.98	156.55	121.76	145.8	156.5	103.4	139.2	143.05	161.83	103.91	128.1	136.37	164.31
Confusion Matrix	TNR	(%)	96.75	97.38	97.38	98.08	97.97	97.97	97.70	97.78	98.38	97.18	98.2	98.2	97.85
	FPR	(%)	3.25	2.62	2.62	1.93	2.02	2.02	2.30	2.23	1.62	2.82	1.8	1.8	2.15
	TPR	(%)	74.45	67.47	76.08	66.90	66.40	75.88	70.25	67.12	61.30	79.95	70.53	67.8	62.3
	FNR	(%)	25.55	32.52	23.93	33.10	33.60	24.12	29.75	32.88	38.7	20.05	29.48	32.2	37.7
Confusion Matrix Metrics	ACC	(%)	85.6	82.42	86.72	82.48	82.18	86.92	83.97	82.45	79.83	88.56	84.36	83.0	80.07
	PPV	(%)	95.81	96.25	96.66	97.2	97.04	97.4	96.82	96.79	97.41	96.59	97.51	97.41	96.66
	MCC		0.73	0.68	0.75	0.68	0.67	0.75	0.7	0.68	0.64	0.78	0.71	0.69	0.64

Table 3

The validation loss values as well as classification performance for various observation and target lengths in MAP.
Performance	Metrics	Observation length (min)	30			60				120
Performance	Metrics	Target length (min)	7	15	20	7	15	20	30	7	15	20	30
Loss Functions	MAPE		5.09	5.97	6.44	4.63	5.61	6.11	6.47	4.45	5.4	5.88	6.44
	MAE		3.8	4.53	4.85	3.47	4.21	4.59	4.93	3.31	4.01	4.43	4.88
	MSE		44.26	55.45	59.94	36.40	47.9	57.16	59.64	35.58	45.23	56.48	61.38
Confusion Matrix	TNR	(%)	93.3	94.67	94.58	95.17	95.2	95.03	95.9	94.58	95	94.88	95.33
	FPR	(%)	6.7	5.33	5.42	4.83	4.8	4.98	4.1	5.42	5	5.12	4.67
	TPR	(%)	81.4	74.38	74.72	83.93	77.68	76.33	72.1	86.6	82.04	79.33	75.15
	FNR	(%)	18.6	25.62	25.27	16.07	22.32	23.67	27.9	13.4	17.95	20.67	24.85
Confusion Matrix Metrics	ACC	(%)	87.35	84.52	84.65	89.55	86.43	85.67	84.0	90.58	88.52	87.1	85.23
	PPV	(%)	92.39	93.31	93.23	94.56	94.18	93.88	94.61	94.1	93.67	93.93	94.14
	MCC		0.75	0.7	0.7	0.79	0.74	0.72	0.7	0.81	0.77	0.75	0.71

Table 4

The comparison of fractional difference and normal difference as the order of 0.3 and 0 for 20 − 7 of observation-target lengths in HR.
Diff order	Metrics	Observation length (min)	20
Diff order	Metrics	Target length (min)	7
d = 0	MAPE		4.44
	MAE		3.36
	MSE		28.06
d = 0.3	MAPE		3.19
	MAE		2.49
	MSE		20.85

We experimentally demonstrate the model performance in terms of classification of clinical events using confusion matrix criteria on predicted target window.

Table 1 indicates the result of evaluating predicted HR signals based on three classes Normal, Bradycardia, and Tachycardia. Among all results for HR, the MAPE, MAE, and MSE had the lowest values (2.78, 2.17, and 15.2, respectively) all for a window size of 120-7 while their maximum values are 4.17, 3.24, and 29.09 respectively all related to the window size of 120 − 30. Moreover, the true positive of normal class has a maximum value of 98.8% for a window size of 30 − 15 and a minimum value of 97.0% for a window size of 20 − 7. True positive of bradycardia has a maximum value of 92.03% related to 120-7 and the minimum value of 76.4% related to 20 − 15. True positive ratio of tachycardia also has a maximum value of 86.9% for 20 − 7 and the minimum value of 72.8% for 60 − 30. The false positive of bradycardia has been a range of [0.4%, 1.1%] for 30 − 15 and 30 − 7 respectively. The false positive of tachycardia has been minimum in both window sizes of 30 − 15 and 60 − 15 with the same values of 0.8 while it has been maximum in 20 − 7 with the value of 2.3. Accuracy (ACC) has also been calculated in a range of [82.96%, 92.03%] related to 60 − 30 and 120-7. Positive predictive value (PPV) for bradycardia has a minimum value of 98.67% for 30 − 7 and a maximum value of 99.49% for 30 − 7. PPV for tachycardia has a minimum value of 97.14% for 30 − 20 and a maximum value of 99.03% for 60 − 15. The Matthew’s correlation coefficient (MCC) for bradycardia has a minimum value of 0.81% for 60 − 20 while it has a maximum value of 0.93% for 120-7. MCC for tachycardia has a minimum value of 0.78% for 60 − 30 while it has a maximum value of 0.88 for all windows with 7 minutes target length.

Table 2 shows the performance of the proposed model for SBP assumed on two normal and hypo-tension classes. According to the table, the SBP model performed best for the MAPE and MAE both in 120-7observe-target windows with values of 4.69% and 5.93 while MSE performed best in 60 − 7 window size with a value of 103.4. Also, MAPE and MAE have had maximum values of 6.88% and 8.46 for a window size of 60 − 30 and MSE has had the maximum value of 164.31 for 120 − 30. True positive ratio besides false positive ratios have shown their best values of 98.38% and 1.62% respectively both for 60 − 30. They also have had their worst values of 96.75% and 3.25% both for 20 − 7 window sizes. True positive ratio and false negative ratio both have had the best performance in 120-7 window size and the worst one in 60 − 30 with values of 79.95% and 20.05% besides 61.30% and 38.7% respectively. Accuracy, MCC, and PPV have the maximum values of 88.56%, 97.51% and 0.78% all related to the 120-7 observe-target window size. On the other hand, ACC has a minimum value of 79.83% for 60 − 30, PPV has a minimum value of 95.81% for 20 − 7 and MCC has a minimum value of 0.64 for two 60 − 30 and 120 − 30 window sizes.

Table 3 is a description of the MAP model based on two different classes of normal and hypo-tension. MAPE, MAE, and MSE are in the ranges of [4.45, 6.47], [3.31, 4.93], and [35.58, 61.38] respectively which all minimums are related to 120-7 and the maximums of MAPE and MAE both occurred in 60 − 30 though the maximum of MSE is in 120 − 30. Also, the true negative ratio with the range of [93.3, 95.9] along with the false positive ratio in the range of [4.1, 6.7] and PPV with the range of [92.39, 94.61] have had their best and worst values in the same window sizes of 60 − 30 and 30 − 7 respectively. True positive ratios with values in the range of [72.1, 86.6] and false negative ratios with values in the range of [13.4, 27.9] both have performed well in 120-7 while they have had their lowest achievements in 60 − 30. ACC with values between 84.52 related to 30 − 15 and 90.58 related to 120-7 has carried the least and greatest results respectively. Finally, the results of MCC with the minimum amount of 0.7 associated with 120-7 and the maximum amount of 0.81 associated with 30 − 15, 30 − 20, and 60 − 30 are seen in the table below.

Our study used the three HR, SBP, and MAP time-series data from the MIMIC-III database for prediction. Each time series has been made stationary in preprocessing step by using fractional derivatives and preserving most of the time series information. Noticeable improvements are visible in using fractional order in all criteria according to Table 4 and Table 5. We also tried to get results based on the first-order of difference. We observed very poor performance in which the prediction signals are the same for various input series, indicating that the model could not find any pattern due to the inherent deterioration made after using the first-order differentiation. If the model structure was changed to fit the first-order differentiating it would not be a proper comparison with the current proposed model.

Table 5

The comparison of fractional difference and integer difference as the order of 0.3 and 0 for 20 − 7 of observation-target lengths in MBP.
Diff order	Metrics	Observation length (min)	20
Diff order	Metrics	Target length (min)	7
d = 0	MAPE		7.54
	MAE		6.04
	MSE		79.64
d = 0.3	MAPE		5.09
	MAE		3.8
	MSE		44.26

In the following step to forecast time series in a multi-step manner, we exploited a deep neural network which is a combination of U-Net and Bi-LSTM. Unlike the study (Bontempi et al., 2013) which utilized CNN in the feature extraction sector, we exploited U-Net architecture as it is more efficient. In the first part, U-Net is deployed to extract features from input data to forecast relevant desired futures. In U-Net, by exploiting the auto-encoder structure, it is possible to reach salient features in the bottleneck. Also, utilizing skip connections serve to push more details between two linked down-sample and up-sample layers (Masum et al., 2019, Ronneberger et al., 2015). We operate hyper-parameter tuning to set the hyper-parameters such as the number of layers and filters. Unlike the original U-Net in which filters were designed in ascending order, we got the best result with filters in descending order which can be related to a number of salient features that are required to restore the rest of the signal. The performance of the suggested structure was evaluated by some common regression criteria such as MAPE, MAE, and MSE in addition to some criteria driven by the confusion matrix. Evaluating the performance of the model based on the regression criteria reveals how much the predicted signal is similar to the actual signal. These criteria are beneficial to show the power of the model for the simulation of exact values of signal even with its fluctuation in the future which is important to make sense of events probably will occur in the specified future time for the clinicians. On the other hand, evaluating the performance of the model based on criteria calculated from the confusion matrix is essential due to classifying the predicted signal based on a threshold to alarm when an intended event is likely to happen. Although, this kind of evaluation cannot consider a main reference due to the inherent error of threshold besides the lack of global standard values. However, we calculated ACC, PPV, and MCC in addition to four confusion matrix categories (True Positive, False Positive, True Negative, and False Negative) which are more interpretable for evaluating predicted signals. ACC simply means the number of values correctly predicted (it measures the fraction of correct predictions). PPV is used to indicate the probability that in case of a positive test that the patient has the specified disease. MCC is a more reliable statistical rate that produces a high score only if the prediction obtained good results in all of the four confusion matrix categories. It seems that the suggested model performance with ACC, PPV, and MCC scores of more than 0.8 could be a great model to forecast vital signs such as HR, SBP, and MAP during surgery or in ICU. However, when the target size increases to 30 min this performance decay to almost 0.7. Furthermore, the study showed a relatively narrow difference between the maximum and the minimum values of regression criteria (MAPE, MAE, and MSE). Model performance based on regression criteria as described in the Results section with details, have mostly the best values in a window size that has the longest observation length and the lowest target length i.e., 120-7. This makes sense with the theory that says the model performs better when it sees longer signals as it can find more patterns and also it is quite clear that less target window has been a more precise prediction (Liu et al., 2019). But it is worth noting that the longer the observed length, the more cost we pay, and it is preferred to make a system that can forecast more future time based on less input length. In this regard, the proposed structure has the advantage of having a narrow range of maximum and minimum for each criterion which means it also performs well for lower cost strategies such as 60 − 30 observe-target window and so on. As a comparison of the results, (Liu et al., 2019) has reported MAPE values of 7.41% and 6.17% for HR and SBP respectively for an observe-target window size of 20 − 7 when the results of the proposed method are 3.19% for HR and 5.61% for SBP in the same window size which indicate that our proposed method performed better.

In this paper, we proposed an effective way to directly forecast vital time series such as HR, SBP, and MAP for several minutes in advance. We make each time series stationary by fractional difference to preserve their main history. We showed that this algorithm has improved the MAPE reported by the last study. Also, we demonstrated the efficiency of the proposed method for use in the ICU to prevent adverse clinical events such as Bradycardia and Hypo-tension by evaluating criteria driven by a confusion matrix. Besides, we implemented a multivariate strategy but the results were poorer with respect to the uni-variate model, therefore we omit the results from this manuscript. Since the model structure is in a high interaction with the input type, it seems that our proposed model is not compatible with the multivariate input strategy and it is required some structural modification which would be considered in our future work.

Author Contribution

M.H. was the supervisor of five MSc students and proposed the main idea and modification leading to publishable results. M.H. wrote introduction S.N. and S.A.S.J. and M.F. wrote the rest of the manuscript. All authors reviewed the manuscript. M.F. downloaded and analyzed and cleaned the dataset. M.H. implemented fractional difference and with the help of S.N. and S.A.S.J. and M.F. implemented deep learning models and trained and evaluated them. K.S. and F.VM. worked on the software development and literature review.

BERAN, J. 1992. Statistical Methods for Data with Long-Range Dependence. Statistical Science, 7, 404-416, 13.
BERAN, J. 2017. Statistics for long-memory processes, Routledge.
BONTEMPI, G., BEN TAIEB, S. & LE BORGNE, Y.-A. 2013. Machine learning strategies for time series forecasting. Business Intelligence: Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures 2, 62-77.
CANNON, M. J., PERCIVAL, D. B., CACCIA, D. C., RAYMOND, G. M. & BASSINGTHWAIGHTE, J. B. 1997. Evaluating scaled windowed variance methods for estimating the Hurst coefficient of time series. Physica A: Statistical Mechanics and its Applications, 241, 606-626.
CHAN, K.-S. & CRYER, J. D. 2008. Time series analysis with applications in R, Springer.
DELIGNIERES, D., RAMDANI, S., LEMOINE, L., TORRE, K., FORTES, M. & NINOT, G. 2006. Fractal analyses for ‘short’time series: a re-assessment of classical methods. Journal of mathematical psychology, 50, 525-544.
DICKEY, D. A. 1984. Journal of the American Statistical Association, 79, 234-234.
EKE, A., HERMAN, P., BASSINGTHWAIGHTE, J., RAYMOND, G., PERCIVAL, D., CANNON, M., BALLA, I. & IKRÉNYI, C. 2000. Physiological time series: distinguishing fractal noises from motions. Pflügers Archiv, 439, 403-415.
ELMOAQET, H., TILBURY, D. M. & RAMACHANDRAN, S. K. 2016. Multi-step ahead predictions for critical levels in physiological time series. IEEE transactions on cybernetics, 46, 1704-1714.
FRIZZELL, J. D., LIANG, L., SCHULTE, P. J., YANCY, C. W., HEIDENREICH, P. A., HERNANDEZ, A. F., BHATT, D. L., FONAROW, G. C. & LASKEY, W. K. 2017. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA cardiology, 2, 204-209.
GUJRAL, H., KUSHWAHA, A. K. & KHURANA, S. 2020. Utilization of time series tools in life-sciences and neuroscience. Neuroscience Insights, 15, 2633105520963045.
HOSKING, J. R. M. 1981. Fractional differencing. Biometrika, 68, 165-176.
HYNDMAN, R. J. & ATHANASOPOULOS, G. 2018. Forecasting: principles and practice, OTexts.
JEBB, A. T., TAY, L., WANG, W. & HUANG, Q. 2015. Time series analysis for psychological research: examining and forecasting change. Frontiers in psychology, 6, 727.
JOHNSON, A. E., POLLARD, T. J., SHEN, L., LEHMAN, L.-W. H., FENG, M., GHASSEMI, M., MOODY, B., SZOLOVITS, P., ANTHONY CELI, L. & MARK, R. G. 2016. MIMIC-III, a freely accessible critical care database. Scientific data, 3, 1-9.
KAM, K. M. 2014. Stationary and non-stationary time series prediction using state space model and pattern-based approach, The University of Texas at Arlington.
KENNEDY, C. E. & TURLEY, J. P. 2011. Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU. Theoretical Biology and Medical Modelling, 8, 1-25.
KRISTOUFEK, L. 2014. Measuring correlations between non-stationary series with DCCA coefficient. Physica A: Statistical Mechanics and its Applications, 402, 291-298.
KWIATKOWSKI, D., PHILLIPS, P. C., SCHMIDT, P. & SHIN, Y. 1992. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of econometrics, 54, 159-178.
LEE, I. W. & FAPOJUWO, A. O. 2005. Stochastic processes for computer network traffic modeling. Computer communications, 29, 1-23.
LIM, B. & ZOHREN, S. 2021. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379, 20200209.
LIU, S., YAO, J. & MOTANI, M. Early prediction of vital signs using generative boosting via LSTM networks. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2019. IEEE, 437-444.
LOPEZ DE PRADO, M. 2018. Advances in financial machine learning (chapter 1). Advances in Financial Machine Learning, Wiley, 1st Edition (2018).
LYNN, L. A. & CURRY, J. P. 2011. Patterns of unexpected in-hospital deaths: a root cause analysis. Patient safety in surgery, 5, 1-25.
MASUM, S. 2019. Forecasting from Physiological Time Series Through Supervised Learning. University of Portsmouth.
MASUM, S., CHIVERTON, J. P., LIU, Y. & VUKSANOVIC, B. Investigation of machine learning techniques in forecasting of blood pressure time series data. Artificial Intelligence XXXVI: 39th SGAI International Conference on Artificial Intelligence, AI 2019, Cambridge, UK, December 17–19, 2019, Proceedings 39, 2019. Springer, 269-282.
MASUM, S., LIU, Y. & CHIVERTON, J. Multi-step time series forecasting of electric load using machine learning models. Artificial Intelligence and Soft Computing: 17th International Conference, ICAISC 2018, Zakopane, Poland, June 3-7, 2018, Proceedings, Part I 17, 2018. Springer, 148-159.
PACHECO, J. R., ROMAN, D. T. & CRUZ, H. T. 2012. Distinguishing Stationary/Nonstationary Scaling Processes Using Wavelet Tsallis q-Entropies. Mathematical Problems in Engineering, 2012.
RONNEBERGER, O., FISCHER, P. & BROX, T. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015. Springer, 234-241.
SAID, S. E. & DICKEY, D. A. 1984. Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika, 71, 599-607.
SCHLITZER, G. 1995. Testing the stationarity of economic time series: further Monte Carlo evidence. Ricerche Economiche, 49, 125-144.
SUTCLIFFE, A. 1994. Time‐series forecasting using fractional differencing. Journal of Forecasting, 13, 383-393.
TARASSENKO, L., HANN, A. & YOUNG, D. 2006. Integrated monitoring and analysis for early warning of patient deterioration. BJA: British Journal of Anaesthesia, 97, 64-68.
WATKINSON, P., BARBER, V., PRICE, J., HANN, A., TARASSENKO, L. & YOUNG, J. 2006. A randomised controlled trial of the effect of continuous electronic physiological monitoring on the adverse event rate in high risk medical and surgical patients. Anaesthesia, 61, 1031-1039.
WATKINSON, P. J. & TARASSENKO, L. 2012. Current and emerging approaches to address failure-to-rescue. The Journal of the American Society of Anesthesiologists, 116, 1158-1159.
XUE, C., SHANG, P. & JING, W. 2012. Multifractal detrended cross-correlation analysis of BVP model time series. Nonlinear Dynamics, 69, 263-273.
YOUSSEF ALI AMER, A., WOUTERS, F., VRANKEN, J., DE KORTE-DE BOER, D., SMIT-FUN, V., DUFLOT, P., BEAUPAIN, M.-H., VANDERVOORT, P., LUCA, S. & AERTS, J.-M. 2020. Vital signs prediction and early warning score calculation based on continuous monitoring of hospitalised patients using wearable technology. Sensors, 20, 6593.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Physiological Vital Time Series Forecasting using Fractional Calculus and Deep Neural Network

Status:

Version 1

Abstract

Figures

Introduction

Materials and Methods

Dataset:

Fractional Order Difference:

Deep Learning Algorithm:

Results

Discussion

Conclusion

Declarations

Author Contribution

References

Additional Declarations

Status:

Version 1