The data set used
The data used in this study included suspected daily cases of pertussis from 25th February 2012 to 23rd March 2018. The data is nationally collected from the national registry at the department of vaccine-preventable diseases, in the Iranian Ministry of Health (Fig. 1).
According to the information provided by the national health authorities, no outbreak detected an understudied period. So, simulation outbreaks were used to assess the performance of the WOD, also EWMA and poison regression-based methods in outbreak detection. Two approaches were considered in the simulated outbreaks. Firstly, reported daily pertussis outbreaks in the literature were used as a source for data. (17–24) Then, the extracted outbreaks were injected into the real number of reported pertussis suspected cases in Iran and considered as a gold standard. The second approach focused on the outbreaks created by the researcher; which differ in types, duration, and sizes. In this study, three types of outbreaks were simulated and injected into real data. The three types of outbreaks are exponential (2,4,8,16,8,4,2), Linear (2,4,6,8,6,4,2) and uniform (6,6,6,6,6,6) increase cases over time. Overall, a total of 560 outbreak days were injected in the real dataset including 10 exponential, 9 linear, and 9 uniform outbreaks. The duration of these outbreaks was between 1 to 5 weeks. The number of injected cases was the utmost 3δ of real data means and the interval between each outbreak was 2 months (Figs. 2 and 3).
Methods Of Outbreak Detection
Wavelet-based outbreak detector (WOD)
In this study, we used a discrete wavelet transform (DWT) model introduced by Aradhye et al (25), and Shemuli et al (14) referred to Multi-scale statistical process control (MSSPC). In this method, the first step e understudy time series decomposed using the desired wavelet (in the current study, db10 and Haar wavelets were used). The result of this decomposition is the production of approximate and detailed coefficients at the first level. In the next step, the approximation coefficient of the first level was decomposed for producing the approximation and details coefficients of level 2. The decomposition of series continued up to 5 levels. In the next step, the Shewhart control chart was applied in monitoring all details coefficients and the last approximation coefficient (level 5). If the values of the coefficients were within the upper and lower limits of the Shewhart chart, the values were converted to zero and the values outside the range were kept. Then, the time series were reconstructed and monitored by the Shewhart control chart to detect the simulated outbreaks. The chart was used in statistical process control had upper and lower control limits calculated from µ ± kσ; where µ is the mean of the understudy time series, k is the fixed-parameter which ranges from 0 to 3 that considered 0.5 to 2 in the study. σ is the standard deviation of the understudy time series. The DWT has different wavelets. More details about the type of wavelets are described in other sources (14, 25–27).
The Exponentially Weighted Moving Average (EWMA)
The statistic of EWMA in a day of t was defined as follows:
EWMAt = λYt + (1 – λ) EWMA t− 1 (12)
Where, Yt equals the number of suspected cases of pertussis in a day of t and λ is the weighting parameter which includes 0 < λ ≤ 1. A value of λ = 1 gives more weight to newest data and small value of λ (closer to 0) gives more weight to older data(28). In this study, the amount of λ determined 0.46 and 0.3 for literature-based and researcher made outbreaks respectively.
The upper control limit or the alarm threshold level by using this method calculated as follows:
Upper Control Limit = EWMA0 + k × σEWMA (28–30)
Where k is specified in a way that results in the desired confidence interval. In the current study, k = 0.5, 1, 1.5,2 and 3 were considered. σEWMA is the standard deviation of the calculated statistics of EWMA at times t to tn and EWMA0 is the mean of non-outbreak days. If the calculated EWMA statistics were more than the upper control limit, it is considered as an alarm for outbreak or aberration. To remove explainable patterns and create a normal distribution, Moving Average (MA):13 were used. It means the mean of the previous 13 data was replaced instead of each data.
Poisson Regression-based Method
The Upper control limits of Poisson regression
To determine the upper control limit, the expected mean of suspected pertussis cases was estimated as follows:
Where log λj is the expected mean of occurred cases in time j, X is another effective factor, such as day, month and so on which are effective factors on the expected mean of pertussis cases and β is Coefficient of X. After the estimation of this parameter, the alarm threshold limit estimated as follows:
Where Vt is the variance of estimated mean and its equal to:
Vt: var (α) + t2 var (β) + 2t cov(α,β)(31)
The Z 0.90, Z 0.95, Z 0.99 were used to calculate different thresholds.
Measures Of The Algorithm's Performance
The performance of these methods was measured using the sensitivity, specificity; false alarm rate, likelihood ratio and area under the receiver operating characteristics (ROC) curve (AUC). The total number of outbreak-days was considered as the gold standard method to calculate the appropriate measures in order to evaluate the performance of different algorithms.
All analyses performed using MATLAB R2018a, STATA 15 (StataCorp LLC) and Excel 2010.