DOI: https://doi.org/10.21203/rs.2.22567/v1
Background
Early detection of outbreaks is very important for surveillance systems. Due to the importance of the subject and lack of similar studies in Iran, the aim of this study was to determine the performance of the Wavelet-Based Outbreak detection method)WOD(in detecting outbreaks and to compare its performance with Poisson regression-based model and Exponential weighted moving average (EWMA) using data of simulated pertussis outbreaks in Iran.
Methods
The data on suspected cases of pertussis from 25th February 2012 to 23rd March 2018 in Iran was used. The performance of the WOD (Daubechies 10 and Haar wavelets), Poisson regression-based method, and EWMA Compared in terms of timeliness and detection of outbreak days using the simulation of different outbreaks (literature-based and researcher-made outbreaks). The sensitivity, specificity, false alarm and false negative rate, positive and negative likelihood ratios, under ROC areas and median timeliness were used to assess the performance of the methods.
Results
In a literature-based outbreak simulation, the highest and lowest sensitivity, false negative in the detection of injected outbreaks were seen in Daubechies 10 (db10), with sensitivity 0.59 (0.56-0.62), and Haar wavelets with 0.57 (0.54-0.60). In the researcher-made outbreaks, the EWMA (K=0.5) with sensitivity 0.92 (0.90-0.94) had the best performance. About timeliness, the WOD methods showed the best performance in the early warning of the outbreak in both simulation approaches.
Conclusions
Performance of the WOD in the early alarming outbreaks was appropriate. However, it's better as the method was used along with other methods in public health surveillance systems.
Outbreaks of infectious diseases are one of the main public health challenges(1). Early detection of outbreaks, timely response, and control of these aberrations are very important for public health surveillance systems(2). The main purpose of the public health surveillance system is an ongoing data collection, analysis, interpretation and final dissemination in public health practice. No doubt, the surveillance system will contribute to reducing the morbidity and mortality due to health-related events(3). Additionally, utilizing a suitable method in a surveillance system for early detection of naturally occurring or bioterrorism-related outbreaks have a very important role in reducing the time between outbreak occurrence and detection (4). Due to many restrictions of the traditional surveillance system in early detection of the outbreak, the syndrome surveillance system is recommended to be used in such conditions. The syndrome surveillance includes collection, analysis, interpretation, and dissemination of health-related data defined in public health sectors (5–7). The objective of the syndrome surveillance system is to provide an early warning (system) for public health threats in near-real-time(8). The Important feature of the syndrome surveillance system is early warning or detection of health-related aberration or outbreak which leads to the reduction of morbidity and mortality of affected people (9). Two main tools; temporal and spatial models are used in the syndrome surveillance system to detect outbreaks as early as possible(10, 11). Different methods, including Cumulative Sum (CUSUM), EWMA, Shewhart chart and time-series models, are the main tools used by syndrome surveillance to detect outbreaks(12). It is worth mentioning that all these methods follow a two-step procedure for the detection of outbreaks. Firstly, determine the level of alarm threshold using the baseline values or a non-outbreak period. Secondly, different algorithms are used for aberration detection using the defined alarm threshold. The above-mentioned methods have two main problems, the first one is related to the baseline period, especially when the real-world data set is used, because it may include the outbreak period. The second problem is related to the nature of the surveillance system dataset. In most conditions, the surveillance data set is non-stationary and noisy(13, 14). The non-stationary of the data makes the mean and the variance of the data not stable over time leading to changes in time series behavior. This phenomenon can increase the risk of a false alarm rate. Limitations such as the effect of non-stationary data on the performance of the syndrome surveillance system necessitate using an appropriate and proper method to overcome the problem. Some researchers used the wavelet-based outbreak detector (WOD) method in the detection of outbreaks (14–16). Previous literature indicated that the number of studies that used this method is very few. Based on our knowledge, there was no evidence that WOD was ever used in any study conducted in Iran. Due to importance of timely detection in surveillance system, the aim of this study was to determine the performance of WOD method in detecting outbreaks and compare its performance with the Poisson regression-based model and EWMA, using data of simulated pertussis outbreaks in Iran
The data set used
The data used in this study included suspected daily cases of pertussis from 25th February 2012 to 23rd March 2018. The data is nationally collected from the national registry at the department of vaccine-preventable diseases, in the Iranian Ministry of Health (Fig. 1).
Outbreak Simulation
According to the information provided by the national health authorities, no outbreak detected an understudied period. So, simulation outbreaks were used to assess the performance of the WOD, also EWMA and poison regression-based methods in outbreak detection. Two approaches were considered in the simulated outbreaks. Firstly, reported daily pertussis outbreaks in the literature were used as a source for data. (17–24) Then, the extracted outbreaks were injected into the real number of reported pertussis suspected cases in Iran and considered as a gold standard. The second approach focused on the outbreaks created by the researcher; which differ in types, duration, and sizes. In this study, three types of outbreaks were simulated and injected into real data. The three types of outbreaks are exponential (2,4,8,16,8,4,2), Linear (2,4,6,8,6,4,2) and uniform (6,6,6,6,6,6) increase cases over time. Overall, a total of 560 outbreak days were injected in the real dataset including 10 exponential, 9 linear, and 9 uniform outbreaks. The duration of these outbreaks was between 1 to 5 weeks. The number of injected cases was the utmost 3δ of real data means and the interval between each outbreak was 2 months (Figs. 2 and 3).
Methods Of Outbreak Detection
Wavelet-based outbreak detector (WOD)
In this study, we used a discrete wavelet transform (DWT) model introduced by Aradhye et al (25), and Shemuli et al (14) referred to Multi-scale statistical process control (MSSPC). In this method, the first step e understudy time series decomposed using the desired wavelet (in the current study, db10 and Haar wavelets were used). The result of this decomposition is the production of approximate and detailed coefficients at the first level. In the next step, the approximation coefficient of the first level was decomposed for producing the approximation and details coefficients of level 2. The decomposition of series continued up to 5 levels. In the next step, the Shewhart control chart was applied in monitoring all details coefficients and the last approximation coefficient (level 5). If the values of the coefficients were within the upper and lower limits of the Shewhart chart, the values were converted to zero and the values outside the range were kept. Then, the time series were reconstructed and monitored by the Shewhart control chart to detect the simulated outbreaks. The chart was used in statistical process control had upper and lower control limits calculated from µ ± kσ; where µ is the mean of the understudy time series, k is the fixed-parameter which ranges from 0 to 3 that considered 0.5 to 2 in the study. σ is the standard deviation of the understudy time series. The DWT has different wavelets. More details about the type of wavelets are described in other sources (14, 25–27).
The Exponentially Weighted Moving Average (EWMA)
The statistic of EWMA in a day of t was defined as follows:
EWMAt = λYt + (1 – λ) EWMA t− 1 (12)
Where, Yt equals the number of suspected cases of pertussis in a day of t and λ is the weighting parameter which includes 0 < λ ≤ 1. A value of λ = 1 gives more weight to newest data and small value of λ (closer to 0) gives more weight to older data(28). In this study, the amount of λ determined 0.46 and 0.3 for literature-based and researcher made outbreaks respectively.
The upper control limit or the alarm threshold level by using this method calculated as follows:
Upper Control Limit = EWMA0 + k × σEWMA (28–30)
Where k is specified in a way that results in the desired confidence interval. In the current study, k = 0.5, 1, 1.5,2 and 3 were considered. σEWMA is the standard deviation of the calculated statistics of EWMA at times t to tn and EWMA0 is the mean of non-outbreak days. If the calculated EWMA statistics were more than the upper control limit, it is considered as an alarm for outbreak or aberration. To remove explainable patterns and create a normal distribution, Moving Average (MA):13 were used. It means the mean of the previous 13 data was replaced instead of each data.
Poisson Regression-based Method
The Upper control limits of Poisson regression
To determine the upper control limit, the expected mean of suspected pertussis cases was estimated as follows:
Where log λj is the expected mean of occurred cases in time j, X is another effective factor, such as day, month and so on which are effective factors on the expected mean of pertussis cases and β is Coefficient of X. After the estimation of this parameter, the alarm threshold limit estimated as follows:
Where Vt is the variance of estimated mean and its equal to:
Vt: var (α) + t2 var (β) + 2t cov(α,β)(31)
The Z 0.90, Z 0.95, Z 0.99 were used to calculate different thresholds.
Measures Of The Algorithm's Performance
The performance of these methods was measured using the sensitivity, specificity; false alarm rate, likelihood ratio and area under the receiver operating characteristics (ROC) curve (AUC). The total number of outbreak-days was considered as the gold standard method to calculate the appropriate measures in order to evaluate the performance of different algorithms.
Used Software's
All analyses performed using MATLAB R2018a, STATA 15 (StataCorp LLC) and Excel 2010.
Literature-based outbreaks simulation
Sensitivity, specificity, positive and negative likelihood ratio under ROC areas
According to this approach, the most sensitive and lowest false negative in the detection of injected outbreaks were seen in Daubechies 10 (db10) with sensitivity 0.59 (0.56–0.62), and Haar wavelets with 0.57 (0.54–0.60) and K = 0.5 for controls limit in Shewhart control charts respectively. In terms of specificity, the EWMA with k = 1.5, 2 and 3 and Haar wavelet with k = 1.5 and 2 had the optimum specificity (100%). So, this algorithm had the lowest false alarm in the detection of outbreaks. The most positive likelihood ratio (LR+) was seen in EWMA with k = 1.5 (LR+: 129.3), and Haar wavelet with k = 1.5 (LR+: 28.65) respectively. According to under ROC areas, the EWMA algorithm with K = 0.5 and 1 had the best performance in the detection of both outbreaks and non-outbreak days Under ROC areas 0.92 and 0.90 respectively. The size of under curve areas is between 0 to1 and more of this area (near to 1) refers to better performance of the algorithm (Table 1 and Fig. 4).
Timeliness of methods
In terms of timeliness, all algorithms with less amount of k (0.5, 1) had at least an alarm on the first day of outbreaks. But, the WOD methods showed the best performance in the early warning of the outbreak. The Haar and the db10 wavelet with K; 0.5 had the best performance in generating an alarm on the first day of outbreaks with 7 on injected outbreaks. As a result of an increase in the amount of K, the probability of generation alarm on the first day of outbreaks decreased due to an increase in the level of alarm thresholds. The median, minimum and maximum timeliness (according to day) in Haar (k; 0.5) and db10 (k: 0.5) wavelets based method was 2 (1 to 14) and 2 (1-14) respectively. This amount was less than the median timeliness of other algorithms. It means that 50% of induced alarms occurred on the second day of the outbreak. The WOD generated an alarm in all 19 outbreaks. This was the best performance among all algorithms. More information showed in Table 3.
algorithm | Sensitivity | Specificity | false Alarm | False-negative | LR+ | LR- | Under the ROC area |
---|---|---|---|---|---|---|---|
EWMA | |||||||
K = 0.5 | 0.49 (0.46–0.52) | 0.86 (0.84–0.88) | 0.14 (0.12–0.16) | 0.51 (0.48–0.54) | 3.60 | 0.59 | 0.68 |
K = 1 | 0.34 (0.32–0.37) | 0.95 (0.94–0.96) | 0.05 (0.04–0.06) | 0.66 (0.63–0.68) | 7.05 | 0.69 | 0.65 |
K = 1.5 | 0.24 (0.21–0.26) | 1.00 (1.00–1.00) | 0.00 (0.00–0.00) | 0.76 (0.74–0.79) | 129.35 | 0.76 | 0.62 |
K = 2 | 0.18 (0.15–0.20) | 1.00 (1.001.00) | 0.00 (0.00–0.00) | 0.82 (0.80–0.85) | - | 0.82 | 0.59 |
K = 3 | 0.09 (0.07–0.10) | 1.00 (1.00–1.00) | 0.00 (0.00–0.00) | 0.91 (0.90–0.93) | - | 0.91 | 0.54 |
Poisson Regression | |||||||
Z 1-α/2 = 90 | 0.20 (0.18–0.22) | 0.95 (0.93–0.96) | 0.05 (0.04–0.07) | 0.80 (0.78–0.82) | 3.72 | 0.85 | 0.57 |
Z 1-α/2 = 95 | 0.16 (0.13–0.18) | 0.96 (0.95–0.97) | 0.04 (0.03–0.05) | 0.84 (0.82–0.87) | 4.20 | 0.88 | 0.56 |
Z 1-α/2 = 99 | 0.12 (0.10–0.14) | 0.98 (0.97–0.98) | 0.02 (0.02–0.03) | 0.88 (0.86–0.90) | 4.92 | 0.90 | 0.55 |
WOD(Haar) | |||||||
K = 0.5 | 0.57 (0.54–0.60) | 0.52 (0.49–0.55) | 0.48 (0.45–0.51) | 0.43 (0.40–0.46) | 1.19 | 0.82 | 0.55 |
K = 1 | 0.23 (0.21–0.26) | 0.92 (0.90–0.94) | 0.08 (0.06–0.10) | 0.77 (0.74–0.79) | 2.88 | 0.83 | 0.58 |
K = 1.5 | 0.08 (0.06–0.10) | 1.00 (0.99-1.00) | 0.00 (0.00-0.01) | 0.92 (0.90–0.94) | 28.65 | 0.92 | 0.54 |
K = 2 | 0.02 (0.01–0.03) | 1.00 (1.00–1.00) | 0.00 (0.00–0.00) | 0.98 (0.97–0.99) | 21.01 | 0.98 | 0.51 |
Db10 | |||||||
K = 0.5 | 0.59 (0.56–0.62) | 0.44 (0.41–0.47) | 0.56 (0.53–0.59) | 0.41 (0.38–0.44) | 1.06 | 0.93 | 0.52 |
K = 1 | 0.26 (0.24–0.29) | 0.88 (0.86–0.90) | 0.12 (0.10–0.14) | 0.74 (0.71–0.76) | 2.22 | 0.84 | 0.57 |
K = 1.5 | 0.15 (0.13–0.17) | 0.97 (0.96–0.98) | 0.03 (0.02–0.04) | 0.85 (0.83–0.87) | 4.92 | 0.88 | 0.56 |
K = 2 | 0.10 (0.08–0.12) | 0.99 (0.98–0.99) | 0.01 (0.01–0.02) | 0.90 (0.88–0.92) | 7.13 | 0.91 | 0.54 |
algorithm | Sensitivity | Specificity | False Alarm Rate | False Negative | LR+ | LR- | Under ROC area |
---|---|---|---|---|---|---|---|
EWMA | |||||||
K = 0.5 | 0.92 (0.90–0.94) | 0.92 (0.90–0.93) | 0.08 (0.07–0.10) | 0.08 (0.06–0.10) | 10.87 | 0.09 | 0.92 |
K = 1 | 0.83 (0.80–0.86) | 0.97 (0.96–0.98) | 0.03 (0.02–0.04) | 0.17 (0.14–0.20) | 29.84 | 0.17 | 0.90 |
K = 1.5 | 0.69 (0.65–0.73) | 0.99 (0.99-1.00) | 0.01 (0.00-0.01) | 0.31 (0.27–0.35) | 114.25 | 0.31 | 0.84 |
K = 2 | 0.49 (0.45–0.53) | 1.00 (1.00–1.00) | 0.00 (0.00–0.00) | 0.51 (0.47–0.55) | 269.19 | 0.51 | 0.74 |
K = 3 | 0.07 (0.05–0.09) | 1.00 (1.00–1.00) | 0.00 (0.00–0.00) | 0.93 (0.91–0.95) | - | 0.93 | 0.53 |
Poisson Regression | |||||||
Z 1-α/2 = 90 | 0.58 (0.54–0.62) | 0.96 (0.95–0.97) | 0.04 (0.03–0.05) | 0.42 (0.38–0.46) | 15.62 | 0.43 | 0.78 |
Z 1-α/2 = 95 | 0.46 (0.42–0.50) | 0.98 (0.97–0.98) | 0.02 (0.02–0.03) | 0.54 (0.50–0.58) | 19.52 | 0.55 | 0.72 |
Z 1-α/2 = 99 | 0.28 (0.24–0.32) | 0.99 (0.98–0.99) | 0.01 (0.01–0.02) | 0.72 (0.68–0.76) | 21.14 | 0.73 | 0.63 |
WOD(Haar) | |||||||
K = 0.5 | 0.79 (0.76–0.83) | 0.51 (0.48–0.53) | 0.49 (0.47–0.52) | 0.21 (0.17–0.24) | 1.61 | 0.41 | 0.65 |
K = 1 | 0.50 (0.46–0.54) | 0.84 (0.82–0.85) | 0.16 (0.15–0.18) | 0.50 (0.46–0.54) | 3.02 | 0.60 | 0.67 |
K = 1.5 | 0.14 (0.11–0.17) | 0.99 (0.98–0.99) | 0.01 (0.01–0.02) | 0.86 (0.83–0.89) | 10.37 | 0.87 | 0.56 |
K = 2 | 0.03 (0.01–0.04) | 1.00 (0.99-1.00) | 0.00 (0.00-0.01) | 0.97 (0.96–0.99) | 7.90 | 0.97 | 0.51 |
Db10 | |||||||
K = 0.5 | 0.73 (0.69–0.76) | 0.46 (0.43–0.48) | 0.54 (0.52–0.57) | 0.27 (0.24–0.31) | 1.34 | 0.60 | 0.60 |
K = 1 | 0.54 (0.50–0.59) | 0.81 (0.79–0.83) | 0.19 (0.17–0.21) | 0.46 (0.41–0.50) | 2.85 | 0.56 | 0.68 |
K = 1.5 | 0.31 (0.27–0.35) | 0.95 (0.94–0.96) | 0.05 (0.04–0.06) | 0.69 (0.65–0.73) | 5.93 | 0.73 | 0.63 |
K = 2 | 0.16 (0.13–0.19) | 0.98 (0.97–0.98) | 0.02 (0.02–0.03) | 0.84 (0.81–0.87) | 6.44 | 0.87 | 0.57 |
Table 3-the performance of understudy methods in early detection or timelines of Literature-based outbreaks simulation
|
Presence alarm in outbreaks(n=19) |
Minimum timeliness |
Maximum timeliness |
Median timeliness |
Alarm in first day(n=19) |
EWMA |
|
|
|
|
|
K=0.5 |
18 (0.95) |
1 |
35 |
4 |
4(0.21) |
K=1 |
15 (0.79) |
1 |
44 |
10 |
5(0.26) |
K=1.5 |
12 (0.63) |
3 |
44 |
14 |
0(0.00) |
K=2 |
7 (0.37) |
3 |
83 |
35 |
0(0.00) |
K=3 |
5 (0.26) |
18 |
83 |
68 |
0(0.00) |
Poisson Regression |
|
|
|
|
|
Z 1-α/2=90 |
17(0.89) |
1 |
36 |
5 |
2(0.11) |
Z 1-α/2=95 |
17(0.89) |
1 |
44 |
11 |
1(0.05) |
Z 1-α/2=99 |
16(0.84) |
2 |
44 |
16 |
1(0.05) |
WOD |
|
|
|
|
(0.00) |
K=0.5 |
19(1.00) |
1 |
14 |
2 |
7(0.37) |
K=1 |
15(0.79) |
2 |
35 |
8 |
0(0.00) |
K=1.5 |
7(0.37) |
2 |
56 |
13 |
0 (0.00) |
K=2 |
5(0.26) |
3 |
83 |
66 |
0(0.00) |
Db10 |
|
|
|
|
|
K=0.5 |
19(1.00) |
1 |
14 |
2 |
7(0.37) |
K=1 |
16(0.84) |
1 |
35 |
10 |
1(0.05) |
K=1.5 |
9(0.47) |
3 |
35 |
13 |
0(0.00) |
K=2 |
7(0.37) |
3 |
74 |
35 |
0(0.00 |
The Researcher-made Outbreaks
Sensitivity, specificity, positive and negative likelihood ratio and Under ROC areas
According to this approach, the highest sensitivity and lowest false negative in the detection of the outbreaks were seen in the EWMA (K = 0.5) with sensitivity equal to 0.92 (0.90–0.94). Furthermore, the EWMA k = 2 and 3, Haar wavelet with k = 2 had the optimum specificity (100%). So this algorithm had the lowest false alarm in the detection of outbreaks. The most positive likelihood ratio (LR+) was seen in the EWMA with k = 2 (LR+:269.5), and EWMA with k = 1.5 (LR+:114.25) respectively. Also, under ROC areas of the EWMA algorithm with K = 0.5 and 1 had the best performance in the detection of outbreaks and non-outbreak days with 0.92 and 0.90 respectively (Table 2 and Fig. 5).
Timeliness Of Methods
In terms of timeliness, all algorithms with less than 1 had at least an alarm on the first day of the outbreaks. However, the WOD methods showed the best performance in the early warning of the outbreak. The Haar wavelet with k: 0.5 and 1 also, the db10 wavelet with K; 0.5 had the best performance in generating an alarm in the first day of outbreaks with 18,16 and 14 alarm in the first day of injected outbreaks respectively. According to the results, with an increase in K, the probability of generating an alarm on the first day of outbreaks decreased due to the increase in the level of alarm thresholds. The median, the minimum, and maximum timeliness (according to day) in Haar(k;0.5) and db10 (k:0.5) wavelets based methods was 1 (1 to 3) and 1 (1–4) respectively. It was clear that this amount was less than the median timeliness of other algorithms. It means that 50% of the induced alarms by these methods occurred on the first day of the outbreak. Also, the poison regression with Z 1−α/2=90 generates alarms in all 27 simulated outbreaks and this performance was better than other methods (Table 4).
Method | Presence alarm in outbreaks(n = 27) | Minimum timeliness | Maximum timeliness | Median timeliness | Alarm in the first day (n = 27) |
---|---|---|---|---|---|
EWMA | |||||
K = 0.5 | 26(0.96) | 1 | 4 | 3 | 5 (0.19) |
K = 1 | 24(0.89) | 2 | 5 | 4 | 0 (0.00) |
K = 1.5 | 24(0.89) | 1 | 11 | 4 | 1 (0.04) |
K = 2 | 20(0.74) | 4 | 11 | 6 | 0 (0.00) |
K = 3 | 6(0.22) | 6 | 28 | 14 | 0 (0.00) |
Poisson Regression | |||||
Z 1-α/2 = 90 | 27(1.00) | 1 | 5 | 3 | 7 (0.26) |
Z 1-α/2 = 95 | 26(0.96) | 1 | 6 | 3 | 7(0.26) |
Z 1-α/2 = 99 | 23(0.85) | 1 | 6 | 3 | 4(0.15) |
WOD | |||||
K = 0.5 | 26(0.96) | 1 | 3 | 1 | 18(0.67) |
K = 1 | 26(0.96) | 1 | 12 | 1 | 16(0.59) |
K = 1.5 | 13(0.48) | 1 | 21 | 7 | 2(0.07) |
K = 2 | 5(0.19) | 2 | 22 | 12 | 0(0.00) |
Db10 | |||||
K = 0.5 | 26 (0.96) | 1 | 4 | 1 | 14(0.52) |
K = 1 | 21(0.78) | 1 | 5 | 2 | 7(0.26) |
K = 1.5 | 18(0.67) | 1 | 19 | 4 | 5(0.19) |
K = 2 | 14(0.52) | 1 | 19 | 8 | 1(0.04) |
Due to the increased number of reported outbreaks in the world(32) The timely detection is very important for surveillance systems. So, using appropriate algorithms or methods has an important role in the early detection in any surveillance system. In the current study, the performance of three outbreaks detection method including WOD, EWMA, and Poisson regression-based methods was assessed in the detection of simulated outbreaks in pertussis cases in Iran. The performance of algorithms, such as the Cusum and EWMA in detecting alarm of outbreaks were used in different studies to address different infectious diseases (33–37). The use of wavelet-based outbreak detector methods in the detection of outbreaks in surveillance systems is very rare. So, the assessment of the performance of this method can be very important as new in surveillance systems. There are three main approaches to assess outbreaks detection which include real data testing, fully synthetic and semi-synthetic simulations respectively. The use of real data provides the most valid results. But, such information is usually unavailable or inaccessible(38). For that reason, the majority of studies use synthetic simulation and semi-synthetic simulation to assess the performance of outbreaks detection methods (39–42). One of the advantages of using this approach is that it provides a real gold standard method that can be used to calculate and evaluate indices(3).
In this study, two approaches were used to apply the outbreaks simulation. The first was based on the reported pertussis outbreaks in literature, and the second was researcher-made outbreaks that used an actual daily number of pertussis cases. The performance of understudy methods was measured by ability in the detection of outbreaks and non-outbreak days and early warning of outbreaks or timeliness. The timeliness defined the time interval between the first day of the outbreak and the first real alarm produced by the methods in this study.
According to our results, the highest sensitivity in detecting Literature-based outbreaks was related to Wavelet-based methods. But, in the researcher made outbreaks, the highest was in the EWMA algorithms with the lower value of K in threshold determination. In both simulation approaches, the most specificity in the detection of non-outbreaks days was related to the wavelet-based method and the EWMA with a high value of K. The Poisson regression had the moderate performance in two approaches. Overall, by considering the low threshold values, the sensitivity of methods increased but the specificity decreased and vice versa. So, increasing the sensitivity of outbreaks detection methods, by considering the low level of the threshold can lead to a decrease in the specificity of related methods. In comparison with Under ROC areas, the most under curve areas were related to the EWMA with lower values of K. It means that the performance of EWMA in detecting outbreaks and non-outbreaks days was better than other methods. According to other study results, the discrete wavelet transform based model compared to the autoregressive (AR) model had similar sensitivity and specificity in the detection of outbreaks (16). The different results may be due to studies on different diseases or different datasets. According to our results, the performance of understudy methods in different approaches to outbreaks simulation was absolutely different. It's worth mentioning that these results were consistent with other studies conducted in this field (42, 43). The performance of outbreak detection methods was affected by many factors including the type of outbreaks, duration, and magnitude. According to the results of another study, the ability of outbreak detection algorithms was higher with larger and lower outbreaks thresholds (40, 44). So, the performance of a method in different conditions with different properties may be different. Additionally, outbreak detection algorithms may have different sensitivity and specificity in different outbreaks setting(1). In terms of early warning of outbreaks or timeliness, the WOD methods had the best performance as an early warning method for outbreaks in both outbreak simulation approaches. It is consistent with other study(16). Early warning and detection of outbreaks and aberrations of infectious diseases are very important to surveillance systems which provides an opportunity to stop the spread of outbreaks from one region to another and can prevent the epidemic to become a global pandemic threat(45). There are many factors that can affect the early detection of outbreaks. One of these factors is the improvement of the outbreak alarm threshold(46). The lower level of alarm threshold can lead to early warning outbreak detection. But, the increase in early alarm can make a false alarm rate increase, and this should be considered in the outbreak detection. So, in the determination of the outbreak alarm threshold, some factors such as case fatality, contagiousness, confirmation costs, etc, must be considered.
Finally, when we compare these methods, the use of the WOD method can work as an early warning method for outbreaks detection. Thus, it can be applied without any presumption on the data set, so that the researcher can use it as an early warning tool to identify outbreaks of other infectious diseases. According to the results of the study, the ability of understudy methods in the detection of outbreak days with k = 2 and 3 is not more than fifty percent. This performance may be affected by the low incidence of pertussis in Iran, and it may be changed according to the type of infectious disease. So, the effect of the nature of the outbreaks on the performance of outbreaks detection methods must be considered. Since the outbreak alarm leads to the formation of the outbreak investigation team, hence the accuracy of alarm is very important. Therefore, it is recommended to use combined algorithms rather than choosing a single one(47). The main limitations of this study were; firstly, we used simulation outbreaks, which could differ from the real outbreaks. So to increase the validity of these methods, we tried to simulate different outbreaks, with different capacities. Secondly, the number of reported outbreaks in daily cases was few; in addition, the reported outbreaks in the literature might have different patterns compared to Iran's setting. For a better assessment of methods, we used two approaches of simulation. Finally, it is recommended that the understudy methods, especially the wavelet-based outbreak detection should be applied to other infectious diseases or data sets with different incidence and patterns.
According to the results of the study, the wavelet-based outbreak detector had the appropriate timeliness in outbreaks detection. But due to the importance of the problem and effect of the nature of outbreaks such as duration, size, and type on the performance of outbreak detection method, it's better that the method was used along with others in public health surveillance systems.
Wavelet-Based Outbreak detection
Exponential weighted moving average
Daubechies
Cumulative Sum
Moving Average
Receiver operating characteristic
likelihood ratio
Ethics approval and consent to participate
The study was approved by the ethical committee of the University, with the ID: IR.TUMS.SPH.REC.1397.276. The consent to participate not applicable.
Consent for publication
Not applicable
Availability of data and materials
Apply to the availability of these data are not publicly available. Data are however available from the authors upon reasonable request and with permission of Committee. A person who wants to access the raw data should contact with the corresponding author.
Competing interests
The authors declare that they have no competing interests.
Funding
This article was extracted from the Ph.D. thesis by Yousef Alimohamadi and financially supported by Tehran University of Medical Sciences
Authors' contributions
YA data analysis, and interpretation of data and wrote the manuscript development; SMZ contributed to the data analysis and the study concept and design, and provided supervision, data extractions and provided expert insight; MK, MY and ML contributed to the study design and the data analysis, the study quality evaluation, manuscript preparation, and KHN provided supervision, Data analysis, provided expert insight and wrote the manuscript development. The author read and approved the final manuscript.
Acknowledgements
The authors would like to express their appreciation for the Iranian Ministry of Health and Center for Communicable Diseases Control for their constant support and collaboration. This article was extracted from the Ph.D. thesis by Yousef Alimohamadi and financially supported by Tehran University of Medical Sciences. Also, this study approved by the ethical committee of Tehran University of Medical Sciences with ID: IR.TUMS.SPH.REC.1397.276.
Author details
1 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran. 2 Center for Communicable Diseases Control, Ministry of Health and Medical Education, Tehran, Iran. 3 Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran. 4 Assistant Professor at the School of Electrical & Computer Engineering, Tarbiat Modares University, Tehran, Iran.