Timely detection of Pertussis outbreaks in Iran: The comparison performance of Wavelet-based outbreak detector, Exponential weighted moving average, and Poisson regression-based methods

Early detection of outbreaks is very important for surveillance systems. Due to the importance of the subject and lack of similar studies in Iran, the aim of this study was to determine the performance of the Wavelet-Based Outbreak detection method)WOD(in detecting outbreaks and to compare its performance with Poisson regression-based model and Exponential weighted moving average (EWMA) using data of simulated pertussis outbreaks in Iran. Methods The data on suspected cases of pertussis from 25th February 2012 to 23rd March 2018 in Iran was used. The performance of the WOD (Daubechies 10 and Haar wavelets), Poisson regression-based method, and EWMA Compared in terms of timeliness and detection of outbreak days using the simulation of different outbreaks (literature-based and researcher-made outbreaks). The sensitivity, specificity, false alarm and false negative rate, positive and negative likelihood ratios, under ROC areas and median timeliness were used to assess the performance of the methods.


Abstract Background
Early detection of outbreaks is very important for surveillance systems. Due to the importance of the subject and lack of similar studies in Iran, the aim of this study was to determine the performance of the Wavelet-Based Outbreak detection method)WOD(in detecting outbreaks and to compare its performance with Poisson regression-based model and Exponential weighted moving average (EWMA) using data of simulated pertussis outbreaks in Iran.

Methods
The data on suspected cases of pertussis from 25th February 2012 to 23rd March 2018 in Iran was used. The performance of the WOD (Daubechies 10 and Haar wavelets), Poisson regression-based method, and EWMA Compared in terms of timeliness and detection of outbreak days using the simulation of different outbreaks (literature-based and researcher-made outbreaks). The sensitivity, specificity, false alarm and false negative rate, positive and negative likelihood ratios, under ROC areas and median timeliness were used to assess the performance of the methods.

Results
In a literature-based outbreak simulation, the highest and lowest sensitivity, false negative in the detection of injected outbreaks were seen in Daubechies 10 (db10), with sensitivity 0.59 (0.56-0.62), and Haar wavelets with 0.57 (0.54-0.60). In the researchermade outbreaks, the EWMA (K=0.5) with sensitivity 0.92 (0.90-0.94) had the best performance. About timeliness, the WOD methods showed the best performance in the early warning of the outbreak in both simulation approaches.

Conclusions
Performance of the WOD in the early alarming outbreaks was appropriate. However, it's better as the method was used along with other methods in public health surveillance systems.

Background
Outbreaks of infectious diseases are one of the main public health challenges (1). Early detection of outbreaks, timely response, and control of these aberrations are very important for public health surveillance systems (2). The main purpose of the public health surveillance system is an ongoing data collection, analysis, interpretation and final dissemination in public health practice. No doubt, the surveillance system will contribute to reducing the morbidity and mortality due to health-related events (3). Additionally, utilizing a suitable method in a surveillance system for early detection of naturally occurring or bioterrorism-related outbreaks have a very important role in reducing the time between outbreak occurrence and detection (4). Due to many restrictions of the traditional surveillance system in early detection of the outbreak, the syndrome surveillance system is recommended to be used in such conditions. The syndrome surveillance includes collection, analysis, interpretation, and dissemination of healthrelated data defined in public health sectors (5)(6)(7). The objective of the syndrome surveillance system is to provide an early warning (system) for public health threats in near-real-time (8). The Important feature of the syndrome surveillance system is early warning or detection of health-related aberration or outbreak which leads to the reduction of morbidity and mortality of affected people (9). Two main tools; temporal and spatial models are used in the syndrome surveillance system to detect outbreaks as early as possible (10,11). Different methods, including Cumulative Sum (CUSUM), EWMA, Shewhart chart and time-series models, are the main tools used by syndrome surveillance to detect outbreaks (12). It is worth mentioning that all these methods follow a two-step procedure for the detection of outbreaks. Firstly, determine the level of alarm threshold using the baseline values or a non-outbreak period. Secondly, different algorithms are used for aberration detection using the defined alarm threshold. The above-mentioned methods have two main problems, the first one is related to the baseline period, especially when the real-world data set is used, because it may include the outbreak period. The second problem is related to the nature of the surveillance system dataset. In most conditions, the surveillance data set is non-stationary and noisy (13,14). The non-stationary of the data makes the mean and the variance of the data not stable over time leading to changes in time series behavior. This phenomenon can increase the risk of a false alarm rate. Limitations such as the effect of non-stationary data on the performance of the syndrome surveillance system necessitate using an appropriate and proper method to overcome the problem. Some researchers used the wavelet-based outbreak detector (WOD) method in the detection of outbreaks (14)(15)(16). Previous literature indicated that the number of studies that used this method is very few. Based on our knowledge, there was no evidence that WOD was ever used in any study conducted in Iran. Due to importance of timely detection in surveillance system, the aim of this study was to determine the performance of WOD method in detecting outbreaks and compare its performance with the Poisson regression-based model and EWMA, using data of simulated pertussis outbreaks in

Methods
The data set used The data used in this study included suspected daily cases of pertussis from 25th February 2012 to 23rd March 2018. The data is nationally collected from the national registry at the department of vaccine-preventable diseases, in the Iranian Ministry of Health (Fig. 1).

Outbreak Simulation
According to the information provided by the national health authorities, no outbreak detected an understudied period. So, simulation outbreaks were used to assess the performance of the WOD, also EWMA and poison regression-based methods in outbreak detection. Two approaches were considered in the simulated outbreaks. Firstly, reported daily pertussis outbreaks in the literature were used as a source for data. (17)(18)(19)(20)(21)(22)(23)(24) Then, the extracted outbreaks were injected into the real number of reported pertussis suspected cases in Iran and considered as a gold standard. The second approach focused on the outbreaks created by the researcher; which differ in types, duration, and sizes. In this study, three types of outbreaks were simulated and injected into real data. The three types of outbreaks are exponential (2,4,8,16,8,4,2), Linear (2,4,6,8,6,4,2) and uniform (6,6,6,6,6,6) increase cases over time. Overall, a total of 560 outbreak days were injected in the real dataset including 10 exponential, 9 linear, and 9 uniform outbreaks. The duration of these outbreaks was between 1 to 5 weeks. The number of injected cases was the utmost 3δ of real data means and the interval between each outbreak was 2 months

Methods Of Outbreak Detection
Wavelet-based outbreak detector (WOD) In this study, we used a discrete wavelet transform (DWT) model introduced by Aradhye et al (25), and Shemuli et al (14) referred to Multi-scale statistical process control (MSSPC).
In this method, the first step e understudy time series decomposed using the desired wavelet (in the current study, db10 and Haar wavelets were used). The result of this decomposition is the production of approximate and detailed coefficients at the first level.
In the next step, the approximation coefficient of the first level was decomposed for producing the approximation and details coefficients of level 2. The decomposition of series continued up to 5 levels. In the next step, the Shewhart control chart was applied in monitoring all details coefficients and the last approximation coefficient (level 5). If the values of the coefficients were within the upper and lower limits of the Shewhart chart, the values were converted to zero and the values outside the range were kept. Then, the time series were reconstructed and monitored by the Shewhart control chart to detect the simulated outbreaks. The chart was used in statistical process control had upper and lower control limits calculated from µ ± kσ; where µ is the mean of the understudy time series, k is the fixed-parameter which ranges from 0 to 3 that considered 0.5 to 2 in the study. σ is the standard deviation of the understudy time series. The DWT has different wavelets. More details about the type of wavelets are described in other sources (14,(25)(26)(27).

The Exponentially Weighted Moving Average (EWMA)
The statistic of EWMA in a day of t was defined as follows: Where, Y t equals the number of suspected cases of pertussis in a day of t and λ is the weighting parameter which includes 0 < λ ≤ 1. A value of λ = 1 gives more weight to newest data and small value of λ (closer to 0) gives more weight to older data (28). In this study, the amount of λ determined 0.46 and 0.3 for literature-based and researcher made outbreaks respectively. The upper control limit or the alarm threshold level by using this method calculated as follows: Upper Control Limit = EWMA 0 + k × σ EW MA (28)(29)(30) Where k is specified in a way that results in the desired confidence interval. In the current study, k = 0.5, 1, 1.5,2 and 3 were considered. σ EW MA is the standard deviation of the calculated statistics of EWMA at times t to t n and EWMA 0 is the mean of non-outbreak days. If the calculated EWMA statistics were more than the upper control limit, it is considered as an alarm for outbreak or aberration. To remove explainable patterns and create a normal distribution, Moving Average (MA):13 were used. It means the mean of the previous 13 data was replaced instead of each data.

Poisson Regression-based Method
The Upper control limits of Poisson regression To determine the upper control limit, the expected mean of suspected pertussis cases was estimated as follows: Where log λ j is the expected mean of occurred cases in time j, X is another effective factor, such as day, month and so on which are effective factors on the expected mean of pertussis cases and β is Coefficient of X. After the estimation of this parameter, the alarm threshold limit estimated as follows: Where Vt is the variance of estimated mean and its equal to: Vt: var (α) + t 2 var (β) + 2t cov(α,β) (31) The Z 0.90 , Z 0.95 , Z 0.99 were used to calculate different thresholds.

Measures Of The Algorithm's Performance
The performance of these methods was measured using the sensitivity, specificity; false alarm rate, likelihood ratio and area under the receiver operating characteristics (ROC) curve (AUC). The total number of outbreak-days was considered as the gold standard method to calculate the appropriate measures in order to evaluate the performance of different algorithms. respectively. According to under ROC areas, the EWMA algorithm with K = 0.5 and 1 had the best performance in the detection of both outbreaks and non-outbreak days Under ROC areas 0.92 and 0.90 respectively. The size of under curve areas is between 0 to1 and more of this area (near to 1) refers to better performance of the algorithm (Table 1 and

Timeliness of methods
In terms of timeliness, all algorithms with less amount of k (0.5, 1) had at least an alarm on the first day of outbreaks. But, the WOD methods showed the best performance in the early warning of the outbreak. The Haar and the db10 wavelet with K; 0.5 had the best performance in generating an alarm on the first day of outbreaks with 7 on injected outbreaks. As a result of an increase in the amount of K, the probability of generation alarm on the first day of outbreaks decreased due to an increase in the level of alarm thresholds. The median, minimum and maximum timeliness (according to day) in Haar (k; 0.5) and db10 (k: 0.5) wavelets based method was 2 (1 to 14) and 2 (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) respectively.
This amount was less than the median timeliness of other algorithms. It means that 50% of induced alarms occurred on the second day of the outbreak. The WOD generated an alarm in all 19 outbreaks. This was the best performance among all algorithms. More information showed in Table 3.    Table 2 and Fig. 5).

Timeliness Of Methods
In terms of timeliness, all algorithms with less than 1 had at least an alarm on the first day of the outbreaks. However, the WOD methods showed the best performance in the early warning of the outbreak. The Haar wavelet with k: 0.5 and 1 also, the db10 wavelet with K; 0.5 had the best performance in generating an alarm in the first day of outbreaks with 18,16 and 14 alarm in the first day of injected outbreaks respectively. According to the results, with an increase in K, the probability of generating an alarm on the first day of outbreaks decreased due to the increase in the level of alarm thresholds. The median, the minimum, and maximum timeliness (according to day) in Haar(k;0.5) and db10 (k:0.5) wavelets based methods was 1 (1 to 3) and 1 (1-4) respectively. It was clear that this amount was less than the median timeliness of other algorithms. It means that 50% of the induced alarms by these methods occurred on the first day of the outbreak. Also, the poison regression with Z 1− α/2 =90 generates alarms in all 27 simulated outbreaks and this performance was better than other methods (Table 4). Table 4 the performance of understudy methods in early detection or timelines of researcher-made outbreaks results. But, such information is usually unavailable or inaccessible (38). For that reason, the majority of studies use synthetic simulation and semi-synthetic simulation to assess the performance of outbreaks detection methods (39)(40)(41)(42). One of the advantages of using this approach is that it provides a real gold standard method that can be used to calculate and evaluate indices (3).
In this study, two approaches were used to apply the outbreaks simulation. The first was based on the reported pertussis outbreaks in literature, and the second was researchermade outbreaks that used an actual daily number of pertussis cases. The performance of understudy methods was measured by ability in the detection of outbreaks and nonoutbreak days and early warning of outbreaks or timeliness. The timeliness defined the time interval between the first day of the outbreak and the first real alarm produced by the methods in this study.
According to our results, the highest sensitivity in detecting Literature-based outbreaks was related to Wavelet-based methods. But, in the researcher made outbreaks, the highest was in the EWMA algorithms with the lower value of K in threshold determination.
In both simulation approaches, the most specificity in the detection of non-outbreaks days was related to the wavelet-based method and the EWMA with a high value of K. The  (42,43). The performance of outbreak detection methods was affected by many factors including the type of outbreaks, duration, and magnitude. According to the results of another study, the ability of outbreak detection algorithms was higher with larger and lower outbreaks thresholds (40,44). So, the performance of a method in different conditions with different properties may be different. Additionally, outbreak detection algorithms may have different sensitivity and specificity in different outbreaks setting(1).
In terms of early warning of outbreaks or timeliness, the WOD methods had the best performance as an early warning method for outbreaks in both outbreak simulation approaches. It is consistent with other study (16). Early warning and detection of outbreaks and aberrations of infectious diseases are very important to surveillance systems which provides an opportunity to stop the spread of outbreaks from one region to another and can prevent the epidemic to become a global pandemic threat (45). There are many factors that can affect the early detection of outbreaks. One of these factors is the improvement of the outbreak alarm threshold (46). The lower level of alarm threshold can lead to early warning outbreak detection. But, the increase in early alarm can make a false alarm rate increase, and this should be considered in the outbreak detection. So, in the determination of the outbreak alarm threshold, some factors such as case fatality, contagiousness, confirmation costs, etc, must be considered.
Finally, when we compare these methods, the use of the WOD method can work as an early warning method for outbreaks detection. Thus, it can be applied without any presumption on the data set, so that the researcher can use it as an early warning tool to identify outbreaks of other infectious diseases. According to the results of the study, the ability of understudy methods in the detection of outbreak days with k = 2 and 3 is not more than fifty percent. This performance may be affected by the low incidence of pertussis in Iran, and it may be changed according to the type of infectious disease. So, the effect of the nature of the outbreaks on the performance of outbreaks detection methods must be considered. Since the outbreak alarm leads to the formation of the outbreak investigation team, hence the accuracy of alarm is very important. Therefore, it is recommended to use combined algorithms rather than choosing a single one (47). The main limitations of this study were; firstly, we used simulation outbreaks, which could differ from the real outbreaks. So to increase the validity of these methods, we tried to simulate different outbreaks, with different capacities. Secondly, the number of reported outbreaks in daily cases was few; in addition, the reported outbreaks in the literature might have different patterns compared to Iran's setting. For a better assessment of methods, we used two approaches of simulation. Finally, it is recommended that the understudy methods, especially the wavelet-based outbreak detection should be applied to other infectious diseases or data sets with different incidence and patterns.

Conclusions
According to the results of the study, the wavelet-based outbreak detector had the appropriate timeliness in outbreaks detection. But due to the importance of the problem and effect of the nature of outbreaks such as duration, size, and type on the performance of outbreak detection method, it's better that the method was used along with others in public health surveillance systems.

Consent for publication
Not applicable Apply to the availability of these data are not publicly available. Data are however available from the authors upon reasonable request and with permission of Committee. A person who wants to access the raw data should contact with the corresponding author.  The Literature-based outbreaks simulation The researcher-made outbreaks simulation  The result of the wavelet outbreaks detector in the detection of researcher-made outbreaks-the above row indicates the levels of decomposition using discrete Wavelet transform and the under rows indicated the original signal(above) detected alarm(middle) and Absolute value of detected alarms(bottom).