A multi-level weighted concept drift detection method

The concept drift detection method is an online learner. Its main task is to determine the position of drifts in the data stream, so as to reset the classifier after detecting the drift to improve the learning performance, which is very important in practical applications such as user interest prediction or financial transaction fraud detection. Aiming at the inability of existing drift detection methods to balance the detection delay, false positives, false negatives, and space–time efficiency, a new level transition threshold parameter is proposed, and a multi-level weighted mechanism including "Stable Level-Warning Level-Drift Level" is innovatively introduced in the concept drift detection. The instances in the window are weighted in levels, and the double sliding window is also applied. Based on this, a multi-level weighted drift detection method (MWDDM) is proposed. In particular, two variants which are MWDDM_H and MWDDM_M are proposed based on Hoeffding inequality and Mcdiarmid inequality, respectively. Experiments on artificial datasets show that MWDDM_H and MWDDM_M can detect abrupt and gradual concept drift faster than other comparison algorithms while maintaining a low false positive ratio and false negative ratio. Experiments on real-world datasets show that MWDDM has the highest classification accuracy in most cases while maintaining good space time efficiency.


Introduction
In recent years, big data, Internet of Things technology, and artificial intelligence have developed rapidly. All walks of life continue to generate a large amount of data, and it has been growing at an alarming rate [1]. These data are called data streams for their own characteristics, such as network data, weather forecast data, wireless sensor data, financial and power grid data, etc. Traditionally, machine learning algorithms have assumed a stationary distribution of data. However, the underlying data distribution in an evolving data stream environment may change over time. A phenomenon, known as concept drift, which means that the data distribution at time points x and y satisfies Dx ≠ Dy may happen [3]. In real life, examples of concept drift include changing user interest preferences, monitoring systems, weather forecasting, and financial fraud detection, etc. [4][5][6]. As concept drift occurs, the old learning models in the past will no longer be effective, resulting in a decrease in classification performance. Therefore, it becomes crucial to adapt to changing data distributions to ensure high learning performance.
Currently, quite a few adaptive learning algorithms use concept drift detection methods to detect concept drift in evolving data streams. Typically, when the drift detector detects a drift, the classification model is updated or retrained to accommodate concept drift. In the past decades, many concept drift detection methods have been proposed, mainly including statistical-based methods [7][8][9][10][11][12], window-based methods [13][14][15][16], and sequence analysis-based methods. In the past proposed drift detection methods, many of them either require huge time and memory costs, or cannot detect concept drift as quickly as possible while maintaining low false negative ratios. Therefore, a good drift detection method should be able to detect the concept drift in the nonstationary data stream environments as soon as possible. Besides, it should differentiate the noise from drift, which means it should keep a low false positive ratio when detecting drifts. What's more, drifts should be detected with low time and space consumption and keep a high accuracy performance. Based on these requirements, this paper proposes a multi-level weighted concept drift detection method (MWDDM) in order to solve the problem of the inability of existing drift detection methods to balance the detection delay, false positives, false negatives, and space-time efficiency when detecting abrupt and gradual concept drift.
In this paper, we innovatively introduce a multi-level weighted drift detection mechanism of "stable level-warning level-drift level" in concept drift detection by proposing a threshold parameter for level transition, also, a window mechanism where a long sliding window overlaps with a short sliding window is used in MWDDM. The algorithm will assign weights to the instances in the two windows during the "stable level," the newest instances in the window will be assigned a larger weight, and the old outdated instances will be assigned a smaller weight, and the difference between the weight value of instances in the "stable level" is quite small relatively. At the same time, the weighted average of correct prediction and the maximum weighted average of correct prediction within the window are calculated. After entering the "warning level," the algorithm will increase the difference of weight values between instances to detect the drift faster, and update the weighted average of correct prediction and the maximum weighted average of correct prediction. Finally, in the "drift level," MWDDM_H and MWDDM_M use the Hoeffding bound generated by Hoeffding inequality and the Mcdiarmid bound generated by Mcdiarmid inequality, respectively, to determine if the difference between the weighted average of correct prediction and the maximum weighted average of correct prediction exceeds the threshold value defined in advance. Then, the occurrence of a drift will be reported. At this point, the classifier will be reset for retraining. The experiments show that MWDDM_H and MWDDM_M can detect concept drift with lower detection delay and keep a lower false positive ratio and false negative ratio than other comparison algorithms. Also, it consumes less time and space memory while maintaining a high classification accuracy.
The main contributions of this paper are as follows.
1 To the best of our knowledge, we propose the multi-level weighted mechanism and apply it to the concept drift detection method for the first time. 2 Discuss the concept drift detection methods of different window mechanisms from the single window type and two-window type. We propose a drift detection method named MWDDM which can detect abrupt and gradual concept drift in the data stream effectively, and two variants named MWDDM_H and MWDDM_M which use Hoeffding inequality and Mcdiarmid inequality separately are proposed. 3 Analyze the proposed method on 4 artificial and 3 real-world datasets against mainly recent and most prevalent 10 concept drift detection methods. We also perform a comprehensive evaluation, showing that our method outperforms the other methods in drift detection performance.
The remainder of this paper is as follows. Section 2 presents the basic concepts and the related works on concept drift detection methods. Section 3 gives the details of the Multi-level weighted mechanism and MWDDM detection method. The experimental results and analysis of the algorithms on the artificial and real datasets are presented in Sect. 4, and the conclusion and future work are presented in Sect 5.

Concept drift
Concept drift is a widespread problem in data stream mining, caused by the change or evolution of streaming data over time. Changes in the underlying distribution cause the feature vectors of arriving instances to no longer reflect class labels. This may negatively impact the reliability and accuracy performance of classifiers making predictions using the streaming data distribution. Suppose the data stream is in the form of consecutive (x t , y t ) instances, where t = 1, 2, 3…., and x t is a feature vector and y belongs to a set with n class labels That is, y ∈ {y 1 , y 2 , ⋯ y n } . A prediction

3
A multi-level weighted concept drift detection method obtained by the predictor based on the feature vector x t at a specific time can be denoted by ŷ t . Then, the concept drift in the time t 0 to t 1 can be defined as formula (1) [18]. Here, p t represents the joint probability distribution between the feature vector x t and the target class label y t at time t. The change of the data flow distribution is the concept drift, which can be reflected in the change of the joint probability distribution.
In works of literature recently, concept drift is further described. At a certain moment, p(x t , y t ) can be obtained from the conditional class concept distribution by formula (2).
Then, when the input x t is predicted, the posterior probability distribution can be obtained according to the Bayesian decision theory as shown in formula (3).
The above is the definition of concept drift in general. In addition, the time step at which a new target concept replaces the old target concept is usually referred to as the duration of concept drift, and the shorter the duration of completing the drift, the Abrupt [19]. Figure 1 shows the difference between these four types of concept drifts.

Drift detection methods of different window mechanisms
The window mechanism has been widely used to deal with the concept drift problem. They argue that the most recent observed instances are the most useful information, and incrementally estimate changes over time or data windows. The window mechanism defines a window as a short in-memory data structure that can store informative data or summarize some statistics about model behavior or data distribution in order to describe the current concept. The sliding window mechanism has become one of the most commonly used window mechanisms for drift detection methods. The sliding window is generally composed of a first-in-first-out (FIFO) data structure. A sliding window defines a window of size n, as a new instance arrives, the oldest instance is discarded [21]. Its mechanism is shown in Fig. 2. At present, the sliding window mechanism is mainly divided into single window type and double sliding window type.
Firstly, in the single window type, the DDM drift detection method proposed by Gama et al. [7] uses binomial distribution in a single window. For each instance, DDM calculates its error rate, that is, the probability of misclassification of a given instance and its corresponding standard deviation, to detect concept drift. DDM is more suitable for dealing with abrupt drift, because gradual drift is easily ignored without triggering a warning. EDDM0 is an improvement of DDM by comparing the distance between two consecutive error rates. When the data stream is changing with a steady state, the distance between consecutive errors becomes larger. When it becomes smaller, it triggers warning and drift, so EDDM is more suitable for handling gradual concept drift. The FHDDM drift detection method proposed by Pesaranghader et al. [15] uses a single sliding window and Hoeffding inequality to calculate and compare the difference between the maximum prediction accuracy and Then, the double sliding window methods can be mainly divided into separate type, adjacent type and overlapping type. These three mechanisms are shown in Figs. 3, 4, and 5, respectively. STEPD [9] uses a statistical test of equal proportions with continuity correction on the data in two separate windows, and signals a warning and a drift when a significant difference in accuracy between the recent and old windows is detected. Adaptive Sliding Window (ADWIN) [13] is one of the most classical drift detection methods using adjacent double window. The main idea of ADWIN is that when the average values in the two sub-windows w 1 and w 2 of the latest window W show a sufficiently large difference which is inferred that the corresponding predicted values are different. Then, the old window is deleted, when the mean of the two windows is defined according to the Hoeffding bound is greater than the threshold. Based on ADWIN, SEED [14] compares two sub-windows, and  the old sub-windows are discarded, when the average of the sub-windows is higher than the selected threshold. It calculates its test statistic using Hoeffding's inequality with Bonferroni correction. The FHDDMS [23] drift detection method uses two overlapping sliding windows to obtain prediction results to detect concept drift.

Proposed algorithm
In this paper, the abrupt and gradual concept drift in the data stream is taken as the research object. A level transition threshold parameter is proposed in the process concept drift detection, and a multi-level weighted mechanism including "stable level-warning level-drift level" is introduced. The weighting mechanism changes the difference between the weight value of instance. Finally, combined with the double sliding window mechanism, a Multi-level Weighted Drift Detection Method (MWDDM) is proposed. In addition, two variants of MWDDM are proposed based on Hoeffding inequality and Mcdiarmid inequality, respectively: MWDDM_H and MWDDM_M.

Multi-level weighted mechanism
Many drift detection methods based on sliding window have been proposed. ADWIN [13], DDM [7], STEPD [9], FHDDM [15] are all classical drift detection methods using sliding windows. Most of the above algorithms compare the differences in two sub-windows within a window to detect drift.
It is comprehensively found that a shorter sliding window can detect the change of data distribution in the data stream more quickly when abrupt concept drift is occurring, and timely warn a drift signal and make the learner make corresponding changes to adapt to the concept drift. In addition, for gradual concept drift with a long length of drift, the short sliding window may not be able to adapt to the slowly changing data stream relatively. Therefore, a sliding window with a larger length may be more suitable for dealing with gradual concept drift. Based on the above conclusions, this paper uses a combination of sliding windows with overlapping a long sliding window and a short sliding window to simultaneously adapt to the abrupt and gradual concept drifts in the data stream. The double sliding window mechanism is shown in Fig. 6.
Also, in the data stream environment, the old instances are considered obsolete or no longer valid. Therefore, incremental learners should be trained using the most recent instances, as the latter are more reflective of the current situation in the context of the data stream. Online learning algorithms typically use fading factors or weighting methods to increase the weight of recent instances. This is very important from an adaptive learning perspective, especially when a transition between two concepts in a data stream occurs, i.e., concept drift. Therefore, according to this observation, we give more weight to the newest instances in the window to help to detect concept drift faster.
Based on this, this paper proposes a multi-level weighted concept drift detection mechanism. The process of drift detection is divided into three levels, namely "stable level," "warning level" and "drift level." First, in this algorithm, the data stream is composed of paired instance groups �� ⃗ x i , y i , where �� ⃗ x i is the attribute vector and y i is its corresponding class. For each instance, Naive Bayes or a classifier such as Hoeffding tree will make a prediction ŷ i , and then compare ŷ i with the actual label y i to decide whether the prediction is correct or not ( ŷ i = y i ? ). If the current prediction is correct, then insert 1 into the long sliding window and short sliding window, and insert 0 if the prediction is wrong.
During the "stable level," we give weight to the instances within two windows. The weighting method used in this paper is a linear weighting method, and its weighting mechanism is shown in Fig. 7. As the number of instances increases, the weight value of the newest instance increases linearly compared to the weight value of the old instance. This paper defines i as the weight value assigned to an instance. In the linear weighting method, i+1 − i = diff , that is, the calculation formula (8) of the weight value of an instance in the window is as follows: The diff is assigned a value of 0.01 during the "stable level." Then, this paper defines the weighted average classification prediction accuracy u s⋅ and u l⋅ within the short sliding window and the long sliding window, respectively. Their calculation formulas are shown in the following formula (4) and formula (5). Among them, | | W s | | and | | W l | | represent the length of the long sliding window and the short sliding window, respectively.
At the same time, before the next concept drift is reported, the paper defines the maximum weighted average classification prediction accuracy observed so far in the long sliding window and the short sliding window, respectively, u max s• and u max l• , which are calculated as follows, Then, in order to judge the time point of when the drift detection method enters the "warning level," this paper defines a level transition threshold parameter s and When either of these two conditions s > s or l > l is satisfied, the method will enter the "warning level," where s = 0.78, l = 0.85. Among them, the determination of the pre-defined thresholds s and l will be discussed in detail in the Sect. 4.
During the "warning level," our method will increase the difference in weight values between the instances in the long and short sliding windows to emphasize the importance of the latest instances so that the detection method can detect concept drift faster. Therefore, after the algorithm enters the "warning level," the weighted average of correct prediction u s⋅ and u l⋅ within the long and short sliding windows A multi-level weighted concept drift detection method are updated to u ′ s⋅ and u ′ l⋅ . Similarly, the maximum weighted average of correct prediction u max s⋅ and u max l⋅ will also be updated to u max′ s⋅ and u max′ l⋅ . Its calculation formula is as follows.
The maximum weighted average of correct prediction within the long and short sliding windows which are u max′ s⋅ and u max′ l⋅ are updated as follows Finally, in the "drift level," MWDDM_H and MWDDM_M are determined by the Hoeffding and Mcdiarmid bounds generated by the Hoeffding inequality and Mcdiarmid inequality, respectively. If the difference between the maximum weighted average of correct prediction and the weighted average of correct prediction in the long and short sliding windows is greater than the pre-defined threshold, the occurrence of a concept drift will be reported. At this time, the classifier will be reset to retrain to adapt to the new data distribution.
The Hoeffding inequality used in MWDDM_H is shown in Theorem 1 [17].
Theorem 1 (Hoeffding inequality) Let X 1 , X 2 , …, X n be n independent random variables, The difference between the empirical means X = 1 n ∑ n i=1 X i , for any ε, there is the following formula (11): According to this theorem, considering the average X which can be at most δ at a given significance level, the estimated error , that is, the Hoeffding bound is shown in Eq. (12): Therefore, MWDDM_H defines two thresholds s⋅H and l⋅H for the long and short sliding windows, respectively, and the calculation formulas are shown in formula (13) and formula (14), respectively; In 1 H MWDDM_H defines the difference between the maximum weighted average of correct prediction and the current weighted average of correct prediction in the long and short sliding windows as Δ s and Δ l , where Δ s = u max s⋅ − u s⋅ and Δ l = u max l⋅ − u l⋅ . Then, when Δ s is greater than the pre-defined threshold s⋅H or l is greater than the pre-defined threshold Δ l⋅H , when either condition is satisfied, the occurrence of concept drift will be reported.

The Mcdiarmid inequality used by MWDDM_M is shown in Theorem 2.
Theorem 2 (Mcdiarmid inequality) Let X 1 , X 2 , …, X n be n independent random variables that all take values in the set X. Furthermore, let f: X n → R X 1 , …, X n , we have ∀i, ∀x 1 , …, x n , x ′ i ∈X, which is shown as formula (15)。 This means that replacing x i with some arbitrary value changes the function f at most c i . For all M >0, we have formula (16).
Finally, given a confidence M , the obtained M , the Mcdiarmid bound is shown in formula (17).
Therefore, MWDDM_M defines two thresholds s⋅M and l⋅M for the long and short sliding windows, respectively, and the calculation formulas are shown in formula (18) and formula (19).
MWDDM_M still uses the difference between the maximum weighted average of correct prediction and the current weighted average of correct prediction in the same long and short sliding window as in MWDDM_H, namely Δ s and Δ l , where Δ s = u max s⋅ − u s⋅ and Δ l = u max l⋅ − u l⋅ . Then, when Δ s is greater than the pre-defined threshold s⋅M or Δ l is greater than the pre-defined threshold l⋅M , when either condition is satisfied, the occurrence of concept drift will be reported.
A multi-level weighted concept drift detection method

Multi-level weighted drift detection method (MWDDM)
Based on the multi-level weighted mechanism proposed above, this paper will analyze the predictions produced by the learner and store them in a double sliding window, and then apply a decision model to try to detect changes in the data distribution and indicate the occurrence of concept drift.
Specifically, given a set of pairs of instances �� ⃗ x i , y i , where �� ⃗ x i is an attribute vector and y i is its corresponding class, for each instance, the base learner will make a prediction ŷ i , and then compare with the actual label y i to decide whether the prediction is correct or not ( ŷ i = y i ? ). The information of the prediction results is stored in the sliding window for the detection model to use. In general, most existing drift detectors analyze the classification accuracy (error rate) and its corresponding standard deviation by predicting the results, and find differences within different windows. Different drift detection methods use different strategies or statistics to monitor the performance of the base classifier and decide when concept drift occurs.
Based on the PAC learning model, MWDDM assumes that as long as the sample distribution is stationary, the error rate will decrease when the number of samples increases, that is, the distribution accuracy tends to increase. Therefore, an increase in error rate or a decrease in classification accuracy indicates a change in the data distribution. Immediately thereafter, the learning performance of existing learners is likely to be degraded. Based on this idea, the classification accuracy rate (or error rate) of the classifier can be used to reflect the data distribution changes in the current data stream. Specifically, this paper uses superimposed long and short sliding windows to obtain the classification prediction results. Based on Hoeffding inequality and Mcdiarmid inequality, two variants of the algorithm are proposed, namely MWDDM_H and MWDDM_M.
The specific flow of MWDDM is shown in Algorithm 1. Lines 1-3 of the algorithm indicate that the window size of the two sliding windows will be initialized, and the parameter values in the algorithm will be assigned, and then s⋅H , l⋅H and s⋅M , l⋅M are calculated for MWDDM_H and MWDDM_M, respectively. Lines 4-7 of the algorithm indicate whether the instances in the window are full, and if so, discard the oldest instance and insert the newest instance. Lines 8-13 of the algorithm indicate that in the "stable level," the weighted average of correct prediction u s⋅ , u l⋅ within the window are calculated, and u max s⋅ and u max l⋅ are updated. Lines 14-22 of the algorithm indicate that it will judge whether the algorithm has entered the "warning level," and if so, update the weighted average of correct prediction u s⋅ , u l⋅ to u ′ s⋅ , u ′ l⋅ , and update to obtain the maximum weighted average of correct prediction u max s⋅ ′, u max l⋅ ′ . Calculate the difference Δ s andΔ l between the maximum weighted average of correct prediction and the current weighted average of correct prediction. Lines 24-29 of the algorithm indicate that, during the "drift level," MWDDM_H and MWDDM_M will determine whether Δ s and Δ l are greater than a pre-defined threshold, and if so, will report the occurrence of a drift and reset the classifier for retraining.

3
A multi-level weighted concept drift detection method

Experiments
In this section, in order to verify the effectiveness of the MWDDM proposed in this paper, this paper conducts experimental evaluations on both artificial datasets and real-world datasets. The experimental platform is Massive Online Analysis (MOA) framework [25]. This paper compares MWDDM with the latest drift detection algorithms, including Drift Detection Method (DDM) [7], Early Drift Detection Method (EDDM) [8], Reactive Drift Detection Method (RDDM) [24], Fast Hoeffding Drift Detection Method (FHDDM) [15], Stacking Fast Hoeffding Drift Detection Method (FHDDMS) [23], McDiarmid Drift Detection Method (MDDM) [16], Drift Detection Method based on Hoeffding's bound (HDDM) [22], and Bhattacharyya distance-based Drift Detection Method (BDDM) [12]. Our experiments are performed on the processor Intel(R) Core(TM) i5-4200H CPU @ 2.80 GHz and 8gb RAM. In Sect. 4.1, this paper introduces the evaluation metrics used in the experiment, introduces the dataset used in the experiment in Sect. 4.2, and presents and analyzes the experimental results in Sect. 4.3.

Evaluation metrics
Currently, the mainstream evaluation metric for detecting concept drift in a data stream is detection delay. When drift occurs at a certain moment, the drift detection algorithm cannot detect it immediately, that is, there is usually a delay in drift detection. Therefore, in order to effectively evaluate the timeliness of drift detection, Detection Delay (DD) is introduced to describe the number of instances between the actual position of the drift and the detected position to evaluate the timeliness of the algorithm. We define i true as the instance position at the actual occurrence time of a certain drift, and i detect as the instance position at the occurrence time detected by the drift detection method, then the detection delay of a certain drift detection is defined as i detect -i true . In this paper, the detection delay DD in a certain dataset is defined as shown in formula (20): where i n true represents the actual instance position of the nth drift in the dataset, i n detect represents the instance position at the n th drift time in the dataset detected by the drift detection method, and m represents the total drift number in the dataset. In addition, the True Positive Ratio (TPR), False Positive Ratio (FPR) and False Negative Ratio (FNR) are defined according to the maximum detection delay Δd introduced in [15]. The maximum detection delay Δd is a threshold that determines how far a detected drift is allowed to be from the true position of the drift to be considered a true drift. In this paper, the maximum detection delay Δd is set to 250 in the dataset containing abrupt concept drift, and 1000 in the dataset containing gradual concept drift. The definitions of false positive ratio and false negative ratio are as follows: Finally, classification accuracy performance (Accuracy), memory usage (RAMhours) and running time (CPU seconds) of drift detection method in real-world datasets are also important evaluation metrics.

Datasets
There have been many studies evaluating the proposed algorithms on artificial datasets of specific types of concept drift. One of the advantages of artificial datasets is to know details such as where the drift is. The real-world datasets used in this paper are as follows. They are frequently used in the fields of concept drift detection and adaptive learning in data streams. All artificial and real-world datasets used in this paper are summarized in Table 1.
Sine This dataset contains abrupt drifts. It takes two properties (x and y) that are uniformly distributed in [0,1] Additionally, the dataset is classified using the following function y = sin(x). Therefore, any instance below the curve is classified as positive, while others are negative until the first drift occurs. The dataset contains a total of 100,000 instances, and every 20,000 instances, a drift occurs, and then reverse classification occurs. The dataset contains a total of four drifts at 20,000, 40,000, 60,000, and 80,000 instances with 10% noise.

3
A multi-level weighted concept drift detection method Circles This dataset contains gradual drifts, which has two continuous attributes x and y. The four circle equations represent 4 different concepts, the instances inside the circle are classified as positive, and the instances outside the circle are negative, a total of two categories. Drift is created by gradually changing the equation of the circle at the drift point. The dataset contains a total of 100,000 instances, and a gradual drift occurs every 25,000 instances, with 10% noise.
LED This dataset contains gradual drifts. The goal of this dataset is to predict numbers on a seven-segment display, where each number has a 10% chance of being displayed. This dataset has 7 class-related attributes and 17 unrelated attributes. Simulate concept drift by exchanging related properties. The dataset contains a total of 100,000 instances, and a gradual drift occurs every 25,000 instances, with 10% noise. Electricity It contains 45,312 instances with 8 input attributes, recorded every half hour for two years by the NSW Electricity Company in Australia. The classifier must predict the rise (Up) or the fall (Down) of the electricity price. Concept drift may stem from changes in consumption habits or emergencies.
Forest covertype It consists of 54 attributes and 581,012 instances describing 7 forest cover types at 30 × 30 m cells obtained from the United States Forest Service (USFS) information system for the Roosevelt National Forest in Northern Colorado 4 wilderness areas.
Pokerhand It consists of 1,000,000 instances, where each instance is an example of five cards drawn from a standard 52-card deck. Each card is described by two attributes (suit and rank), for a total of ten predicted attributes.

Experiments of parameter analysis
First, the determination of parameters s and l in MWDDM is experimentally analyzed. If s and l are collected with all instances in the real-world datasets and the artificial datasets, the number is too large to clearly show the trend of s and l changing with the instances. Therefore, this paper collects the parameter values of 1000 instances before the first drift point and 1000 instances after the first drift point in all artificial datasets, and uses Naive Bayes and Hoeffding tree as classifiers for experiments, respectively. Figures 8 and 10 show the changing trends of the parameter value s around the first drift point in the abrupt drift dataset and the gradual drift dataset, respectively. Figures 9 and 11 show the changing trend of the parameter value l in the abrupt drift dataset and the gradual drift dataset near the first drift point, respectively.
It can be seen from Figs. 8 and 10 that the parameter value s of the method is 0.78-1.0 in most cases, whether in the abrupt drift dataset (Sine, Mixed) or the gradual drift dataset (Circles, LED). The range of s fluctuates continuously, and the s value drops sharply from 0.78 to about 0.4 when instance = 20,000 near the drift point. This means that when 7.8 ≤ s ≤1, the algorithm speculates that the data distribution in the data stream is in the "stable level," and when s <0.78, the algorithm may experience conceptual drift and enter the "warning level." Therefore, this paper sets the parameter value s in the algorithm to 0.78. In addition, it can be seen from Figs. 9 and 11 that the variation trend of l is constantly changing in the range of 0.85-1.0 in most cases, while in the range near the drift point, the value of s drops sharply from 0.85 to around 0.7, and this change is particularly evident in Fig. 12. At the same time, what is shown in the figure is the changing trend of l value under the long sliding window in the gradual dataset, which is particularly important for detecting gradual concept drift. Therefore, this paper sets the value of l to 0.85.

Experiments of drift detection performance
In this section, the proposed method MWDDM and other comparison algorithms are tested on artificial datasets with abrupt concept drift, namely SINE and MIXED, and artificial datasets with gradual concept drift, namely CIRCLES and LED. Experiments were carried out on the NB and Hoeffding tree (HT) as learners, respectively, The experiments give Detection Delay (DD), True Positive Ratio (TPR), False Positive Ratio (FPR) and False Negative Ratio (FNR) over the five tested datasets. The drift detection performance of the algorithms was summarized and analyzed, and the best results are highlighted in the table.
In this paper, the maximum detection delay Δd is set to 250 on datasets with abrupt drift (SINE, MIXED), and 1000 on datasets with gradual drift (CIRCLES, LED), because the drift width of gradual drift is considered in this paper. If Δd is set too small, it will lead to a higher false positive ratio.  Table 2 shows the drift detection performance of MWDDM_H and MWDDM_M and other comparative algorithms using Naive Bayes and Hoeffding tree as learners, respectively, in the LED artificial dataset, which contains gradual concept drift. Regardless of whether Naive Bayes or Hoeffding tree is used as the learner, MWDDM_H and MWDDM_M achieve the lowest DD among all algorithms, followed by MDDM, BDDM, FHDDM, FHDDMS and HDDM_W. Specifically, when using Naive Bayes as the learner, the DD of MWDDM_H is reduced by 9.07 compared with MDDM_E, and it is 45.73 compared with FHDDMS. Specially, MWDDM_H reduce 15.02 of DD compared with the latest drift detection method BDDM. Both EDDM and DDM have the highest DD. In addition, EDDM has the highest FPR and FNR. Table 3 shows the drift detection performance on the CIRCLES artificial dataset, which contains gradual concept drift. Among all algorithms, MWDDM_M has the lowest DD in the dataset, followed by MWDDM_H, MDDM and HDDM_W. Compared with the next best performance of MDDM_E, MWDDM_H reduces In addition, the MWDDM algorithm in this paper also has false detections to some extent. Table 4 shows the drift detection performance on the SINE artificial dataset, which contains abrupt concept drift. When using Naive Bayes as the classifier, MWDDM_H achieves the lowest DD and FNR among all algorithms, but also has a certain FPR, followed by MWDDM_M. In addition, EDDM and DDM have the highest DD and the lowest TPR. When using Hoeffding tree as the classifier, HDDM_W achieved the lowest DD, followed by MWDDM_H, MWDDM_M, and BDDM. Also, EDDM, and DDM had the highest DD.

3
A multi-level weighted concept drift detection method  Fig. 13 Classification accuracy in real-world dataset using HT Finally, the drift detection performance of the algorithm in the MIXED artificial dataset, which also has abrupt drift, is shown in Table 5. Similarly, MWDDM_M achieved the lowest DD and highest TPR, followed by MWDDM_H, but both outperformed algorithms such as BDDM, FHDDMS, and HDDM_W. EDDM and DDM have the highest false negative rate.
To sum up, the experiments of MWDDM_H and MWDDM_M and the comparison algorithm on artificial datasets show that both the datasets with abrupt drift and the datasets with gradual drift can outperform all other comparison algorithms. In most cases, MWDDM_H and MWDDM_M have the lowest DD, the highest TPR, and the lowest FNR. The main reason for this is because of the Multi-level weighted mechanism MWDDM used in the concept drift detection process. The multi-level weighted mechanism gives more weights to the instances in the data stream where concept drift is more possible to appear. Based on this, MWDDM can detect the concept drift with lower DD. Besides, the buffering effect this mechanism brings can make MWDDM more robust to noise that should be differentiated from concept drift which means MWDDM can detect the drift with low DD and low FPR at the same time.

Experiments of accuracy
In this paper, the proposed algorithms MWDDM_H and MWDDM_M are tested for accuracy in real-world datasets, namely Poker hand, Electricity and Forest Covertype. The real-world dataset means that the specific location and duration of the concept drift in the dataset will not be known, so the evaluation indicators such as detection delay, true positive ratio, false positive ratio and false negative ratio will not be able to be evaluated on the real-world dataset. Therefore, on the real-world dataset, we consider the classification accuracy in the dataset as well as the running time and memory consumption. Figure 12 shows the classification accuracy of MWDDM_H, MWDDM_M, and other comparison algorithms on three real-world datasets using Naive Bayes as the classifier. In Fig. 12a, showing the performance on the POKER HAND dataset, BDDM, DDM, HDDM_A, and MWDDM_H achieve the highest classification accuracy. In Fig. 12b, EDDM, HDDM_A, and MWDDM_H and MWDDM_M achieve the highest classification accuracy. In Fig. 12c, EDDM and MWDDM_H achieved the highest classification accuracy. Figure 13 shows the classification accuracy of MWDDM_H, MWDDM_M, and other comparison algorithms on three real-world datasets using Hoeffding tree as classifiers. In Fig. 13a, it can be concluded that DDM, HDDM_A, and MWDDM_H and MWDDM_M have the highest classification accuracy. In Fig. 13b, in the first half of the ELECTRICITY dataset, RDDM has the highest classification accuracy in most cases. In the second half of the dataset, MWDDM_H and MWDDM_M are more accurate in classification in most cases. Finally, in Fig. 13c, MWDDM_H and MWDDM_M have higher classification accuracy in most cases, followed by RDDM and BDDM. In addition, it can be found in Figs. 12 and 13 that the classification accuracy of MWDDM_H and MWDDM_M can rise faster, that is, faster than all other algorithms to recover from the concept drift that may exist in the real-world dataset, which also means that MWDDM_H and MWDDM_M have lower detection latency for concept drift, enabling faster detection of drift and then resetting the classifier.
Finally, this paper summarizes and analyzes the evaluation time (CPU seconds) and memory usage (RAM-Hours) of MWDDM_H, MWDDM_M and other comparison algorithms on real-world datasets. CPU seconds describes the time of executing the mining algorithm on the CPU running at full power. Compared with the running time of the whole process, CPU seconds can more reasonably describe the time consumption of the drift detection method; RAM-hours describes the calculation cost of the mining process of the mining algorithm, and 1 ram hour represents the memory resources used to call 1 GB random access memory (RAM) for one hour. These two evaluation indicators are obtained under the MOA framework [28] and are used in current data stream mining algorithm literatures. Tables 6 and  7 show the space-time consumption of MWDDM_H and MWDDM_M and other algorithms on three real-world datasets with Naive Bayes and Hoeffding trees as learners, respectively. For the convenience of display, the three real-world datasets of Poker hand, Electricity and Forest covertype are represented by PH, ELE, and FC in the table, respectively. In terms of running time, MWDDM_H and MWDDM_M spend less running time on datasets POKER HAND and ELECTRICITY than most other comparison algorithms. In terms of memory consumption, although MWDDM_H and MWDDM_M use double-layer windows, they can generally achieve less memory consumption due to the way of accessing the prediction results.
In summary, the experiments of MWDDM_H and MWDDM_M and their comparison algorithms on three real-world datasets show that MWDDM_H and MWDDM_M have the highest or higher classification accuracy in most cases, and their time and space consumption also have excellent performance. Main reason for the above results is that MWDDM can detect the drift in the datasets earlier than other comparison algorithms which means MWDDM can adjust the learning model more timely so as to reduce the loss of accuracy. In particular, it is able to recover from concept drift that may exist in real-world datasets faster than other contrast algorithms, which shows that MWDDM_H and MWDDM_M can detect drift faster to allow the learner to react.

Conclusion
In many real-world application scenarios such as user preferences, monitoring systems, weather forecasting, and financial fraud detection, concept drift has become an urgent problem to be solved. In order to better solve the problem of concept drift in the data stream, this paper proposes a multi-level weighted concept drift detection method (MWDDM), which proposes a threshold parameter for level transition, and introduces a "Stable level-Warning level-Drift level" multi-level weighted mechanism in the concept drift detection process. The double sliding window mechanism is also applied in the method. During the "stable level," MWDDM will assign weights to the instances within the window, the newest instances are assigned a larger weight, and the old outdated instances are assigned a lower weight, and the difference in weight values between instances is small during this level. After entering the "warning level," the algorithm will increase the difference in weights between instances within the windows to detect concept drift faster. Finally, in the "drift level," two variants of the algorithm, MWDDM_H and MWDDM_M, use Hoeffding inequality and Mcdiarmid inequality to determine whether concept drift has occurred. The method in this paper can detect the abrupt drift and gradual concept drift faster in the data stream with lower false positive ratio and false negative ratio, and achieve high classification accuracy in real-world datasets, and achieve good performance in terms of space-time consumption. In future work, we consider using an adaptive windowing mechanism and take measures to enhance the robustness to noise in the data stream to reduce the false positive ratio.