A Novel Framework for Concept Drift Detection for Classification Problems in Data Streams

doi:10.21203/rs.3.rs-3244928/v1

In streaming data environments, data characteristics and probability distributions are likely to change over time, causing a phenomenon called concept drift, which poses challenges for machine learning models to predict accurately. In such non-stationary environments, there is a need to detect concept drift and update the model to maintain an acceptable predictive performance. Existing approaches to drift detection have inherent problems like requirements of truth labels in supervised detection methods and high false positive rate in case of unsupervised drift detection. In this paper, we propose a semi-supervised Autoencoder based Drift Detection Method (AEDDM) aimed at detecting drift without the need of truth labels, yet with a high confidence that the detected drift is real. In a binary classification setting, AEDDM uses two autoencoders in a layered architecture, trained on labelled data and uses a thresholding mechanism based on reconstruction error to signal the presence of drift. The proposed method has been evaluated on four synthetic and four real world datasets with different drifting scenarios. In case of real-world datasets, the induced and detected drifts have been evaluated from classifier’s performance viewpoint using seven mostly used batch classifiers as well as from adaptation perspective in an online learning environment using Hoeffding Tree classifier. The results show that AEDDM affectively detects the distributional changes in data which are most likely to impact the classifier’s performance (real drift) while ignoring the virtual drift thus considerably reducing the false alarms with an ability to adapt in terms of classification performance.

Artificial Intelligence and Machine Learning

Concept drift

Machine learning

Autoencoder

Data stream

Deep learning

Due to current advancement in artificial intelligence, cloud storage and computing, the tendency towards the usage of machine learning applications for monitoring and predicting the real-world has increased. Amongst other complex and specialized machine learning tasks, classification is the most widely used. The performance of a machine learning model, specifically a classifier, heavily depends on its generalization capability and the coherence in statistical properties of training and test data. At the time, these models are built and tested, the training and test data is available in its entirety and produces reasonably good results. However, it has been observed that the predictive performance of machine learning models is often impacted when they are deployed in production (Schelter et al. 2018; Oladele 2021; Schröder and Schulz 2022). Due to this variability in performance, there have been concerns about the wide scale applicability of these machine learning solutions in industry. The reason behind this performance degradation is related to the unique characteristics of the data (Wares et al., 2019). Specifically, in real world machine learning scenarios, data characteristics and distribution change over time leading to a non-stationary environment which poses challenges for classification models to predict accurately and there is a need to constantly monitor the data stream for detecting such changes and then adapt and retrain the model(Flórez et al., 2023). This phenomenon where the data characteristics and distribution change resulting in a need to update the model is called concept drift and the adaption of the model to the new changes is called concept drift adaptation (Gama et al. 2004; Schröder and Schulz 2022). This research domain has been extensively reviewed from different perspectives (Iwashita and Papa 2019; Hu, Kantardzic, and Sethi 2020; Gemaque et al. 2020; Gama et al. 2014).

From a probabilistic viewpoint, concept drift can be considered as a change in the joint probability distribution of features ‘X’ and the corresponding labels ‘Y’ with respect to time ‘t’. Concept drift occurs between time S_t and S_t+1 if P_t (X ,Y) \(\ne\) P_t+1(X ,Y), where P_t is the joint probability distribution of the feature vector X and the target class label Y at time S_t ; and P_t+1 is the joint probability distribution at S_t+1 (Gama et al., 2014). Since the joint pdf of X and Y can be written as P (X, Y) = P(Y|X) x P(X), concept drift can occur due to the changes in the data distribution i.e., P(X) alone, or due to the changes in the posterior i.e., P(Y|X) alone or due to change in both. Based on the source of drift and its impact on the joint probability distribution of features and the target, concept drift can be virtual or real. If joint probability distribution of features and target concept changes due a change in P(X) (change in the distribution of features) without impacting the decision boundary P (Y /X) then such type of drift is virtual drift. If the decision boundary P (Y/X) changes due to a change in P(Y) (Prior probability of the target concept) or P (X/Y) (prior conditional) or due to a change in both, then such a drift is called a real drift (Lu et al., 2019).

Changes in data distribution can occur at different patterns which may or may not affect the decision boundary (real or virtual concept drift) in a supervised learning scenario. In case of real drift, the performance of the classifier will degrade in terms of evaluation metrics and a model update will be required(Lu et al., 2019) which can be done periodically in a passive manner to incorporate the new changes or based on an implicit drift detection mechanism. The former passive approach to update the model is expensive in terms of computational resources as it results in periodic unnecessary model updates even if there is no drift. The latter implicit drift detection is more practical as it updates the model only if there is sufficient evidence about the occurrence of the drift. Based on the speed of the change, drift can also be categorized as sudden, gradual, incremental and recurring (Gama et al., 2014). In this research, we are concerned with implicit drift detection i.e., detecting sudden and gradual changes in data distribution which are likely to impact the decision boundary (real drift).

Recent approaches to implicit drift detection have some inherent problems. Supervised drift detection methods like DDM (Gama et al., 2004), LFR (Heng Wang & Abraham, 2015), RDDM (Barros et al., 2017), and others (Yu & Abraham, 2017; Cabral & Barros, 2018; Pesaranghader et al., 2018) monitor the model’s performance measures like accuracy, error rate and recall etc. to detect drift. In data streams, these labels are not available immediately after the prediction and it is not possible to detect drift in real time. Semi-supervised methods like OLINDA (Spinosa et al., 2007), SAND (Haque, Khan, & Baron, 2016), ECHO (Haque, Khan, Baron, et al., 2016) rely on the confidence level of predictions to detect drift. These methods are classifier dependent as different classifiers will have different confidence levels. Unsupervised drift detection techniques like (Qahtan et al., 2015; Gu et al., 2016; Dos Reis et al., 2016züaçık et al., 2019 and Z. Wang & Wang, 2020) and others etc detect drift by monitoring the changes in the data distribution. These unsupervised techniques suffer from complexity of density estimation from high dimensional data and high false positive rate (detected drift is virtual not impacting the classifier performance thus causing unnecessary model updates).

Motived by the recent success of deep learning-based techniques in addressing complex real-life problems and their inherent capability to deal with high dimensional data, we propose a semi-supervised Autoencoder based Drift Detection Method (AEDDM) to detect drift in data streams with a focus to address the problems in current supervised, semi-supervised and unsupervised drift detection techniques. Some recent research papers on drift detection using autoencoders have shown the effectiveness of deep learning technology specifically autoencoders in drift detection domain (Jaworski, Rutkowski, & Angelov, 2020; Yong et al., 2020b; Jaworski et al., 2018; Menon & Gressel, 2021), but these methods also have some limitations. Former three research papers do not consider the classification scenario and are limited to change detection in the data distribution using reconstruction error and other plots without any explicit mechanism (algorithm) to generate warnings and detect drift. Although ADD (Autoencoder based Drift Detection) proposed by (Menon & Gressel, 2021) uses a thresholding mechanism to detect drift but confirms drift if a single batch exceeds threshold thus not considering the possibility of false alarms. Another limitation of this work is that the same threshold is used for different datasets which we have experimentally observed that each dataset as well as each class data has its own reconstruction loss pattern and threshold. Considering the bottlenecks in current supervised, semi-supervised, unsupervised, and deep learning-based drift detecting techniques, our proposed AEDDM methodology for drift detection has the following potential research contributions:

An Autoencoder based Drift Detection Method (AEDDM) which uses a thresholding mechanism and a layered architecture to monitor the data distribution of both classes in a binary classification setting. An explicit algorithm which monitors the changes in data distribution based on a thresholding mechanism to generate warnings and then confirm the drift.

Use of two different thresholds namely batch threshold and count threshold, thus incorporating extrinsic as well as intrinsic measures to monitor the changes in the incoming batch stream which makes the drift detector robust to false alarms as well as adaptable.

A drift detector as effective in detecting real drift as supervised drift detection techniques with the added advantage of no need of true class labels.

Unsupervised drift detection yet with a high confidence that the detected drift is real without causing unnecessary model updates and processing overheads.

Creation of new synthetic datasets which can be used by the research community along with a novel mechanism to introduce gradual drift at discrete intervals in real-world datasets.

We are specifically interested in drift detection rather adaptation. After successful drift detection, any standard adaptation strategy can be used to update the model. The rest of the paper is organized as follows: Section 2 describes the related work in concept drift detection domain; Section 3 provides the theoretical aspects and Section 4 covers our proposed methodology. Experimental results are presented in Section 5 followed by Section 6 which include conclusion and future research directions.

In data streams, an active drift detection mechanism is essential in detecting real drifts and then updating the model with an adaptation strategy in a cost-effective manner. Extensive work has been done by the research community in this direction and still the field is emerging with new extensions. Recent Concept drift detection methods can be classified as supervised/explicit, unsupervised/implicit, and semi-supervised based on the drift detection mechanism. Various review papers on concept drift detection techniques classify different methods into different categories like statistical-based, window-based, ensembles etc. among supervised drift detection approaches (Wares et al., 2019), unsupervised drift detection methods (Gemaque et al., 2020) and active and passive drift detection methos (Ditzler et al., 2015). A brief literature review of these methods is provided in this section.

2.1 Supervised / Explicit Drift Detection Methods

Supervised drift detection methods monitor the performance measure of a classifier like accuracy, precision, recall, f-measure, or error rate and assume that class labels are available at prediction time. These methods have been categorized as statistical, window-based and ensemble methods.

2.1.1 Statistical Methods

Statistical methods of drift detection apply statistical tests to a window of performance scores of the classifier to detect any significant changes in its performance. The Sequential Probability Ratio Test (SPRT) (Wald, 1973) is used to test the hypothesis whether the incoming data belongs to a distribution P₀ or P₁ or more samples are needed to reach to a conclusion. Cumulative Sum (CUSUM) which is also based on SPRT, uses the residuals as an input to detect the change (Page, 1954). A variation of CUSUM is Page- Hinckley (PH) test which is used to detect abrupt changes in the average of a gaussian signal (Page, 1954). Although CUSUM and PH are similar algorithms, they are used in different streaming environments. CUSUM is applied on residual from a predictor and is used for anomaly detection whereas PH is better suited to signal processing environments to detect sudden changes. Stagger (Schlimmer & Granger, 1986) employs concept evaluation and then refinement using Boolean characterization and changes the concept definition (characterization) if the existing definition is not able to provide satisfactory results. Stagger was observed to be sensitive to overfitting, taking longer times to adopt to a new concept after being trained for a longer time on an old concept.

Drift Detection Method (DDM) (Gama et al., 2004) models the error-rate of an online classifier with a binomial distribution and generates a warning level and then a drift alarm if the error-rate exceeds predefined thresholds. DDM can detect sudden or abrupt drift but suffers in case the concept is changing gradually which goes un-noticed and no warning is generated. A work like DDM is Early Detection Method ( EDDM) (Baena-García et al., 2006) which uses the distance between two classification errors to detect the drift and is able to detect slow gradual changes more effectively as compared to DDM. The EDDM method also defines warning levels and drift levels to detect drift like DDM but requires 30 classification errors to occur before it can detect any drift. In some cases, it may take many examples to conceive 30 classification errors and may cause memory overflow. Both DDM and EDDM are sensitive to class imbalance scenario as the minority class may contribute very little to accuracy, to address this (S. Wang et al., 2013) proposed a Drift Detection Method for Online Class Imbalance Learning ( DDM-OCI) which uses the change in recall of the minority class (true positive rate) to detect drift. DDM-OCI assume that the presence of class imbalance is known in advance and was evaluated on binary classification problems. It suffers from false positives as well. Another problem with DDM-OCI is that it is quite possible for a drift to occur without changing the recall of minority class, for example in case of a drift from imbalance class distribution to a balance class distribution.

Statistical Tests for Equal Proportions ( STEPD) (Nishida & Yamauchi, 2007) uses two additional parameters as significance levels for warning and drift levels based on changes in the accuracy of an old and a recent window with size 30 but with small sample size, used statistical test gets ineffective. Other prominent supervised drift detection methods include HDDM (Hoeffding’s bounds Drift Detection Method(Frías-Blanco et al., 2015), LFR (Linear Four Rates) (Heng Wang & Abraham, 2015), HLFR ( Hierarchical Linear Four Rates(Yu & Abraham, 2017), RDDM (Reactive Drift Detection Method)(Barros et al., 2017), variants of Fisher’s Exact test (Cabral & Barros, 2018) and McDiarmid Drift Detection Methods (MDDMs) (Pesaranghader et al., 2018). A detailed literature review of all these methods is available in this pre-print.

2.1.2 Window-based Methods

Window-based methods statistically monitor the sliding windows of various sizes instead of monitoring individual instances in a stream. The distribution of a current window is compared with a reference distribution to detect any significant change. (Bifet & Gavaldà, 2007) proposed an adaptive-window based drift detection method (ADWIN2) as an improvement of their initial work (ADWIN) which uses a variable size window which adjusts its size based on changes in the data distribution. In ADWIN, user needs to specify only the size of one larger window W which is split up into two optimal sub-windows W_hist and W_new based on the detected significant change in the means of any such two windows. Windows size grows if no change is detected and shrinks when a drift or change is detected. (Hulten et al., 2001a) proposed Concept Adapting Very Fast Decision Tree Algorithm (CVFDT) which uses a sliding window to process the data stream and update the tree nodes. If there is a drift, then tree nodes will not pass the Hoeffding’s test in which case a new sub-tree is created with the best new attribute at the root. If the new sub-tree gives better classification performance than the old one, then the old tree is replaced. An improvement to CVFDT was E-CVFDT (G. Liu et al., 2013) which gives better performance in gradual drift scenarios.

Table 1

Review of Drift Detection Methods
Supervised Drift Detection
	Statistical	SPRT (Wald, 1973)
		CUSUM(Page, 1954)
		PH (Page, 1954)
		STAGGER (Schlimmer & Granger, 1986)
		DDM (Gama et al., 2004)
		EDDM (Baena-García et al., 2006)
		STEPD (Nishida & Yamauchi, 2007)
		DDM-OCI (S. Wang et al., 2013)
		LFR (Heng Wang & Abraham, 2015)
		HLFR (Yu & Abraham, 2017)
		RDDM (Barros et al., 2017)
		MDDM (Pesaranghader et al., 2018)
	Window-based	CVFDT (Hulten et al., 2001a)
		ADWIN (Bifet & Gavaldà, 2007)
		E-CVFDT (G. Liu et al., 2013)
	Ensemble-based	SEA (Nick Street & Kim, 2001)
		AWE (Haixun Wang et al., 2003)
		AUE (Brzeziński & Stefanowski, 2011)
		DWM (Kolter & Maloof, 2007)
		DOED (Sidhu & Bhatia, 2015)
		RDWM (Sidhu & Bhatia, 2019)
		Learn + + Family ((Ditzler & Polikar, 2013a)
Semi-supervised Drift Detection		SAND (Haque, Khan, & Baron, 2016)
	Non-DL Methods	ECHO (Liao et al., 2016)
		OLINDDA (Spinosa et al., 2007)
		ECSMiner (Masud et al., 2011)
	DL Methods	Bayesian Autoencoders(Yong et al., 2020a)
		RBM (Jaworski et al., 2018)
		Autoencoders (Jaworski, Rutkowski, Angelov, et al., 2020)
		Autoencoders, ADD (Menon & Gressel, 2021)
		Autoencoders (Castellani et al., 2021)
Unsupervised Drift Detection	Batch Based	MD3 (Sethi & Kantardzic, 2015)
		MD3-RS (Sethi & Kantardzic, 2017)
		NN_DVI (A. Liu et al., 2018)
		A PCA based Change detection (Qahtan et al., 2015)
		HDDDM (Ditzler & Polikar, 2011)
		KS-test (Z. Wang & Wang, 2020b)
	Online	Incremental KS Test (Dos Reis et al., 2016)
	Online	KSWIN (Raab et al., 2020)

2.1.3 Ensemble Methods

Instead of using a single classifier’s error rate, ensemble methods use a group of classifiers and their average error rate to detect such changes in underlying concepts. (Nick Street & Kim, 2001) proposed SEA (Streaming Ensemble Algorithm) which builds K C4.5 classifiers sequentially on a fixed chunk size to build an ensemble C. When the ensemble is full, it is used for prediction for the incoming data chunk. It also builds a single C4.5 classifier and compares the performance of the ensemble and the single classifier. If the performance of the single classifier is better than the ensemble, then the worst performing classifier is removed, and the new classifier is added to the ensemble. One drawback of SEA lies in its mechanism to remove the worst performing classifier from the ensemble without considering the recency of the data it was trained on. An ensemble with pre-determined size can still have many poor performing classifiers trained on quite older concepts. This problem was addresses by (Haixun Wang et al., 2003) ‘s Accuracy Weighted Ensemble ( AWE) which builds a new classifier on each arriving chunk like SEA, but instead of removing the worst performing classifier from the ensemble, it entirely builds a newly ensemble including only those classifiers with MSE less than a pre-defined threshold thus resulting in a variable size ensemble.

An obvious drawback in AWE was the silencing effect resulting in no class prediction if none of the classifiers meet the MSE threshold in case of a sudden drift. An improvement to AWE was made by (Brzeziński & Stefanowski, 2011)‘s Accuracy Updated Ensemble (AUE) algorithm enabling individual classifiers to be updated directly instead of just weights updating. An incremental ensemble method based on Dynamically Weighted Majority ( DWM) was proposed by (Kolter & Maloof, 2007) which maintained a weighted pool of experts as base learners. If the ensemble made a mistake, a new expert was added to the pool, and if an expert or base learner made a mistake, its weight was decreased. If an expert continuously made mistakes, it was removed from the ensemble based on a set threshold. DWM uses the prediction from each base learner and its weight to compute the ensemble prediction. Recurring Dynamic Weighted Majority (RDWM) (Sidhu & Bhatia, 2019) is another ensemble method which employs two ensembles to detect recurrent drift. There are other ensemble methods from Learn + + family ((Polikar et al., 2001; M. Muhlbaier et al., 2004, M. D. Muhlbaier & Polikar, 2007, M. D. Muhlbaier et al., 2009, Ditzler & Polikar, 2010b, Ditzler & Polikar, 2013a) which use ensembles of Neural Netwrok based weak learners with varying weighting and voting mechanisms.

2.2 Semi-supervised Drift Detection Methods

The dependence of drift detectors on the availability of class labels in supervised drift detection techniques and the associated cost and delays in the availability of the true labels in real world applications drew the attention of the research community towards framing techniques which either do not rely on class labels or have limited reliance. Some of these works require labeled data for initial training and initialization of drift detectors (Castellani et al., 2021) and drift detection is done in completely unsupervised way; while other methods use available labelled data for training of classifiers and initialization for drift detectors and classifier’s confidence levels are needed for drift detection (Haque, Khan, & Baron, 2016). The formers methods are mainly deep learning methods which use autoencoders for drift detection while the later methods use classifiers like SVM, KNN, ensembles or clustering.

A semi-supervised drift detection method SAND (Semi-Supervised Adaptive Novel Class Detection) (Haque, Khan, & Baron, 2016) uses classifier’s confidence to detect concept drift. It uses a K-NN based ensemble and needs only limited data for model updating where the confidence level is low. SAND is also able to detect outliers. However, due to change detection after calculating each confidence, SAND becomes inefficient in terms of execution time. To overcome this, (Haque, Khan, Baron, et al., 2016), proposed ECHO (Efficient Handling of Concept Drift Evolution over Stream Data ) which uses dynamic programming and performs change detection selectively. (Pinagé et al., 2020) proposed a semi-supervised drift detection method which uses self-annotation and ensemble learners like SAND and uses dynamic classifier selection in an online setting. Other well-known semi-supervised drift detection methods include OLINDDA (Online Novelty and Drift Detection Algorithm) (Spinosa et al., 2007) which uses k-mean clustering ; ECSMiner (Enhanced Classifier for data Streams with novel class Miner) (Masud et al., 2011) which are more focused on novel class detection .

2.2.1 Deep Learning based Drift Detection Methods

Some recent works on drift detection are based on deep learning methods including autoencoders and Restricted Boltzmann Machine (RBM). (Yong et al., 2020a) used Bayesian autoencoders to detect drift in sensors data of an industrial environment. Three different measures including reconstruction loss, aleatoric and epistemic uncertainties have been used to detect drift. In case of a real drift (drift already present in data due to sensors degrading conditions) all three measures show a considerable deviation. (Jaworski et al., 2018) applied RBM on a synthetic binary dataset generated with the help of RBM to detect sudden and gradual drift. Two indicators, reconstruction loss and free energy have been used to detect drift in the data. In case of drift, both measure indicate a considerable difference from the normal data. (Jaworski, Rutkowski, Angelov, et al., 2020) applied autoencoders to the same dataset and used reconstruction error and cross-entropy using autoencoders and proved that sudden and gradual drift can be detected with autoencoders. All these methods are limited to change detection in data distribution and do not consider the classification scenario without considering the impact of changes in data distribution on the classifier’s decision boundary. Autoencoder based Drift Detection (ADD) (Menon & Gressel, 2021) is another recent work which uses autoencoders to detect drift in phishing data. To detect drift, reconstruction loss is compared with user defined thresholds and Hoeffding’s tree is used as a classifier. The author has showed significant improvement in accuracy after drift detection and adaptation. However, this work too has some limitations like using a single autoencoder to model the distribution of all the classes in a classification dataset, use of the same thresholds for all the datasets and not considering the possibility of false positives. Another autoencoder based drift detection method which considers a classification scenario is proposed by (Castellani et al., 2021) and uses a constrained low-dimensional embedding of the input data. The proposed method shows the effectiveness of autoencoders in learning the distribution of data and detecting real drifts while ignoring the virtual drifts.

2.3 Unsupervised Drift Detection Methods

Algorithms in unsupervised drift detection methods usually maintain two windows namely reference(historical) window and detection window (new data) and use a distance measure to quantify the difference between the distribution of historical data and new data. Historical window is keep fixed while detection window is a sliding. If the difference in data distributions of two windows is significant, then a drift is detected with an indication of drift points (Lu et al., 2019). Unsupervised drift detection is also known as “data distribution-based drift detection” or “implicit drift detection”. Unsupervised drift detection methods have been broadly classified into batch-based and online-based methods (Gemaque et al., 2020). If drift is detected based on a batch of data elements as a detection window, then these methods are called batch-based drift detection methods and if drift is detected based on each individual instance in the detection window, then it is called as online drift detection.

2.3.1 Batch-Based Unsupervised Drift Detection Methods

Margin Density Drift Detection (MD3) (Sethi & Kantardzic, 2015) uses a trained SVM classifier with a known set of minimum and maximum density values [𝜌_min, 𝜌_max] along with a threshold θ𝜌. For an incoming batch of data, MD3 checks every instance whether it lies in margins or not and counts number of instances lying in the margin. It compares this count with previous 𝜌_min, 𝜌 _max values and updates 𝜌_min, 𝜌 _max for the current batch. If 𝜌_{max -} 𝜌 _min > θ𝜌 then a drift is detected. One major problem with MD3 is that it is classifier dependent and works only with SVM. (Sethi & Kantardzic, 2017) proposed a classifier-independent version of MD3 which monitors blind spots densities of multiple classifiers and if there is a considerable disagreement between individual classifiers then it is an indication of a high uncertainty. It uses margin density in the case of linear SVM and blind-spot density in case of other classifiers like decision trees and nearest neighbors as base learners in an ensemble. A significant difference between margin or blind-spot densities of two data windows (reference and detection window) indicates a drift in the dataset. This modified MD3 technique is referred to as MD3-RS (Random Subspace). A drift detection method based on dissimilarity in regional densities NNDVI ( Nearest Neighbour based Density Variation Identification) was proposed by (A. Liu et al., 2018) which uses K-Nearest Neighbour ( KNN) to identify the variation in regional densities and to detect drift. (Qahtan et al., 2015) proposed a PCA based drift detection method for multidimensional data streams which uses principal component analysis to project the high dimensional data into a lower dimensional space and then uses density estimation to compare two windows for drift detection. HDDDM (Hellinger Distance based Drift Detection Method)(Ditzler & Polikar, 2011) is another batch based drift detection method which uses Hellinger distance between a reference and a detection window to detect drift. A KS (Kolmogorov Smirnov ) test based drift detection (Z. Wang & Wang, 2020b) another batch based method which uses KS test to compare the distributions of two batches and signals drift if the difference is significant based on the computed p-value and chosen significance level.

2.3.2 Online-based Unsupervised Drift Detection Methods

An online drift detection method based on an incremental KS-test (Dos Reis et al., 2016) tests whether two samples belong to the same data distribution or not .The hypothesis test is applied with the addition of each new sample in the detection window. KSWIN (Kolmogorov- Smirnov Windowing) (Raab et al., 2020) is another online version of KS based drift detection which uses a fixed-size sliding window ψ divided into two sub windows R (recent concept) and W (last concept) of the same size r . The elements of W are sampled uniformly from n-r part of the ψ (where n is the size of ψ) and last r represent the recent concept R. If two windows R and W differ significantly then a drift is detected.

2.4 Summary of Literature Review

The above literature review has been provided on the most common and popular drift detection methods and summarized in Table 1. The characteristics of drift detection techniques are summarized in Table 2.

Table 2

Summary of Drift Detection Techniques
Comparison Criteria	Detection Approach
Comparison Criteria	Supervised	Semi-Supervised	Unsupervised	Deep Learning Based
Dependency on true labels in detection	⎫	×	×	×
Dependency on true labels in adaptation	⎫	Limited	⎫	⎫
Classifier dependency	⎫	⎫	×	×
False alarms	×	⎫	⎫	under research
Types of drift detected	Surely Real	Probably Real	Probably virtual	Under research
Signal Analyzed	Univariate	Univariate	Multivariate	Univariate
Ability to summarize high dimensional data	×	×	Limited	Inherent

The properties of an ideal drift detector can be summarized as follows:

An ideal drift detector should be as reliable as is the drift detection based on supervised drift detection techniques.

Drift detection process should not be dependent on class labels as in case of unsupervised drift detection techniques.

Any available labelled should be leveraged in training a drift detector like in semi-supervised drift detection techniques.

The power of deep learning should be used to summarize high dimensional data.

Keeping in view the derived properties of an ideal drift detector, we propose an Autoencoder based Drift Detection Method (AEDDM) which works in batches and the drift detection is done in completely unsupervised way with a high confidence that the detected drift is real. In the next section, we provide theoretical aspects of AEDDM including a autoencoders and batch based drift detection.

The proposed AEDDM approach for drift detection is an autoencoder based approach where drift detection is done in a batch manner. Both concepts are briefly described in this section.

3.1 Autoencoder

An autoencoder is an artificial neural network that learns efficient data encodings for the input data by ignoring the noise to re-generate the input at the output layer (Goodfellow 2016; Soppin, Ramachandra, and Chandrashekar 2021). It uses a set of recognition weights to map the input into a code vector at the hidden layer and then uses a set of generative weights to reconstruct the coded vector into original input at the output layer (Hinton & Zemel, 1994). A simple autoencoder consists of an input layer, one or more hidden layers and an output layer of the same size as of input layer.

It has an encoder part which consists of an input layer and one or more hidden layers. In the case of more than one hidden layer, the later hidden layers are smaller in size so that the network can encode the original input onto a smaller space. The last hidden layer in the encoder part is called the bottleneck. The decoder part is the exact replica of the encoder part. The encoder layer uses a non-linear function f to encode the input layer values to a latent and compressed representation as given by the equation h = f(x) while decoding the latent represention, the decoder uses another function g to reconstruct the origional input as given by the equation g(h) = x\({\prime }\) The reconstruction loss is defined as the mean squarred difference between the original input and the reconstructed input over all training instances and can be represented by Eq. 1. Minimizing the reconstruction loss acts as the objective function in training an autoencoder.

L ( x,x \({\prime }\) )= \(\frac{1}{n}\sum _{i=1}^{n}{\left({x}_{i}-{x}_{i}^{{\prime }}\right)}^{2}\) Eq. 1

Consider an autoencoder with only one hidden layer. Inputs are encoded to a latent representation at the hidden layer by using a nonlinear activation function as given by the equation h = σ (Wx + b) where σ is an element-wise sigmoid or Rectified Linear Unit (ReLU) activation function, W and b are weight and bias vectors respectively, which are initialized randomly during the training phase. Decoding takes place through decoder by using the encoded representation at the bottleneck and is given by the equation \({X}^{{\prime }}\)= \({\sigma }^{{\prime }}\left({W}_{h}^{{\prime }}+b\right)\). During the forward pass in training through backpropagation, difference is calculated between the original input X and reconstructed input X\({\prime }\) and weights and biases are updated in the backward pass based on the computed error. The training continues till the number of epochs elapses and objective function is minimized.

3.2 Batch-based Drift Detection

In batch-based drift detection, the streaming data is accumulated in small batches of fixed size and drift detection is done in a batch manner. A general framework for unsupervised batch-based drift detection adapted from (Gemaque et al., 2020) and (A. Liu et al., 2018) with slight modifications is shown in Fig. 2. It consists of four modules namely offline training phase, online data accumulation phase, distribution comparison phase and a significance test phase.

In offline training phase, distribution parameters are computed from the available data ( reference window) like computation of margin densities (Sethi & Kantardzic, 2017) and reconstruction error (Castellani et al. 2021;Jaworski, Rutkowski, Angelov, et al. 2020) which are used to define different thresholds. In module 2, incoming data stream is divided into batches of fixed size and a distribution comparison is made for each batch with respect to the reference window in module 3, while the significance of the difference is tested in module 4 with the help of a statistical test to decide whether the incoming batch has a different distribution as compared to the reference window. The proposed AEDDM method follows a similar framework which is described in the next section.

The proposed AEDDM framework follows a batch-based semi-supervised drift detection mechanism. At the architectural level, it has three components; an offline component where two autoencoders are trained on labelled data for each binary class problem and thresholds are computed; an ensemble component which defines the sequential order of the autoencoders; and an online component where data arrives in batches and drift detection is performed for the whole batch data stream.

During the offline training phase (see Fig. 4), the available labeled dataset is divided into positive and negative class data based on labels in the dataset. Both positive and negative class data are split up into training (training & validation) and validation sets. Two autoencoders are trained, one for each class. In case autoencoder is to be used as a classifier, one autoencoder can be used but to learn the data distribution of each class perfectly, we need two autoencoders each trained on individual class data. An autoencoder architecture with five layers (Input, H1, Bottleneck, H1, Output) is used to train the autoencoders.

For threshold computation, the validation data of each class is divided into batches of size 32 and passed to the respective autoencoder. Reconstruction loss for each instance in a batch and average reconstruction loss for the whole batch are computed. Using these reconstruction loss values, three different thresholds namely instance threshold, batch threshold, and count threshold are computed for both positive class and negative class data. Instance threshold is computed as \(u+3\sigma\) of reconstruction error values. It is assumed that any data point with reconstruction error values greater than the instance threshold will be a drifted data point. Although reconstruction error values are assumed to follow a normal distribution, it is observed that with small batch size, this assumption may not hold for all the batches depending on the distribution of the data in a batch. To combat this, a configurable parameter N is used, which is the number of batches to be considered for instance threshold computation. For N batches, instance threshold is computed as average over N.

For each batch in validation data, AEDDM compares the reconstruction loss value for each instance with the instance threshold. The number of instances in a batch exceeding the instance threshold is counted for all the batches. Count Threshold is taken as the maximum or median value (alpha parameter) It is assumed that in the case of non-drifted data, the number of instances exceeding the instance threshold in a batch will be less than the count threshold. Any batch where count threshold exceeds is assumed to be a drifted batch. The Batch Threshold is computed using batch average reconstruction loss values. It is taken as \(u+k\sigma\) (based on experimentation default value of k=3 is chosen) over batch average reconstruction error values. It is observed that for large validation data, batch average reconstruction error values follow a normal distribution with some degree of skewness.

The ensemble component decides which autoencoder to place at layer 1 and which one at layer 2 in sequential order. Autoencoder with a lower batch threshold is placed at layer 1, while an autoencoder with a higher batch threshold is placed at layer 2, and corresponding thresholds (batch, instance, and count) are used at respective layers, as shown in Flowchart 1. This assembling of autoencoders is carried out initially when autoencoders are trained, and thresholds are computed from validation data. Every time drift is detected, autoencoders will be re-trained on the most recent data, thresholds will be recomputed, and reassembling of the autoencoders will take place.

The detection component in AEDDM works in batch mode. It receives data in batches, processes each batch, and keeps a record of each batch. The available data stream is divided into batches of size 32 and passed to AEDDM Algorithm (see Algorithm 1) along with layer 1 and layer 2 trained autoencoders with corresponding batch thresholds, count thresholds, and instance thresholds (already computed in offline component). Circles in Fig. 5 indicate the order of execution of different modules in Algorithm 1.

Module 1 or controller executes first and passes each batch to Module 2 (predict_for_each_batch), which computes reconstruction loss for each instance in the batch. Each reconstruction error value is compared with the instance threshold, and the number of instances exceeding the layer 1 instance threshold (exceed_count_layer1) is counted. The average batch reconstruction error (avg_error_layer1) for each batch is also computed. If the avg_error of a batch exceeds layer_one_batch_thres and exceed_count_layer1 exceeds layer_one_count_threshold, then that batch is passed to layer_two_encoder. For each instance, reconstruction error is computed from layer_two_encoder prediction, and this error is compared with layer2_ins_thresh to compute exceed_count_layer2. Average batch reconstruction error avg_error_layer2 is also computed for this batch, and if this avg_error_layer2 and exceed_count_layer2 both exceed their respective thresholds (layer_two_batch_thres and layer_two_count_threshold), then this batch is assumed to be a drifted batch, and its index is appended to all_excede_list. Module 2 returns the outputs (see step 2.1.6 in Algorithm1) for each batch back to Controller, where it is stored in batch history. After all available batches have been processed, AE-DDM calls the Drift Detector (Module 3) to detect drift. The Drift Detector module takes batch average reconstruction error, exceed count, batch threshold and count threshold of layer 2 autoencoder. It compares batch average reconstruction error and exceeds count for each batch from batch history with batch threshold and count threshold. If three consecutive batches exceed both thresholds, then drift is confirmed, and if a single batch exceeds these thresholds, a warning is generated. Step 3(a) can be used in AND or OR settings and it is named as the Beta Parameter of AEDDM. Based on experimentation the default value of beta is taken as “AND”. The next section describes the experiments in detail.

This section provides the experimental evaluation of the proposed AEDDM method for drift detection. The AEDDM approach has been tested on three sets of experiments from three different perspectives. Firstly, evaluating the proposed approach for sudden and gradual drift detection on synthetic datasets; Secondly, testing it on drift-induced real-world datasets from real and virtual (as well as from sudden and gradual) drift point of view and thirdly comparing the proposed approach with other similar state-of-the-art in an online classification scenario using Hoeffding Tree classifier. The experimental evaluation is aimed at empirically showing that the proposed method can detect sudden and gradual drifts, the detected drift is real thus minimizing the false alarms. Section 5.1 describes the datasets used in this research along with other experimental settings, sections 5.2 show experiments on synthetic datasets, Section 5.3 details the experiments on real world drift induced datasets while Section 5.4 compares the classification performance of AEDDM using Hoeffding tree classifier in an online setting.

5.1 Datasets and Experimental Settings

The proposed approach is evaluated on four synthetic datasets including Rotating Hyperplane (Haixun Wang et al., 2003,Fan, 2004), Moving RBF (Losing et al., 2017,Menon & Gressel, 2021), GAUSSIAN and Varying Distributions (VD) and four real -world datasets including NOAA, Covertype, KDDCUP99 and ELEC2. These datasets have been briefly described below: -

Rotating Hyperplane

A hyperplane with d dimensions is represented by the equation _i x_i =w₀, If w_i x_i > = w₀ then instances are labeled as positive otherwise negative. (Hulten et al., 2001a). A change in classification boundary is introduced by changing the weights of the features gradually. We have used scikit multiflow to generate hyperplane dataset with 10 features. The hyperplane dataset contains slow gradual drift in five features with 60% change magnitude.

Moving RBF Dataset

This dataset is also generated using scikit multiflow and consists of gaussian distributions which move with constant speed. Moving RBF datasets involves gradual drift. Concept drift can be introduced by changing the position or number of centroids. The generated dataset has 2 classes, 30 attributes and 50 centroids. The drifted data is generated using scikit multiflow random RBF drift generator with a change speed of 0.6.

Gaussian Dataset

A synthetic dataset with 20 features is drawn from a standard normal distribution based on a specified range of mean and standard deviation. For each feature in negative samples, the mean range is (0.1,0.6) and the standard deviation range is (0.05,0.45) with 30,000 samples. For positive samples, the mean range is (2,7) and standard deviation range is (1.5, 2.5) with 30,000 samples. In drifted data, the mean range for positive class is changed to (4,9) with standard deviation range in the range (1.5,3) while for negative class data the mean range is changed to (0.3,0.9) and standard deviation ranges in the range (0.1,0.5)

Varying Distributions (VD) Dataset

In this dataset, class 1 instances are sampled from a binomial distribution (n = 10, p = .05) while class 0 instances are sampled from a logistic distribution (loc = 0.38). VD dataset consists of 19200 data points, five features and balanced class distribution. The drifted data contains 30 batches, where initial 20 batches contain non-drifted data. In the last 10 batches, the distribution of both class data is changed in such a way that in batch number 31 and 32 data distribution of both classes changes in one column; in batch 33 and 34 the data distribution changes in two columns, in batch 35 and 36 in three columns and so on.

NOAA Weather Dataset: NOAA dataset (Ditzler & Polikar, 2013b) contains weather measurements in eight dimensions and contains daily records covering 50 years. There are 18,159 records in this dataset and the task is to predict whether it will rain or not. The class distribution is 12,461 records for no rain and 5,698 for rain. Eight attributes are temperature, dew point, sea level pressure, visibility, average wind speed, maximum sustained wind speed, minimum temperature, and maximum temperature. The data has been made available by the National Oceanic and Atmospheric Administration (NOAA) which is referred to as the NOAA dataset.

Covertype

Forest Covertype (Cabral & Barros, 2018; Frías-Blanco et al., 2015) dataset divides the forest land based on physical attributes like elevation, soli type, wilderness area, slope etc. of a 30m X 30m region into 7 different classes. The dataset has been made public by the US Forest Service. It contains 581012 records and 54 attributes both numerical and categorical. This dataset has been converted into a binary classification dataset.

KDDCUP99

KDDCUP99 is a network intrusion detection dataset which was used in KDDCUP 1999 competition. It is a multiclassification dataset where instances are labeled as normal or some known attack type (Pinagé et al., 2020).

The original dataset contains more than 4M records. We have used one of its subsets which contains 494021 records and 41 dimensions by converting it to a binary classification dataset with class values as normal and attack.

ELEC2

Electricity dataset commonly known as ELEC2 (Harries & Wales, 1999) shows daily supply, demand and scheduled electricity transfer between states New South Wales and Victoria. It contains 45,312 instances with 8 features. The class label shows whether the price was up or down on a day in New South Wales relative to a moving average of the last 24 hours. This dataset have been used in various research paper related to drift detection (Pinagé et al., 2020; Gama et al., 2004; Costa et al., 2018; A. Liu et al., 2018).

The summary of the datasets used in this research is shown in Table 3. In the case of real-world datasets, it is usually not known whether the drift is present or not, and if it is present, the location of the drift is not known. For experimental evaluation, we explicitly introduced drift in these datasets which will be explained in section 5.3. All the datasets used and generated in this research work are available at https://github.com/Usman07442/ConceptDrift_IJMLC.

Table 3

Summary of Datasets
Dataset	Instances	# of Features	Type	Drift Type / Drift Induced	Size of Drifted Data
Gaussian	60,000	20	Synthetic	Sudden	6000
VD	19,200	5	Synthetic	Gradual	960
Hyperplane	40,000	10	Synthetic	Gradual	2,560
Moving RBF	40,000	30	Synthetic	Gradual	2,560
NOAA	18,159	8	Real	Sudden, Gradual	1816
Covertype	581,012	54	Real	Sudden, Gradual	58,102
KDD99	494,021	41	Real	Sudden, Gradual	49,403
ELEC2	45,312	8	Real	Sudden, Gradual	4,531

Experimental Settings

In offline training phase, the available dataset is divided into three distinct subsets. Initial 70% data is used for training (and validation) of autoencoders, next 20% data is used for threshold computation and the last 10% data is used for testing the autoencoder on normal non-drifted data stream. The simplest deep autoencoder architecture with one hidden layer in an undercomplete setting is used in all the experiments which allows to compress the input onto a lower dimension at the bottleneck. The size of the hidden layer (number of neurons) is kept approximately one-third of the input and the size of the bottleneck is taken as approximately one-third of the size of the previous hidden layer in the encoder part while decoder part is an exact replica of the encoder part. Since we have real valued data at the input layer, mean squared error (mse) is used as the loss function with Adam optimizer. ReLU activation function is used at all the layers except the output layer where sigmoid is used. For threshold computation, results have been averaged over 10 runs to avoid any experimental bias. To test AEDDM effectiveness in drift detection, drift is introduced in the stream (last 10% of the dataset) which starts at batch 20 (21st batch). The initial 20 batches contain normal (non-drifted) data. For all the datasets batch size of 32 is used for autoencoders training as well as for windowing purposes. The reference window (the training data with computed threshold) is fixed while the detection window moves in batches. The next section describes the experiments on synthetic datasets.

5.2 Experiments on Synthetic Datasets

To show the effectiveness of AEDDM in detecting sudden and gradual drifts which are the most common scenarios, four synthetic datasets including Gaussian, VD dataset, Hyperplane and Moving RBF have been used. The former two datasets are newly designed datasets in this research work while the latter two datasets have been used in various research papers in drift detection domain. The focus of this set of experiments is to determine the best set of parameters for AEDDM framework which can be used to effectively detect drift with minimum delay and least false positives. These parameters include:

k parameter:

k parameter is basically the sensitivity parameter which defines the spread of the data that will be considered as non-drifted. This sensitivity parameter is used in threshold computation and can be determined empirically by applying AEDDM framework on normal non-drifted data as well as on drifted data. We have evaluated the values of k = 1,2,3 to determine the best value of k which can be used in \(u+k\sigma\) for threshold computation (see Table 4 &5)

Alpha Parameter

To signal a batch as drifted or normal, AEDDM uses batch threshold as well as count threshold. The average reconstruction error of a batch from the respective AE is compared with batch threshold while the exceed count (explained in Section 4) of a batch is compared with count threshold. Average reconstruction error hides the internal details of reconstruction error of individual instances in a batch while count threshold encounters this by providing the count of instances which exceed instance threshold (explained in Section 4). This count threshold can be computed as a median of batch wise exceed counts in normal data or as a maximum value. Median would make the count threshold more sensitive resulting in more false positives while maximum would make it less sensitive to noise and false positives. The best fit can be determined empirically for a chosen dataset. We name this the Alpha Parameter with two possible values as median or maximum.

The logical parameter or “Beta Parameter”

Another parameter of AEDDM framework is the logical parameter (see Algorithm 1: Step 3a) which determines whether both thresholds will be used in conjunction or disjunction. It can be empirically established whether to use batch threshold and count threshold in AND setting or in OR setting to signal a batch as normal or drifted. Using AND is expected to be more robust to noise and false alarms but may cause delays in drift detection in some cases. The best fit values of the above three parameters for a dataset can be determined empirically. We name this the Beta Parameter with two possible values as “AND” or “OR.

Table 4 summarizes the results for all the selected values of the AEDDM parameters on non-drifted data while Table 5 shows the results on drifted data for four synthetic datasets. In the case of non-drifted data, warnings and false positives are reported over the entire test batch stream while in case of drifted data both warnings and false positives are only counted till the detection point. In all datasets, drift starts from batch 20. The best set of parameters can be determined from both the performance on the non-drifted data as well as the drifted data. In the case of non-drifted data, the best set of parameters is expected to generate a minimum number of warnings and false positives. In the case of Gaussian and VD datasets, there is no clear distinction in performance across different values of k since both classes are more apart from each other from data distribution point of view (See Table 4). But this distinction is much clearer in the case of Hyperplane and moving RBF dataset for k=3 and there are comparatively less warnings and false positives in non-drifted data. Based on this, we have limited our search of best parameters within k=3 across all four datasets. In case of drifted data, the best set of parameters would be where AEDDM can detect drift with minimum delay, warnings, and false positives; delay being the most important. Considering performance on both drifted and non-drifted data, Table 6 summarizes the best set of parameters across all four datasets.

Table 4

AEDDM Parameters Calibration on Non-Drifted Data
Dataset	K	Count Threshold Measure:	Logical Parameter: AND OR	Warnings	False Positives
Gaussian (187 batches)	1	Median	AND	0	0
		Median	OR	0	0
		Maximum	AND	0	0
		Maximum	OR	0	0
	2	Median	AND	0	0
		Median	OR	27	0
		Maximum	AND	0	0
		Maximum	OR	0	0
	3	Median	AND	0	0
		Median	OR	16	0
		Maximum	AND	0	0
		Maximum	OR	2	0
VD 60 Batches	1	Median	AND	0	0
		Median	OR	2	0
		Maximum	AND	0	0
		Maximum	OR	0	0
	2	Median	AND	0	0
		Median	OR	5	0
		Maximum	AND	0	0
		Maximum	OR	0	0
	3	Median	AND	0	0
		Median	OR	4	0
		Maximum	AND	0	0
		Maximum	OR	0	0
Hyperplane 124 Batches	1	Median	AND	30	77
		Median	OR	27	83
		Maximum	AND	25	2
		Maximum	OR	46	44
	2	Median	AND	46	13
		Median	OR	46	40
		Maximum	AND	7	0
		Maximum	OR	16	0
	3	Median	AND	49	7
		Median	OR	55	10
		Maximum	AND	0	0
		Maximum	OR	2	0
RBF 125 batches	1	Median	AND	2	123
			OR	2	123
		Maximum	AND	5	117
			OR	2	123
	2	Median	AND	17	97
			OR	4	120
		Maximum	AND	24	3
			OR	48	29
	3	Median	AND	46	37
			OR	31	70
		Maximum	AND	10	0
			OR	24	0

Table 5

Table 5: AEDDM Parameters Calibration on Drifted Data.
Dataset	K	Count Threshold Measure:	Logical Parameter: AND OR	Warnings	Detection Delay	False Positives
Gaussian	1	Median	AND	0	0	0
		Median	OR	0	0	0
		Maximum	AND	0	0	0
		Maximum	OR	0	0	0
	2	Median	AND	0	0	0
		Median	OR	4	0	0
		Maximum	AND	0	0	0
		Maximum	OR	0	0	0
	3	Median	AND	0	0	0
		Median	OR	2	0	0
		Maximum	AND	0	0	0
		Maximum	OR	0	0	0
VD 30 Batches	1	Median	AND	0	4	0
		Median	OR	0	0	0
		Maximum	AND	0	4	0
		Maximum	OR	0	2	0
	2	Median	AND	0	5	0
		Median	OR	2	1	0
		Maximum	AND	0	5	0
		Maximum	OR	0	2	0
	3	Median	AND	0	6	0
		Median	OR	2	1	0
		Maximum	AND	0	6	0
		Maximum	OR	0	2	0
Hyperplane 80 Batches	1	Median	AND	8	-20	9
		Median	OR	6	-20	12
		Maximum	AND	5	-1	1
		Maximum	OR	10	-20	4
	2	Median	AND	7	-19	4
		Median	OR	8	-19	5
		Maximum	AND	3	No Det	0
		Maximum	OR	5	No Det	0
	3	Median	AND	4	1	0
		Median	OR	5	1	0
		Maximum	AND	1	No Det	0
		Maximum	OR	2	No Det	0
RBF	1	Median	AND	0	-20	20
			OR	0	-20	20
		Maximum	AND	0	-20	20
			OR	0	-20	20
	2	Median	AND	2	-20	14
			OR	2	-20	17
		Maximum	AND	5	0	0
			OR	9	-14	2
	3	Median	AND	8	0	3
			OR	6	0	8
		Maximum	AND	4	22	0
			OR	6	21	0

Based on the results, it is evident that using k = 3, median as a count threshold measure and using both batch threshold and count threshold in AND setting shows better results. We chose this setting as a default setting of AEDDM which is followed in all rest of the experiments.

Table 6

Best Set of Parameters for Synthetic Datasets
Dataset	Best Parameter Settings
Dataset	K	Alpha	Beta
Gaussian	3	Median	AND
VD	3	Maximum	OR
Hyperplane	3	Median	AND
Moving RBF	3	Median	AND

To evaluate the results graphically, we have plotted the reconstruction error and exceed counts in Fig. 6 which shows the reconstruction error plots as well as exceed count plots at the outputs of layer 1 and layer 2 autoencoders for all four datasets. For those batches which are not passed to layer 2 autoencoder, exceed count at layer 2 is symbolically set to -1. Reconstruction error plots for both layer 1 and layer 2 autoencoder (Fig. 6 (a) and 6 (b)) show a clear difference between non-drifted data (starting 20 batches) and drifted data which starts from batch 20 in three out of four datasets. In the case of Hyperplane datasets, the difference in reconstruction error is not very clear as there is a very slight difference between the distribution of positive and negative class data (Fig. 6(iii) a and b). Similarly, exceed counts layer 2 plots (Fig. 6(d)) show a clear difference between the distribution of drifted and non-drifted data in the case of Gaussian, VD and RBF dataset as compared to the Hyperplane dataset. This distinction is not very clear in layer 1 exceed count plots (Fig. 6 (c)). Since drift is detected based on the reconstruction error and exceed counts at layer 2, so we are more interested in layer 2 outputs specifically. As we have used a simple deep vanilla autoencoder with some default set of hyperparameters, there is a huge space to improve the detection performance on these datasets by calibrating different types of autoencoders with different architectures and set of hyperparameters. The next section details further experimentation of AEDDM on real-world datasets.

5.3 Experiments on Real World Datasets

The drift detection performance of AEDDM approach is evaluated on four benchmark real-world datasets including NOAA weather data, Forest Covertype, KDDCUP99 Intrusion detection dataset and ELEC2 electricity dataset. In the case of real-world datasets, it is more likely that the change in distribution of most relevant features will impact the decision boundary to a larger extent and it is expected that such changes will impact the accuracy of the pre-trained classifiers (real drift). Similarly, changes in distribution of features which are less informative should not impact the decision boundary with no or very limited impact on the performance of pre-trained classifiers (virtual drift). An ideal drift detector should be able to consider the data distributional changes in more informative features and ignore the changes in less informative features with respect to drift detection.

To test AEDDM on real world datasets, we introduced drift by interchanging the values of top25% ( 30% or 40%) attributes in one setting and by interchanging the feature values of bottom25% (30% or 40%) in another setting by using the same approach followed by (Sethi & Kantardzic, 2017) and (Castellani et al., 2021). The top 25% and bottom 25% attributes are selected based on feature importance measure like information gain or mutual information. The drift detection results of AEDDM approach have been summarized in Table 7. For batch threshold k = 3 is used, median is used as a count threshold measure (alpha parameter) while AND is used as a logical parameter (beta parameter) for this set of experiments. The number of batches used in non-drifted and gradual drift case depends on the size of the dataset while in the case of sudden drift initial 55 batches are considered based on the size of the smallest dataset. In case of non-drifted data, AEDDM shows no false positives in all four datasets and only a few warnings in case of NOAA and Covertype datasets (see Table 7). A two-sample t-test is used as a significance test with 5% significance level to validate the outcome of AEDDM. The null hypothesis (H0) is true when there is no drift in the datasets and H1 is true when there is a drift in the dataset. The t-test also confirms that there is no drift in the normal non-drifted batch stream for all four datasets. For sudden drift top 25% case, AEDDM detects the drift with zero delay for three datasets and with a delay of four batches in case of ELEC dataset whereas for sudden drift bottom 25% case there are only a few warnings (NOAA = 3, ELEC = 5, Covertype = 16); only one false positive in the case of Covertype and ELEC datasets but no detections in case of all four datasets confirmed by t-test for initial 55 batches. This result strengthens our preposition that a drift detector should be able to detect real drift (drift in important features) while ignoring the virtual drift (changes in less important features) which AEDDM effectively demonstrates through these experiments.

To induce gradual drift in the real world datasets, we introduced a new mechanism for incorporating a gradual drift (sudden at discrete steps) in the datasets in such a way that initial 10% data contains no change, for the next 10% data ,values of the top25% attributes are increased by 10% ( as compared to the original values in non-drifted data), the next 10% data values are increased by 20% and so on, so that in the last chunk, data values in the top 25% attributes are increased by 100%. In the case of top25% gradual drift, drift is detected in all four datasets while in the case of bottom25% gradual drift, no drift is detected in NOAA and Covertype. Although the results vary across different datasets in bottom25 gradual drift case, the performance of AEDDM is encouraging and can be further explored.

Table 7

AEDDM Drift Detection Results on Real Datasets
Dataset	Non-drifted Data				Sudden Drift (Initial 55 batches considered: drift starts at batch 20)					Gradual Drift (Whole batch stream is considered)
	Non-drifted Data				Top25% (30%/40%)		Bottom25% (30%/40%)			Top25%			Bottom25%
	Batches	Warnings	False Positives	t-test α = 5%	Detection Delay	t-test	Warnings	False Positives	t-test	Drift Point	Detect Point	t-test	Detect Point	t-test
NOAA	55	3	0	H₀	0 (Top 40%)	H₁	3	0	H₀	6	25	H₁	No Detection	H₀ (L1) H₁ (L2)
Covertype	1815	51	0	H₀	0 (Top 25%)	H₁	16	1	H₀	182	584	H₁	No Detection	H₀
KDDCUP	1543	0	0	H₀	0 (Top 30%)	H₁	0	0	H₀	155	616	H₁	467	H₁
ELEC2	142	0	0	H₀	4 (Top 30%)	H₁	5	1	H₀	15	77	H₁	35	H₁

To demonstrate the impact of drift on the classifier’s performance and effectiveness of AEDDM in detecting real drift while ignoring the virtual drift, we experimented with seven most used classifiers in batch classification problems including logistic regression, random forest, KNN, SVM, XGB, decision tree and MLP (In the case of kddcup99 and Covertype datasets SVM is not used due to long training time). In all four datasets, sudden and gradual drift is introduced using the top25% and bottom25% approach; in sudden case drift starts from batch 20 while in gradual case it starts after initial 10% of the batch stream. Classification performance is measured using f1 score and results averaged over 5 batches are shown in Fig. 7. For sudden drift scenarios (Fig. 8 (a) and (b)) results are reported only for first 55 batches while for gradual drift scenarios (Fig. 7 (c) and (d)) results have been reported over the entire batch stream. In case of top 25% sudden drift (see Fig. 7(a)), there is a clear drop in f1score after index 3 (batch indices start from zero) which is batch 20 as the results have been averaged over 5 batches which indicates the drift is real.

In case of bottom25% sudden drift (Figure:7(b)), there is not as much degradation in performance and f1score almost follows the same pattern as in case of first non-drifted 20 batches. Here AEDDM shows its robustness to distributional changes in less informative features (virtual drift) and does not detect this drift (see Table 7). Similarly, in top25% gradual case (Fig. 7(c)), f1score gradually falls/changes at discrete intervals in accordance with how the drift is introduced which indicates a real drift and AEDDM detects this drift successfully in three datasets namely NOAA, Covertype and KDDCUP99 with some delays. While in case of bottom25% gradual drift (Fig. 7(d)), the distributional changes do not impact the classification performance as much (virtual drift) and AEDDM ignores these changes in three of the datasets. These results show the effectiveness of the proposed AEDDM method in detecting distributional changes in the real-world datasets which are more likely to impact the classifier’s performance. Apart from testing the AEDDM performance using batch classifiers, we have also tested it using a well-known online classifier “ Hoeffding tree classifier”. The next section briefly describes these experiments.

5.3 Performance Comparison

In real word applications, the drift detection mechanism should be transparent to machine learning scenarios (specifically classification in our case) working in tandem with the arrival of new data. In case a drift is detected which is likely to impact the performance of the classifier, the pre-trained classifier should be retrained with the new data so that it can maintain acceptable predictive performance. To test and compare the performance of the proposed AEDDM approach in an online learning environment, we have used Hoeffding tree (Hulten et al., 2001b) as a base classifier which is an online incremental learning algorithm. A Hoeffding tree has the capability to adapt to the changes in the new data with the addition of every new sample and gives performance comparable to a non-incremental batch learner with unlimited data availability (Montiel et al., 2018).

For this set of experiments, we have used the same four real datasets in the top 25% sudden drift setting. The drift is detected by AEDDM in NOAA, Covertype and KDD at batch 20 with zero delay while at batch 26 in case of ELEC dataset with a delay of six batches. Comparison is made across the following:

Static Model (NoChange): A Hoeffding tree is trained on the available training data. It is assumed that no drift occurs, so no drift detector is employed, and no model update takes place. This acts as the lower baseline.
Prequential HT: A prequential Hoeffding Tree classifier with 32 as pre-train size and batch size is used. Performance measures like accuracy and kappa statistics are averaged over 32 instances. It acts as an upper baseline as labels are readily available and first HT predicts each instance then updates itself with the correct label.
AEDDM (The proposed method): A Hoeffding tree is trained on the entire available label data and used for making predictions until a drift occurs. Each drifted batch becomes part of the training data when labels are available, and the model is retrained. Yet, we haven’t formalized the complete adaptation mechanism for AEDDM, and it will be part of the future work. The current focus is on the effectiveness of real drift detection with a limited demonstration of adaptation.
KS Test: Kolmogorov Smirnov test is used for drift detection in an unsupervised manner. It compares an empirical distribution (incoming batch stream) with a theoretical distribution (non-drifted data) to test whether both batches come from the same distribution or not. A p-value less than 0.05 at 5% significance level indicates the presence of drift between the current and reference distribution (Z. Wang & Wang, 2020a). A Hoeffding tree is trained on the available training data and is used to predict the incoming batches. After the drift point, each incoming batch becomes part of the training data and the Hoeffding is retrained.
ADD: ADD is another autoencoder-based batch drift detection method which uses thresholding mechanism to detect drift. It also uses the Hoeffding tree as a base classifier which is initially trained on the available training data along with the autoencoders. ADD uses two different thresholds for gradual and sudden drift and alerts drift if any batch reconstruction error exceeds either threshold. In case of drift, the base classifier is retrained on the current batch and each drifted batch becomes part of the training data to retrain the autoencoders and training loss is recomputed.

We have selected KS test based drift detection (Z. Wang & Wang, 2020a) and ADD (Jaworski et al. 2020) for comparison as both are batch based drift detection methods and work in batch incremental settings. The results have been reported for the first 40 batches where the initial 20 batches contain non-drifted data while the last 20 batches contain drifted data with drift starting at batch 20 in case of all four datasets. This equal distribution of drifted and non-drifted batches provides an equal base for accuracy comparison over the entire batch stream.

In this set of experiments our focus is to evaluate the drift detection performance of AEDDM followed by an adaptation mechanism so that the pretrained classifier can recover from the drift and performance degradation. Here both prequential Hoeffding tree and ADD have their own adaptation mechanism (described above) while KS test and AEDDM use a similar approach (described above). The static model employs no detection hence no adaptation as well. The plots in Fig. 8 show that the performance of the base classifier falls from batch 20 and then it recovers as the model is retrained based on drift detection and adaptation mechanism by each method. This fall in accuracy is much clearer in the case of KDDCUP99 and Covertype datasets as compared to the other two datasets. The no-update model in all four cases shows how sharp the accuracy falls after drift occurs at batch 20 and if there is no detection and adaptation mechanism. The proposed AEDDM approach effectively detects this drift with zero delay (see Table 7) and the adapted base classifier quickly recovers from this drift giving the best average accuracy over the batch stream in case of KDDCUP99, Forest Covertype and ELEC datasets and shares the top rank in case of NOAA dataset with KS approach. The average accuracy scores over the entire batch stream are summarized in Table 8. The proposed AEDDM approach almost outperforms other methods in all four datasets. The access to experiments can be provided on a reasonable request to the corresponding author.

Table 8

Average Batch Accuracy
	Average Accuracy Over 40 Batches
Dataset	AEDDM	ADD	KS	Prequential HT	No Update
NOAA	0.68	0.67	0.68	0.66	0.48
COVERTYPE	0.70	0.62	0.61	0.58	0.57
KDDCUP99	0.94	0.91	0.92	0.93	0.49
ELEC	0.85	0.63	0.83	0.73	0.71

In this paper, the AEDDM approach for drift detection is presented which is capable of detecting drift in a batch stream in an unsupervised manner. For a binary classification problem, the proposed method uses two autoencoders placed in sequential order and uses a thresholding mechanism which is based on batch reconstruction error (batch threshold) and reconstruction error of individual instances within a batch (instance threshold and count threshold). The proposed method learns the distribution of both classes data in training phase and then uses the deviation in batch reconstruction error and exceed counts within an incoming batch stream to signal warning levels and drift. The proposed method has three configurable parameters including ‘k’ which defines the spread of the non-drifted data, ‘alpha ‘which is the measure to be used in count threshold (median or maximum) and ‘beta’ which is the logical parameter (AND or OR) and defines whether both batch threshold and count threshold will be used in conjunction or disjunction. From experimentation, the default set of parameters is k = 3, alpha=’median’ and beta=’AND’. The AEDDM method is shown to detect the real drift while ignoring the virtual drift thus considerably reducing the false alarms in different sudden and gradual drifting scenarios.

The proposed AEDDM approach has been experimentally analyzed on four synthetic datasets and four real-world benchmark datasets. The results show that AEDDM can detect sudden drift with zero delay in most of the cases and is also able to detect gradual drift effectively using the default set of parameters. The results on real-world datasets show that AEDDM only detects real drift i.e., the distributional changes in most important features while ignores the virtual drift i.e., the distributional changes in less important features. This has been verified by using the effect of drift on seven mostly used classifiers in a binary classification setting using top25% and bottom25% approach. The performance of AEDDM method in online setting has been compared with other state of the art batch drift detection methods including KS test and ADD using Hoeffding tree as a base classifier and the proposed methods shows better performance in terms of average batch accuracy in three out of four real-world datasets.

The AEDDM approach has been configured and evaluated using a deep vanilla autoencoder with one hidden layer with some default set of hyperparameters (activation function, neuron in hidden layers, optimizer, loss function, network topology etc. ) using a batch size of 32 as the sliding window. Both autoencoders have been placed in sequential order and batch is only passed to the second autoencoder if the thresholds of first autoencoder exceeds. In these simple and default settings, the AEDDM has shown its effectiveness in detecting real drift in different sudden and gradual scenarios. There is a huge space for experimentation with other possible AEDDM architectures with different types of autoencoders with different set of hyperparameters and batch sizes for binary as well as multiclassification problems with multiple drift points which will be considered as a future work.

The authors have no relevant financial or non-financial interests to disclose.

The authors have no competing interests relevant to this article's content.

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

The authors have no financial or proprietary interests in any material discussed in this article.

Baena-García, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., & Morales-Bueno, R. (2006). Early Drift Detection Method. 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams, 6, 77–86. https://doi.org/10.1.1.61.6101
Barros, R. S. M., Cabral, D. R. L., Gonçalves, P. M., & Santos, S. G. T. C. (2017). RDDM: Reactive drift detection method. Expert Systems with Applications, 90, 344–355. https://doi.org/10.1016/j.eswa.2017.08.023
Bifet, A., & Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. Proceedings of the 7th SIAM International Conference on Data Mining, 443–448. https://doi.org/10.1137/1.9781611972771.42
Brzeziński, D., & Stefanowski, J. (2011). Accuracy updated ensemble for data streams with concept drift. International Conference on Hybrid Intelligent Systems, 6679 LNAI(PART 2), 155–163. https://doi.org/10.1007/978-3-642-21222-2_19
Cabral, D. R. de L., & Barros, R. S. M. de. (2018). Concept drift detection based on Fisher’s Exact test. Information Sciences, 442–443, 220–234. https://doi.org/10.1016/j.ins.2018.02.054
Castellani, A., Schmitt, S., & Hammer, B. (2021). Task-Sensitive Concept Drift Detector with Constraint Embedding. 2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings. https://doi.org/10.1109/SSCI50451.2021.9659969
Costa, A. F. J., Albuquerque, R. A. S., & Santos, E. M. Dos. (2018). A Drift Detection Method Based on Active Learning. Proceedings of the International Joint Conference on Neural Networks, 2018-July. https://doi.org/10.1109/IJCNN.2018.8489364
Ditzler, G., & Polikar, R. (2013a). Incremental Learning of Concept Drift from Streaming Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2283–2301. https://doi.org/10.1109/TKDE.2012.136
Ditzler, G., & Polikar, R. (2011). Hellinger distance based drift detection for nonstationary environments. IEEE SSCI 2011: Symposium Series on Computational Intelligence - CIDUE 2011: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments, 41–48. https://doi.org/10.1109/CIDUE.2011.5948491
Ditzler, G., & Polikar, R. (2013b). Incremental learning of concept drift from streaming imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2283–2301. https://doi.org/10.1109/TKDE.2012.136
Ditzler, G., Roveri, M., Alippi, C., & Polikar, R. (2015). Learning in Nonstationary Environments: A Survey. IEEE Computational Intelligence Magazine, 10(4), 12–25. https://doi.org/10.1109/MCI.2015.2471196
Dos Reis, D., Flach, P., Matwin, S., & Batista, G. (2016). Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 1545–1554. https://doi.org/10.1145/2939672.2939836
Fan, W. (2004). Systematic data selection to mine concept-drifting data streams. KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 128–137. https://doi.org/10.1145/1014052.1014069
Flórez, A., Rodríguez-Moreno, I., Artetxe, A., Olaizola, I. G., & Sierra, B. (2023). CatSight, a direct path to proper multi-variate time series change detection: perceiving a concept drift through common spatial pattern. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-023-01810-z
Frías-Blanco, I., Del Campo-Ávila, J., Ramos-Jiménez, G., Morales-Bueno, R., Ortiz-Díaz, A., & Caballero-Mota, Y. (2015). Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Transactions on Knowledge and Data Engineering, 27(3), 810–823. https://doi.org/10.1109/TKDE.2014.2345382
Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004). Learning with drift detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3171, 286–295. https://doi.org/10.1007/978-3-540-28645-5_29
Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4). https://doi.org/10.1145/2523813
Gemaque, R. N., Costa, A. F. J., Giusti, R., & dos Santos, E. M. (2020). An overview of unsupervised drift detection methods. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(6). https://doi.org/10.1002/widm.1381
Goodfellow, Y. B. and A. (2016). Deep Learning. MIT Press.
Gözüaçık, Ö., Bonab, H., Büyükçakır, A., & Can, F. (2019). Unsupervised concept drift detection with a discriminative classifier. International Conference on Information and Knowledge Management, Proceedings, 2365–2368. https://doi.org/10.1145/3357384.3358144
Gu, F., Zhang, G., Lu, J., & Lin, C. T. (2016). Concept drift detection based on equal density estimation. Proceedings of the International Joint Conference on Neural Networks, 2016-Octob, 24–30. https://doi.org/10.1109/IJCNN.2016.7727176
Haque, A., Khan, L., & Baron, M. (2016). SAND: Semi-supervised adaptive novel class detection and classification over data stream. 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 1652–1658.
Haque, A., Khan, L., Baron, M., Thuraisingham, B., & Aggarwal, C. (2016). Efficient handling of concept drift and concept evolution over Stream Data. 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, 481–492. https://doi.org/10.1109/ICDE.2016.7498264
Harries, M., & Wales, N. S. (1999). Splice-2 comparative evaluation: Electricity pricing.
Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, Minimum Description Length and Helmholtz free Energy. Advances in Neural Information Processing Systems, 6, 3–10.
Hu, H., Kantardzic, M., & Sethi, T. S. (2020). No Free Lunch Theorem for concept drift detection in streaming data classification: A review. In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (Vol. 10, Issue 2). Wiley-Blackwell. https://doi.org/10.1002/widm.1327
Hulten, G., Spencer, L., & Domingos, P. (2001a). Mining time-changing data streams. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 97–106. https://doi.org/10.1145/502512.502529
Hulten, G., Spencer, L., & Domingos, P. (2001b). Mining time-changing data streams. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 97–106. https://doi.org/10.1145/502512.502529
Iwashita, A. S., & Papa, J. P. (2019). An Overview on Concept Drift Learning. IEEE Access, 7, 1532–1547. https://doi.org/10.1109/ACCESS.2018.2886026
Jaworski, M., Duda, P., & Rutkowski, L. (2018). On applying the Restricted Boltzmann Machine to active concept drift detection. 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings, 2018-Janua, 1–8. https://doi.org/10.1109/SSCI.2017.8285409
Jaworski, M., Rutkowski, L., & Angelov, P. (2020). Concept Drift Detection Using Autoencoders in Data Streams Processing. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12415 LNAI, 124–133. https://doi.org/10.1007/978-3-030-61401-0_12
Jaworski, M., Rutkowski, L., Angelov, P., Artificial, P. A.-I. C. on, & 2020, undefined. (2020). Concept Drift Detection Using Autoencoders in Data Streams Processing. Springer, 124–133. https://doi.org/10.1007/978-3-030-61401-0_12
Kolter, J. Z., & Maloof, M. A. (2007). Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research, 8, 2755–2790.
Liao, J., Zhang, J., & Ng, W. W. Y. (2016). Effects of different base classifiers to Learn++ family algorithms for concept drifting and imbalanced pattern classification problems. Proceedings - International Conference on Machine Learning and Cybernetics, 1, 99–104. https://doi.org/10.1109/ICMLC.2016.7860884
Liu, A., Lu, J., Liu, F., & Zhang, G. (2018). Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recognition, 76, 256–272. https://doi.org/10.1016/j.patcog.2017.11.009
Liu, G., Cheng, H. R., Qin, Z. G., Liu, Q., & Liu, C. X. (2013). E-CVFDT: An improving CVFDT method for concept drift data stream. 2013 International Conference on Communications, Circuits and Systems, ICCCAS 2013, 1, 315–318. https://doi.org/10.1109/ICCCAS.2013.6765241
Losing, V., Hammer, B., & Wersing, H. (2017). KNN classifier with self adjusting memory for heterogeneous concept drift. Proceedings - IEEE International Conference on Data Mining, ICDM, 291–300. https://doi.org/10.1109/ICDM.2016.141
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2019). Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346–2363. https://doi.org/10.1109/TKDE.2018.2876857
Masud, M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. M. (2011). Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering, 23(6), 859–874. https://doi.org/10.1109/TKDE.2010.61
Menon, A. G., & Gressel, G. (2021). Concept Drift Detection in Phishing Using Autoencoders. Communications in Computer and Information Science, 1366, 208–220. https://doi.org/10.1007/978-981-16-0419-5_17
Montiel, J., Read, J., Bifet, A., & Abdessalem, T. (2018). Scikit-multiflow: A Multi-output Streaming Framework. Journal of Machine Learning Research, 19. https://doi.org/10.5555/3291125.3309634
Nick Street, W., & Kim, Y. S. (2001). A streaming ensemble algorithm (SEA) for large-scale classification. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 377–382. https://doi.org/10.1145/502512.502568
Nishida, K., & Yamauchi, K. (2007). Detecting concept drift using statistical testing. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4755 LNAI, 264–269. https://doi.org/10.1007/978-3-540-75488-6_27
Oladele, S. (2021). A Comprehensive Guide on How to Monitor Your Models in Production - neptune.ai. Página Oficial Neptune AI. https://neptune.ai/blog/how-to-monitor-your-models-in-production-guide
Page, E. S. (1954). Continuous Inspection Schemes. Biometrika, 41(1/2), 100. https://doi.org/10.2307/2333009
Pesaranghader, A., Viktor, H. L., & Paquet, E. (2018). McDiarmid Drift Detection Methods for Evolving Data Streams. Proceedings of the International Joint Conference on Neural Networks, 2018-July. https://doi.org/10.1109/IJCNN.2018.8489260
Pinagé, F., dos Santos, E. M., & Gama, J. (2020). A drift detection method based on dynamic classifier selection. Data Mining and Knowledge Discovery, 34(1), 50–74. https://doi.org/10.1007/s10618-019-00656-w
Qahtan, A., Alharbi, B., Wang, S., & Zhang, X. (2015). A PCA-based change detection framework for multidimensional data streams. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015-Augus, 935–944. https://doi.org/10.1145/2783258.2783359
Raab, C., Heusinger, M., & Schleif, F. M. (2020). Reactive Soft Prototype Computing for Concept Drift Streams. Neurocomputing, 416, 340–351. https://doi.org/10.1016/j.neucom.2019.11.111
Schelter, S., Biessmann, F., Januschowski, T., Salinas, D., Seufert, S., & Szarvas, G. (2018). On Challenges in Machine Learning Model Management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 5–13. http://sites.computer.org/debull/A18dec/p5.pdf
Schlimmer, J. C., & Granger, R. H. (1986). Incremental Learning from Noisy Data. Machine Learning, 1(3), 317–354. https://doi.org/10.1023/A:1022810614389
Schröder, T., & Schulz, M. (2022). Monitoring machine learning models: A categorization of challenges and methods. In Data Science and Management. https://doi.org/10.1016/j.dsm.2022.07.004
Sethi, T. S., & Kantardzic, M. (2015). Don’t pay for validation: Detecting drifts from unlabeled data using Margin Density. Procedia Computer Science, 53(1), 103–112. https://doi.org/10.1016/j.procs.2015.07.284
Sethi, T. S., & Kantardzic, M. (2017). On the reliable detection of concept drift from streaming unlabeled data. ArXiv.
Sidhu, P., & Bhatia, M. P. S. (2015). An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. International Journal of Machine Learning and Cybernetics, 6(6), 883–909. https://doi.org/10.1007/s13042-015-0366-1
Sidhu, P., & Bhatia, M. P. S. (2019). A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority. International Journal of Machine Learning and Cybernetics, 10(3), 563–578. https://doi.org/10.1007/s13042-017-0738-9
Soppin, S., Ramachandra, M., & Chandrashekar, B. N. (2021). Essentials of Deep Learning and AI: Experience Unsupervised Learning, Autoencoders, Feature Engineering, and Time Series Analysis with TensorFlow, Keras, and scikit-learn (English Edition).
Spinosa, E. J., De Carvalho, A. P. D. L. F., & Gama, J. (2007). OLINDDA: A cluster-based approach for detecting novelty and concept drift in data streams. Proceedings of the ACM Symposium on Applied Computing, 448–452. https://doi.org/10.1145/1244002.1244107
Wald, A. (1973). Sequential Analysis. DOVER PUBLICATIONS, INC.
Wang, Haixun, Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 226. https://doi.org/10.1145/956755.956778
Wang, Heng, & Abraham, Z. (2015). Concept drift detection for streaming data. Proceedings of the International Joint Conference on Neural Networks, 2015-Septe. https://doi.org/10.1109/IJCNN.2015.7280398
Wang, S., Minku, L. L., Ghezzi, D., Caltabiano, D., Tino, P., & Yao, X. (2013). Concept Drift Detection for Online Class Imbalance Learning. Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2013.6706768
Wang, Z., & Wang, W. (2020a). Concept Drift Detection Based on Kolmogorov–Smirnov Test. Lecture Notes in Electrical Engineering, 572 LNEE, 273–280. https://doi.org/10.1007/978-981-15-0187-6_31
Wang, Z., & Wang, W. (2020b). Concept Drift Detection Based on Kolmogorov–Smirnov Test. Lecture Notes in Electrical Engineering, 572 LNEE, 273–280. https://doi.org/10.1007/978-981-15-0187-6_31
Wares, S., Isaacs, J., & Elyan, E. (2019). Data stream mining: methods and challenges for handling concept drift. SN Applied Sciences, 1(11). https://doi.org/10.1007/s42452-019-1433-0
Yong, B. X., Fathy, Y., & Brintrup, A. (2020a). Bayesian Autoencoders for Drift Detection in Industrial Environments. 2020 IEEE International Workshop on Metrology for Industry 4.0 and IoT, MetroInd 4.0 and IoT 2020 - Proceedings, 627–631. https://doi.org/10.1109/MetroInd4.0IoT48571.2020.9138306
Yong, B. X., Fathy, Y., & Brintrup, A. (2020b). Bayesian Autoencoders for Drift Detection in Industrial Environments. 2020 IEEE International Workshop on Metrology for Industry 4.0 and IoT, MetroInd 4.0 and IoT 2020 - Proceedings, 627–631. https://doi.org/10.1109/MetroInd4.0IoT48571.2020.9138306
Yu, S., & Abraham, Z. (2017). Concept drift detection with hierarchical hypothesis testing. Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017, 768–776. https://doi.org/10.1137/1.9781611974973.86

A Novel Framework for Concept Drift Detection for Classification Problems in Data Streams

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Work