Supervised and Unsupervised Deep Learning Approaches for EEG Seizure Prediction

Epilepsy affects more than 50 million people worldwide, making it one of the world's most prevalent neurological diseases. The main symptom of epilepsy is seizures, which occur abruptly and can cause serious injury or death. The ability to predict the occurrence of an epileptic seizure could alleviate many risks and stresses people with epilepsy face. We formulate the problem of detecting preictal (or pre-seizure) with reference to normal EEG as a precursor to incoming seizure. To this end, we developed several supervised deep learning approaches to identify preictal EEG from normal EEG. We further develop novel unsupervised deep learning approaches to train the models on only normal EEG, and detecting pre-seizure EEG as an anomalous event. These deep learning models were trained and evaluated on two large EEG seizure datasets in a person-specific manner. We found that both supervised and unsupervised approaches are feasible; however, their performance varies depending on the patient, approach and architecture. This new line of research has the potential to develop therapeutic interventions and save human lives.


Introduction
Epilepsy is one of the most prevalent neurological disorders in the world, affecting approximately 1% of the world's population [1][2][3].Epilepsy is characterized by spontaneously occurring seizures, which could lead to bodily injuries, fractures, burns [4], and death in many cases [5].People with epilepsy are mostly concerned with the fear of incoming seizures [6].Therefore, there is a dire need to reduce the unpredictability of seizures to reduce the risk of injuries and improve their quality of life.
Electroencephalography (EEG) is normally used to analyze brain activity pertaining to seizures [7].Brain activity in people with epilepsy can be separated into four states: regular brain activity (interictal), brain activity before the seizure (preictal), brain activity during the seizure (ictal), and brain activity immediately after a seizure (postictal).The preictal state can contain observable physiological changes prior to the onset of a seizure [8] that can be used to predict an incoming seizure.The capability to predict an epileptic seizure could alleviate the risks patients face [9]; it would give patients the time to get help and greatly reduce the risk of injury.However, the biggest challenge is designing seizure prediction approaches is that there is no universally agreed upon preictal period length (PPL).Bandarabadi et al. [10] investigated the optimal PPL for seizure prediction using statistical analysis and found that the optimal PPL varies for each patient and for seizure within each patient [10].
Most of the work in this area is around seizure detection [11], which involves detecting a seizure after its occurrence.Although this is important, contemporary work must aim to predict seizures before their onset, as it can save patients' lives and improve their quality of life.Our main hypothesis is that the correct detection of preictal state against normal brain activity (through supervised or unsupervised approaches) can be a strong indicator of an incoming epileptic seizure.In the supervised setting, a binary classifier can be trained between interictal and preictal periods.Whereas, in the unsupervised setting, a classifier can be trained on only normal EEG (interictal) and preictal state can be identified as an anomaly.Our main contributions are: • Presented supervised and new unsupervised deep learning approaches to predict epileptic seizures.• Experimentally determined the PPL and window size, as against heuristics or domain knowledge.
• Performed leave-one-seizure out cross validation for better generalization of results.
• All the experiments were performed in a patient-specific manner to avoid data leakages, overestimation of results and emphasis on individualized outcomes.
Our results showed that the unsupervised approaches were able to obtain comparable results to supervised seizure prediction in many patients.However, across all implementations there was not one clear best-performing model.This paper is an extension of our preliminary work [12] that introduced supervised convolution neural network (CNN) on SWEC-ETHZ dataset [13].In this paper, we present two new supervised approaches: CNN-Long Short Term Memory (LSTM) and Temporal Convolution Network (TCN), and three new unsupervised approaches (CNN, CNN-LSTM, TCN autoencoders).We developed new seizure prediction baselines for the SWEC-ETHZ dataset [13] and included a new CHB-MIT dataset [14].

Related Work
Seizure prediction using supervised machine learning has been used to distinguish the interictal and preictal states [15].Typical supervised machine learning seizure prediction approaches involve signal pre-processing, extracting and selecting features, followed a classifier [15].Common signal processing techniques include high-pass, low-pass, or band-pass filtering, as well as artifact removal techniques [15].Feature extraction is typically done by a bio-signals or epilepsy expert examining a patient's EEG and deciding appropriate features for separating the preictal and interictal states [15].These features are often patient-specific and include statistical, non-linear, frequency domain, and time-frequency domain features [15,16].Common classifier choices include support vector machines (SVM), k-nearest neighbour and random forest [15].Machine learning approaches may have limitations in terms of extracting handcrafted features, which could be suboptimal and time-consuming.Deep learning approaches can overcome some of these challenges by able to learn features from data with little to no pre-processing, generate high-level representations of data, and learn complex functions [17].However, deep learning approaches need vast amount of data and computing resources to train and deploy the models, which can be time-consuming and costly.Also, it requires a careful tuning of the hyperparameters to avoid overfitting and under fitting.An overview of preictal-interictal classification seizure prediction methods (on human subjects) using deep learning is shown in Table 1.
Many reviewed deep learning methods performed some type of pre-processing the EEG data before passing it on to the classifier, typically through filtering [1,21], artifact removal [35], or time-frequency analysis [21,22].Common deep learning architectures used for seizure prediction include CNN [1,22], LSTM network [29,32], and feed-forward multilayer perceptron (MLP) [18].We observed that majority of the studies use CNNs, LSTMS and/or their combinations to benefit from learning spatial and temporal features.The window size (fixed duration of data to analyze) and PPL were kept fixed in most of the studies, and they varied even when working on the same dataset and patients.This is an issue in building classifiers to predict seizures because the optimal PPL varies across patients (as concluded by Bandarabadi et al. [10]).Only four of the studies reported experimenting with the PPL [18,21,29,35], while others did not present any rationale for their choices.Some of the studies (e.g., [14]) also found different PPLs sizes, showing that the optimal PPL varies depending on the method's implementation.These studies show that it is better to determine the PPL empirically at a patient-specific level, rather than using a generic or pre-determined average over a population.
We extend the existing supervised methods by obtaining PPL and window size using a leave-one-seizure-out (LOSO) evaluation and introduced a new supervised TCN classifier for this task.Unsupervised deep learning approach has been used by Daoud et al. [31]; in conjunction with classification approach to predict epileptic seizures.They trained a deep autoencoder on unlabelled EEG segments (balanced data of preictal and interictal segments) and replace the trained autoencoder in their classification pipeline.Their intent to use autoencoder is to leverage the transfer learning abilities of the model by allowing the training process to have a good start point instead of random initialization of the parameters, which reduces the training time drastically.We introduced three different autoencoder models in a total unsupervised manner and separate from the classification approaches and studied their performance for this problem.

Supervised Seizure Prediction
Preictal-interictal classification for seizure prediction is performed with three different architectures: convolutional neural networks (CNN) (used in our previous work [12]), and two new architectures, i.e., CNN-LSTM), and TCN.We briefly discuss them below.

CNN
The CNN model takes in EEG samples that have been time-frequency transformed using a STFT [22] (see Section 5.3).This helps the model in extracting time and frequency features and puts the data into a suitable format for 2D convolutions [22].
The CNN architecture takes advantage of spatial information in data to learn relevant features.Each sample was converted into a 2D matrix F × T using a STFT, where F was the number of sample frequencies used and T was the number of segment times used.The matrix was then resized to a 128 × 128 "image" using bilinear interpolation so that image sizes were consistent regardless of the window size.The time-frequency transform was done independently for each channel, resulting in each sample being of dimensions C × 128 × 128, where C is the total number of channels.The samples were then passed to the CNN model, which is made up of three convolutional blocks (see Figures 1a and 1b), followed by three fully connected layers with ReLU activation functions.Table 2 shows the model hyperparameters used for the CNN.CNN-LSTM model takes in STFT images similar to the CNN model.The input is a consecutive series of images as one sample.The input sequence is divided into smaller sub-sequences, which are independently time-frequency transformed and resized into 64 × 64 images, leading to dimensions C × n × 64 × 64, where n is the number of sub-sequences in a sample and is equal to the sequence length divided by the subsequence length.Each sub-window is passed into a CNN model with two convolutional blocks that outputs a feature vector.Then, each feature vector is concatenated into a sequence and passed into a 2-layer LSTM, whose outputs are passed to a fully connected layer that outputs the final scores.An overview of the CNN-LSTM architecture and hyperparameters are shown in Figure 2 and Table 3a.We did experiments with LSTM on raw EEG data; however, the results were not satisfactory and not discussed in the paper.Converting the EEG to STFT highlighted the frequencies and enabled CNN to learn spatial information, while the LSTM was able to model the temporal dependencies between there.The RNNs were not used because LSTM supports longer memory and better handling of vanishing gradient problem [38].On the flip side, LSTMs generally require more parameters to train, making it computationally expensive and memory intensive.We resolved this issue by training LSTM models on a high-performance computation GPU cluster (see Section 6).

TCN
The TCN model takes in scaled sequences of size C × S/4, where S is the sequence length and the sequences were down-sampled by a factor of 4. The TCN model [39] consisted of TCN blocks (see Figure 3).Each TCN block is two consecutive sub-blocks that contain a causal 1D convolution layer with a dilation, a weight normalization layer, a ReLU activation function, and a dropout layer [39].The TCN blocks have skip connections, where the input to the block is added to the output [39].The model contained 6 TCN blocks with 32 channels each, followed by a 1D convolution layer, and a fully connected layer.The dilation factor of each block was 2 (n−1) , where n is the layer number.Figure 3 and Table 3b shows the TCN architecture and hyperparameters.

Unsupervised Seizure Prediction
The reliance on preictal data for supervised seizure prediction methods remains a challenge.Preictal data is typically scarce, and deep learning methods require a considerable amount of data from both classes to work well.Preictal-interictal classification    methods cannot be used effectively on patients with little preictal data, and class imbalance still remains an impending problem.An unsupervised approach (anomaly detection) to seizure prediction could remedy these problems.Anomaly detection for seizure prediction would require only interictal (and no preictal data) to train, making it easier to be more accessible to a larger population.Autoencoders (AEs) and its variants are apt to be used within this framework, with reconstruction error used as an anomaly score [40].To our knowledge, this is one of the first seizure prediction work that uses unsupervised deep learning approach for epileptic seizure prediction without utilizing preictal data.We implemented the following autoencoder approaches for this task.
• CNN autoencoder [41].Similar to the supervised CNN, it takes STFT images as input.The encoder is made up of three convolutional blocks followed by a fully connected layer which generates an embedding state of size 64.The decoder is a mirrored version of the encoder (see Figure 4a).• CNN-LSTM autoencoder [42].Similar to the supervised CNN-LSTM, the input sequence was divided into smaller sub-sequences and then an STFT was performed on each sub-sequence.The encoder consisted of an individual CNN encoder for each sub-sequence, followed by an LSTM that generated an embedding state of size 64.
The decoder has the reverse architecture to the encoder.(see Figure 4b).• TCN autoencoder [43].It takes in raw scaled sequences, as is the case with the supervised TCN.The encoder was a TCN with three layers, each with 16 channels followed by a 1d convolution and a fully connected layer.The size of the embedding state was 64.The decoder was an exact mirror of the encoder (see Figure 4c).
5 Data Processing

Datasets
We used two EEG Epilepsy seizure datasets, the Sleep-Wake Epilepsy Centre ETH Zurich (SWEC-ETHZ) dataset [13] and the Children's Hospital Boston Massachusetts Institute of Technology (CHB-MIT) dataset [14].Both datasets are publicly available, easy to access, and contain human raw EEG recordings, where no seizure states have been pre-selected.This is important so we can define and experiment with different preictal and interictal regions.The SWEC-ETHZ dataset is an iEEG dataset containing over 2, 500 hours of recordings across 18 patients with a sampling rate of either 512Hz or 1024Hz [13].The CHB-MIT dataset contains scalp EEG recordings from 22 patients sampled at 256Hz with at least 22 EEG electrodes [14].Note that one patient had their recordings taken on two separate occasions 1.5 years apart, and the two cases are treated as two separate patients for the rest of this paper [14].We define a "lead seizure" as any seizure that occurs at least 30 minutes after a preceding seizure [22].Only preictal periods from lead seizures were considered because of the lack of interictal and preictal data to train models.Patients that have less than three lead seizures were withheld from the experiments because at least three lead seizures were required to perform test partitioning combined with an internal leaveone-seizure-out (LOSO) cross-validation step (see Figure 5b).Six out of the 18 patients in the SWEC-ETHZ dataset were not considered for this work due to this condition.All patients in the CHB-MIT dataset had at least three lead seizures.A description of dataset attributes for all patients used from both the SWEC-ETHZ and CHB-MIT datasets is shown in Tables 4 and 5, respectively.

Data Preprocessing
The length and location of the preictal period is defined by the PPL and the intervention time (IT).The IT is the time between the preictal state and the seizure onset.Interictal data is defined as any data that is not preictal, ictal, postictal, and is d distance away from the preictal state, as shown in Figure 5a.The data was divided into  samples of a fixed window size, which were labelled as either interictal or preictal.We set d = 0 to evaluate the model's ability to classify interictal and preictal samples in close temporal proximity to actual seizures.The IT was set to 0, increasing it can be a future experiment after generating a baseline.In the SWEC-ETHZ dataset, interictal samples were randomly selected with a down-sampling factor of 8 because interictal data were overly abundant, and the classes were significantly imbalanced (patients ID04, ID09, and ID10 used a down-sampling factor of 2 instead because there was less interictal data).The number of preictal samples were artificially increased by using 50% overlapping windows.The size of each sample was sf ×C where s was the window size, f was the sampling rate, and C was the number of EEG electrodes.
The dataset was partitioned into a training set and a testing set using LOSO partitioning.We used the last lead seizure's preictal data as the test set, while all other preictal data was part of the training set.As shown in Figure 5b, LOSO partitioning is a better way to evaluate a model's ability to generalize to a new seizure's preictal data.Standard test partitioning where samples are randomly assigned to the training or test set may be an overestimation of the actual performance of the classifier.

Time-Frequency Transform
We transformed the EEG data from a time-series input into the time-frequency domain [44,45] using short-time Fourier transform (STFT).It converts a one-dimensional time-series signal into a two-dimensional matrix of values with axes of time and frequency [46].The STFT splits the signal into a series of smaller sequences and then performing Fourier transforms on each one individually, providing a way to see changes in the frequency domain at various points in time [47].In this work, the sequence is split into segments of 128 samples before performing Fourier transforms.In CNN based models used in the work, an STFT was used to pre-process the input before passing samples to the model.Other time-frequency analysis methods such as the continuous wavelet transform [21] and phase-amplitude coupling [48] were experimented with in our preliminary work but did not provide better results.

Experimental Setting and Results
A grid search was performed to find the optimal window size and PPL for each patient.We ran the model with varying window size (5, 10, 15, 30, 60 seconds) and PPL (30, 60, 120 minutes) values.We used an internal LOSO cross-validation to tune the parameters without looking at test data.This was done by dividing the training set into folds, where each fold was a different seizure's preictal and interictal data.One fold was the validation set, while the others were used for training.Each fold in the set was used as the validation set once, and the performance across all runs in a patient was averaged.An example of the cross-validation method used is shown in Figure 6.The area under the Receiver Operating Characteristic curve (AUC ROC) [49] was used as a performance metric for hyperparameter tuning.The test set was completely withheld from this process.All the models were trained using an NVIDIA V100S-PCIe GPU with 32 GB memory.A class-weighted (class weights vary per patient) cross-entropy loss function was used with the Adam optimizer and was trained for 100 epochs with a batch size of 128 and a learning rate of 0.0001.All implementations were done in the PyTorch framework [50].After the final parameters for a model were set, it was evaluated on the test set using the AUC ROC and precision-recall curve (AUC PR).AUC PR is more appropriate for imbalanced classification problems [51].The reported performance metrics are AUC ROC and AUC PR.The calculation of false positive, false negative and derived metrics, such as accuracy, specificity, and sensitivity depend on the choice of the threshold applied to the reconstruction error or classification probabilities [21,31].ROC and PR analysis is more thorough since it does not depend on the threshold choice, and instead analyzes the specificities and sensitivities at all possible thresholds and provides an overall summary metric in terms of AUC ROC/AUC PR.For imbalanced datasets, AUC PR is a better metric as it takes care of precision and recall at all thresholds and does not inflate the results.Another important point to note is that in unsupervised deep learning models, there is no validation set; therefore, calculating an operating threshold is non-trivial.

Supervised Prediction
Hyperparameter tuning results using the supervised CNN are shown in Tables 6 and  7 for the SWEC-ETHZ and CHB-MIT datasets, respectively.The window size and PPL obtained using cross-validations as well as AUC ROC vary considerably across different patients in both datasets.More than half of the patients in each dataset show AUC ROC values greater than 0.7.In the SWEC-ETHZ dataset, six of the patients had a test AUC ROC at least 0.1 lower than their validation AUC ROC, while in the CHB-MIT dataset it was eight patients.This is consistent with Bandarabadi et al. [10] that the optimal preictal period for seizure prediction varies even on seizures within the same patient.The best way to account for this problem is to train and test on as many lead seizures' preictal data as possible.CNN model architecture is identical to the optimized parameter implementation.This was done to explore the benefits of optimizing hyperparameters for seizure prediction.Figures 7 show the comparison of the two methods on the SWEC-ETHZ and CHB-MIT dataset.In general, for the SWEC-ETHZ dataset, the optimized hyperparameter implementation performed slightly better than the fixed parameter.In patient ID09, the optimized hyperparameter implementation performed much better than the fixed parameter implementation.For patient ID09, the hyperparameter tuning found a window size of 30 seconds and a PPL of 2 hours.It is likely that there was additional preictal information in the extra hour of data not used in the fixed parameter implementation.For the CHB-MIT dataset, most patients had similar results for both the fixed and optimized hyperparameter implementations.There were a few patients (ID 5, 16, 17, 18) that had much better results with the optimized model.However, there were also patients (ID 9, 22, 23) who performed better with a fixed hyperparameter implementation.For these patients, the last seizure's optimal hyperparameters were likely different from the optimal hyperparameters for the preceding seizures in the patient's dataset.Figures 9 and 10 show the comparison between the optimized and fixed hyperparameter implementations for the SWEC-ETHZ and CHB-MIT datasets respectively using AUC PR instead.It can be observed that the optimized implementation generally performs better on the SWEC-ETHZ dataset in both metrics, and that the difference is marginal in the CHB-MIT dataset.These experiments indicate that hyperparameter tuning can potentially improve the performance in comparison to fixed parameters.

Unsupervised Prediction
In the unsupervised approach, the training set only contained interictal data.For these experiments, the hyperparameters were fixed, with a window size of 30 seconds and a  PPL of 60 minutes.The models were trained for 500 epochs with a batch size of 128 and a learning rate of 0.0005.After training, the models were evaluated on the test set that contained both interictal and preictal samples.Both AUC ROC and AUC PR were used to evaluate performance.the CNN AE performed the worst across most patients while the CNN-LSTM and TCN AEs performed relatively better, and even surpassed the supervised implementation in some patients.In the CHB-MIT dataset, the results vary even more, with no clear winner.

Best Implementations
Tables 8 and 9 show the best-performing implementation (from all experiments with supervised and unsupervised approaches) for each patient in the SWEC-ETHZ and CHB-MIT datasets and its corresponding AUC ROC and AUC PR.For SWEC-ETHZ dataset, an unsupervised approach was the best-performing implementation for 7 out of 12 patients.For the CHB-MIT dataset, for 16 out of 23 patients, supervised for this, it is important to have as many lead seizures data in a patient as possible, since preictal data is typically scarce.A limitation of our work is that a patient requires three lead seizures in their data to work with this method.It may not always be feasible for a patient's data to have at least three lead seizures, especially considering the difficulty of data acquisition.Anomaly detection seizure prediction performance varied significantly across different architectures.Although supervised preictal-interictal classification performed better overall, there were many patients where an unsupervised approach was the best implementation.Additionally, in the SWEC-ETHZ dataset, an unsupervised approach was the best implementation for the majority of patients.This is likely because the SWEC-ETHZ dataset had a much larger recording duration and interictal-preictal ratio.In autoencoder based unsupervised approaches, the model is trained on only interictal (or normal) EEG data and reconstruction error is used to detect the onset of seizures (or preictal events).If the interictal data is interfered with noise, it means that the reconstruction error may be higher even for the interictal training data, which could result in misidentifying pre-seizures (that may also have higher reconstruction error because they were not seen before).Alternatively, if the interictal data is not diverse enough, then a slight variation in test interictal data could lead to misclassifying it as pre-seizure, which may lead to higher false alarms rate.Nevertheless, anomaly detection seizure prediction shows promise, and it implies that with improved signal processing and predictive modelling it may not be necessary to collect substantial preictal data to predict a seizure.Figure 16 shows the average performance in terms of AUC PR across all patients.It can be observed that the supervised CNN and the supervised CNN-LSTM performed the best on average.However, the difference in performance across models is not large, and with a large standard deviation, it is impossible to make a statistical claim on the best performing model.In general, it can be observed that the supervised approaches performed better than the unsupervised approaches with results varying across individual patients.Our results also showed the potential of using unsupervised approaches for seizure prediction.A major advantage is that it only uses unlabelled interictal EEG data, which is easier to acquire and is not dependent on an expert to annotate.

Conclusions and Future Directions
We developed several supervised approaches and introduced new unsupervised deep learning approaches for predicting epileptic seizures.In each approach, the main goal was to identify a preictal state (either as a binary class or anomaly) to predict the onset of an incoming seizure.We accounted for the variability of EEG and the preictal period by tuning the window size and PPL using a grid search.We trained personalized models and tuned hyperparameter using LOSO approach for better generalization of results.This method has achieved good results on more than half of the patients.We experimented with different supervised and unsupervised deep learning architectures on two large EEG datasets.Our results vary across different implementations depending on the patient.The advantage of unsupervised methods is that they do not require preictal data to train the models; thus, alleviating the challenges around data acquisition, and effort and time spent in labelling.However, due to the absence of validation set in unsupervised approaches (anomaly detection), it is non-trivial to obtain a threshold to detect pre-seizure events.Previous work on creating proxy outliers from the normal data can be extended to obtain an operating threshold for predicting seizures [52,53].We found that in many cases, an unsupervised approach was able to get similar or even better performance than a supervised approach; however, there was no single best performing model.Our extensive experiments show the feasibility of supervised and unsupervised deep learning approaches for seizure prediction.However, the amount of preictal data per patient appears to be a crucial factor in training generalized models.
A future extension would be to experiment with a larger range for the hyperparameters.These parameters can also vary across implementations, so optimized hyperparameter implementations with the CNN-LSTM or TCN architecture as the base could be valuable.Another direction is to obtain (person-specific) operating threshold for deployment of these algorithms in a real-world setting.However, there are several limiting factors, including data imbalance, and unequal (and potentially unknown) costs of false positive and false negative in this application.These costs must be informed by clinical practices and guidelines as it pertains to human life.Generative adversarial networks [54] and autoencoders trained in adversarial manner [55] can be another potential unsupervised deep learning approach to predict epileptic seizures.Another extension would be to try different signal processing methods and advanced CNN and sequential models, including Resnet and Transformers.A breakthrough in reducing intervention time before the onset of seizure would lead to development of therapeutic interventions that can empower epilepsy patients to live without the fear or adversarial outcomes.

Fig. 5 :
Fig. 5: (a) Labelling of the preictal and interictal periods with parameters.(b) Simplified visualization of LOSO test partitioning by withholding the last seizure.

Fig. 6 :
Fig. 6: LOSO cross-validation example with four seizures.One seizure is used for validation, while the others are used for model training.

Fig. 8 :
Fig. 8: AUC ROC Comparison of CNN models using optimized hyperparameters vs fixed hyperparameters on the CHB-MIT dataset.

Fig. 9 :
Fig. 9: AUC PR Comparison of CNN models using optimized hyperparameters vs fixed hyperparameters on the SWEC-ETHZ dataset.

Fig. 10 :
Fig. 10: AUC PR Comparison of CNN models using optimized hyperparameters vs fixed hyperparameters on the CHB-MIT dataset.

Figures 13 ,Fig. 12 :
Figures 13, 14a, and 14b show the anomaly detection based seizure prediction AUC PR results for the CNN, CNN-LSTM, and TCN AEs on the SWEC-ETHZ dataset and CHB-MIT dataset, respectively.We also show the supervised CNN with fixed hyperparameters for comparison.It can be observed that the performance varies significantly across different architectures and patients.For the SWEC-ETHZ dataset,

Table 1 :
Overview of deep learning EEG seizure prediction methods.

Table 6 :
Validation and test results for preictal-interictal classification with optimized hyperparameters on the SWEC-ETHZ dataset.6.1.1Comparisonwith Fixed ParametersWe implemented a preictal-interictal classification model with a fixed window size of 30 seconds and PPL of 1 hour to compare to our tuned hyperparameter model.The

Table 7 :
Validation and test results for preictal-interictal classification with optimized hyperparameters on the CHB-MIT dataset.