The methodology followed in this work is as shown in Fig. 3
Table 1
Seizure types and description
Seizure code  Name  Description  Pre Ictal  Ictal  Interictal 
Samples  NE  samples  NE  Samples  NE 
aABSZ  Absence Seizure  On EEG, absence discharges are noticed. Symptom: Loss of consciousness  271989  18  96568  56  745239  46 
aCPSZ  Complex Partial Seizure  Staring blank and daydreaming is the symptoms also loss of awareness regarding their surroundings.  317195  28  920447  52  753664  46 
bFNSZ  Focal Seizure  Starts in one part of the brain, but can spread to other parts i.e. become generalized.  426127  28  324887  55  786432  48 
bGNSZ  Generalized Non Specific Seizure  Generalized seizures which cannot be further classified. Symptoms: unconsciousness with spasms, stiffening, etc.  323400  22  447487  54  712084  44 
aSPSZ  Simple Partial Seizure  Awareness is not affected. It affects body muscles.  298618  18  541405  52  786432  48 
aTNSZ  Tonic Seizure  Stiffening of muscles.  353294  28  1089507  52  786432  48 
aTCSZ  TonicClonic Seizure  First stiffening followed by body jerking (Grand Mal)  411635  28  240279  45  700249  44 
*NE = number of events 
Seizure Types And Descriptions
An epileptic seizure is mainly having four major types, namely, Fibril, generalized, partial, and unclassified. They are subdivided into 20 different classes. In this work, seven types of seizures have been chosen, which have different symptoms during the onset of a seizure attack. Two manifestations are used i.e. electro clinical and electrographic. For representing Electro clinical, ‘a’ is used and for electrographic ‘b’ is used. Seizure types and description is narrated in Table 1.
Dataset
The dataset used in this work is taken from Temple University Hospital’s EEG Repository [42], specifically TUH EEG Seizure Corpus (TUSZ) as shown in Fig. 4. The repository contains 16986 sessions from 10874 unique subjects. The number of EEG channels varies from 20 to 31. The signals are of varying lengths (3,00,000 columns). The 10–20 electrode configuration was used.
Pre Processing
The obtained EEG recordings were then preprocessed be fore feeding into machine learning models. Since each file had a varying number of channels (from 21 to 30 channels), the first 18 channels from all files were taken and labelled for the type of seizure. Duration of signal for ictal region was directly taken from the corpus information shown in Fig. 1. For interictal regions, duration between the stop time of one event and start time of the next chronological event of the same file were taken as start and stop times. The duration of signal before the first event of each file is taken as the preictal duration.
Feature extraction using Fast Fourier Transform (FFT)
For feature extraction of each signal, FFT as in Eq. (1) was performed on each signal and the power spectrum was analyzed. The frequencies in the signal with maximum power were extracted as features and appropriate labels (type of seizure each for ictal, preictal, and interictal) were added.
X(K)= \(\sum _{n=0}^{N1}x\left(n\right){e}^{\frac{2\pi Kn}{N}}\) (1)
The features extracted above were fed into multiple classification algorithms to find the models giving better results. Of these, the models showing better performance can then be tuned to improve performance. These models are explained as below.
Machine Learning Algorithms
A. Bagged Trees
To reduce variance of a statistical learning method, bagging or bootstrap aggregation, is a procedure in general. In this bagged tree (BT) method, using B bootstrapped training sets, B regression trees are constructed and the resulting predictions are averaged. Due to deep growth of trees and nonpruned nature, high variance with low bias can be observed in each individual tree, Towards reducing variance averaging the BT is used. The predicted value for an observation is the mode (classification) or mean (regression) of the trees. The difference between BT and Decision Trees(DT) is that BT may contain multiple trees, each having different features, terminal node counts, data, etc., where DT contains a single tree. Bagged Trees consist of the following steps.

Sample m data sets D1, D2, …., Dm from training set D with replacement.

For each Dj train a classifier hj(x) where x represents features.

The final classifier is given by Eq. (2)
$$\text{h}\left(\text{x}\right) = \frac{1}{m} \sum _{j=1}^{m}{h}_{j}\left(x\right)$$
2
The advantages of Bagged Trees are:

It can reduce the variance within the algorithm (helpful with highdimensional data).

It is easy to implement.

It forms a strong learner yielding better performance. Disadvantages of Bagged Trees are:

The algorithm is computationally intensive.

It is difficult to interpret (Low explaining ability).
B. KNearest Neighbors
The KNN algorithm works on the assumption that similar data points tend to be situated closer to each other in a plane, as seen in Fig. 5. This classification algorithm works by classifying data points based on their ‘K’ nearest neighbors. For example, if the value of K is taken to be 1, the data will be assigned the class of its closest neighbor. In Fig. 5, the blue hexagon is the test sample, two concentrated circles represent K = 3 and K = 5 nearest neighbors. In this example, K = 3 has 2 red balls close to the test sample whereas green only has one ball so, in this case, the test sample is classified as red. On the other hand, for K = 5, 3 green balls are present compared to 2 red balls. Therefore, for K = 5, the test sample will be classified as a green class. Because of this ambiguity, researchers prefer to have K = 1, most of the time showing better results, compared to any value of K. the nearest neighbors are found based on Euclidean distance as in Eq. (4).
The largest probability of an input x being assigned to a class is given as in Eq. (3):
$$P\left(y=jX=x\right)=\frac{1}{K}{\sum }_{i\in A}^{}I\left({y}^{\left(i\right)}=j\right)$$
3
The distance metric is given by the Euclidean distance as follows (where p = 2):
$$d\left(x,{x}^{{\prime }}\right)=\sqrt{{\left({x}_{1}{y}_{1}\right)}^{p}+{\left({x}_{2}{y}_{2}\right)}^{p}+\dots +{\left({x}_{N}{y}_{N}\right)}^{p}}$$
4
Commonly used distance metrics are the Euclidean, Manhattan and Minkowski metrics. d(p, q) is the distance between points p and q as shown in Eq. (5), Eq. (6) and Eq. (7) respectively. Here n is the number of dimensions of the train set and pi and qi are the values of the ith dimensions of data points p and q respectively.

Euclidean distance metric
$$d\left(p,q\right)= \sqrt{\sum _{i}^{n}{\left({p}_{i} {q}_{i}\right)}^{2}}$$
5

Manhattan distance metric
$$d\left(p,q\right)= \sum _{i}^{n}{p}_{i} {q}_{i}$$
6

Minkowski distance metric
$$d\left(p,q\right)= {\left({\sum }_{i}^{n}{x}_{i}{y}_{i}\right)}^{1/p}$$
7
The merits of KNN are:

Implementation is simple.

To the noisy training data, is robust

For large training data, it can be more effective.
The demerits of the KNN algorithm are:

Determination of the value of K is complex sometimes.

High computation cost because for all the training samples the distance between the data points needs to be calculated.

Standardization is required for training sets.
C. Weighted KNN
The weighted KNearest Neighbors (WKNN) algorithm differs from the conventional KNN algorithm (as in Fig. 6) as weights are assigned to classes, and the decision is reached depending on the weights. The data points located close to the query point are assigned higher weights. The results obtained from this model are easy to interpret. The weighted Euclidean distance d between points pi and qi with weight wi is calculated as in Eq. (8).
$$d\left(p,q\right)= \sqrt{{\sum }_{i}{w}_{i}{({p}_{i}q)}^{2}}$$
8
The advantages of the weighted KNN algorithm are:
Disadvantages of weighted KNN are:

Sensitive to noise, missing data, and outliers

Does not work well for highdimensional data

Needs feature scaling
D. Subspace KNN
In the random subspace method, the features (‘attributes’, ‘independent variables’, ‘predictors’) are sampled randomly unlike BT. In this, it replaces each learner. Informally, to avoid overfocusing on features by individual learners, which appear highly predictive/descriptive in the training set. But, at the same time, it fails to be as predictive for points outside that set. For this reason, for highdimensional problems where the number of features is much larger than the number of training points, random subspaces are an attractive choice. The selection of subspace is mathematically shown in Eq. (9) and Eq. (10). The process is pictorially represented by Fig. 7.

Given a set of N points in an ndimensional feature space {(\({x}_{1},{x}_{2},\dots ,{x}_{n}\)) \({x}_{i}\) is real for all \(1\le i\le n\)} (9)

We consider the mdimensional subspaces
{(\({x}_{1},{x}_{2},\dots ,{x}_{n}\))\({x}_{i}=1\) for \(\text{i}\in I,{x}_{i}=0 \text{f}\text{o}\text{r} \text{i}\notin I\)} (10)
Where I, is an melement subset of {1, 2… n} and m < n.
The advantages of Subspace KNN (SKNN) are:
Disadvantages of Subspace KNN are:
E. Random Forest
A Random Forest (RF) algorithm can be used for classification as well as regression, as in the case of Support Vector Machines. The algorithm works by creating decision trees and assigning new data points to the category that wins the majority of votes, as described in Fig. 8. This model finds applications in many sectors and is often used for highdimensional data.