Methodology proposal of ADHD classification of children based on cross recurrence plots

Dealing with electroencephalogram signals (EEG) is often not easy. The lack of predicability and complexity of such non-stationary, noisy and high-dimensional signals is challenging. Cross recurrence plots (CRP) have been used extensively to deal with the detection of subtle changes in signals, even when the noise is embedded in the signal. In this contribution, a total of 121 children performed visual attention experiments and a proposed methodology using CRP and a Welch Power Spectral Distribution have been used to classify then between those who have ADHD and the control group. Additional tools were presented to determine to which extent this methodology is able to classify accurately and avoid misclassifications, thus demonstrating that this methodology is feasible to classify EEG signals from subjects with ADHD. The experimental results indicate that the proposed methodology shows higher accuracy in comparison with methods proposed by other authors, providing that the correct recurrence tools are selected. Also, this methodology does not require extensive training such as the methods proposed using classical machine learning algorithms. Furthermore, this proposed methodology shows that it is not required to manually discriminate events among the EEG electrodes since CRP can detect even the smallest changes in the signal even when it has embedded noise. Lastly, the results were compared with baseline machine learning methods to prove experimentally that this methodology is consistent and the results repeatable. Given the right CRP metrics, an accuracy of up to 97.25% was achieved, indicating that this methodology outperformed many of the state-of-the-art techniques.


Introduction
There are many methods to cope with highly nonlinear time series. For instance, Altan et al. [6] used deep neural networks (LSTM, in this case) with a grey-wolf optimizer for wind speed forecasting. Also, Cao et al. [13] used the same type of deep recurrent neural network to forecast financial time series, while Karasu et al. [27] used multi-objective particle swarm optimization (MOPSO) for crude oil forecasting. Lastly, Altan et al. [5] used a methodology based on LSTM, wavelet decomposition and a swarm intelligence method to forecast the price of digital currency. Electroencephalograph (EEG) signals have been investigated abundantly and have many challenges [54]. For instance, the nonlinear nature of such signals make it difficult to classify them using statistical and machine learning techniques. Some of the difficulties when trying to classify such signals are: • Statistical approaches are often not accurate enough due to the highly nonlinearity of the EEG signals. • Some statistical and machine learning methods process the signal knowing there is an event (an attention event, in this case), but the method has problems locating the event itself. • Surface EEG signals have a large amount of noise embedded in the signal, which may considerably increase the complexity of the classification. • Most artificial intelligence (AI) and machine learning methods require extensive training for pattern recognition of the signals.
To deal with these difficulties, a methodology is presented to classify EEG signals for both children with ADHD and control groups using recurrence plots (RP). RP is a graphical representation of the amount of time at which two states of a system exist in the same phase-space neighborhood [37]. With these graphical representations, the dynamics of a highly nonlinear set of signals may be studied [2,20].
Furthermore, since RP and cross recurrence plots (CRP) are visual representation of the data, an analysis called RQA (recurrence quantification analysis) must be carried out in order to quantify the length and number of the recurrences, the phase trajectories, the system's Entropy, among others [17,52] 2 Background

EEG signals
The electroencephalogram (EEG) signals are composed of a series of electrical potentials that changes over time on different channels according to the international 10-20 standard [20]. An example of a typical output from the EEG is shown in Fig. 1.
Many methods have been used for EEG signal classification. For instance, Zhou et al. [59] used a radial basis function (RBF) support vector machines to classify EEG signals, while Richhariya and Tanveer [48] used another variation of an SVM to classify EEG. Also, Dose et al. [15] used an algorithm called mutual information to classify EEG signals with a brain-computer interface (BCI) and Satapathy et al. [50] used a popular swarm intelligence method called particle swarm optimization (PSO) for epilepsy identification using EEG signals.
Furthermore, deep neural networks (DNN) have been used as well to classify EEG signals. For instance, Kumar et al. [30] used a EEG classification for motor imagery based on DNN and Nagabushanam et al. [40] used a recurrent network called LSTM to classify such signals.

ADHD
According to some authors [19,47], attention-deficit/ hyperactivity disorder (ADHD) is the most common behavioral disorder in children. For a correct diagnosis of ADHD in children, many factors must be considered. Some of them are: poor attention, distractibility, hyperactivity, impulsiveness, poor academic performance or behavioral problems at home or at school [19,47].
Also, Eslami and Saeed [18] used SVD (singular value decomposition) to classify ADHD patients based upon their dissimilarities. Furthermore, Sadatnezhad et al. [48] used linear discriminant analysis to detect features of ADHD in EEG signals, while Ghassemi et al. [22] used a similar approach to detect features on EEG signals. Lastly, Kuang and He [29] used deep neural network to attempt to classify ADHD subject using fMRI images.
In terms of the accuracy of the methods for ADHD classification, it is noteworthy the following works: For instance, Jayawardena et al. [26] and Marcano et al. [33] used to classify patients with ADHD and the control group using EEG signals achieving between 85 and 95%. Another popular algorithm for this type of applications is the support vector machines (SVM), where Mueller et al. [39] showed an accuracy of 90% using a linear SVM. Furthermore, applying an extreme learning machine, ELM is intended to predict ADHD by analyzing magnetic resonance images, MRI [43]. In this study the precision achieved was 90.18%, while Tenev used also support vector machines (SVM) and a voter algorithm to classify ADHD in adults achieving similar accuracy.
Lastly, Hui et al. [24] obtained different results for classification using machine learning algorithms for ADHD in adults. In his study reported 90.04% of accuracy for decision trees, Adaboost an accuracy of 87.74%, Bagged tree 82.9%, and SVM of 87.28%.

Cross recurrence plots
Recurrence plots have been used largely for a number of applications. For instance, Aboofazeli and Moussavi [1] have used RP and CRP for distinguishing features for breathe and swallowing sound. Also, Litak and Rusinek [31], Litak et al. [32] had used RPs and CRP to describe the features of the vibration for milling process, while Demos and Chaffin [14] used RP to evaluate musical performance.
Also, Bastos and Caiado [9] have used RP to determine trend on stock markets, while Addo et al. [4] also used RP for financial applications. Furthermore, Rashvandi and Nasrabadi [46] have been able to distinguish between different breath sounds using RQA and Nalband et al. [41] determined features to classify knee disorders using cross recurrence plots (CRP).
Finally, Villamor (2017) has used CRP to classify novice and expert programmers by tracking eye movements.
In terms of EEG signals with the use of RP, Khodabakhshi and Saba [28] used RP for the analysis of emotions, Ngamga et al. [42] and Torse et al. [56] were able to classify patients that suffered epileptic seizures. Also, Becker et al. [10] used RP strategies to monitor patients under anesthesia. At present, the authors are unaware of any studies using RP or CRP for the classification of ADHD in children with similarities in comparison with the methodology presented in this contribution.
A cross recurrence plot (CRP) is a two-dimensional figure that represents the occurrences between two different dynamic systems in an m-dimensional phase space. Cross recurrent matrix is defined by Eckmann et al. [16] as: where: x i and ỹ j represent the trajectories in an m-dimensional space. H(Á) is the Heaviside function and; ||Á|| is the Euclidean norm The RP has a main diagonal line, since CR i,i = 1 (i = 1…N), with the length depending on the largest Lyapunov exponents as explained by Aceves-Fernandez et al. [3].
In order to accurately calculate the recurrence quantification analysis features (RQA) for the problem at hand, the embedding parameter must be determined priorly. According to several authors [11,21,35,36] the success or failure of RQA measures depends upon the correct calculation of the parameters which are: time delay, norm, recurrence threshold and embedding dimension.
Time delay is calculated since the noise is largely to increase as the dimension increases due to the nonlinearity of the signal. Having a large time delay will not reconstruct accurately the signal in its original phase space. For that reason, the time delay was calculated using the method average mutual information and was set to 1 as shown on Fernandez-Fraga et al. [20].
The norm chosen in this contribution was calculated using Euclidean norm since it was demonstrated that this was the best type of norm to be used to this type or problem [34,53,60]. Furthermore, the recurrence threshold must be as large as five times the standard deviation of the observational noise, i.e. e [ 5r [23,44], which in this case was set to 2.
Lastly, the embedding dimension was calculated using the false nearest neighbors algorithm as shown by Zou et al. [60], which in this case was calculated as m = 5 using the Taken's theorem as demonstrated by Huke [25].
It is important to define the measures to quantify the recurrence structures, called cross recurrence quantification analysis (CRQA). The measures considered in this contribution are: Recurrence Rate, Determinism, Entropy, Laminarity, Trapping Time and Trend [37] (Ngamba 2016).

Recurrence Rate
The Recurrence Rate is a measure of density of the points or recurrences in the system [21]. Stochastic behavior causes short diagonal lines, whereas deterministic behavior causes longer diagonal lines. The Recurrence Rate is given by: where lP(l) is the distribution of the diagonal lines

Determinism
The Determinism (DET) corresponds to the local predictability of a system. Determinism also measures the discrepancies of the diagonal lines [37]. The Determinism of a system is calculated as: where l min is the minimum length that forms diagonal structures from recurrence points

Entropy
This measure refers to the Shannon Entropy of the frequency distribution of the diagonal line lengths, this is, the variability in the lengths of the diagonal lines [58]. The Entropy of a system is given by:

Laminarity
The Laminarity (LAM) may be defined as the frequency distribution of the lengths that form vertical lines [36]. Laminarity is also the evidence of the chaotic transitions and is related to the number of laminar phases in the system, which is represented by the occurrence of vertical lines in the recurrence plot.
The laminar phases of a system are given by:

Trapping time
Trapping Time measures the difference of the length of vertical lines (6): where v is the length of the vertical lines, v min is the shortest length that is considered a line segment and P(v) is the distribution of the corresponding lengths.

Trend
The trend is a linear regression coefficient over the recurrence point density of the diagonals parallel to the line of identity (LOI). The trend measurement is given by: 3 Materials and methods
In this context, 61 children were diagnosed with ADHD and 60 healthy controls, boys and girls aged 7-12 years old. The ADHD children were diagnosed by experienced psychiatrist using DSM-IV criteria [38].
As shown by Samavati et al. [51], the children looked at pictures that may be attractive for them to watch according to their age (e.g. cartoons) and then asked to response how many characters they saw.

Proposed methodology
The proposed methodology follows these steps: • The signals were acquired for each electrode and each subject (both ADHD and control groups) as explained by Mohammadi et al. [38] • The relevant electrodes must be selected. In this case, an experiment was performed to determine whether the accuracy decreases for certain electrodes or whether the accuracy is region or electrode dependent. • If the signals contained readable information, they will be normalized using a method called z-score, otherwise they will be discarded. • Once the signals were normalized, the recurrence features must be selected. To ensure repeatability and consistency, the embedded dimension m = 5, delay = 1 by mutual information method and Euclidean norm were fixed. • The recurrence quantification parameters were calculated for each electrode of each subject (both ADHD and control groups) unless the signal was discarded as explained on step c. The recurrence parameters calculated were: Recurrence Rate (Rec), Determinism (Det), Entropy (Ent), Laminarity (Lam), Trapping Time (TT) and Trend. • Power spectral density (PSD) must be calculated for each electrode and each subject using the Welch method [57] (Barbe 2009). • Once RQA parameters were calculated, interpretation of the results must be performed.
The methodology described in the present section may be graphically shown in Fig. 2.

Experimental results
The interpretation of these non-stationary, highlydimensional, nonlinear EEG signals is often not straightforward. In this contribution, the Welch method is used to determine the segments for power spectral density called periodograms as explained by Welch et al. [57], Rahi and Mehra [45] for each electrode and each RQA feature. Figure 3a shows the Recurrence Rate for all test for each electrode for ADHD group using the Welch method, while Fig. 3b the Recurrence Rate for the control group. Also, Fig. 3c, d shows the Determinism for ADHD and control groups, respectively. Furthermore, Fig. 3e, f shows the RQA Entropy calculated for both ADHD and control groups. Lastly, Fig. 3g, h shows Laminarity for both ADHD and control groups and Trapping time are shown in Fig. 3i, h, while trend is shown in Fig. 3k, l (also, ADHD and control groups, respectively).
As shown in Fig. 3, there are significant differences between the ADHD and control groups in terms of the power frequency for each electrode. However, the visual differences do not quantify the extend to which the proposed methodology is feasible to classify both groups. Hence, accuracy (AC), sensitivity (SE) and specificity (SP) were calculated for each RQA metric and experiments were made to ensure that this methodology is consistent. A total of 1000 experiments were performed using all the data available for each electrode validating the results with the 80-20 method. This is, in each experiment, 80% of the data was used as the model whereas the remaining 20% was used for testing. For each correct detection, a true positive (TP) and true negative (TN) are given whereas a mis-detection is given by false positive (FP) and false negative (FN). AC, SE and SP are calculated as follows: Accuracy   Table 3 shows the SE, SP and AC for Entropy for the experiments carried out. Table 4 shows the SE, SP and AC for Laminarity for the experiments. Table 5 shows SE, SP and AC for trapping time. Finally, Table 6 shows SE, SP and AC for trend. Figure 4 shows sensitivity for all metrics (Recurrence, Determinism, Entropy, Laminarity, Trapping Time, and Trend). Figure 5 shows the specificity for all quality metrics. Lastly, Fig. 6 shows the accuracy for all quality metrics. shows the difference between the Recurrence Rate for the ADHD and control groups. This is especially so, in frequencies greater than 0.5 rad/sample. Also, Fig. 3c, d shows the Determinism for ADHD and control groups, respectively. In these figures, the difference between the frequencies for both groups starts to be evident from 0.5 rad/sample onwards. Furthermore,  Fig. 3   In terms of Laminarity (Fig. 3g, h) it could be noted that for all frequencies, the signals for the ADHD group seem more similar with respect to the control group. Finally, both Trapping Time (Fig. 3i, j) and Trend (Fig. 3k, l) show similar values for frequencies smaller than 0.7 rad/sample. For most cases, the differences between ADHD and control groups are starting to be evident as the frequencies increase. With these figures, it may be concluded that in the larger frequencies, the differences between both groups are more evident.

Discussion of the results
Likewise, in Table 1, it is shown that specificity is reliable ranging from 0.9724 for electrode Cz to 0.9201 for electrode P4 with a mean of 0.949. This seems to indicate that regardless of the experiment and the data used as testing, most of the electrodes show high accuracy. In terms of SP, it shows a mean of 0.9461 with a higher value of SP of 0.9663 for electrode O1. This seems to indicate that for Recurrence Rate, the results do not tend to give miscalculations of false positives and false negatives. Lastly, high values of AC are also shown, ranging from 0.9144 to 0.9675 with a mean of 0.9493. Table 2 shows the SE, SP and AC of Determinism for the experiments performed. In this table, it is shown that the sensitivity from Determinism ranges from 0.9306 to 0.9669 with a mean of 0.9497. In comparison, the baseline results given by Mohammadi  [38] which were 0.9228 using an multilayer perceptron (MLP) are higher for every electrode in the results showed in this contribution. In terms of the specificity, Determinism also shows high values for all electrodes, ranging from 0.9412 (P3 electrode) to 0.9651 (Cz), which also seems to indicate that the methodology proposed detects false positives and false negatives with a high confidence. Finally, accuracy for RQA Determinism ranges from 0.9372 (P4 electrode) to 0.9613 (both for Fz and F7 electrodes). This seems to indicate that the results are reliable for every experiment performed. Also, the results from all the electrodes outperformed most of the methods used for comparison as shown in Table 7, which strongly suggest that the proposed methodology is a feasible tool to classify accurately ADHD children from EEG signals. Also, Table 3 shows the SE, SP and AC for Entropy for the experiments carried out. In this table, it is shown that sensitivity is slightly lower than  Table 4 shows the SE, SP and AC for Laminarity for the experiments. In this table, it is shown that all metrics show a high value of true positives and true negatives, which seems to indicate that Laminarity is a good RQA metric to classify ADHD from control group. More specifically, sensitivity ranges from 0.9666 (Fp1 electrode) with the lowest value of 0.938 (T6 electrode), which is not a low value by any means. Also, the experimental results for specificity show that it reaches a mean of 0.9529, whereas the mean for accuracy is 0.9527 which is seemed to indicate that Laminarity is a  The right CRQA metric must be chosen reliable tool to classify ADHD using the proposed methodology. Table 5 shows SE, SP and AC for trapping time. In this table, it is shown that the sensitivity is also higher than the baseline work ranging from 0.9379 (T4) to 0.9632 (O1) with a mean of 0.9511. Also, specificity shows a mean of 0.953 and accuracy a mean of 0.954, which seems to indicate consistency between metrics for Laminarity and a high probability of classification between ADHD and control groups.
Finally, Table 6 shows SE, SP and AC for trend. In this table, it is shown that this is the RQA metric with the lower values. For instance, SE ranges from 0.8811 (T3 electrode) to 0.9148 (F8 electrode) with a mean of 0.9055. Also, SP shows a lower value of 0.882 (F8) and a higher value of 0.9164 (Cz) with a mean of 0.8984, while AC shows a lower value of 0.8834 (C4) and the higher value of 0.9154 (O1) with a mean of 0.9001. This seems to indicate that although it is fairly constant and does not show a low value, it is lower than the baseline values is aimed to achieve.
On the other hand, to show to which extent this methodology may be used to classify EEG signals for ADHD in children, Fig. 4 shows sensitivity for all quality metrics (Recurrence, Determinism, Entropy, Laminarity, Trapping Time and Trend). In this figure, it is shown that in comparison with the value in the baseline shown by Mohammadi et al. [38], all metrics with the exception for Trend are higher in the experiments shown in this contribution. Although Trend does not show a considerable low value, it may be discarded for sensitivity if a high classification wants to be achieved using the proposed methodology. Figure 5 shows the specificity for all quality metrics. This figure shows similarities with sensitivity, in which Recurrence, Determinism, Laminarity and Trapping Time show high values, whereas Entropy shows slightly lower values and trend also shows a lower specificity than the baseline. This also seems to indicate that given the right RQA features, a high classification rate to detect ADHD in EEG signal can be achieved.
Furthermore, Fig. 6 shows the accuracy for all quality metrics. In this figure, it is shown that for most RQA feature, the accuracy is high with a few exceptions and most of the mean for these features are higher than the baseline (again, the exception being trend). This also seems to indicate that this methodology does not detect false positives and show high rates of correct classifications when using the correct features. The experimental results indicate that RQA trend is not recommended to be used in order to achieve higher accuracy when using the proposed methodology.
Finally, Table 7 shows a comparison between many methods to classify ADHD with the proposed methodology. This shows that although many authors have methods that work consistently, the proposed methodology reports a higher accuracy and has many advantages over current state-of-the-art methods.

Conclusions and future work
The proposed methodology shows that it is feasible classify ADHD using EEG signals with high accuracy. This also shows that the classification rate improves when compared with other techniques used in the literature. Also, the reliability of this method is considered to be high as the experimental results show consistency. Some RQA features prove to give better results than others. Therefore, the right tools must be selected in order to increase the classification rate. The experimental results demonstrate that this methodology may be used to accurately classify ADHD in EEG signals.
Furthermore, this methodology presents many advantages with respect to existing machine learning algorithms. For instance, an effective training must be made in order to make an accurate classification for machine learning, which is not required using the proposed methodology. Also, the classification rate is high regardless of the EEG electrode placed in the subject and does not need to discriminate some electrodes as shown in the experimental procedure.
Lastly, this work shows that the proposed methodology does not tend to detect false positives and false negative classifications, which is also a contribution of the present work.
For future work, it may be worthwhile to explore the reason for trend showing a lower classification rate for sensitivity, specificity and accuracy.