The framework of the proposed method is shown in Fig. 1, which indicates how to use effectivity connectivity to detect the ADHD and HC progress. At first, the 19 channels EEG raw data is collected from ADHD and HC subjects during a visual attention task. Then, the pre-processing is conducted to remove the artifacts and noise from the raw data. The MVAR model and PSI algorithm are applied to construct a connectivity matrix (\(19\times 19\times 4\))in delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz) and beta (13–30 Hz) frequency bands for each subject. Furthermore, graph theoretical analysis is proposed to extract the features from the connectivity matrix (graph) as global efficiency and clustering coefficient. At last, we conducted the statistical analysis by applying the SEM algorithm.

## 2.1 Dataset

To validate the performance of the classification between ADHD and HC in this study, the EEG data for ADHD and HC from IEEE Data port (DOI: 10.21227/rzfh-zn36) has been used. According to DSM-IV criteria, participants were 30 children with ADHD (15 boys and 15 girls, ages 7–12) provided by Roozbeh hospital in Tehran, Iran and 30 HC subjects were (15 boys and 15 girls, ages 7–12) collected from a primary school. All children of HC group have not got a history of psychiatric brain disorder such as epilepsy, major medical illness or any report of high-risk behaviors.

The EEG raw data was collected from a 10–20 standard 19 channels electrode cap which correspond to FP1, FP2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, Fz, Cz, Pz respectively. The data has a 128 Hz sample rate and with two reference electrodes A1 and A2.

## 2.2 EEG pre-processing

EEG signals were filtered in bandpass frequency bands between 0.5 Hz to 50 Hz using a zero-phase finite impulse response (FIR) filtering algorithm. Independent component analysis (ICA) is used to assume statistically independent sources, in addition, removing the blinks, eye-movements and artifacts in this step. Then, we re-referenced the data to the average of all scalp channels.

## 2.3 Multivariate autoregressive (MVAR) models

MVAR model can infer directivity and the causal relationship between brain connections based on effective connectivity methods which is an extension of AR model on multi-dimensional variables [21]. The algorithm of MVAR model is shown as follow:

$$X\left(n\right)={\sum }_{k=1}^{p}A\left(k\right)\times X\left(n-k\right)+W\left(n\right)$$

1

\(X\left(n\right)={[{x}_{1}\left(n\right),\dots ,{x}_{M}\left(n\right)]}^{T}\) is the current value of EEG pro-processing signal in time n. \(M\) is the channel number, here \(M\) equals to \(19\). \(p\) is the model order, \(A\left(k\right), k=1,\dots ,p,\) are \(M\times M\) coefficient matrix of MVAR model which describe the linear interaction at lag \(k\) from \({x}_{j}(n-k)\) to \({x}_{i}\left(n\right), (i,j=1,\dots ,M)\). \(W\left(n\right)\) is a vector of zero-mean Gaussian noise process with covariance matrix \({\Sigma }\). Here we use the multichannel Yule-Walker equation to describe the relationship between the coefficient matrix \(A\left(k\right)\) and covariance matrix \({\Sigma }\) because its simple calculation and good performance [27]. Thus, the output of the \(X\left(n\right)\) is the \(19\times 19\times p\) matrix of each subject.

Another key parameter of the MVAR model is the order of the MVAR model. The choice of order is closely related to the fitting effect of the model. The small order cannot make full use of the information of the observation data for accurate fitting. The large order would cause the phenomenon of overfitting and would increase the expense of calculation.

In this study, Akaike Information Criterion (AIC) equation is provided to assess the order of MVAR model [28].

$$AIC\left(p\right)=In\left|{\Sigma }\left(\text{p}\right)\right|+\frac{2}{N}m{p}^{2}$$

2

where \({\Sigma }\left(\text{p}\right)\) represents the covariance matrix of fitting error of the p-order model, and N represents the total number of settlements used for model fitting. Thus, p = 5 was selected as the model order according to the AIC equation.

We also need to obtain frequency domain data through coherent spectrum estimation, where the MVAR model is converted to frequency domain form through Fourier transform. The transfer matrix of MVAR model \(H\left(f\right)\), and cross-spectrum matrix \(S\left(f\right)\) are estimated as follow:

$$H\left(f\right)={(\sum _{k=0}^{p}-{A}_{k}{e}^{-jk2\pi f})}^{-1}$$

3

$$S\left(f\right)=H\left(f\right){\Sigma }\left({H}^{H}\left(f\right)\right)$$

4

where \({H}^{H}\left(f\right)\) is the conjugate transpose of \(H\left(f\right)\). \({\Sigma }\) is the noise covariance matrix. \({A}_{k}\) is the parameter of \(M\times M\) coefficient matrix and the \(p\) is the number of model order.

We use the MVAR model to obtain more refined spectral analysis results, which is conducive to more accurate calculation of effective connectivity coefficients. The spectrum power values were calculated in whole frequency band. After the MVAR model fitting, we got a \(19\times 19\times 128\) matrix for each subject, this is also the input of the PSI algorithm in Eq. (6).

## 2.4 Effective connectivity analysis

Phase slope index measure is used in our study. The PSI between two given components signals ‘i’ and ‘j’ is defined as:

$${PSI}_{ij}=\mathfrak{I}(\sum _{f\in F}{C}_{ij}^{\text{*}}\left(f\right){C}_{ij}(f+{\delta }_{f}))$$

5

where F is the set of frequencies of interest, F equals to half-bandwidth of the integration across frequencies. C is the normalized coherent spectrum and \({\delta }_{f}\) is an incremental step in the frequency domain. The normalized coherent spectrum C is defined as:

$${C}_{ij}\left(f\right)=\frac{\left|{S}_{ij}\left(f\right)\right|}{\sqrt{{S}_{ii}\left(f\right){S}_{jj}\left(f\right)}}$$

6

The definition of \({S}_{ij}\left(f\right)\) means the cross-spectrum between i and j, and it is the output of the Eq. (4).

According to the definition in Eq. (6), the imaginary part of coherent spectrum is used in this algorithm. Because the imaginary part information of the coherent spectrum would not change due to aliasing between the signals [29]. In other words, PSI can avoid erroneous estimation in effective connectivity caused by signal aliasing.

In this study, we used to construct the effective connectivity in four frequency band which delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz) and beta (13–30 Hz) respectively between ADHD and HC. An average effective matrix for an ADHD subject and a HC subject of delta band was shown in Fig. 2 and Fig. 3.

## 2.5 Graph theoretical analysis

Two key parameters of the graph are the nodes and edges. To detect the ADHD, we use three graph theory measures: clustering coefficient, nodal efficiency and degree centrality.

**Threshold method**

Regarding to not all connection is necessary to be calculated in the graph theory, threshold method is used in weighted graph analysis. But threshold method has issue with threshold selection. A higher threshold may cause the problem of not being able to construct a brain network and a lower threshold may cause the problem of no meaningful connectivity measures [30]. Empirically, selecting threshold as 0.2 produced the best results.

**Clustering coefficient**

Clustering coefficient is proposed to assess the ability of segregation in graph which is the most important measures in researching cognitive problem of brain [31]. The algorithm is shown as follow [22]:

$${C}^{W}=\frac{1}{N}\sum _{i}\frac{2{{t}_{i}}^{w}}{{k}_{i}\left({k}_{i}-1\right)}$$

7

The \({{t}_{i}}^{w}\) is the number of triangles around a node i, a subgraph with three nodes and three edges is called a triangle. \({k}_{i}\) is the degree of a node i. N is the numbers of nodes, here N equals to 19. The algorithm of the\({ t}_{i}\) and \({k}_{i }\)is described in Eqs. (8) and (9).

$${{t}_{i}}^{w}=\frac{1}{2}\sum _{j,h\in N}{\left({w}_{ij}{w}_{ih}{w}_{jh}\right)}^{\frac{1}{3}}$$

8

$${k}_{i}=\sum _{j\in N}{w}_{ij}$$

9

where \({w}_{ij}\) is the connection weights between node i and node j. When the measures of edges from the graph are greater than the threshold value, the connection is defined existed and \({w}_{ij}\)equals to the value of the edges, otherwise \({w}_{ij} =0\).

**Nodal efficiency**

Nodal efficiency of a graph measures the ability of each node to exchange information, and is defined as [22]:

$${E}_{nodal}\left(i\right)= \frac{1}{N-1} \sum _{i\ne j}\frac{1}{{l}_{ij}}$$

10

where N is the number of the nodes in the graph, and \({l}_{ij}\) is a path between nodes i and j with the minimum number of edges.

**Degree Centrality**

Degree centrality of a graph measures the direct impact of the brain region on other adjacent brain regions [22]. The degree centrality formula shown as follow:

$${C}_{d}\left(i\right)= \sum _{j\in N}{w}_{ij}$$

11

where \({w}_{ij}\) is the normalized connection weights that \(0\le {w}_{ij}\le 1\).

Thus, each subject has four graphs in four EEG frequency bands, and each graph has extracted three graph theory measures to detect the ADHD.