During the last years, numerous classic machine learning based methods have been developed for many different applications in medicine, for instance disease classification and diagnosis [36], medical imagining [37], smart health records [38], personalized treatment [39], epidemic control [40], or artificial intelligence surgery [41], to name a few. Among all these methods and applications, the use of deep learning algorithms for diagnosis and disease identification have become of key importance in healthcare and medical services. In this regard, artificial neural networks (ANN) employing RBF architectures have been tested to solve complex problems in classification, showing good performances in nonlinear scenarios [23, 42]. In the present study, a novel RBF technique to improve accuracy using a fuzzy C-means algorithm was developed. The algorithm was directly applied to the pre-processed real EEG signals to classify two different clusters, namely schizophrenic patients, and healthy controls. Figure 2 shows the architecture of the ANN with the input layer, just one hidden layer, and the output layer, as it was mentioned in the introduction section.

The input layer corresponds to the vectors \({e}_{p}=[{e}_{p,1}, {e}_{p,2}, \dots ,{e}_{p,32}]\) recorded by the 32 electrodes for each p patient. The hidden layer is formed by N neurons with associated radial basis function \(\mu \left(r\right)\) that estimates the Euclidean distance (denoted as r and showed in Eq. 1) of the input vectors with respect to the center of the sth node \({c}_{s}=[{c}_{s,1}, {c}_{s,2}, \dots ,{c}_{s,32}]\) of the RBF neuron (for \(s=1, 2,\dots , N\)), see Fig. 3:

\({r}={r}\left({{e}}_{{p}}\right)=\left(‖{{e}}_{{p}}-{{c}}_{{s}}‖\right)=\sqrt{\sum _{{n}=1}^{{P}}{({{e}}_{{p},{n}}-{{c}}_{{s},{n}})}^{2}}\) (1)

The RBF function \(\mu \left(r\right)\) can be of several types depending on the patterns to be classified. The most common choices are:

Gaussian function

\({\mu }\left({r}\right)={{e}}^{-{\left(\frac{{r}}{{{\sigma }}_{{s}}}\right)}^{2}}\)

Multi-quadratic function

\({\mu }\left({r}\right)=\sqrt{1+{\left(\frac{{r}}{{{\sigma }}_{{s}}}\right)}^{2}}\)

Inverse multi-quadratic function

\({\mu }\left({r}\right)=\frac{1}{\sqrt{1+{\left(\frac{{r}}{{{\sigma }}_{{s}}}\right)}^{2}}}\)

Poly-harmonic Spline Function

\({\mu }\left({r}\right)={{r}}^{{k}}, {k}=1, 3, 5, \dots\)

\(\mu \left(r\right)={r}^{k}\text{l}\text{n}\left(r\right), k=2, 4, 6, \dots\)

where \({\sigma }_{s}\) is the width of the sth node of the RBF neuron, shown in Fig. 2. Finally, the output of the network can be calculated as a function of the RBF functions and the output weights \({w}_{s}\) associated to each neuron as follows:

\({O}\left({e}\right)=\sum _{{i}=1}^{{N}}{{w}}_{{i}}{\mu }\left(‖{{e}}_{{p},{i}}-{{c}}_{{s},{i}}‖\right)\) (2)

## 3.1. Training of the proposed Neural Network

The training process employed for the proposed RBF neural network includes two separate phases:

1.- The hidden RBF layer parameters are obtained. To this aim, a fuzzy means algorithm [21–24] has been employed to initialize the parameter values \({{c}}_{{s}}, {{\sigma }}_{{s}}\) and calculate the network structure such as the number of hidden layers \({N}\). Generally, \({N}\) is selected by a trial-and-error method or applying the k-means clustering algorithm [43–44]. However, the use of a fuzzy means systems permits to increase accuracy and is faster. The process establishes a fuzzy partition defining several triangular fuzzy arrangements which centers define a multidimensional grid for the input data. Then, the nodes of the grid are stablished as node centers for the hidden layers, where the distance between any pair of center positions is always equal or greater than the smallest edge in the grid. In addition, it is guaranteed that at least one hidden node is designated for any input data.

Specifically, the input variable is divided in \({{a}}_{{s}}\) triangular fuzzy sets named \({{T}}_{{s}}^{1}, {{T}}_{{s}}^{1},\dots , {{T}}_{{s}}^{{{a}}_{{s}}}\) with membership functions:

\({{\mu }}_{{{T}}_{{s}}^{{l}}}\left({{e}}_{{p}}\right)=\left\{\begin{array}{c}1-\frac{\left|{{e}}_{{p}}-{{t}}_{{s}}^{{l}}\right|}{{{d}}_{{s}}^{{l}}} if e\in \left[{{t}}_{{s}}^{{l}}-{{d}}_{{s}}^{{l}},{{t}}_{{s}}^{{l}}+{{d}}_{{s}}^{{l}}\right] , l=1,\dots ,{{a}}_{{s}} \\ 0 otherwise \end{array}\right.\) (3)

being \({{t}}_{{s}}^{{l}}\) the central element with membership value equal to unity and \({{d}}_{{s}}^{{l}}\) is half of the respective width. It is worthwhile to mention that for each input variable, the sum up of the membership quantities is the unity. The aforementioned fuzzy partition properties have been represented for an input vector \({{e}}_{1}\) in Fig. 3.

At last, the width \({{\sigma }}_{{s}}\) of the RBF activation functions are calculated. For each i node, the width was estimated using the g heuristic of the nearest neighbour:

\({{\sigma }}_{{i}}\left({e}\right)={\left(\frac{1}{{g}}\sum _{{j}=1}^{{g}}{‖{{c}}_{{i}-}{{c}}_{{j}}‖}^{2}\right)}^{1/2}\) (4)

where \({{c}}_{1}, {{c}}_{2}, \dots , {{c}}_{{g}}\) are closest node centers to the hidden i node. The g value was chosen so that entering an input vector into the system activates a large number of nodes.

2.- The weights\({{w}}_{{s}}\) are calculated by means of a linear regression based on Eq. (2). The solution of the linear regression implies N equations with N unknown weights, that can be expressed in a matrix form as follows:

(5

Moreover, a simpler solution corresponding to the exact interpolation obtained as \({w}={{\mu }}^{-1}\) can be applied.

Lastly, in order to train the hidden layer of the neural network, a group of known training pairs of inputs \({{e}}_{{k}}\) and outputs \({{O}}_{{k}}\) (with\({k}=1, 2,...,{K})\)has been employed.

## 3.2. Validation

The validation of the model proposed was performed by means of a K-fold cross-validation process [45] to assess its predictive capability, see Fig. 4.

It is an iterative procedure that divides the recorded input data from the electrodes randomly into k groups or folds of approximately the same size. K-1 groups are used to train the model and the remaining one is used for validation. This process is repeated K times using a different group for validation in each iteration. The process generates K estimates of the error, which average is considered the final estimation. In the present study, the input recorded dataset was divided 70% for training and 30% for testing. To avoid overtraining the cross-validation analysis was performed without sharing data across training and validation groups. A comparison among different classical machine learning algorithms i.e., Support Vector Machine (SVM), Bayesian Linear Discriminant Analysis (BLDA), Gaussian Naive Bayes (GNB), K-Nearest Neighbour (KNN) and Adaboost were also included in the study to check the advantages of the proposed model. All the methods were implemented through the machine learning Matlab toolbox [46].

As it is well known, the algorithms need to be adjusted during the training process by means of different hyperparameters like the number of splits, learners, neighbours, distance metric, distant weight, kernel, box constraint level, multiclass method, etc. The hyperparameters of each model were optimized with a Bayesian approach. In this regard, the Bayesian optimization generates a short sequence of simulated experiments with different combinations of the hyperparameters, keeping the values that presents the best area under the curve (AUC) and balanced accuracy.

Finally, the parameters checked to measure performance are:

\({R}{e}{c}{a}{l}{l} \left({\%}\right)=\frac{{T}{P}}{{T}{P}+{F}{N}} {x} 100\) (6)

\({S}{p}{e}{c}{i}{f}{i}{c}{i}{t}{y} \left({\%}\right)=\frac{{T}{N}}{{T}{N}+{F}{P}} {x} 100\) (7)

\({P}{r}{e}{c}{i}{s}{i}{o}{n} \left({\%}\right)=\frac{{T}{P}}{{F}{P}+{F}{P}} {x} 100\) (8)

\({B}{a}{l}{a}{n}{c}{e}{d} {a}{c}{c}{u}{r}{a}{c}{y} \left({\%}\right)=\frac{{R}{e}{c}{a}{l}{l}+{S}{p}{e}{c}{i}{f}{i}{c}{i}{t}{y}}{2} {x} 100\) (9)

In these equations, *TP* indicates the number of positive cases, *TN* is the true negatives, *FN* the false negatives and *FP* indicates the false positive cases.

In addition, the \({{F}}_{1}\) score and Matthew's correlation coefficient (*MCC*) were employed during the study. The \({{F}}_{1}\) score is defined as:

\({{F}}_{1} {s}{c}{o}{r}{e} \left({\%}\right)=\frac{{P}{r}{e}{c}{i}{s}{i}{o}{n}\bullet {R}{e}{c}{a}{l}{l}}{{P}{r}{e}{c}{i}{s}{i}{o}{n}+{R}{e}{c}{a}{l}{l}} {x} 100\) (10)

and the *MCC* [47], that measures the overall model performance, is described as:

\({M}{C}{C} \left({\%}\right)=\frac{{T}{P}\bullet {T}{N}-{F}{P}\bullet {F}{N}}{\sqrt{({T}{P}+{F}{P})({T}{P}+{F}{N})({T}{N}+{F}{P})({T}{N}+{F}{N})}} {x} 100\) (11)

Lastly, two additional metrics assessing the overall model performance, namely Cohen’s Kappa (*CK*) and degenerated Younden’s index (*DYI*) [47], have also been included in the study.