Accurate neural network classification model for schizophrenia disease based on electroencephalogram data

Electroencephalogram is a useful interface system that translates the human electrical brain activity into voltage signals. By these means, the recorded brain waves can be employed to characterize, classify, or diagnose mental disorders. A novel neural network model to classify patients with schizophrenia based on electroencephalograms is presented. The proposed model decomposes the multichannel electroencephalogram records into a group of multivariate novel radial basis functions using a fuzzy means algorithm. The decomposition permits to extract different electroencephalogram channel information and distinguish between two sort of classes i.e., schizophrenic patients and healthy controls. Results show improved accuracy compared to classical algorithms reported in the literature i.e., Support Vector Machine, Bayesian Linear Discriminant Analysis, Decision Tree, Gaussian Naive Bayes, Random Forest, K-Nearest Neighbour, Convolutional Neuronal Network, or Adaboost. As a result, the method presented in this paper achieves the highest balanced accuracy, recall, precision and F1 score values, close to 93% in all cases. The model presented in this paper may be integrated in real time tools involved during the diagnostic of schizophrenia.


Introduction
Schizophrenia is a disorder of crucial public health significance affecting more than 21 million people worldwide, being more common in men (12 million) than in women (9 million) [1]. Schizophrenia may cause a combination of serious disturbances in thinking and behavior, hallucinations, and delusions [2,3]. It usually involves lifelong treatment, where early detection can help to control symptoms and can improve the long-term prognosis [4,5]. The exact causes of schizophrenic disorder are unknown, although current research suggests a combination of hereditary and environmental factors [6,7]. The genetic predisposition maybe the primarily motive, but external factors such as stressful life situations or substance abuse can act as triggers. Diagnosis of schizophrenia is generally based on a comprehensive evaluation of the person's illness history and a clinical interview carried out by psychiatrists. In this interview, signs and symptoms are carefully analyzed, and all possible close sources are asked: family, friends, neighbors, or work, to name a few. For its diagnosis, two manuals that classify psychiatric diseases are used i.e., the Diagnostic and Statistical Manual of Mental Disorders (DSM) of the American Psychiatric Association, and the International Classification of Diseases (ICD) of the World Health Organization (WHO). The last versions of both tests are DSM-V and ICD-10, respectively.
Currently, there is no objective examinations that can confirm the disease. Nevertheless, the monitoring of neurophysiological activity through encephalograms (EEGs) provides decisive information that can lead to a possible diagnosis [8][9][10][11]. EEGs offers an effective means of evaluating brain function, remaining an essential tool in neurology, particularly in serious neurological conditions. Besides, it can be employed with relative low damage to patients due to its non-invasive nature [12][13][14][15]. Although EEG does not precisely determine the etiology of the brain disfunction, the subsequent processing of the data recoded by the electrodes can be very helpful to make diagnostics.
Recent research studies focused on machine learning have significantly helped to develop new classification techniques using EEG records [16][17][18][19]. By these means, different brain diseases, including schizophrenia, have been successfully identified compared with healthy controls by examining the neuronal activity of the brain. Generally, machine learning is classified in supervised learning, unsupervised learning, deep learning, and reinforcement learning [13,20], but during the last years, methods based on deep learning employed for classification have increased significantly due to its interesting properties. For instance, deep learning systems can create new features from datasets without human intervention, it is able to work with unstructured data or allows to develop efficient models at learning complex features due to the use of multiple hidden layers.
More specifically, several algorithms and models have been developed for the classification of schizophrenia disease during the last years [21][22][23][24][25][26]. Among them, several machine and deep learning algorithms such Decision Tree, Random Forest, Naïve Bayes, Support Vector Machine, k-Nearest Neighbor, Gradient Boosting and Convolutional Networks have shown encouraging results with accuracy levels above 80-90% in many studies. These results indicate that the use of computational algorithms can be efficiently employed in clinical environments during the diagnosis of schizophrenia, reducing subjectivity and costs. However, new method and models, as the one proposed in this work, have to be proposed to still improve its computational requirements, complexity, and classification properties.
In this particular study, a novel deep leaning method based on radial basis function (RFB) using a fuzzy means algorithm is presented. An RBF network is a non-linear artificial neural network that uses complex functions to predict the possible outputs from the input data [27][28][29][30]. The RBF network has three different layers in total, the input layer (it transmits the input signals to the hidden neurons without performing any processing operation), the hidden layer (it performs non-linear and local transformations by means of the radial basis functions) and the output layer (it combines the activation of the hidden neurons to obtain the outputs). Thus, these types of networks build approximations that are linear combinations of multiple non-linear local functions. In addition, the training phase is extremely fast, the analysis of hidden layers is simple and present easy network configurations. On the other hand, fuzzy clustering is a class of clustering algorithm where each element could belong to more than one cluster [31][32][33]. This sort of system is developed to solve exclusive grouping (which considers that each element can be unambiguously grouped with the elements of a specific cluster). The similarity between new elements and clusters is obtained by an analytical function named membership function. Values close to one denote maximum similarity, whilst values close to zero indicate minimum similarity. Thus, the main aim of the fuzzy clustering consists of finding the optimal membership function. More specifically, fuzzy C-means algorithm is a method used in fuzzy grouping based on objective functions (normally least squares error functions) [34][35][36][37]. These algorithms define a grouping criterion depending on the objective functions that are iteratively minimized to obtain the optimal fuzzy cluster.
By these means, the use of RBF deep learning methods combined with fuzzy grouping have shown a great accuracy to classify patients with schizophrenia and healthy controls compared with other classical methods. As a result, the proposed algorithm improved the results obtained up to date and could be used in current psychiatry for real time clinical diagnosis. The paper is organized as follows: Sect. 2 introduces the materials used in this study. Section 3 presents our proposed classification approach. The description of the experiments and the discussion of the results are given in Sects. 4 and 5, respectively. Finally, the conclusions of this paper are summarized in Sect. 6.

Materials and equipment
This section describes the data used in this study. EEGs were recorded continuously through a 32-channel brain vision system employing sintered Ag/AgCl electrodes, see Fig. 1. The sampling rate was 500 Hz and the electrodes were placed following the International System 10-20 [38] with reference points Fpz/Afz/Fz/Cz/Pz/Qz. External interferences and artifacts existing in EEG signals due to electrical distribution network, breathing, eye-blinking, body movements, breathing or sweating were removed via filtering [39,40]. More specifically, a notch filter at 50 Hz and a low pass filter with 40 Hz cut-off frequency were applied.
The clinical interview DSM-IV from the Diagnostic and Statistical Manual of Mental Disorders was provided to all patients, after requiring informed consent and completing baseline assessments, for schizophrenia diagnosis. The interview was designed to be administered by healthcare personnel having experience in performing unstructured diagnostic evaluations. All patients and controls resided in Cuenca, Spain. In addition, all of them were enrolled in the Severe Mental Disorder Program of the Psychiatric Service of Virgen de la Luz Hospital, Cuenca, Spain. All the evaluations and results obtained were approved by the Clinical Research Ethics Committee of the Health Area. It was conducted between May 2013 and April 2020. Three hundred and twelve subjects with schizophrenia and three hundred and twenty healthy controls were examined during the training of the proposed network in order to confirm the classification models proposed in this paper. Additionally, an external dataset composed of one hundred and twenty patients with schizophrenia and one hundred and five healthy patients provided by the hospital of Klinikum Bremen-Ost (Germany) was also employed to validate the method. Inclusion criteria comprised symptoms within at least 6 months and age between 10 to 60 years. Exclusion criteria limited the study to patients with medical instability and pregnant or lactating women.

Methodology
During the last years, numerous classic machine learning based methods have been developed for many different applications in medicine, for instance disease classification and diagnosis [42], medical imagining [43], smart health records [44], personalized treatment [45], epidemic control [46], or artificial intelligence surgery [47], to name a few. Among all these methods and applications, the use of deep learning algorithms for diagnosis and disease identification have become of key importance in healthcare and medical services. In this regard, artificial neural networks (ANN) employing RBF architectures have been tested to solve complex problems in classification, showing good performances in nonlinear scenarios [29,48]. In the present study, a novel RBF technique to improve accuracy using a fuzzy C-means algorithm was developed. The algorithm was directly applied to the pre-processed real EEG signals to classify two different clusters, namely schizophrenic patients, and healthy controls. Figure 2 shows the architecture of the ANN with the input layer, just one hidden layer, and the output layer, as it was mentioned in the introduction section.
The input layer corresponds to the vectors e p = [e p,1 , e p,2 , … , e p,32 ] recorded by the 32 electrodes for each p patient. The hidden layer is formed by N neurons with associated radial basis function (r) that estimates the Euclidean distance (denoted as r and showed in Eq. 1) of the input vectors with respect to the center of the s th node c s = [c s,1 , c s,2 , … , c s,32 ] of the RBF neuron (for s = 1, 2, … , N ), see Fig. 3: The RBF function (r) can be of several types depending on the patterns to be classified. The most common choices are: where s is the width of the s th node of the RBF neuron, shown in Fig. 2. Finally, the output of the network can be Poly − harmonic Spline Function (r) = r k , k = 1, 3, 5, … (r) = r k ln(r), k = 2, 4, 6, …  [41] calculated as a function of the RBF functions and the output weights w s associated to each neuron as follows:

Algorithm of the proposed neural network
The algorithm process employed for the proposed RBF neural network includes two separate steps employing a collection of training inputs e p and outputs O p (with p = 1, 2, ..., K) ∶ 1. The hidden RBF layer parameters are obtained. To this aim, a fuzzy means algorithm [31][32][33] has been employed to initialize the parameter values c s , s and calculate the network structure such as the number of hidden layers N . Generally, N is selected by a trial-and-error method or applying the k-means clustering algorithm [49,50]. However, the use of a fuzzy means systems permits to increase accuracy and is faster.
The proposed process establishes a fuzzy partition (FP) defining several triangular fuzzy arrangements which centers define a multidimensional grid for the input data. Specifically, the input variable is divided in a s triangular fuzzy sets named T 1 s , T 1 s , … , T a s s with membership functions: being t l s the central element with membership value equal to unity and d l s is half of the respective width. It is worthwhile to mention that for each input variable, the sum up of the membership quantities is the unity. The aforementioned fuzzy partition properties have been represented for an input vector e 1 in Fig. 3.
At this point, next procedure is carried out in order to find the best fuzzy subspace following the next steps: I. For a given input e p , the optimal diffuse subspace is generated resulting into two phases: i. Phase 1: The fuzzy set that gives the best membership value for e j , being j = 1, 2, …, N, is obtained ii. Phase 2: The fuzzy subspace T is generated from the optimal fuzzy sets selected in phase 1 II. The center c s and the width s values of the hidden layer based on the results calculated in step I are estimated in five phases: i. Phase 1: An auxiliar rule number B is set to 0 ii. Phase 2: Step I is executed to construct the diffuse subspace T 1 = {t 1 , 1 1 } . Then, the first run of the algorithm is carried out for input data e(1) and B is fixed to 1  Also, it is presumed that the lowest distance zd l0 r(e(k)) belongs to the fuzzy subspace T B0 = {t S0 , 1 S0 } and the following rule is checked: if true, phase 4 can be skipped. If not, phase 4 is accomplished. iv. Phase 4: B is changed to B = B + 1.
v. Phase 5: First, if k = K the algorithm makes the last calculations and ends. On the other hand, the consecutive input data is included, and the process goes back to phase 3. In this final step, the width s of the RBF activation functions are calculated. For each i node, the width was estimated using the g heuristic of the nearest neighbour: where c 1 , c 2 , … , c g are closest node centers to the hidden i node. The g value was chosen so that entering an input vector into the system activates a large number of nodes.
Then, the nodes of the grid are stablished as node centers for the hidden layers, where the distance between any pair of center positions is always equal or greater than the smallest edge in the grid. In addition, it is guaranteed that at least one hidden node is designated for any input data.
2. The weights w s are calculated by means of a linear regression based on Eq. (2). The solution of the linear regression implies N equations with N unknown weights, that can be expressed in a matrix form as follows: Moreover, a simpler solution corresponding to the exact interpolation obtained as w = −1 can be applied.
Lastly, in order to train the hidden layer of the neural network, a group of known training pairs of inputs e p and outputs O p (with p = 1, 2, ..., K) has been employed.

Validation
The validation of the model proposed was performed by means of a K-fold cross-validation process [51] to assess its predictive capability. In the present study, the input recorded dataset was divided 70% for training and 30% for testing. To avoid overtraining the cross-validation analysis was performed without sharing data across training and validation groups. Additionally, an external dataset not used during the training phase of the algorithm was used to validate the method, see Fig. 4.
A comparison among different classical machine learning algorithms i.e., Support Vector Machine (SVM), Bayesian Linear Discriminant Analysis (BLDA), Decision Tree (DT), Gaussian Naive Bayes (GNB), Random Forest (RF), K-Nearest Neighbour (KNN), Convolutional Neuronal Network (CNN), and Adaboost were also included in the study to check the advantages of the proposed model. All the methods were implemented through the machine learning Matlab toolbox [52].
As it is well known, the algorithms need to be adjusted during the training process by means of different hyperparameters like Kernel functions, number of splits, iteration limits, tolerances, number of instances, number of learners, number of neigbours, number of neurons, etc. The hyperparameters of each model were optimized with a Bayesian approach. In this regard, the Bayesian optimization generates a short sequence of simulated experiments with different combinations of the hyperparameters, keeping the values that presents the best area under the curve (AUC) and balanced accuracy. In this regard, Table 1 shows the main hyperparameters of the machine learning algorithms evaluated in the study.
For the SVM algorithm a Gaussian kernel function is employed with next parameters C = 1.0, sigma = 0.5, numerical tolerance = 0.001, and iteration limit = 100. Also, in the BLDA method a Gaussian Kernel is employed. In the case of DT, the minimum number of instances active on the leaves and internal nodes are 4 and 6, respectively. The maximum depth is 100, and the method stops when the nodes reach 95%. The GNB algorithm adjust the parameter usekernel with a value False in order to assume a Gaussian distribution, the correction factor Laplace (fL) is 0, and the parameter adjust was set to 0. The Random Forest algorithm has 20 trees where the maximum number of features and maximum tree depth are unlimited, and the nodes stop splitting reached five maximum instances. For the k-NN algorithm, the distance metric is Euclidean and uses 20 number of neighbors. Concerning the CNN network, it employs a uniform weight with learning rate and network section depth equal to 0.1 and 3, respectively. The Adaboost procedure makes use of a tree base estimator to train the model, employing 20 maximum number of splits, 0.1 learning rate and 50 number of learners. Finally, the RFB method proposed in this paper is configured using 3 fully connected layers, 60 maximum centers, and maximum number of neurons of 100 with an iteration limit set to 500.
An ablation study has been performed to observe the model performance under different parameter configurations and optimize the RBF model. The number of neurons, the number of centers, and the number of hidden layers have been considered in the analysis, see Table 2. Thus, after completing all the study cases, the best performing configuration of our proposed model can be achieved with the highest accuracy and the lowest complexity [53]. In this case, the values of 100 neurons, 50 centers, and 1 hidden layer provides the best network performance.
Furthermore, a deeper description of the RBF performance is presented in Table 3. More specifically, the mean number N of adaptation cycles required to train several RBFs with c radial basis functions and the mean percentage error produced in the training set E tr and test set E ts are presented. The number in parentheses represents the standard deviation.
As it can be observed, if the number of radial functions increases the number of adaptation cycles also increases, obtaining different mean percentage errors for the training and test datasets. These errors generally decrease if the number of radial functions is increased until the optimum value is reached. If the number of radial functions is still augmented (i.e., value of 60), the neuronal network is overtrained and the error rises as it loses precision.

Results
The results obtained for schizophrenia classification employing the proposed method and different classical algorithms are presented subsequently. The experiments were carried out on an Intel Xeon dual-core computer and 32 GB RAM. The machine learning toolbox included in MATLAB was used for preprocessing of EEG data and developing the  Table 4.
The proposed method achieved the highest metrics. Compared to RBF, the algorithms CNN, Adaboost, KNN, RF and DT had the subsequent best values of metrics. Balanced accuracy, recall, precision and F 1 score were in all cases 3% and 8% below the proposed method based on radial basis functions, respectively. SVM, BLDA, GNB, presented lower values in all the parameters evaluated. On the other hand, the same metrics, shown in Table 5, have been obtained for the external dataset provided by the hospital of Klinikum Bremen-Ost (Germany) to validate the method proposed. As it can be observed, the same trends are obtained for all methods and all figure of merits, thus confirming improved accuracy of the proposed RBF system.
Additionally, the results for the AUC, MCC, DYI and Kappa index are given in Table 6. Again, the best performance is obtained for the model presented in this study with parameter values AUC and DYI index close to 93% and MCC and Kappa close to 83%. The models, CNN, Adaboost, RF, and KNN, behaved with lower performance values and DT, SVM, BLDA, GNB exhibited considerably lower classification capability.
Furthermore, all indexes have been calculated for the validation dataset provided by the hospital of Klinikum Bremen-Ost (Germany), see Table 7, showing similar results to those obtained in Table 6. Therefore, it can be concluded that the RBF model was significantly more accurate than classical systems employed currently in machine learning classification approaches.
For clarity, Fig. 5 displays the values of metrics and indexes as radar charts, divided into three different graphs corresponding to training, test and external datasets. As it can be observed, among all algorithms analyzed the RBF model has the best shape, close to the maximum values in all cases. The rest of the methods showed lower accuracy following the next precision (from highest to lowest accuracy): CNN, Adaboost, RF, KNN, SVM, DT, BLDA, and GNB.
Additionally, the overall diagnostic accuracy of the method has been checked by means of the receiver operating characteristic (ROC) that represents the sensitivity versus (1-specificity). ROC is a curve of probability so that values close to 1 denotes a perfect predictive capability of the method analyzed, and it is closely related to the area under a given ROC curve. In this regard, the higher the AUC, the better the prediction accomplished by the model is. A 0 AUC value indicates a totally inaccurate prediction, and a 1 AUC value reflects a completely accurate test. Likewise, an AUC value of 0.5 indicates that the method is not able to discriminate schizophrenic patients and healthy controls, and the ROC curve falls on the diagonal. AUC values above  between 0.7 to 0.8 are considered acceptable, between 0.8 to 0.9 are considered excellent, and AUC values beyond 0.9 are considered outstanding. See Fig. 6 for the ROC curves obtained in the case of data presented in Table 6. As it can be expected, the RBF proposed system reaches the best prediction accuracy for schizophrenia disease. At last, Big-O notation (used in computer science to describe the performance or complexity of an algorithm as a function of the input data) has been applied to the proposed and the classic machine learning methods studied. Big-O notation precisely designates the worstcase scenario, and it can be used to show the execution time required or the space used in the computer memory or disk [54,55]. Table 8 shows the complexity in seconds for the proposed systems, where N is the number of samples used in the input vector. From the data obtained in the analysis, it can be observed that the RBF method presented in this paper has the lowest complexity (it is important to note the simplicity of the method proposed with just one hidden layer). The RBF algorithm possess a logarithmic growth O(log(N)) (as in the case of the DT, RF, and Adaboost algorithms). Contrariwise, the SVM system takes the longest processing time, of the order of O(N2), for high

Discussion
In this analysis, a novel method based on neuronal networks for schizophrenia classification has been presented. More specifically, a method employing RBF functions combined with fuzzy C-means clusters was developed. It showed a high accuracy in the diagnostic prediction, improving the classification capability compared to other  1 3 classical algorithms. In the analysis, the configuration of the RFB system was firstly determined using the machine/ deep learning toolbox of MATLAB following the expressions described in Sect. 3.1. To this aim, the method was trained by means of three hundred and twelve patients affected with schizophrenia and three hundred and twenty controls. Furthermore, an external dataset with one hundred and twenty patients with schizophrenia and one hundred and five healthy patients was also used to validate the new method proposed. Furthermore, similar models based on machine learning algorithms, namely SVM, BLDA, DT, GNB, RF, KNN, CNN, and Adaboost, generally employed in disease detection were generated. The proposed method was tested employing eight factors (balanced accuracy, recall, precision, F 1 score, AUC, MCC, DYI and Kappa index) and the test data to check the performance during the classification of diagnosticated patients with schizophrenia and healthy individuals. The best metrics were obtained for the proposed model (balanced accuracy = 93.40%, recall = 93.49%, precision = 93.30%, F 1 score = 92.73%, AUC = 93%, MCC = 83.13%, DYI = 93.40%, and Kappa = 83.02%). The other classic methods mentioned previously always showed lower values i.e., between 3%-18% points lower, for the metrics analyzed. Additionally, ROC results for the methods included in the analysis indicated that the best prediction algorithm was the proposed RBF system. Therefore, the outcomes achieved proved that the classification algorithm proposed improves the existing classical classifiers.
Finally, Table 9 shows a comparison of the proposed RBF method with other recent similar studies performed to classify schizophrenia disease through the most used machine and deep learning algorithms available in the literature. As it can be observed, the network proposed in this paper possess the highest accuracy. Encompassing the results obtained in similar recent studies (between 2018 and 2022) and applying different techniques (Decision Tree, Random Forest, Naïve Bayes, Support Vector Machine, k-Nearest Neighbor, or Gradient Boosting), the classification accuracy obtained range from 58.2% (obtained with a SVM technique) to 90.69% (obtained with a Naïve Bayes algorithm). In the rest of the cases, random forest and neuronal networks provides also good performances with values 85.1% and 84.8%, respectively, but below the results obtained in the case of the RBF network.
In summary, the enhanced classification properties of RBF algorithms combined with fuzzy C-means can provide important advantages for classification. First, RBF systems have good generalization ability, stability, robust tolerance to noise, and simplify the configuration of the network as they only have one hidden layer. This implies very fast training and rapid classification. Furthermore, the algorithm integrates a fuzzy initialization of the network that improves accuracy and capabilities, where it is not necessary to have a fixed number of cluster prior training stages.

Conclusions
In conclusion, in this study an RBF model initialized with a fuzzy C-means algorithm have been developed. This analysis proposes the combination of methods to improve classification properties to discriminate schizophrenic patients from healthy individuals. Moreover, other recent machine learning methods i.e., Support Vector Machine, Bayesian Linear Discriminant Analysis, Decision Tree, Gaussian Naive Bayes, Random Forest, K-Nearest Neighbour, Convolutional Neuronal Network, or Adaboost, have also been included in the study to compare the performance of the model presented in this paper. The results obtained show that RBF algorithms combined with fuzzy clusters provides the best classification accuracy in classifying patients affected by schizophrenia and healthy controls. In particular, balanced accuracy, recall, precision, F 1 score, AUC, DYI parameters around 93% and, MCC and Kappa indexes around 83%, were achieved. The proposed RBF classifier presented important advantages compared to other classic methods such as simplicity, good generalization ability and robust tolerance to noise. These results indicate that the application of RBF artificial neural network techniques to data acquired by means of encephalograms can potentially help during the automatic classification of patients in medical environments.