Classification of vulnerability levels using multivariate biomarkers in schizophrenia: a machine-learning approach

Background Schizophrenia is a heterogeneous neurodevelopmental disease involving cognitive and motor impairments. Motor dysfunctions, such as eye movements or neurological soft signs (NSS), are proposed as endophenotypic markers. Methods Machine-learning method applied on oculomotor performances using comprehensive testing with prosaccades, antisaccades, memory-guided saccade tasks and smooth pursuit, as well as NSS assessment, was used to discriminate patients with schizophrenia (SZ), full siblings of patients (FS) and healthy volunteers (C). Results The most reliable classification was between C and SZ, with only 15% and 12% of error rates for validation and test, whereas the SZ vs . FS classification provided the highest error rates (32% of error rate in both validation and test). Interestingly, NSS were selected as the best predictor, together with a combination of measures, for the two classifications: C vs . SZ, SZ vs . FS. In addition, memory-guided saccades were consistently selected among the best two multimodal features for the classifications involving the control group (C vs. SZ or FS). Conclusions Taken together, these results emphasize the importance of neurological soft signs and sensitive oculomotor parameters, especially memory-guided saccades. This classification provides promising avenues for improving early detection of / early intervention in psychosis.

Recently, our group (Caldani et al., 2017a) explored Memory Guided Saccades (MGS) and antisaccades (AS) in SZ, FS and controls (C) and we found a higher error rate in MGS in SZ and FS compared to C, while the error rate in AS was significantly higher only in SZ compared to C. Based on these findings, we suggested that MGS could be more accurate than AS to detect deficient inhibitory processes. Concerning prosaccade tasks (PS), normal performance was reported in SZ (Gooding et al., 2008), while a deficit in accuracy was found in unmedicated SZ (Crawford et al., 1995); a reduced latency was also observed in naïve patients (Krebs et al., 2010).
Regarding SPEM, poor gain as well as elevated rate of catch-up saccades and intrusive saccades were reported in SZ with a large effect size (O'Driscoll et al., 2008;Franco et al., 2014). Finally, studies using fixation tasks found controversial results, with preserved performance in SZ in some studies (Gooding et al., 2000;Hutton et al., 2002) , and alteration in others (Curtis et al., 2001;Raemaekers et al., 2002). Recently, our group found more intrusive saccades in SPEM and fixation tasks with distractors in SZ compared to C and FS, and more abnormalities in SZ with high NSS compared to SZ with low NSS (Caldani et al., 2017b). Using multivariate modelling techniques with eye-movement tests, Benson et al. (2012) found that patients with SZ differed from C in SPEM, fixation stability and free viewing tasks, with free viewing being the best single discriminator.
To our knowledge, no study explored oculomotor markers using machine-learning approach with the aim to discriminate different profiles within the schizophrenia spectrum. The SVM modelling strategy has several advantages over computational methods: it can be easily applied to multi-modal data analysis, and it is not constrained by a priori assumptions or abstractions on the data. Instead, machine-learning is about the construction and study of systems that can learn from data (Dreyfus, 2005). Machine-learning models such as neural networks or SVM can be used to design data-driven models. The model is built using the input vectors (in the present case, multimodal features extracted from eye pursuit or saccades), and matching these vectors to expected output vectors (in our case detection of the group the subject belongs to). Once the model has been built, it is then confronted to a new independent data set to estimate its validity.
By assessing different eye movement paradigms such as MGS, AS, SPEM, Prosaccade and Fixation tasks and adding NSS characterization, the purpose of the present study was to integrate the data in a modelling strategy by means of these classifying algorithms, in three groups of subjects belonging to the schizophrenia spectrum, namely SZ, FS and healthy controls. Our hypothesis is that this kind of multivariate analysis will elicit certain biomarkers of eye movements with a good discrimination power, and that NSS will be a major discriminator, in line with our previous studies (Caldani et al., 2017a,b;Caldani, 2017). Studies (Nurnberger et al., 1994) was used to ascertain diagnosis of SZ, and exclude any DSM4 axis 1 diagnoses for siblings and controls. Patients with SZ were mainly under atypical antipsychotics, and had been stable for more than three months. For details see Table 1.

Subjects
All participants were examined using Neurological Soft Signs Examination (Krebs et al., 2000), which encompasses five dimensions of NSS (23 items) as well as assessment of extra-pyramidal symptoms (Simpson and Angus, 1970), abnormal involuntary movement scale (Guy, 1976) and lateralization (adapted from Edinburgh Inventory). Lastly, patients have been clinically assessed by BPRS (Brief Psychiatric Rating Scale ).
For all participants in the present study the exclusion criteria were the following: history of neurological/cerebral/ophthalmological disorder, history of substance dependence during the last year and/or for a period of more than five years and recent cannabis abuse, intellectual deficiency, Simpson-Angus score > 3, abnormal involuntary movement scale score > 3 ( Le Seac'h et al., 2012).
The principles of the Declaration of Helsinki were followed and the protocol was approved by our Institutional Human Experimentation Committee (N°2010-A00149-30). All subjects gave their written informed consent before their participation and received 50 euros.

Oculomotor paradigms
Stimuli were presented on a 22-inch PC screen. The stimulus was a white filled square subtending a visual angle of 0.5 deg. Eye movements were recorded using the Mobile EBT Tracker (SuriCog), a CEmarked medical eye-tracking device. Recording frequency was set up to 300 Hz. The precision of this system was 0.25°. Four paradigms were used: prosaccades, antisaccades (AS), memory-guided saccades (MGS) and SPEM (for details see Caldani et al. 2017a,b).

Procedures
Calibration factors for each eye were determined from the eye positions during the calibration procedure (Bucci and Seassau, 2013). For prosaccades we calculated the gain (ratio eye amplitude to target amplitude) as well as the number of anticipation (latency <80) and of express saccades (latency 80<x<130ms). For AS we also calculated the error, rate, that is the number of saccades directed toward the stimulus. For MGS, we calculated the error rate (the number of erroneous saccades made before the extinction of the fixation point). The latency of erroneous saccades was classified according to four different time windows corresponding to the precise time the subject initiated the saccade (1st: 80-320 ms; 2d: 350-500 ms; 3rd: 500-750 ms; 4th: > 750 ms). Each window was established in reference to the histograms of these erroneous MGS latencies, and for each (classification) window these classes were compared to a two-sample Kolmogorov-Smirnov test.
A low p-value indicates that two histograms effectively correspond to different distributions.
For SPEM, we measured the number of saccades with amplitude ≥ 2° and the corresponding gain (which corresponds to the ratio between eye velocity and the target velocity).

Data analysis
(see Data analysis in the Supplementary Files)

Results
Results obtained with 1, 2 and 3-feature classifications systematically led to a better model when three features were used, with lower validation errors in this condition. We afterwards estimated the generalization error of this model on the independent test subset. Low test error rates were obtained in the [12-32] % range ( Table 2). The C vs. SZ classification was the most reliable, with 15% validation and 12% test error rates. The SZ vs. FS classification provided the highest error rates, with 32% of errors in both validation and test (note that a random classification in these two-class classifications would yield a 50% error). Neurological soft signs were selected as the best predictor, together with a combination of measures, for two classifications: C vs. SZ and SZ vs. FS. In addition, Memory-guided saccades (4 th window) were consistently selected among the best two multimodal features for all classifications involving the control group (C vs. SZ and C vs. FS).
We computed the three-class classification resulting from the combination of these two-class classifiers (see Table 3). The detection rate is the probability for a given sample belonging to Class c to be effectively detected as a member of Class c. In two-class classifications, the detection rate of each class corresponds to sensitivity and specificity.
The predictive value (P.V.) is the probability, when a sample is detected as a member of Class c, for this sample to indeed belong to Class c. In two-class classifications, the predictive power of each class corresponds to negative predictive value and positive predictive value.
Group C has the highest detection rate (74%). The SZ group had the highest predictive value (75%).
Overall, the classification was reliable for all groups C, SZ and FS with both detection rates and predictive values above 50%.

Discussion
In this study, our aim was to develop a model of machine-learning approaches by using automatic learning method in order to discriminate three groups of subjects. Neurological soft signs were selected as the best predictor, together with a combination of measures, for two classifications: C vs.
SZ, SZ vs. FS. Finally, memory-guided saccades were consistently selected within the best two multimodal features for all classifications involving the control group (C vs. SZ, C vs FS).
The success of machine-learning approaches relies on two key factors. Firstly, weak variables cannot be used to design an efficient model, as machine-learning approaches only extract the information present in the data. The model performances are therefore bounded by the statistical power of the input variables to predict the expected outcome. Secondly, uncontrolled increase of model complexity can lead to severely biased performances, when the model is overfitting / overfits the data (thereby learning non-reproducible features). As numerous indicators have to be compared, there is indeed a high risk to develop a biased model -this is the well-known "curse of dimensionality": the more complex the data are, the fewer rules can be extracted using either data mining or classifiers. Solving this issue was done through a reduction of the input feature space, using supervised feature selection.
Low test error rates were obtained for all group comparisons, confirming the interest of neurological soft signs and saccadic measurements for the diagnostic of schizophrenia. NSS emerged as the best predictor for the classifications. MGS was consistently selected, with error latency occurring in the 4 th window (> 750 ms), selected for almost all classifications involving the control group. NSS appeared as very sensitive markers of vulnerability for schizophrenia, reflecting a deviance in the maturation of neurodevelopmental brain structures throughout foetal life, infancy and adolescence. This was shown at different stages of the disease (Krebs et Mouchet, 2007;Chan et al., 2017;Caldani et al., 2017a), in siblings of patients (Gourion et al., 2004), or in first episode subjects (Chan et al., 2017). In addition, NSS was also shown to be linked to ocular movement anomalies (Picard et al., 2009. MGS have been proposed as a useful endophenotyp (Calkins et al., 2008) with error rates showing a good sensitivity in patients and relatives (Landgraf et al., 2008;Caldani et al., 2017a). In our findings, errors occurring early with anticipatory saccades in AS combined to NSS were good classifiers for SZ versus FS, but less powerful than MGS, in line with our previous analysis (Caldani et al., 2017a).
Anticipatory or express saccades in prosaccades in combination with other features were also good classifiers for C vs. SZ. These oculomotor features suggest a defect in inhibition control, underlying the implication of crucial cerebral structures such as dorsolateral prefrontal cortex and frontal eye field (Leigh and Zee, 2015).
The reader should bear in mind that the features used for these classifications contain measurement noises, which cannot be reduced to zero, therefore higher classification rates would not be possible without flaws in the model design. In three-class classifications, all classes are detected with high accuracies (random classification accuracy in a three-class classifier would be 25%). In two-class classifications, the highest error rates are obtained for SZ vs. FS (32% error) and C vs. FS (27% error), which is satisfying (random classification error in a three-class classifier would be 50%). The C vs. FS classification task is a comparison of control subjects with asymptomatic subjects with genetic risk factors. In this condition, a low error rate would be surprising, and indeed the confusion matrix illustrates that this classification error is mainly due to FS samples detected as controls. As can be observed on the predicted FS column in Table 3, 24% of SZ are detected as FS by the model. The FS class identified by our model shares some traits in common with the SZ. On the other hand, 28% of FS are detected as control subjects (C column of the table). FS is a heterogeneous group, with oculomotor performances that could be tightly linked to the performances of their related probands (Mazhari et al., 2011;Curtis et al., 2001) via shared genetic background, but also depending on the personal history, psychopathology or personality profiles of the subjects (Morgan et al., 2015).
Multi-modal biomarkers consistently outperformed monovariate classifications. This is a well-known phenomenon in machine-learning: two apparently weak predictors can become very effective when combined (Dreyfus, 2005).

Limitations
Our sample is rather small. Our patients with schizophrenia were nearly all under treatment, and antipsychotic medications could influence the results. Further studies are warranted to confirm and replicate our results.

Conclusion
The use of different oculomotor paradigms as well as discrete neurological and clinical parameters combined in a three-class classification model has revealed the existence of different profiles, with good predictive values according to the different stages of the schizophrenia spectrum. Many of these oculomotor features support a global inhibitory control defect in the schizophrenia spectrum. The machine-learning methodology allowed us to emphasize the importance of taking into account both Neurological Soft Signs and oculomotor parameters especially MGS, in the study for early detection strategy. This study brought converging evidence that biomarkers sensitive to neurodevelopmental abnormalities could constitute useful endophenotypes in schizophrenia. The principles of the Declaration of Helsinki were followed and the protocol was approved by our Institutional Human Experimentation Committee (N°2010-A00149-30). All subjects gave their written informed consent before their participation and received 50 euros. When the subject was a minor, consent was given to the parent', the legal guardian of patient.

Consent to publish
Written informed consent for publication of their clinical details was obtained from the patient/parent/guardian/ relative of the patient.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors have no financial and personal relationships with other people or organizations that could inappropriately influence or bias their work in this study.

Authors' Contributors
All authors participated in critically revising all sections of the manuscript, and have approved the final version. In addition, RG, MOK, IA, conceptualized and designed the study; IA, MPB and FBV directed the study implementation, including designing literature search processes and data analyses, drafting the manuscript; MOK, AI, MPB and FBV completed the literature search and study selection; NB and CMLF contributed to patient recruitment and FBV and AB to analysis tools.