Participants
All participating patients (n = 171) met DSM-IV-TR criteria for schizophrenia spectrum disorders (SSDs) according to the Structured Clinical Interview for DSM-IV (SCID) [34, 35]. Individuals with alcohol- or drug-use disorders within the past 6 months, intellectual disability (IQ ≤ 70), current or historical neurological disorders, pregnancy, and claustrophobia were excluded from the study. HCs were required to have no previous or current psychiatric disorders, neurological disorders, or significant medical conditions. All participants were presented with a detailed description of the study design to ensure that they fully understood the procedures and gave written informed consent. The study was approved by the Ethics Committee of Jeonbuk National University. All procedures were performed in accordance with relevant guidelines.
Clinical assessment
The severity of symptoms was evaluated within a week of fMRI scanning using the positive and negative syndrome scale [36] and, the Calgary Depression Scale for Schizophrenia7. The PANSS and CDSS were administered by trained psychiatrists.
Image Acquisition And Preprocessing
Resting-state functional and structural MRI (rsfMRI and sMRI) data were obtained at the Jeonbuk National University Hospital on a 3T Verio scanner (Siemens Magnetom Verio, Erlangen, Germany) using a 12-channel standard quadrature head coil. We collected a 5-min resting-state scan consisting of 150 contiguous echo-planar imaging functional images (TR: 2000 ms; TE: 30 ms; flip angle: 90°; FOV: 240 mm; image matrix: 64 × 64 mm; voxel size = 1·0 × 1·0 × 1·0 mm [3]; 176 slices). MRI data preprocessing was conducted in a standard way using the Statistical Parametric Mapping software package, ver·12. The criteria for excessive head motion were translation > 2 mm or rotation > 2° in any direction. Participants for whom more than 10% of volumes showed excessive head motion were excluded from the analysis. The linear trend was removed through the time course, and the band-pass filter (0·008 < f < 0·09 Hz) was applied.
Functional Connectivity Analysis
Time series of the voxels within the ROI were averaged to generate the regional time series for the automated anatomical labeling (AAL) atlas. The FC matrix was computed by correlating time series data between every pair in the AAL atlas using the CONN toolbox. Group comparison was performed using ANCOVA with education as covariate. For the contrast map, we applied the cluster-level extent threshold of p < 0·01, which was corrected for multiple comparisons using the false discovery rate. Partial correlations were carried out controlling for age, sex, education, duration of illness, chlorpromazine equivalent doses and head motion (framewise displacement) on the relationship between the rsFC z-values of brain regions showing significant between- group differences and PANSS. The significance level was set at a cluster-level of p < 0·05, and data were not corrected for multiple comparison because of the exploratory nature of the evaluation.
Graph Covariance Pooling Block
The brain FC can be expressed as the complete graph \(G=\left(E, B\right),\) where \(B\) is a set of nodes reflecting 116 brain regions and \(E\) is a weighted adjacency matrix of edges. To capture graph representations of a functional connectome, we adopted graph-wise convolutional filters in the BrainNetCNN [5], which were composed of E2E, E2N, and N2G. However, the block was modified by applying two more methods, i.e., second-order pooling and the self-attention mechanism, and was named the graph covariance pooling (GCP) block (Fig. 1).
Unlike the BrainNetCNN [5] second-order pooling was inserted before the row convolutional filter in the E2N layer. To this end, the 3D output tensor \({\mathbf{x}}_{\text{e}\text{e}}{\in \mathbb{R}}^{h\times w\times {c}^{{\prime }}}\) of the E2E layer was reshaped to the two-dimensional matrix \({\mathbf{F}}_{\text{r}\text{e}}:{\mathbf{x}}_{\text{e}\text{e}}\to {\mathbf{x}}_{\mathbf{e}{\mathbf{e}}^{{\prime }}}{\in \mathbb{R}}^{h\times M}\) where the i-th row indicates the representations of i-th brain regions. Given the matrix \({\mathbf{x}}_{\text{e}{\text{e}}^{{\prime }}}\) consisting of M-samples and h-dimensional features, the sample covariance matrix of \({\mathbf{x}}_{\text{e}{\text{e}}^{{\prime }}}\) can be written as follows:
$${\mathbf{F}}_{\text{c}\text{o}\text{v}}: {\mathbf{x}}_{\text{e}{\text{e}}^{{\prime }}}\to {\mathbf{x}}_{\text{c}\text{o}\text{v}}={\mathbf{x}}_{\text{e}{\text{e}}^{{\prime }}}\mathbf{A}{\mathbf{x}}_{\text{e}{\text{e}}^{{\prime }}}^{\text{T}} , \mathbf{A}=\frac{1}{M}\left(\mathbf{I}-\frac{1}{M}{\mathbf{J}\mathbf{J}}^{\text{T}}\right) \left(5\right)$$
where \(\mathbf{I}\) is the\(M\times M\) identity matrix, \(\mathbf{J}\) represents the \(M\)-dimensional vector, which is composed of one, and \(\text{T}\) denotes the matrix transpose. We performed a row-wise group convolutional filter by considering each row as a group, \({\mathbf{F}}_{\text{r}\text{c}\text{o}\text{n}\text{v}}:{{\mathbf{x}}_{\text{c}\text{o}\text{v}}\to \mathbf{x}}_{\text{e}\text{n}}{\in \mathbb{R}}^{h\times 1}\), to maintain characteristics of the functional connectome data. Through the proposed E2N layer, the input tensor from the E2E layer was transformed into region-wise sparse representations corresponding to the number of brain regions, which can be defined as follows:
$${\mathbf{F}}_{\text{E}2\text{N}}={{\mathbf{F}}_{\text{r}\text{c}\text{o}\text{n}\text{v}}\circ \mathbf{F}}_{\text{c}\text{o}\text{v}}\circ {\mathbf{F}}_{\text{r}\text{e}} :{\mathbf{x}}_{\text{e}\text{e}}\to {\mathbf{x}}_{\text{e}\text{n}}{\in \mathbb{R}}^{h\times 1} \left(6\right)$$
The GCP block is a computational module that can build the enhanced tensor \({\tilde{\mathbf{X}}\in \mathbb{R}}^{h\times w\times c}\) from its original tensor \({\mathbf{X}\in \mathbb{R}}^{h\times w\times c}\), which can be defined as follows:
$${\mathbf{F}}_{\text{G}\text{B}\text{C}\text{P}}={\mathbf{F}}_{\text{E}\text{X}}\circ {\mathbf{F}}_{\text{S}\text{E}} :\mathbf{X} \to \tilde{\mathbf{X}} \left(3\right)$$
where\({ \mathbf{F}}_{\text{S}\text{E}}={\mathbf{F}}_{\text{E}2\text{N}}{\circ \mathbf{F}}_{\text{E}2\text{E}}\) is the squeeze function and \({ \mathbf{F}}_{\text{E}\text{X}}= {\mathbf{F}}_{\text{N}2\text{G}}\) denotes the excitation function. For the squeeze operation, an input tensor was fed to the E2E layer to encode the edge strengths over a pair of brain regions. To decrease the computational cost of second-order pooling at the following layer, we also reduced the number of channels from \(c\) to \({c}^{{\prime }}\), and the E2E layer of the proposed squeeze operation is defined as follows:
$${\mathbf{F}}_{\text{E}2\text{E}}:\mathbf{X}\to {\mathbf{x}}_{\text{e}\text{e}} {\in \mathbb{R}}^{h\times w\times {c}^{\text{'}}} \left(4\right)$$
For the excitation operation, we employed the N2G layer. This aims to summarize the responses of all brain regions into a single response. In the N2G layer, the dimensionality of input vector \({\mathbf{x}}_{\text{e}\text{n}}\) was decreased by passing the bottleneck layer with a reduced ratio. We then increased the vector from the reduced size to its original size, and activated the output vector using the sigmoid function, \({\mathbf{F}}_{\text{f}\text{c}}: {\mathbf{x}}_{\text{e}\text{n}}\to {\mathbf{x}}_{\text{n}\text{g}}{\in \mathbb{R}}^{h\times 1}\). The final enhanced tensor \(\tilde{\mathbf{X}}\) computed by the proposed excitation operation can be obtained by
\({\mathbf{F}}_{\text{N}2\text{G}}={\mathbf{F}}_{\text{r}\text{m}\text{u}\text{l}} :\mathbf{X}, { \mathbf{x}}_{\text{n}\text{g}}\to\) \(\tilde{\mathbf{X}}{\in \mathbb{R}}^{h\times w\times c} \left(7\right)\)
where\({ \mathbf{F}}_{\text{r}\text{m}\text{u}\text{l}}\) denotes a row-wise multiplication between the input tensor \(\mathbf{X}=[{\mathbf{x}}_{1},{\mathbf{x}}_{2},\dots ,{\mathbf{x}}_{h}{\in \mathbb{R}}^{w\times c}]\) and the weight vector\({ \mathbf{x}}_{\text{n}\text{g}}=[{a}_{1},{a}_{2},\dots ,{a}_{h}]\). The output tensor \(\tilde{\mathbf{X}}\) was highlighted, helping to boost representation discriminability.
Braingcpnet Architecture
The BrainGCPNet consists of a typical convolutional layer, GCP block, and fully connected classification layer (Fig. 1). For a detailed description, see the Supplementary Material.
Experiments
We conducted an ablation analysis on the proposed BrainGCPNet and quantitative performance comparisons with competing methods using the nested 10-fold cross validation strategy. To avoid possible bias caused by the random dataset partitioning, the cross-validation was repeated 10 times independently, and the average score was reported. Hyperparameters such as varying regularization factors, weight decay, and network architecture, were empirically tuned and optimized. We optimized two important hyperparameters, an initial learning rate and a weighting factor of L1 regularization, using Bayesian optimization. The performance was evaluated using accuracy, sensitivity, and specificity. Also, we plotted the receiver operating characteristic curve of BrainGCPNet and other competing methods, including SVM, fully connected neural network, CNN, squeeze and excitation network (SENet) [8], and BrainNetCNN [5]. For a detailed description, see the Supplementary Material.
Discriminative connections
To discover discriminative functional connections between the brain regions that make significant contributions to the recognition of SDDs, we used the gradient-based explanation method [37]. To obtain an explainable saliency map, after choosing a target class (SSDs or HCs), we fed validation data to the explanation method, and entire saliency maps were linearly integrated and normalized. Connectivity strength between the nodes and nodal strength (sum of edge weights attached to a node within a network) were estimated.