2.1 Simulated brain MR spectra
Dataset simulation followed the approach by Lee and Kim 14, constructing ground truth target spectra from a spectral basis set encompassing key brain metabolites. The metabolites included in the basis set were aspartate (Asp), creatine (Cr), gamma-aminobutyric acid (GABA), glutamate (Glu), glutamine (Gln), glutathione (GSH), glycine (Gly), glycerophosphocholine (GPC), glycerophosphoethanolamine (GPE), Myo-inositol (Ins), lactate (Lac), N-acetyl-aspartate (NAA), N-acetyl-aspartyl glutamate (NAAG), phosphocholine (PCh), phosphocreatine (PCr), phosphorylethanolamine (PE), scyllo-Inositol (Scy), serine (Ser), and taurine (Tau).
For each simulated spectrum, concentrations for each metabolite were randomly selected within the range of metabolite concentrations in normal adult brain according to the literature 14,16 based on their intensity in the spectrum. Augmentation was then introduced to the spectra through phase modulation, Lorentzian line broadening, and white noise. These modulated spectra served as training input for the deep learning model. Augmentation not only increased the dataset's size but also mitigated model overfitting, rendering the synthetic spectra more realistic. The white noise, with magnitude [0.05 1], aimed to achieve an SNR ranging from 1 to 3. Subsequently, after Fourier transformation, the simulated spectra were sampled to 1024 points and normalized to the range [0,1] to enhance efficiency during model training. A total of 50,000 MR spectra were simulated, with 80% (40,000) allocated for model training, and 10% (5,000) reserved for validation and testing.
2.2 Brain phantom experiments
An in-house made structural-metabolic phantom was assessed, featuring six tubes each with a 25 mm diameter, symmetrically positioned within a larger cylindrical container measuring 110 mm in diameter. These six tubes were filled with buffered solutions (pH = 7) of brain metabolites: 6 mM of NAA, 8 mM of glutamate, 1 mM of GABA, 4 mM of creatine, 5 mM of choline, 8 mM of Myo-inositol, and 4 mM of lactate. Each small tube contain slightly different concentrations and combination of the above-mentioned metabolites. To expedite T1-relaxation, the tube solutions were doped with Magnevist at a concentration of 1 ml/L. Contrarily, the larger cylindrical container was filled with an aqueous solution devoid of metabolites and remained free from Magnevist doping. MRSI data were acquired using a 3 Tesla Philips MRI system (Ingenia, Best, the Netherlands) with following acquisition parameters: PRESS sequence, TR/TE = 2000/35, matrix size of 32×32, and voxel size of 1×1×5 mm3. MR spectra were augmented with phase and intensity modulations, Lorentzian line broadening, and by adding white noise, which resulted in total 10,000 MR spectra all with 1024 complex data points.
2.3 Human subjects
Training dataset: Multi-institutional clinical data from healthy volunteers (n = 21) and glioma patients (n = 4) including MR spectroscopic data sourced openly from Hingerl et al. 17 and Archibald et al. 18, and from national/international hospital partners were used in the current study. The data spans various magnetic field strengths, including 7 and 3 Tesla, across Philips and Siemens MRI systems, and involves different acquisition sequences. Hingerl L. and collegues 17 acquried 3-dimentional free induction decay (FID)-MRSI from glioma (n = 4) and healthy subjects (n = 6) using 7 T (Magnetom, Siemens Healthcare, Erlangen, Germany, maximum gradient strength = 40 mT/m, maximum slew rate = 200 mT/m/ms) with following acquisition parameters: TR/TE = 280/1.3 milliseconds; flip angle of 30°, 2778 Hz spectral bandwidth, and one average. Archibald J. and colleagues 18 obtained MRSI data from 15 healthy subjects using a 3 Tesla Philips (Achieva scanner, Best, the Netherlands). The acquisition involved a single-channel Transmit-Receive (T/R) head coil and utilized the Point RESolved Spectroscopy (PRESS) sequence with parameters TE/TR = 22/4000 ms, 32 averages, scan time = 3:12. A total of 16 non-water suppressed spectra were acquired as a baseline, and these spectra were utilized in the current study. In total, we prepared about 20,000 augmented in vivo MR spectra for training and validation.
Test dataset: MRSI data was also acquired from 40 patients with glioma (grades 2–4, 20 females, ages 25–74 years) from a national hospital. Anatomical MRI and proton 2D MRSI were acquired using PET/MRI systems (Siemens Biograph mMR, software version Syngo MR, Erlangen, Germany). Standard MRI sequences were acquired, including pre- and post-contrast enhanced (ce) 3D T1 magnetization prepared rapid gradient echo imaging (MPRAGE), 3D FLAIR, and axial T2-weighted imaging. MRSI data parameters included: 2-dimentional PRESS sequence, TR/TE = 1700/30 ms, flip angle of 90˚, three averages, matrix size of 16 × 16, vector size of 1024, and slice thickness of 16 mm.
Ethical approval
for the study was granted by the Regional Ethics Committee (REC Central Norway, reference numbers 2016/279 and 2018/2243), aligning with the principles of the Helsinki Declaration and national guidelines. All patients signed written informed consent.
2.4 Architecture of Neural Network
In this study, two models were trained to perform parallel tasks: the transformation of time-series free induction decay signals into frequency domain MR spectral representation and the classification of glioma versus healthy data. The primary objective was to develop a model capable of transforming time domain complex series into frequency domain MR spectra, while simultaneously identifying anomalies and excluding non-usable time series data. For model training and validation, a dataset comprising 80,000 MR spectra with 1024 time points were utilized, created through a combination of simulated spectra and data from a phantom containing crucial brain metabolites. The ground truth was established using standard spectral processing techniques, including phase modulation and Fourier Transformation in MATLAB (MathWorks, Natick, MA, USA). Notably, the data were not water suppressed, implying that the water signal would be significantly larger than that of the metabolites of interest, assuming imperfect water suppressions in the clinical data.
For Model 1, we developed a CNN inspired by the AUTOMAP network 19, customized for one-dimensional data (see Fig. 1). The model comprises three fully connected layers with hyperbolic tangent activation, succeeded by two convolutional layers with ReLU activation 20, and a deconvolution layer. Training utilized mean-squared-error loss and the Adam optimizer optimizer 21, employing a batch size of 20, a learning rate of 5e− 6, and 200 epochs. Notably, the real and imaginary parts of the data were trained separately within the same model.
Next, we aimed to train a CNN model (Model 2) for distinguishing healthy from glioma voxels. Figure 2 illustrates the architecture of the CNN model inspired by DenseNet 22, tailored for one-dimensional time series data. DenseNet's distinctive densely connected structure enables the training of deep models without excessive computational demands, minimizing overfitting. The model incorporates ReLU activation 20, and to adapt it for binary classification a sigmoid activation were appended at the model's conclusion.
2.3 Training
For training the classification model (Model 2), the training dataset underwent partitioning MR spectra from either glioma or healthy tissues into distinct training (80%) and validation datasets (20%), with strict measures to prevent the model from exposure to the test dataset during training. The test dataset for evaluating the performance of the classification model was reserved from one institute (40 patients with gliomas), and their data were not used in the training phases of either Model 1 or Model 2. The training dataset encompasses 20,000 MR spectra from healthy subjects and patients with glioma 17. Each spectrum, comprising real and imaginary components, featuring 1024 time points. To distinguish between glioma and healthy tissues, each spectrum is labeled using one-hot encoding.
For training the classification model, binary cross entropy served as the loss function, and optimization was achieved using the Adam optimizer 21. A batch size of 200, a learning rate of 5e− 5, and 100 epochs were employed during the training process. The model takes in raw time domain data, featuring separate channels for the real and imaginary components of the complex spectra. The final layer of the model is a fully connected layer with 1024×2 nodes, reconstructing the complex frequency spectra. The implementation utilized TensorFlow 2.5 23 and Keras 24, and the training process occurred on a fourth-generation MacBook Pro with an Apple M1 chip, featuring an 8-core CPU, 8-core GPU, 16-core Neural Engine, and 16.0 GB RAM.
2.4 Evaluation
During each epoch, the training set underwent processing through the model to generate frequency domain spectra. Subsequently, the mean squared error (MSE) was calculated, and the model's weights were updated through backpropagation. This iterative process continued throughout the epochs, refining the model's performance over time.
$$\text{M}\text{S}\text{E} = \frac{1}{N}\sum _{i=1}^{N}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}$$
1
To evaluate Model 1, where the true frequency domains were known, the test dataset was processed, and the MSE was calculated to assess performance and monitor for overfitting. Following the last epoch, the weights were finalized, and the test dataset underwent evaluation to provide a comprehensive assessment of the final model.
In evaluating the performance of Model 2, we leveraged Receiver Operating Characteristic (ROC) curves, a widely accepted metric in medical imaging and classification tasks. The ROC curves provided a comprehensive assessment of the model's ability to discriminate between different classes, in the context of voxels containing brain tumor tissue (“Glioma”) versus healthy (non-tumor) tissue (“Healthy”) using MRSI raw data. The utilization of ROC analysis enhances the robustness of our findings, offering insights into the model's sensitivity and specificity, crucial factors in clinical decision-making and diagnostic accuracy.
Also, we utilized confusion matrices to further assess the performance of our model. The confusion matrix allowed for a detailed examination of the model's classification outcomes, providing insights into true positives, true negatives, false positives, and false negatives. This comprehensive evaluation strategy enhances the reliability of our findings, offering a nuanced understanding of the model's precision, recall, and overall classification accuracy. The combined use of ROC curves and confusion matrices strengthens the robustness of our results, contributing to a thorough characterization of the CNN's performance in the context of glioma vs. non-tumor tissue classification through in vivo MRSI data.
To investigate and visualize how our deep learning classification model (‘Model 2’) predictes tumors from healthy non-tumor tissues, we utilized Grad-CAM (Gradient-weighted Class Activation Mapping) 25. Grad-CAM helps us understand and interpret the model’s decisions by highlighting crucial areas in the input data that significantly affect the model’s output. This involves performing a backward pass to compute the gradients of the target class score concerning the final convolutional layer's feature maps. We prepared 50 consecutive MR spectra for each group of healthy and glioma tissue classes and averaging them to enhance SNR.