Comparison of Convolutional Neural Networks Based Method and LCmodel on the Quanti�cation of in Vivo Magnetic Resonance Spectroscopy

Quanti�cation of metabolites concentrations in institutional unit (IU) for between subject and long-term comparison is considered important strategy in the applications of magnetic resonance spectroscopy (MRS). The aim of this study is to investigate if metabolite concentrations quanti�ed by convolutional neuronal network (CNN) based method associated with a proposed scaling procedure can re�ect variations of the metabolite concentrations in institution unit (IU) at different brain regions with different signal-to-noise-ratio (SNR) and linewidth (LW). An error index based on standard error (SE) is proposed to indicate the con�dence levels on the prediction for metabolites. In vivo MRS spectra were collected at 3 brain regions from 44 subjects at 3T system. Metabolite concentrations in IU quanti�ed by LCModel and CNN from 44 subjects were compared. For in vivo spectra characterized under different spectral quality in terms of SNR and LW, line narrowing and noise free spectra were successfully exported by CNN. Concentrations of �ve metabolites quanti�ed by CNN and LCModel are in similar range with statistically signi�cant Pearson’s correlation coe�cients (0.28~0.70). SE of the metabolites show positive correlation with Cramer-Rao lower bound (CRLB) (r=0.60) and with absolute CRLB (r=0.84). In conclusion, the CNN based method with the proposed scaling procedures can be used to quantify in vivo MRS spectra. The concentrations of �ve major metabolites were reported in IU, which are in the same range as those quanti�ed using a routine MRS quanti�cation procedures by LCModel. The SE can be used as error index indicating predicted uncertainties for metabolites with the information similar to the absolute CRLB.


Introduction
Magnetic resonance spectroscopy (MRS) is a noninvasive tool to measure the biochemical information in the brain. It has been widely applied to study various neurological and psychiatric disorders [1][2][3][4] . The challenges on the quanti cation of MRS are the low signal to noise ratio (SNR), overlapping metabolic signals, contamination of macromolecules, lipids and artifacts 5-7 . Therefore, comprehensive spectral processing procedures are developed for the quanti cation of metabolites [8][9][10][11] . Even though, the development of methods to detect metabolites characterized with low concentration, complex spectral shape, overlapped spectral line is still necessary for MRS at clinical eld strength.
Another important property of spectral quanti cation is to estimate proper error index for each quanti ed metabolite. The error index is used exclude unreliable quanti ed metabolites concentrations in the statistical analysis, which is considered essential in clinical MRS data. In one of the commonly used software packages, LCModel, the error metrics are Cramer-Rao lower bound (CRLB), which indicates the uncertainty in spectral tting. The CRLB of individual metabolite, associated with SNR and LW, have been already widely used for quality assurance in the analysis of MRS data for decades. A recent study has further suggested that absolute CRLB (abs CRLB) is a better choice than CRLB for quality assurance to avoid possible bias on the statistic test especially in the clinical studies 23 . For DL method, an error index based on SNR, LW and signal to background ratio (SBR) had recently been demonstrated to successfully re ect quanti cation uncertainties as other nonlinear based model tting methods 18 . Here, an error metric estimated from for individual metabolite was proposed for the DL based method and we compared the proposed error metrics to CRLB based error index reported by LCModel.
In this study, a CNN based method was implemented to process the spectra 19 . To increase the clinical feasibility of this CNN based method, a scaling method was proposed to quantify the processed spectra in IU 24 . We investigated if metabolite concentrations quanti ed by the CNN based method can re ect variations of the metabolite concentrations among a group of subjects as those quanti ed by LCModel in IU. The performance of this CNN based method was evaluated on in vivo spectra collected from 3 brain regions for 44 subjects, characterized by different level of linewidth (LW) and SNR. As the multivariable regression was implemented in the CNN based method of this study, we proposed to use the standard error (SE) of each estimator as an error index to indicate the con dence levels on the CNN prediction for metabolites. The performance of SE of each metabolite was evaluated by correlating SE with the CRLB and abs CRLB.

Simulation and CNN prediction
The distributions of SNR and LW of the simulated spectra were shown in the gure 1. For all data (N=50000) and test data (N=5000), SNR distributed in the range of 17~192 (67.2±29.1) and LW ranged from 3.5~19.1 Hz (7.9±2.9 Hz). Representative simulated spectra with different SNR, LW and relative metabolic amplitudes were shown in the gure 2, along with CNN predicted spectra, ground truth (GT) and residue. The capability of CNN on noise reduction and LW narrowing can be clearly seen on the predicted spectra. The predicted spectra are very close to the GT in terms of LW and peak heights. The difference between the predicted spectra and GT shows observable residues at spectral range of major metabolites such as tNAA, Cre, tCho and Glu. The %MSE is 0.65% for spectra with relative low SNR and large LW (SNR=32, LW=15.2 Hz) and is as low as 0.07% for spectra with relative good spectral quality (SNR=126, LW=3.8 Hz). In 10-fold cross validation, the mean and standard deviation of MSE among 10 folds are 2.91×10 10 ±5.58×10 8 (%MSE=0.20% ± 0.17%) in the training/validation data (N=45000), and are Page 4/20 2.47×10 10 ±8.69×10 9 (%MSE=0.21% ±0.19%) in the test data (N=5000). The 1st training model of CNN was adopted for the evaluation of CNN and for in vivo spectra.
For the test sets (N=5000), the scatter plots of the predicted metabolic amounts and GT were shown in the gure 3. All metabolites show positive correlations between the predicted metabolic amounts and GT.
For all metabolites except NAAG, Gln and GABA, the predicted values spread evenly over the GT line (slope=1) with the Pearson's correlation coe cient, r, in the range of 0.92 ~0.99. For NAAG and Gln, the predicted values spread over a larger range than other metabolites associate with lower correlations (r=0.73 for Gln, r=0.81 for NAAG). For GABA, the predicted values only spread in a small metabolic range (1.3~1.7 mM) compared to GT (1.0~2.0mM), leading to low correlation (r=0. 21

In vivo results
The tting of metabolite, macromolecules (MMs) and baseline using LCModel for all subjects were shown in gure 5. Using default LCModel setting, the MMs at 0.9~3.11 ppm can be well tted and the tted MMs is similar to those in the previous reports 25,26 . For the baselines estimated by spline, we found relatively large distortion across subjects in 3.5 ~ 4.0 ppm, which account for the residue water signals and contribution of MM37, MM38 and MM40 listed in table S2. Representative in vivo spectra tted by LCModel from 3 brain regions were shown gure S1. Among three regions, DACC shows lower variations in tted lines across subjects, where DSTRI shows largest variations, especially in the tted baseline. The SNR and LW calculated based on NAA peak and by LCModel were summarized in Table 1. Among these 3 brain regions, DACC located in a relative homogeneous brain region has lowest LW, and DSTRI located in the deep brain has largest LW and lowest SNR. DLPFC with largest VOI has highest SNR among three regions. Representative in vivo spectra under different SNR and LW were shown in the gure 6 with the corresponding CNN predicted spectra. In vivo spectra exported as line narrowing, noise free spectra can be clearly seen on the CNN predicted spectra. Table 1 SNR and LW of in vivo spectra for 3 brain regions from 44 subjects. SNR NAA and LW NAA were calculated in the frequency domain based on the peak of N-acetyl Aspartate (NAA) at 2.0 ppm. Signal is de ned as the peak height of NAA without removing baseline and overlapping peaks. Noise is de ned as the standard deviation of spectra in the range of -1.0 ~ -3.0 ppm. The LW of spectra is de ned as full width half maximum of the NAA peak. SNR LCModel  coe cient between SE and abs CRLB (r=0.84) is higher than that between SE and CRLB (r=0.60). Table 2 Metabolite concentrations of tNAA, Cr, tCho, mI, Glu, Gln, Glx quanti ed using CNN and LCModel from 3 brain regions. Concentrations are shown as mean ± standard deviation (standard deviation/mean) from 44 subjects. Metabolites concentrations were reported in institutional unit (IU). The standard error (SE) of estimators were estimated in multiple linear regression. SE is less than the CNN predicted concentrations by an order of 10 −7 . CRLBs were exported by LCModel. N is the number of subjects included in the calculation of CRLBs using the criterion: CRLB>10% for tNAA, Cr, tCho; CRLB > 20% for mI, Glu, Gln, Glx. For GABA and NAAG, tting on these two metabolites using LCModel was not reliable (CRLB > 20% for GABA; CRLB>40% for NAAG) and NAAG can only be quanti ed in about half of the subjects ( Table 3). As for CNN, consistent GABA concentrations were quanti ed with the between subject variation around 10% in 3 brain regions. Low NAAG concentrations were reported by CNN (< 1 IU) with large between subject variations (>50%).

Discussion
In this study, the feasibility of a CNN based method on the quanti cation of metabolite concentrations was demonstrated on in vivo MRS data collected at 3 brain regions from 44 subjects. With the proposed scaling procedure, we have shown that this CNN based method can quantify the metabolites concentrations in IU with the values of 5 major metabolites tNAA, Cr, tCho, mI and Glu in compatible range with those quanti ed by LCModel ( Table 2). The between subject variations of these metabolite are also in compatible range with the previous studies [27][28][29] . For the CNN model used in this study, the optimized parameters such as initial learning rate, decay factor, L2 regularization etc. differ from the model used in the previous study 18, 19 , which implies that this CNN based method can be adapted for different MRS protocols and in vivo data sets after optimization of the models.
For the subject by subject comparison between CNN and LCModel, positive correlation (p<0.005) were found for these metabolites ( gure 7). The ndings on in vivo results are in agreement with the ndings in the evaluation of CNN model using the test set, where these metabolites showed good prediction compared to GT ( gure 3) and they had lower MAPEs (<5%) compared with other metabolites ( gure 4). Discrepancies were found on the quanti cation of in vivo Gln concentrations between CNN and LCModel. For the CNN, the prediction of Gln show larger errors ( Table 2) and lower correlation to GT ( gure 3) compared to other 5 major metabolites. As for LCModel, the separation of Glu and Gln on the spectra collected at 3T is a well-known di culty. Therefore, if the CNN can serve a better tool to differentiate Gln form Glu on 3T system as claimed by the previous studies 19 needs further investigation. It is necessary to have a reliable quanti cation strategy on Gln. As Glx is the combination of Glu and Gln, Glx by CNN also differs from that by LCModel. Despite the discrepancy in concentrations, Glx has similar within subject variations as Glu ( Table 2).
The metabolic tting was carried out using multivariable linear regression and the SE of each estimator is proposed as an error index to indicate the con dence level on quanti ed metabolic signal from the predicted spectra. For all subjects, SEs of the metabolites were lower than the quanti ed concentrations by an order of 10 −7 , which implies that the predicted spectra were very close to the basis in terms of spectral shape and LW. Even though, we found higher SEs on metabolites with complex spectral lines (Glu, Gln and Glx) than SEs of three singlets (NAA, Cr, tCho), which is in accordance with the distribution of CRLBs (Table 2). A further examination using the correlation shows that SE has higher correlation (r=0.84) with the abs CRLB than that with CRLB (r=0.60) ( gure 8). This indicates that SEs could potentially serve as an error index indicating the con dence levels on the CNN prediction with information close to abs CRLB. For CNN based quanti cation, it is necessary to accumulate more data to set up a quality assurance standard in terms of SE, SBR, LW and SNR in further investigation. This could be helpful for the possible applications of CNN based MRS quanti cation.
One interested issue on the CNN based MRS quanti cation is the possibility to differentiate metabolites with low concentrations from the overlapping metabolites, such as NAAG and GABA. These two metabolites are not quanti able using LCModel ( . Although good GABA prediction has been shown in a previous study 18 , it is reasonable as the results were reported at 9.4T. As GABA is a very important neurotransmitter, a further modi cation of this CNN model can thus be expected to improve the quanti cation of GABA at 3T. The possibility to quantify GABA on the spectra collected with PRESS sequence instead of the spectra editing sequence 30-32 will be a great advantage for the CNN based MRS method. The limitation of this study is on the estimation of MMs on in vivo spectra. Recently, it is suggested that experimental MM model can improve the accuracy and stability on the estimation of MMs for in vivo spectra 25,26 . However, experimental MM models were not available in our site. Therefore, default LCModel setting was adopted, where only MMs estimated in the spectral range less than 3.2 ppm can be acceptable. The MMs at 3.5 to 4.0 ppm were then handled only by the spline baseline tting. Large variations in baseline at this spectral range were reported in the previous reports 5, 25,26 , which can be also seen in our results ( gure 5 and gure S1). The uncertainty in the estimation of MMs can lead to incorrect metabolite-to-MM ratio even we have corrected the contribution of MMs according to their relative amplitude (table S2). Nevertheless, the simulated MMs actually vary in amplitude and metabolite-to-MM ratio also alters in a range in the preparation of simulated spectra. CNN model is therefore trained using the simulated spectra with a wide range of MMs. We think the CNN model used in this study can still deal with in vivo spectra even under some bias on the estimation of metabolite-to-MM ratio. Another issue is that in vivo spectra tted with default LCModel setting have signi cant in uence in the estimation of baselines and MMs, which leads to errors in the quanti ed concentrations 26 , especially for mI, Glu, Gln. Among 7 metabolites shown in gure 7, the low correlation coe cients were indeed found in mI and Gln (0.29 and 0.28). The proper model on MMs and baseline setting in LCModel tting can certainly improve the metabolite quanti cation using LCModel, which should be enrolled in the future study.
In conclusion, we have shown that CNN based method have potential to serve as spectral processing tool as LCModel on the quanti cation of in vivo MRS spectra. The proposed scaling procedure can be integrated with the CNN model to report the metabolite concentrations in IU, which are in the same range as those quanti ed using a routine MRS quanti cation procedures, LCModel. The SE of the estimator can be used as error index indicating predicted uncertainties for metabolites with the information similar to abs CRLB. The ability to isolate GABA and Gln needs further investigation with improved MM models and baseline tting in LCModel.
Firstly, simulated brain spectra (N=50000) were generated by combining 13 simulated metabolites with the concentrations evenly distributed between upper and lower bounds of metabolite concentrations listed in table S1 19,33 . These metabolite-only spectra were used as the ground truth (GT) in the training and testing of CNN. Secondly, MMs were simulated using 13 Gaussian model functions at spectral range of 0.5 ~4.5 ppm 5, 34 . The simulation parameters of MMs are listed in table S2. The MMs was generated by varying relative amplitude and LW of 13 MMs within 10% and 20% range, respectively. MMs were added to the metabolite-only spectra according to metabolite-to-MM ratio, which is determined by the in vivo results of LCModel using default setting ( gure S1). MMs from LCModel accounts for the contribution of MMs at range from 0.90 ppm to 3.11ppm 25 . The total contribution of MM was then corrected for the contribution of MMs in 3.5 ppm to 4.0 ppm. The metabolite-to-MM for all subjects in three brain regions is 2.2~4.2 (3.2±0.58). In the simulation, the MMs were added to the metabolite-only spectra by varying the metabolite-to-MM ratio evenly distributed in the range of 2.2~4.2. Finally, LW broadening was applied by varying time constant of the exponential function such that the LW of the spectra was in the range of 3.5 ~19.1 Hz (7.9±2.9 Hz). Random noise was added to the spectra such that SNR of the spectra was in the range of 17~ 192 (67.2±29.1). The LW of spectra was calculated as full width half maximum of the NAA peak. SNR was calculated in the frequency domain. Signal is de ned as the peak height of NAA without removing baseline and overlapping peaks. Noise is de ned as the standard deviation of spectra in the range of -1.0 ~ -3.0 ppm. The simulated spectra were cropped at spectra range 0.5 ~ 4.2 ppm for the input of CNN. As the simulated spectrum is 2500 Hz (~19.57 ppm), the cropped simulated spectra corresponds to 774 spectra points (4096´3.7/19.57).

Design and evaluation of CNN
The CNN model was constructed in python (v3.6); Keras (v2.2.4) with Tensor ow (v1.13.1, CUDA 10.1) on graphic processing unit (GPU; NVIDIA Titan RTX). In brief, CNN consists of one input layer, three convolution blocks, two max pooling layer, one fully connected layer with linear activation. Each convolution block was composed of 4 repetitions of convolution lter, batch normalization and activation layer 18, 19 . The size of convolution lter is 15x1 with pad size of 7. The number of convolution lter is 32 for 1 st block, 64 for 2 nd block and 128 for 3 rd block. The recti ed linear unit (ReLU) function was used in the convolution blocks as activation. The parameters for CNN structure and training were initially set as the implementation in other studies 19 . We optimized these parameters based on the loss of training and validation. For max pooling, the size and stride is 2x1 and 2. Stochastic gradient decent with momentum (SGDM) algorithm was used for the training with initial learning rate of 0.01 and momentum of 0.8. The loss function was mean square error (MSE) with L2 regularization parameter of 0.001. The schematic of the CNN is shown in gure S2. The batch size is 32 and maximum epoch is 110. The learning rate is reduced on plateau by a decay factor of 0.8 with a minimum of 10 -7 when the validation loss did not improve through 5 epochs. The training time is around 70 minutes.
The evaluation of CNN was performed using 10-fold cross validation. Simulated brain spectra (N=50000) were randomly split into 10 groups. Each group was assigned as a test set (N=5000) and remaining spectra (N=45000) were assigned as validation set (N=4500) and training set (N=40500). The dimension of the input data is 40500 × 774 × 1 (array × height × width) for the training set. The CNN was trained using training set and validation set. Errors between the GT and CNN predicted spectra on the test set were calculated. The performance of CNN model was evaluated by MSE between the predicted spectra and GT, and %MSE is the MSE normalized by mean square values of GT spectra. In addition, the accuracy on the estimation of the metabolic amounts was evaluated using mean absolute error (MAE) and mean absolute percentage error (MAPE) between CNN predicted metabolic amounts and GT.

In vivo Experiments
All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Before being included in this study, all participants gave their informed consent to undergo a protocol, which was approved by the Research suppressed (NWS) reference scan was acquired using 4 averages as the reference for phase correction, calibration of metabolic signal. The total acquisition time for each subject is around 50 minutes.
Scaling the spectra for CNN model For in vivo data, zero order phase correction was applied. All spectra were resampled to 2500 Hz using linear interpolation in the spectral domain, and then zero-padding to 4096 points in the time domain. The water scaling was performed before the input to CNN. The scaling factor (Fac) for the simulated spectra and in vivo spectra is calculated according to the known concentration of the simulated spectrum of NAA Where S NAA is the area of NAA peak at 2.0 ppm in the basis; Conc water is the concentration of water, which is set to 55556 mM; Conc NAA is the NAA concentration of the basis, which is 50 mM; NH water and NH NAA is the number of protons of water and NAA, which is 2 and 3; S water-invivo is water area of in vivo spectra, which can be calculated from the NWS spectra. In vivo spectra were multiplied by the Fac, and cropped at spectra range 0.5-4.2 ppm (774 spectral points) for the input of CNN. The diagram of water scaling procedures and CNN for in vivo data is illustrated in gure S3.

Quanti cation of metabolite concentrations
The CNN predicted spectra can be taken as linear combination of the spectra in the basis without line broadening, noise and baseline. The amounts of 13 metabolites were solved using multiple linear regression: Where Y is a 774×1 matrix containing CNN predicted spectra; X is a 13×774 matrix containing the spectra of 13 metabolites in the basis; β is a 13×1 matrix representing relative amounts of 13 metabolites. These estimated metabolite amounts, β, were scaled to metabolite concentrations according to known metabolite concentrations in the basis and the standard error (SE) of the estimated β was exported. In vivo spectra were quanti ed by LCModel (http://s-provencher.com/lcmodel.shtml) using the same basis as CNN. Spectra were tted between 0.5 ~ 4.2 ppm. Metabolic signals were calibrated to NWS data using the water scaling method. Partial volume and relaxation correction was applied for quanti ed metabolites from both CNN and LCModel as described in the previous study 1,31 . In brief, tissue probability maps of gray matter, white matter and cerebral spinal uids were generated using the segmentation toolbox provided by SPM8 (www. l.ion.ucl.ac.uk/spm) on T1 images. Corrections for water density in tissues and for relaxation effects were carried out using the published parameters 22 . The metabolite concentrations were reported in IU. Here, the comparison between CNN and LCModel was made on major metabolites typically reported on the spectra collected at clinical eld strength. These metabolites are: tNAA=NAA+NAAG; Cr; tCho= GPC+PCh; mI; Glu; Gln; Glx= Glu+Gln. The metabolites were excluded for the comparison using the Cramer-Rao lower bounds (CRLB) with the following criterion: CRLB>10% for tNAA, Cr, tCho; CRLB > 20% for mI, Glu, Gln, Glx. To further examine the potential implications of SE, SE was correlated with the CRLB and abs CRLB 23,37 , which is calculated as the CRLBs multiplied by the quanti ed metabolite concentrations.
Declarations Figure 1 Histogram of SNR and LW of the simulated spectra for CNN training. The distribution of SNR and LW are in the range of 17~192 (67.2±29.1) and 3.5~19.1 Hz (7.9±2.9 Hz) for all data (N=50000). The SNR and LW of data used for test (N=5000) has similar distribution as all data. Representative spectra used for the training of CNN model. Simulated spectra (simulated) with different SNR, LW and relative metabolic amplitudes, CNN predicted spectra (predicted), Ground truth (GT), difference of CNN predicted spectra and GT were shown. The predicted spectra are all noise free, line narrowing and very close to the GT with observable residues at spectral range of major metabolites. The largest %MSE (0.65%) is in spectra with low SNR and large LW (SNR=32, LW=15.2 Hz) and lowest %MSE (0.16%) is in spectra with good spectral quality (SNR=126, LW=7.9 Hz).

Figure 3
Scatter plots of the predicted metabolite concentrations (pred.) and GT for test data in CNN evaluation.
For each plot, GT line (solid red) and regression line (solid black) with one standard deviation (dash black) were plotted. Pearson's correlation coe cient (r) were calculated for each metabolite. All metabolites have positive correlations. The lowest correlation is found in GABA (r=0.21). Other metabolites have correlations >0.9 except NAAG (r=0.81) and Gln (r=0.73).

Figure 4
Mean absolute percent error (MAPE) of all metabolites in the basis. MAPEs were calculated by the errors between the predicted metabolic amounts and ground truth, where multiple linear regression was used to estimate the predicted metabolite amounts. The MAPEs and the error bars representing the standard deviation were estimated over the simulated brain spectra in the test set (N = 5000).
The tted spline baselines were relatively distorted, with large deviations across volunteers in the range from 3.5 ppm to 4.0 ppm, which is responsible for the tting of residue water and contribution of MM37, MM38 and MM40 in table S2.

Figure 6
Representative in vivo spectra (black) and CNN predicted spectra (red) from (a) DACC, (b) DLPFC, (c) DSTRI. For each region, spectra under different spectral quality in term of SNR and LW were shown. The properties of noise reduction, LW narrowing, baseline removal can be clearly seen. The CNN predicted spectra have LW similar to the spectra in the basis.

Figure 7
Scatter plots of the metabolite concentrations quanti ed by CNN and by LCModel. For each plot, metabolite concentrations in IU were excluded by CRLB thresholds using the criterion: CRLB>10% for tNAA, Cr, tCho; CRLB > 20% for mI, Glu, Gln, Glx. For Gln, 4 subjects were excluded in DSTRI and 2 subjects were excluded in DLPFC. For Glu, 1 subject was excluded in DLPFC. For each plot, regression line (dash gray) was plotted and Pearson's correlation coe cient (r) was calculated. All metabolites have statistically signi cant correlation (p<0.005). Among 7 metabolites, lowest correlation is found in Gln (r=0.21). The concentrations of tNAA, Cr, tCho, mI, Glu from CNN and from LCModel are distributed over similar range.