Prediction of Adolescents' Fluid Intelligence Scores based on Deep Learning with Reconstruction Regularization

Abstract Objective The aim of this study was to develop a predictive model for uncorrected/actual fluid intelligence scores in 9–10 year old children using magnetic resonance T1-weighted imaging. Explore the predictive performance of an autoencoder model based on reconstruction regularization for fluid intelligence in adolescents. Methods We collected actual fluid intelligence scores and T1-weighted MRIs of 11,534 adolescents who completed baseline tasks from ABCD Data Release 3.0. A total of 148 ROIs were selected and 604 features were proposed by FreeSurfer segmentation. The training and testing sets were divided in a ratio of 7:3. To predict fluid intelligence scores, we used AE, MLP and classic machine learning models, and compared their performance on the test set. In addition, we explored their performance across gender subpopulations. Moreover, we evaluated the importance of features using the SHapley Additive Explain method. Results: The proposed model achieves optimal performance on the test set for predicting actual fluid intelligence scores (PCC = 0.209 ± 0.02, MSE = 105.212 ± 2.53). Results show that autoencoders with refactoring regularization are significantly more effective than MLPs and classical machine learning models. In addition, all models performed better on female adolescents than on male adolescents. Further analysis of relevant characteristics in different populations revealed that this may be related to gender differences in underlying fluid intelligence mechanisms. Conclusions We construct a weak but stable correlation between brain structural features and raw fluid intelligence using autoencoders. Future research may need to explore ensemble regression strategies utilizing multiple machine learning algorithms on multimodal data in order to improve the predictive performance of fluid intelligence based on neuroimaging features.


Introduction
The early understanding of children's cognitive development can lead to improved health outcomes throughout adolescence.It is therefore crucial to identify the neural mechanisms underlying general intelligence.Fluid intelligence (Gf) refers to the ability to think logically and solve problems in novel situations, independent of acquired knowledge (Supekar, Swigart et al. 2013).According to general consensus, uid intelligence peaks in late adolescence and declines thereafter.Therefore, its quanti cation and accurate prediction is essential for adolescents because it forecasts their creative achievement, academic performance, employment prospects, socioeconomic status, etc. magnetic resonance imaging (MRI) images of the brain's structural and functional architecture are powerful tools for predicting uid intelligence.The ABCD dataset provides MRI images and data on a large number of adolescent participants to accurately predict uid intelligence scores.In the past, uid intelligence has been studied in order to identify the mechanisms that underlie cognitive abilities.It has been shown that brain volume is strongly correlated with intelligence, and that this effect can be large (Gignac and Bates 2017).Linking brain structure to function at the neural level and at the level of phenotypic expression is a continuing challenge in neuroscience.Despite new neuroimaging methods, such as functional magnetic resonance imaging (fMRI) and diffusion tensor tractography, it remains challenging to link fundamental structural properties with complex behavioral expressions (Cole, Yarkoni et al. 2012).Researchers have used T1-weighted MRI to correlate brain structure with autism, Alzheimer's disease, and Parkinson's disease (Wang, Wee et al. 2015, Pominova, Kuzina et al. 2019, Sämann, Iglesias et al. 2022).There are, however, few structural MRI studies investigating more subtle differences in routine brain function, despite these abnormalities often being associated with gross differences in brain organization.
Frontoparietal connectivity properties and other brain traits have been associated with uid intelligence (Pohl, Thompson et al. 2019).Recently, structural information in MRI has been found to correlate with uid intelligence (Cole, Yarkoni et al. 2012).A machine learning approach is described in reference (Kao, Zhang et al. 2019) for predicting uid intelligence from brain MRI data.In general, uid intelligence scores are predicted using existing computer-aided tools and a machine learning model trained on the extracted features (Pölsterl, Gutiérrez-Becker et al. 2019).In recent years, deep learning methods have emerged as state-of-the-art solutions to many problems across multiple domains, including natural language processing, bioinformatics, and medical imaging (Ahmad, Eckert et al. 2018, Rajkomar, Dean et al. 2019).As a deep learning model, autoencoders (Kramer 1991, Yang, Xu et al. 2021) are used to learn e cient encodings of inputs and have become effective feature extraction tools.In this study, we propose an autoencoder approach to predict uid intelligence scores of adolescents by analyzing ROI shape features.

Materials and Methods
Figure 1 illustrates our pipeline for predicting uid intelligence based on T1-weighted MRI scans.The scans were performed in accordance with the protocol of the Adolescent Brain Cognitive Development (ABCD) study.To quantify brain morphometry, by assigning 74 neuroanatomical labels (Fischl, Salat et al. 2002) to each voxel for FreeSurfer segmentation, volumetric information for tissue classes including ventricles and subcortical grey and white matter structures can be obtained.We used the mean value of each region of interest (ROI), which was de ned by the Destrieux (Desikan, Ségonne et al. 2006) atlas, divided into a total of 148 ROIs left and right, and one metric each for the left hemisphere, right hemisphere, and whole brain.We therefore extracted a total of 604 volume, area, depth and thickness measurements, all of which were produced by FreeSurfer.These brain measurements form the input to our deep autoencoders.Finally, we evaluate the feature importance of the nal model using the recently proposed SHapley Additive explanation (SHAP)(Lundberg and Lee 2017).

Dataset and participants
In this study, data were obtained from ABCD Data Release 3.0 (https://nda.nih.gov/abcd).More than 11,000 adolescents aged 9-10 from 21 research centers across the country participated in the ABCD Study, a longitudinal study of brain, behavioral, and child health in the United States (Fan, Marshall et al. 2021).Written and oral informed consent was obtained from parents and children, respectively.Further information is available on the ABCD website (https://abcdstudy.org)and elsewhere.The neuroimaging and demographic data of 11,534 adolescents were screened according to whether they completed the baseline tasks.Fluid intelligence scores recorded in the ABCD study were measured using the NIH Toolbox Neurocognition battery.

Data preprocessing
All measurements were normalized while accounting for outliers by subtracting the median and dividing by the range between the 5% and 95% percentiles.In this way, we reduce the impact of outliers while still obtaining approximately centered features with equal scale.Then, all samples were divided into a training set and a test set according to a 7:3 ratio.Lastly, the uid intelligence scores from the training data were standardized to a zero mean and unit variance; the same transformation was carried out on the validation and test data.

Model
Autoencoder consists of two components: decoder and encoder. is the space of decoded messages and is the space of encoded messages.Both and belong to Euclidean spaces, where m and n are the dimensions of the input feature vector and hidden vector, respectively.The encoder and decoder are denoted and , respectively, where, and are the parameters to be learned.For any , we note as the encoding.And for any , is noted as the decoded information.As described in Fig. 2, both the encoder and decoder are modeled as multilayer perceptron.Additionally, we established an additional regression branch for predicting uid intelligence scores.For this branch, the encoded hidden vector is used as an input, and the predicted uid intelligence score is used as an output.We denote this branch network as , where is the parameter to be learned and is the predicted value.The motivation for our research is to optimize the process of reconstructing the input features in order to obtain an effective representation of the original features, which is the effective information that has been left after eliminating redundant information.The remaining valid information was used to predict uid intelligence scores.
To prevent the model from over tting, we imposed a regularization term of the L2-norm on all parameters.It is worth noting that and share the output of the encoder.According to the theory of multi-task machine learning, through the joint learning of multiple related tasks and sharing the representation of the hidden layer, a single task can help other tasks learn better feature representations, thereby improving the performance of the model for a single task.Finally, the objective function of the whole model is: where, is the mean square loss (MSE) function.and are hyperparameters that control the degree of regularization.

Feature importance
Although deep learning models exhibit the potential to solve a wide range of prediction tasks, their blackbox nature often prohibits their application in clinical settings (

Models for comparison
Several classical machine learning methods have been employed to predict continuous variables based on a set of features, including support vector regression, random forest, xgboost, etc.In some previous studies ( machine learning models were used to predict uid intelligence scores by using arti cially extracted brain morphological features.A comparison was made between these models and ours for predicting uid intelligence scores.Multi-layer perceptron is also used as a baseline model to verify the effectiveness of using the reconstruction loss function of autoencoder as a constraint to predict the uid intelligence score.

Experiment settings
As a result of the random division of training and test sets, 8073 people were selected for the training set, and 3461 people were selected for the test set.To calculate 95% con dence intervals, 100 bootstrapped samples were used to estimate model performance on the test set.The neurons in the hidden layer of the autoencoder were pre-set to 50, 100, 200, and 400.Using grid search results, it was found that a model with 100 neurons performed best in the validation set.Therefore, we set the number of neurons in the hidden layer at 100.We initialize the learning rate to 0.1, then it decays to 0.99 of the previous step every 5 epochs.Based on pre-experiments, we found that setting , , and in the objective function to 0.1 produced the best validation set performance.Except for the decoder, the structure of the multilayer perceptron is the same as that of the autoencoder.During the training process, we adopted the approach of early stopping, and the early stopping point is the rst time the validation set's MSE increases.AE and MLP are implemented through the Pytorch platform.Classic machine learning algorithms in this research are implemented using the scikit-learn library.

Evaluation metrics
To evaluate the performance of AE and MLP, we use MSE as an evaluation metric.MSE is de ned in statistics as the mean of the squared difference between the predicted value and the true value.It can be calculated by Eq. 2.
where, N is the total number of subjects, is the real intelligence score, and is the predicted score of the prediction model.where, and represent the true and predicted values, respectively, and and represent their sample means.
In order to compare the differences in the PCC of the model between subjects of different genders, we used two independent samples t test to test the hypothesis of the difference in PCC.The calculation is shown in Eq. 4.
where, and represent the pearson correlation coe cient between the predicted value and the real value of the model in different gender groups, and represent their standard deviations, and and represent the number of people of different genders.

Basic information on the study population
The demographics of the 11,534 adolescents we studied are summarized in Table 1.There were 5504 males (47.7%) and 6030 females (52.3%) among them.The mean age of the objects interviewed was 130.47 ± 14.3 months.For all samples, the uncorrected mean uid intelligence score was 91.6 ± 10.62.The demographic characteristics of the training and test samples are balanced.Figure 3 shows the distribution of uncorrected uid intelligence scores among subjects of different genders.scores by 0.016, 0.041, 0.37, 0.04, and 0.085, respectively.The results show that our proposed method is signi cantly superior to classical machine learning models in predicting uid intelligence scores.We then tested the performance of each model on male and female adolescent populations.Figure 5 illustrates the PCC of AE and MLP, while Fig. 6 shows the PCC of classical machine learning models.
According to the results, all models performed better on female adolescents than on male adolescents.

Feature importance
To better understand which features drive predictions, we examine the feature importance of each individual model.We calculated the SHAP value of each feature for each sample in the training set separately for AE and MLP, then we took the absolute number of the SHAP value of each feature and calculated their average value in the population.Then we sorted the average value of the absolute value of SHAP of each feature from large to small.Table 3 summarizes the top 10 features and the mean absolute SHAP contained in the AE and MLP.Among the features listed in Table 3, sulcal depth of right hemisphere temporal pole, volume of left hemisphere temporal pole, sulcal depth of left hemisphere middle occipital gyrus, volume of left hemisphere middle temporal gyrus are jointly selected by AE, MLP.The AE model gave the highest priority to the sulcal depth of right hemisphere temporal pole, where an increase in depth correlated with an increase in uid intelligence scores.Figure 7 shows in more detail the top 20 features selected by the AE model for predicting uid intelligence.Figure 8 shows the relationship between the values of these 20 features and SHAP.We can see that not only the sulcal depth of right hemisphere temporal pole is positively correlated with the uid intelligence score, but the volume of left hemisphere temporal pole and volume of left hemisphere anterior transverse collateral sulcus are also positively correlated with the uid intelligence score.

Discussion
Fluid intelligence has been an established metric in psychology and education research since the early 70's (Jensen 1974).In standard Gf assessments, multiple-choice questions are administered nonverbally (Raven 1983).There have been some studies exploring how functional MRI can predict the Gf score, but no study has examined whether structural T1-weighted MRI can predict the Gf score.Several studies have utilized 3D MRI images directly to predict uid intelligence, however these methods have a higher prediction variance than classical machine learning models based on manual features.It may require a greater number of samples during the training phase if CNN is directly used to predict the uid intelligence score (Li, Jiang et al. 2022).As a new approach to uid intelligence score prediction, we propose to investigate whether individual brain anatomical properties, as revealed by T1-weighted MRI, can be used to predict uid intelligence scores.Our study evaluates the performance of two deep learning models in predicting Gf in order to investigate this hypothesis.
Our motivation for using AE to predict uid intelligence scores is its ability to compress highly dimensional features.Furthermore, because neural networks can approximate nonlinear functions, the regression branch in our proposed model can successfully learn the nonlinear relationship between the morphological features of the cortex and uid intelligence scores.In this paper, our core contribution is to use AE to reconstruct features and use the reconstruction loss as an additional constraint to assist regression branch prediction.Results from this approach appear to be successful.To avoid over tting problems in MLP, regular terms of L1 and L2 norms are often used in previous studies.Moreover, the dropout layer has also been demonstrated to be effective in preventing the problem of over tting.
Unfortunately, these methods cannot make the model learn the relationship between features.It can be seen from Fig. 4 that there is a correlation between the morphological features of the brain, and the Pearson correlation coe cient between 12.4% of the feature pairs is greater than 0.3.These correlations have a strong impact on the extrapolation of the model.As neural networks extract features adaptively, their purpose is determined by their objective function.In MLP, the model only needs to capture the relationship between features and uid intelligence scores when adaptively extracting features.But in AE, the model needs to take into account the correlation between features when adaptively extracting features.As a result of this additional constraint, the model is able to be generalized more effectively.
Our experimental results demonstrate this, where AE outperforms MLP on the test set.
Based on previous observations of sex differences in the growth and maturation of different brain structures in child and adolescent subjects (Sowell, Trauner et al. 2002, Giedd and Rapoport 2010), we paid special attention to differences in the predictive power of the model between sexes.According to the results of the t test, we found that whether in AE or in MLP, the PCC predicted by the model for female adolescents was always better than that for male adolescents (P < 0.001, P < 0.001).This result is consistent with that of previous studies (Ranjbar, Singleton et al. 2019).From Fig. 3, we can see that the distribution of uid intelligence scores between different genders is roughly the same.Therefore, we analyzed that the reason for the signi cant difference in PCC between different sexes may be the sex difference in the underlying mechanism of uid intelligence in adolescent brain development.Therefore, we further analyzed the feature importance of the AE model in different gender objects.
However, Table 4 shows that there are differences in the distribution of important features selected by the AE model in different gender groups.The three characteristics of right hemisphere temporal pole depth, left hemisphere temporal pole volume, and left hemisphere anterior transverse collateral sulcus volume are the top three across genders.When it comes to characteristics after the fourth place, however, the distribution is different between the sexes.Differences in the ranking of feature importance in developing brains may be due to sex differences in the mechanisms underlying uid intelligence.A further study is needed to assess the radiographic predictors of Gf in developing young males and females.indicating that the distribution of the predicted values is signi cantly narrower than the true values.The reason for this phenomenon in our analysis is that these characteristics can only explain part of the attribution of uid intelligence, so these predictions will be tightly clustered together.Clearly, more information is needed to better understand the relationship between structural imaging and uid intelligence.

Conclusion
We constructed a weak but stable correlation between brain structural features and raw uid intelligence using autoencoders.The results of our analysis indicate that predicting uid intelligence solely on the basis of these morphological characteristics proves to be challenging.Despite this, our results indicate that incorporating reconstruction loss into deep learning models can improve the generalization of deep learning models in predicting uid intelligence scores.In order to improve the predictive performance of uid intelligence based on neuroimaging features, future research should explore ensemble regression strategies utilizing multiple machine learning algorithms.Additionally, we discovered that the distribution of feature importance for predicting uid intelligence scores for adolescent populations differed across genders.This evidence suggests a need for further investigation of the relationship between structural imaging and uid intelligence.on the value of that feature, from low (yellow) to high (purple).In this example, the feature will appear smooth if its impact on the model's prediction changes smoothly as its value changes.
Ribeiro, Singh et al. 2016, Ahmad, Eckert et al. 2018).To alleviate the problem of black-box predictions, we applied the Shapley additive explain (SHAP) algorithm to predictive models.A Shapley value is a model-independent way of expressing the in uence of features on a particular prediction.It comes from cooperative game theory.Based on the current set of feature values, the Shapley value illustrates how an individual feature contributes to the difference between the actual and average predictions(Winter 2002, Lundberg, Nair et al. 2018).
Srivastava, Eitel et al. 2019, Tamez-Pena, Orozco et al. 2019, Vang, Cao et al. 2019), these Meanwhile, we use Pearson correlation coe cient (PCC) to measure how close the predicted value is to the actual value.It has been widely used in previous research and is considered superior to MSE for the evaluation of model performance.In statistics, PCC represents the linear correlation between two variables, whose value range is between [-1,1].The calculation is shown in Eq. 3.

Figure 3 Distribution
Figure 3

Figure 7 Top 20 Features
Figure 7

Table 1
Summary of Demographic Characteristics of the SampleAs shown in Table2, our method achieves the best performance on the test set (MSE is 105.212 ± 2.53, PPC is 0.209 ± 0.02).In comparison with MLP, LR, RF, SCR, and Xgboost, our method improves MSE

Table 2
Performance of Different Models in the Train Set and Test Set AE: Autoencoder; MLP: Multilayer Perceptron; LR: Logistic Regression; RF: Random Forest; SVR: Support Vector Regression; Xgboost: Extreme Gradient Boosting.

Table 3
The Mean Absolute SHAP of the Top 10 Features Selected by AE and MLP.

Table 4
The Mean Absolute SHAP of the Top 10 Features Selected by AE in Objects of Different Genders Furthermore, we note that the variance of the predicted values is smaller than that of the true values,