**Network definition. **To test whether rs-FC can predict an individual’s magnitude of self-prioritization, we asked 348 participants to complete the perceptual matching task (see Fig. 1a) outside the scanner. Briefly, the participants were instructed to associate themselves and a stranger with different shapes (e.g., “a circle represents you, and a square represents a stranger”) and then continuously judge whether the label-shape combination displayed on the screen was correct. Participants’ tendency for self-prioritization was quantified as the difference in reaction time between the stranger- and self-matching conditions 32.

Fig. 1b depicts the processing pipeline. The whole-brain functional network nodes were defined using the Human Brainnetome Atlas 33, which consists of 210 cortical and 36 subcortical regions of interest (ROIs). This fine-grained atlas reliably integrates information from anatomical and functional connections and has been used in many neuroimaging studies to predict individual differences 34. For each participant, time series within each node were computed by averaging the blood oxygenation level-dependent (BOLD) signal over all voxels in the ROIs. We then calculated the Pearson correlation for each pair of the 246 resulting time series to obtain a 246 × 246 symmetric FC matrix for each participant. Each element in the matrix characterizes the FC strength between two nodes.

However, when combining neuroimaging and behavioral data in machine learning algorithms, an inescapable challenge is the potential curse of dimensionality caused by few samples but high dimensionality 35. The numerous unique features in the FC matrix entail a feature-selection step before model building 25, 36. To identify potential predictive features, we performed a Pearson correlation between each feature (i.e., edge) and the SPE score (based on the training set only; see Methods for details). A commonly used threshold (*p* < 0.01) was then applied to remove noisy edges and retain those that were significantly correlated 25. As a result, the number of selected features varied from 317 to 383 in multiple iterations, representing less than 2% of the brain’s 30,381 total edges as defined by the atlas.

**Predictability of SPE. **The selected features were subsequently fed into a linear SVR model. Through rigorous LOOCV, we obtained the predicted SPE value for each participant. Notably, as suggested by previous studies 37, we also employed an inner loop of 5-fold cross-validation to tune the hyperparameters of the linear SVR and avoid data leakage. Overall, in the unseen data (i.e., not used to train the model), the true value of the SPE was significantly correlated with the predicted value (*r* = 0.41, *p* < 0.001; Fig. 1c), indicating that our linear SVR model successfully predicted the SPE of individuals.

Notably, a common pitfall in machine learning is that each cross-validation fold does not satisfy independence, and the best practice for assessing the significance of model performance is permutation testing 35. Therefore, we randomly shuffled the phenotype (i.e., SPE score) and reran the prediction pipeline 1000 times to obtain a null distribution of the model performance. The *p*-value was calculated as (1 + the number of r-values in the null distribution that are greater than or equal to the r-values resulting from the unshuffled data)/1001. The permutation test results confirmed the significance of the predictive model (permutation *p* < 0.001).

**Model robustness and specificity. **Previous studies have suggested that the predictive power of models may be affected by confounding factors 35. In the worst case, the model may predict not the desired phenotype but confounders such as demographic variables or head movement. To verify the robustness of the identified network to these factors, we performed several control analyses. First, instead of using Pearson’s correlation, we selected the most relevant features based on partial correlation coefficients that controlled for age, gender, or head movement (defined as mean framewise displacement). The resulting networks still significantly predicted the participants’ SPE scores (*r*s > 0.25, permutation *p*s < 0.001). Second, before feature selection, we removed any edges associated with age or head movement or that differed between genders (*p* < 0.01). The streamlined networks still significantly predicted the SPE (*r*s > 0.31, permutation *p*s < 0.001). Third, we reconstructed a model using only these covariates as features and compared its performance with that of the main model. The results revealed that the covariate-only model could not predict participants’ SPE scores (*r* = -0.16, permutation *p* = 0.99), and Steiger’s Z test showed that its performance was significantly worse than that of the main model (*z* = 7.78, *p* < 0.001).

Another consideration for robustness is the choice of the algorithm. Although SVR is one of the most widely used algorithms 38, we replaced it with several other common linear models to further boost robustness. Note that feature selection and cross-validation (CV) strategies were the same. The results revealed that the predictions of the least absolute shrinkage and selection operator (LASSO) and ridge regression remained significant after replacing the linear SVR (*r *= 0.21 and 0.23, respectively, permutation *p*s < 0.001), whereas ordinary least squares (OLS) regression failed to predict individuals’ SPE scores (*r* = 0.08, permutation *p* = 0.07). The unsatisfactory performance of OLS can be attributed to overfitting caused by a lack of regularization 39.

Finally, to examine the specificity of the prediction model, we used the identified consensus network to predict participants’ self-construal 40, which is associated with SPE 41. The results demonstrated that although the consensus network could successfully predict SPE scores, it could not significantly predict participants’ independent self (*r* = -0.16, permutation *p* > 0.99) or interdependent self (*r* = 0.04, permutation *p* = 0.14). These results demonstrate that the consensus network is specific to the SPE, which is a behavior-level self-bias and cannot be generalized to higher-level self-reflections such as self-construals. Interestingly, a previous study demonstrated that self-interdependency has a stronger impact on individuals’ self-prioritization than self-independence 41. This is consistent with our results, showing that the interdependent self predicts slightly better than the independent self (z = 2.70, p < 0.01).

Overall, we demonstrated that the identified networks were SPE-specific and that the trained model was robust. The predictions were not confounded by covariates such as head movements and could be achieved with different algorithms.

**Contributing networks to prediction. **As described above, each iteration of LOOCV may select slightly different features; therefore, we first selected those edges that appeared in all iterations to construct a consensus network (Fig. 1d) for interpretation 25. Consistent with previous prediction studies 23, 36, the highest-degree nodes (i.e., nodes with the most connections) in the consensus network were widely distributed across the brain, including the frontal, occipital, parietal, and temporal lobes, as well as the subcortical regions (see Table S1). In particular, most of these nodes were located in the right hemisphere. To better characterize the neural substrates of self-prioritization prediction, we took full advantage of the weights assigned to each feature in the SVR model. As recommended by Haufe, *et al.* 42, we first transformed the raw coefficients into interpretable activation patterns (see Methods for details). Subsequently, we summarized the connectivity patterns using these two methods.

First, we grouped the 246 nodes into 24 macroscale brain regions anatomically defined by the Brainnetome Atlas 33. The contribution of connectivity in each pair of macroscale regions was characterized as the sum of the activation patterns of all edges within it 43. As shown in Fig. 2, the FCs between the thalamus and the parahippocampal gyrus (PhG), lateral occipital cortex and superior temporal gyrus (STG), thalamus and insular gyrus, superior frontal gyrus (SFG) and inferior frontal gyrus (IFG) were the primary predictors of stronger self-prioritization. Meanwhile, the FCs between the PhG and middle temporal gyrus, PhG and middle frontal gyrus, and within the IFG predicted weaker self-prioritization.

Second, we regrouped the nodes into canonical networks based on the Yeo 17-network parcellation 44. The mapping relationship between nodes and networks was defined by the atlas development team 45, 46. For clarity, we merged related sub-networks in the 17 networks 47 and referred to the undefined subcortical regions as subcortical networks. For the resulting nine canonical networks, we evaluated the contribution of within- or between- network connectivity analogously to the analyses of macroscale brain regions (Fig. 3a). The results demonstrated the predictive power of connectivity between the dorsal attention network (DAN) and somatomotor network (SMN), the DMN and salience network (SN), and the DMN and DAN.

To assess the role of each canonical network individually, we performed further computational lesion analysis (see Fig. 3b). A lesion in a canonical network means that the signals of all its constituent nodes are erased, and the pipeline is then reran based on the resulting FC matrix 25. The results demonstrated that the model could still significantly predict individuals’ SPE scores without any single canonical network, which corresponds to findings in other fields, confirming that machine learning models tend to utilize whole-brain signals 48. Nevertheless, Steiger’s Z test still found several canonical networks that caused significant degradation in model performance after computational lesions, including the subcortical network (*z* = 5.74, *p* < 0.001), limbic network (*z* = 3.30, *p* < 0.001), SMN (*z* = 4.15, *p* < 0.001), DMN (*z* = 2.85, *p* < 0.01), DAN (*z* = 2.2, *p* = 0.03), and SN (*z* = 2.35, *p* = 0.02).