Patients’ characteristics
A total of 1,161 early-stage invasive breast cancer patients were retrospective recruited, of which 1,088 patients were eligible for this study. The study design was shown in Supplementary Fig. 1 and the study workflow was shown in Fig. 1. The Table 1 showed the clinicopathological characteristics of patients in the training cohort (n=803), the prospective-retrospective validation cohort (n=106), and the external validation cohort (n=179). 389 (35.75%) of 1,088 patients were initially diagnosed as positive ALN status by radiologists though MRI, but 106 (27.25%) of 389 patients didn’t have ALN metastasis (ALNM) and were confirmed as negative ALN status with pathologically examination. Rather, 190 (28.27%) of 672 patients with clinical negative ALN status were found to have pathological positive ALN status.
Distinguishing ALN metastasis by tumor and lymph node radiomic signature
The key radiomic features of ALN and tumor regions were selected from a total of 5,178 quantitative features by the random forest algorithm. Overall distribution of key radiomic features from contrast-enhanced T1-weighted imaging (T1+C), T2-weighted imaging (T2WI), and diffusion-weighted imaging quantitatively measured apparent diffusion coefficients (DWI-ADC) sequences among patients with and without ALNM in the training cohort was demonstrated in Fig. 2A. Incorporating these key features of ALN region to predict ALNM yielded AUC values of 0.85, 0.61, and 0.81 in the training cohort, the prospective-retrospective validation cohort, and the external validation cohort, respectively (Supplementary Fig. 2). Simultaneously, incorporating three-sequence key features of tumor region for ALNM prediction achieved an AUC of 0.78, 0.59, and 0.63 in the training cohort, the prospective-retrospective validation cohort, and the external validation cohort, respectively (Supplementary Fig. 3). The AUC values for ALNM prediction of combining multi-sequence features were higher than incorporating single-sequence features in the training and the validation cohorts (Supplementary Table 1).
When combined both ALN and tumor regions features, the ALN-tumor radiomic signature was constructed and illustrated good performance in detecting ALNM in the training cohort (AUC, 0.88), the prospective-retrospective validation cohort (AUC, 0.87), and the external validation cohort (AUC, 0.87), which outperformed the ALN or tumor radiomic signature alone (Fig. 2B). The detailed evaluation indicators for model performance including sensitivity and specificity were summary in Table 2.
Distinguishing ALN metastasis by multiomic radiomic signature
In the univariate analysis, which was presented in Supplementary Table 2, five differentially expressed clinical characteristics were found to be sassociated with ALN status in the training cohort, including age (P = 0.012), clinical T stage (P < 0.001), clinical N stage (P < 0.001), Ki67 expression (P = 0.010), and molecular subtype (P = 0.033). To develop a more precise and clinically applicable method that could predict an individual’s ALN status, the multiomic radiomic signature incorporated all key radiomic features of ALN and tumor region with clinical characteristics, pathological characteristics and molecular subtype that significantly associated with ALNM was built and showed better performance of ALNM prediction, which achieved the higher AUC (0.90) in the training cohort, the prospective-retrospective validation cohort (AUC, 0.93), and the external validation cohort (AUC, 0.91), respectively (Fig. 2C). The sensitivity and specificity of the multiomic radiomic signature were list in Table 2. As for combination of these clinicopathological characteristics and molecular subtype, it achieved an AUC of 0.74, 0.68, and 0.72 for predicting ALNM in the training cohort, the prospective-retrospective validation cohort, and the external validation cohort, respectively (Supplementary Fig. 4).
The multiomic radiomic signature also presented the ability of discriminating ALNM patients with 1, 2, and 3 positive nodes (AUC of 0.88, 0.89 and 0.92 in the training cohort; AUC of 0.79, 1.00 and 0.93 in the prospective-retrospective validation cohort; AUC of 0.97, 0.93 and 0.87 in the external validation cohort; Supplementary Table 3).
Besides, to further assess the added value of the multiomic radiomic signature to the ALN status, we conducted subgroup analysis within patients with different molecular subtypes. Encouragingly, the multiomic radiomic signature could identify ALNM patients in the subgroups of Luminal A (AUC, 0.91, 0.91, respectively), Luminal B (AUC, 0.89, 0.92, respectively), human epidermal growth factor receptor 2 (Her-2) positive (AUC, 0.90, 0.76, respectively), and triple-negative breast cancer (TNBC) (AUC, 1.00, 1.00, respectively) patients in the training and external validation cohorts (Supplementary Table 3). In addition, when stratified by other factors such as age, Ki67 expression level, and clinical T stage, the AUC value of the multiomic radiomic signature remained at 0.86-1.00 in these subgroups (Supplementary Table 3).
Decision curve analysis (DCA) was conducted to determine the clinical usefulness of the multiomic radiomic signature by quantifying the net benefits at different threshold probabilities. The decision curve showed that if the threshold probability is > 5%, using the multiomic radiomic signature to predict ALNM adds more benefit than either the ALN-tumor radiomic signature or the radiologists’ diagnosis (Fig. 2D).
According to the radiomic score of the multiomic radiomic signature, an optimal cutoff value (0.334) was generated to classify patients into high- and low-score groups in the training cohort. High-score patients with high risk of ALNM had significantly shorter DFS compared with the low-score group (HR 0.43, 95% CI 0.21-0.86, P=0.014; Supplementary Fig. 5).
During clinical practice, patients’ clinical ALN status judged by the preoperative MRI are commonly inconsistent with the pathological one and even the senior radiologists make mistakes sometimes. The multiomic radiomic signature could precisely recognize ALNM among patients with different clinical tumor stage in the training and the validation cohorts (Supplementary Table 3). As shown in Fig. 3, patient 1 had pathological positive ALN status but was considered as a non-ALNM patient by radiologists though MRI before surgery. On the contrary, patient 2 was initially diagnosed as ALNM by radiologists but found to be a non-ALNM patient by pathologically examination. The ALN status of two patients could be accurately assessed through the multiomic radiomic signature by the cutoff value of 0.334.
Radiomics associatied with tumor microenvironment
According to the cutoff values of radiomic score from T1+C and T2WI sequences signatures of tumor region in the training cohort, 91 breast cancer patients from The Cancer Genome Altas (TCGA) and The Cancer Imaging Archive (TCIA) with T1+C and T2WI sequences MRI were classified into two group. In total, 1381 T1+C and T2WI sequences-based differentially expressed genes (DEGs) were obtained among low-score and high-score patients. Next, the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed to further examine the biological functions of the identified radiomic-based genes. The GO enrichment analysis indicated that the radiomic-based genes were enriched in various physiological metabolic processes, such as affection of transmembrane transporter activity, NADP binding and ATPase complex (Supplementary Table 4 and Supplementary Fig. 6). The KEGG pathway enrichment analysis found these genes were involved in the oxidative phosphorylation signaling pathway (Supplementary Table 4).
We further explored the association between MRI radiomic features and tumor microenvironment including 22 immune cells, long non-coding RNAs, and types of methylated sites in these patients. The expression level of 395 long non-coding RNAs and the enrichment level of 1784 types of methylated sites were found significantly different among ALNM and non-ALNM patients and the top 30 one were selected via random forest algorithm for further analysis. Overall distribution of key tumor-microenvironment features among patients with and without ALNM was shown in Fig. 4A. The key radiomic features of ALN and tumor region were found to be remarkably correlated linearly with immune cells like the M0 macrophages, B naïve cells and neutrophils, long non-coding RNAs like P11.563P16.1 and RP11.888D10.3, and types of methylated sites like cg14681629 and cg02784848 (Fig. 4B-D).