Development of the REO-basedgrade signature
The flowchart of this study was described in Fig. 1. For each of the seven training datasets (Table 1), with FDR < 0.1, we firstly identified gene pairs with significantly stable REOs in the pHG1 and pHG3 groups, respectively, and then identified gene pairs with reversal REOs between the two groups. Then, 437 gene pairs were commonly identified from the seven datasets and they consistently showed the same reversal REO patterns between the pHG1 and pHG3 groups in the seven datasets. Next, we performed a forward-stepwise selection procedure to search a set of gene pairs that achieved the highest F-score according to the classification rule as follows: a sample was classified into the HG3 group if at least a half of the REOs of the set of gene pairs within the sample voted for HG3; otherwise, into the HG1 group. Finally, we obtained 10 gene pairs, denoted as 10-GPS (Table 2), to distinguish different histological grades with the highest F-score (0.8884). In the training data, the apparent specificity for HG1 samples was 90.77% and the apparent sensitivity for HG3 samples was 86.99%. The performance of the transcriptional grade signature in each training dataset can be found in Additional file 1: Table S1. Notably, the apparently imperfect performance should be reasonable because HG evaluation based on the pathological Nottingham grading system is highly subjective with only 50%–85% inter-observer agreements [17-20]. We speculated that the 10-GPS could provide a more objective and clinically relevant measure of tumor grade with prognostic significance.
We validated the above speculation based on the knowledge that HG3 patients were with lower survival rate than HG1 patients [3, 7]. Here, we collected another four independent datasets (Table 1) including samples with RFS or OS data of early stage ER-positive breast cancer patients who accepted surgery only. When the 10-GPS was applied to these datasets, the averaged apparent sensitivity for all HG3 samples was 83.1% and the average apparent specificity for all HG1 samples was 78.4%. In a merged dataset from the three validation datasets with the RFS information, the 12 pHG3 patients reclassified as HG1 by the signature had significantly higher 10-years RFS rate than that of the 46 HG3 patients confirmed by the signature (p=0.0143; HR=8.17, 95% CI: 1.10-60.6; C-index = 0.61, Fig. 2a). And, we also compared 10-years RFS rates between the 13 pHG1 patients reclassified as HG3 by the 10-GPS and the 93 HG1 patients confirmed by the 10-GPS. Despite no statistical difference was, there was trend to be different between the two groups (p=0.3583; HR=1.65, 95% CI: 0.56-4.86; C-index = 0.54, Fig. 2b). There was also trend of difference between the RFS rate of 12 pHG3 patients reclassified as HG1 by the signature and that of 13 pHG1 patients reclassified as HG3 by the 10-GPS (p=0.1847; HR=3.95, 95% CI: 0.44-35.35; C-index = 0.65, Fig. 2c).
In each of the three validation datasets with the RFS information, we also compared the survival between the pHG1 and pHG3 patients diagnosed by the pathological Nottingham grading system and the survival between the HG1 and HG3 patients reclassified by the 10-GPS from the pHG1 and pHG3 patients. Significant difference of RFS between the pHG1 and pHG3 patients was observed only in GSE6532 and GSE4922 dataset (Additional file 2: Fig. S1). However, the HG1 patients showed significantly better RFS than those of HG3 patients in all the three datasets (Fig. 3). No significant difference of prognosis was observed between pHG1 and HG1 cohorts for each of the four validation datasets. Similar for pHG3 and HG3 cohorts for each of the four datasets (Additional file 3 Fig. S2). The majority histological grade labels of pHG1 and pHG3 sample were correct, which resulting no significant difference of survival mentioned above. These results demonstrated that the 10-GPS can more accurately and objectively stratify samples into distinct prognostic groups.
Application of the signature to reclassification of HG2 samples
Then, we used the 10-GPS to reclassify the pHG2 samples of the above four validation datasets with RFS or OS information (Table 1) into the HG1 and HG3 groups, respectively, and evaluated their prognostic differences.
Firstly, for the 68 pHG2 samples of the GSE7390 dataset, the 10-GPS signature allocated 38 and 30 patients into the HG1 and HG3 groups, respectively. And, the former ones had significantly higher RFS rate than the latter ones (p=5.55E-03; HR=2.53, 95% CI: 1.28-4.97; C-index = 0.64; Fig. 4a). Then, in the 91 pHG2 patients combined from the datasets of GSE6532 and GSE4922 with small sample sizes, the RFS rate of the 65 patients stratified into the HG1 group was significantly higher than that of the 26 patients stratified into the HG3 group (p=9.06E-03; HR=2.64, 95% CI: 1.24-5.62; C-index = 0.61; Fig. 4b). In the EGA dataset, the 71 HG1 patients classified by the 10-GPS also displayed significant higher OS rate than that of the 49 HG3 patients classified by the 10-GPS (p=6.92E-03; HR=2.10, 95% CI: 1.21-3.64; C-index = 0.61; Fig. 4c).
Transcriptional characteristics of the low-HG and high-HG samples recognized by the 10-GPS
In the TCGA-BRCA dataset, we used the limma algorithm and found 6,194 differentially expressed genes (DEGs) between the 58 pHG1 and 170 pHG3 samples diagnosed by the pathological Nottingham grading system (FDR<0.05, Additional file 4: table S2). Applying the 10-GPS to these samples, 94 samples were allocated into the HG1 group and the other 134 samples were allocated into the HG3 group. We identified 8,087 DEGs between the two reclassified groups with the same FDR control (Additional file 4: table S3). And of these genes, up-regulated genes were significantly associated with proliferation and down-regulated were significantly associated with extracellular signal transduction (Fig. 5). When comparing the two DEG lists, we found that 5,472 (88.34%) of the 6,194 DEGs between the original HG1-HG3 groups were also included in the DEGs identified after sample reclassification and the dysregulation directions of the overlapped genes reached up to 100% (binomial test, p < 1.10E–16). We also identified 3,126 DEGs between 145 HG1 (denoted as LHG2) samples and 71 HG3 (denoted as HHG2) samples recognized from the pHG2 samples with the aid of 10-GPS (Additional file 4: table S4). About 31.14% of the 2,519 DEGs were also included in the 8,087 DEGs. The concordance score of the 1,164 overlapped DEGs was 99.92%, which was unlikely to happen by chance (binomial test, p < 1.10E–16). Moreover, after reclassifying HG status of samples for TCGA dataset, we identified differential expressed genes identified from pHG1 and pHG3 samples, from HG1 and HG3 samples reclassified from pHG2 samples, and HG1 and HG3 samples reclassified from overall samples respectively (Fig. 6). In all of the three two-way clustering heatmaps in Fig. 6, each of the two reclassified histological grade sample subclasses (two child nodes under the root node in sample clustering tree) contained both HG1 and HG3 samples although HG1 (or HG3) samples were in the majority. This may be caused by the difference between quantitative and qualitative expression relationship essentially. After identified enriched pathway lists by differential expressed genes for all the four datasets, the most common pathways were associated with proliferation and extracellular signal transduction (Additional file 5: Fig. S3) The clearer transcriptional differences between the two reclassified groups indicated that the 10-GPS could more accurately and objectively stratify samples into distinct histological grade groups.
Comparison of 10-GPS prognostic significance with GGI, Oncotype DX and PAM50
For each of the four validation datasets, we obtained relapse risk by using existing signatures such as Gene expression Grade Index (GGI)[41], PAM50 [42]and Oncotype DX[43]. Chi-square test were conducted, revealing that there were significant differences in relapse risks generated by almost all the three existing signatures between HG1 and HG3 cohorts (Fig.7, Additional file 1: Table S5). This indicating that histological grade status reclassified by 10-GPS were with prognostic significance. Meanwhile, no prognostic differences between low-risk group identified by GGI and HG1 group identified by 10-GPS were observed (Fig. 8 a-d). It was similar for high-risk group identified by GGI and HG3 group identified by 10-GPS were observed (Fig. 8 e-h). Significant prognostic difference between HG1 samples who were identified as low-risk by GGI and HG1 samples who were identified as high-risk by GGI was observed only in GSE4922, which might result from unbalanced samples (Additional file 6 Fig. S4 a-d). No significant prognostic difference between HG3 samples who were identified as low-risk by GGI and HG3 samples who were identified as high-risk by GGI was observed (Additional file 6 Fig. S4 e-h). It showed that the prognostic significance of 10-GPS was more aligned with the survival of patients. Moreover, HG1 samples who were identified as high-risk and HG3 samples who were identified as low-risk were mainly from pHG2 cohort. The consistency between the result and prior knowledge that pHG2 are with low inter-observer agreements was reasonable. It implied that 10-GPS was feasible for reclassifying histological grade status, especially for pHG2 samples. All these comparisons indicated that 10-GPS could effectively reclassify into distinct histological grade groups with significantly prognostic difference.