In this study, by inviting 21 pathologists to assess Ki67 scores of same set of Luminal-like breast cancer specimens, we were able to evaluate the practicability of three guidance (IKWG, NHCC and NHCCa9), as well as that of Quantitative Dot Blot (QDB)-based Ki67, in daily clinical practice.
Our results demonstrated that the consistency is easier to achieve among pathologists by following NHCC guidance while harder to achieve by NHCCa9 guidance. However, if QDB results may be used as reference, the IKWG guidance offered best guidance to identify patients of low risk group, i.e., those spared of chemotherapy, while NHCCa9 offered significantly more false negative results.
We were also able to investigate the intra-rater ICC in this study by assigning random number from 1 to 120 to the triplicate section of the 40 samples. We believe this design would best reveal the potential subjectivity of IHC analysis in real world practice. Admittedly, a few of the pathologists may be alarmed with the repeated images during the evaluation. However, from the results we got, even if this situation did exist, it made minimum impact on the overall assessment.
The intra-rater ICC was calculated at as low as 0.639, with 25% percentile at 0.76, median at 0.848 and 75% percentile at 0.9225. We interpreted that even for an experienced pathologist in China, there was only 85% chance on average for him/her to score the same IHC slide consistently.
The current study is limited to the evaluation of a set of pre-stained slides. Thus, it was unable to evaluate the potential variations associated with pre-analytical factors in individual institutions. All the invited pathologists were also not through extensive training besides a broad instruction. Thus, we considered this study should reflect faithfully the real-world practice for all these invited pathologists.
We were surprised to find that there were 20% (8/40) specimens categorized as intermediate risk group based on the C5-C95 of the purposed 2.31 nmole/g identified in the previous study using QDB method. We interpreted that the precision of QDB method remains to be improved, as its improvement should narrow the window of intermediate risk group more in the future. It should also be cautioned that the proposed 2.31 nmole/g remained to be validated in the future with much larger scale of study. However, we expect the possible adjustment of this cutoff should have minimum impact on overall conclusion.
One unexpected observation is that while overall agreement between QDB and IHC method was satisfactory (r = 0.78 by Pearson), there were clear difference with two specimens, #23 and #29. They were grouped as high risk group by QDB method, but as low risk group by IHC method by all three guidance. One putative explanation maybe due to the negative influence of heavy staining on the nuclear antigen, as suggested by Rudbeck (7). The other possibility may be the incorrect staining due to poor pre-staining treatment. However, this point was debatable even among the invited pathologists. The IHC images of these two specimens were provided in supplemental data (Supplemental Fig. 2), warranting further discussion of this clear discrepancy of the two methods.
It also should be pointed out there existed difference in nature of the results from QDB analysis and IHC analysis. In QDB, the total protein lysates were extracted from FFPE slices through disruption of the tissue structure. Thus, QDB measures averaged protein content to minimize the heterogeneity of the tissue slice. In contrast, Ki67 scores reflect the localized Ki67 protein level with the fully preserved tissue structure, thus reflect better the heterogeneity of the tissue slice. The results from these two methods should be highly correlated, as demonstrated by current study, but not identical by any chance.
It is not unclear which method would provide more relevant result for the prognosis and prediction of patients. While some argue that tissue heterogeneity might be better reflected through IHC analysis, it is equally arguable that QDB method might maximally minimize the negative influence of tissue heterogeneity in the prognosis and prediction of the patients. Clearly, the final judgement may be only achieved through properly controlled prospective clinical trials in the future.
Another limitation with current study is that we invited 21 pathologists for IHC analysis, yet only three technicians were requested for QDB analysis. The limited number of technicians for QDB analysis may underestimate the variations among technicians when interpreting the QDB results. On the other hand, QDB analysis is an objective biochemical assay tightly controlled internally. The C5/C95 analysis also takes full consideration of the variations among technicians at large scale. Thus, we interpreted that potential impact of including more technicians for QDB analysis should not fundamentally change the overall conclusion of the current study.
In conclusion, by inviting 21 experienced pathologists to score the Ki67 levels of the same set of IHC slides from 40 ER + breast cancer specimens, we were able to compare the practicability of three clinical guidance (IKWG, NHCC and NHCCa9) in daily clinical practice. We were also able to compare the Ki67 scores with results from QDB measurements to suggest that QDB may improve the consistency of Ki67 assessment significantly in daily clinical practice. Our results also showed that if QDB results may be used as reference, the IKWG guide was hard to achieve agreement among pathologists, yet give the most trustworthy guide for chemotherapy for Luminal-like patients.