Background: The Bland-Altman plot with limit of agreement has been widely used as a visual tool for assessing test-retest reliability or reproducibility between two measurements. We have observed, however, that in certain circumstances the limit of agreement approach may mislead practitioners. Particularly, if the acceptable difference is not available and two readers are highly concordant but the common variance of the data is large, the broad width of the limit of agreement plot may incorrectly indicate a lack of agreement.
Methods: This paper proposes a novel, scaled index-based guidance for graphical evaluation of reproducibility or reliability. We create a reference band from two measurements, which is based on the concordance correlation coefficient.
Results: Simulation studies have been carried out to demonstrate the benefits of our method over the limit of agreement. We also consider the application to the real examples, including the peak expiratory flow rate data in Bland and Altman's paper and the test-retest reproducibility data of Radiomics study.
Conclusions: In absence of acceptable difference, we found that the limit of agreement seems to derive subjective inference and may not be consistent with concordance correlation coefficient. Our simulation study results and real data application show that the proposed method can provide practitioners with a novel graphical evaluation method which is consistent with results from concordance correlation coefficient approach than the limit of agreement approach.