In this study, we initially reported an AI-based pathological diagnostic model for transurethral resection specimens of BCa, designated PAIDM. The PAIDM not only showed excellent performance in identifying MIBC but also performed well in distinguishing high-grade and low-grade BCa. More importantly, the PAIDM excelled at both WSI-level and patch-level recognition, with a promising application for BCa staging and grading.
The accurate diagnosis of invasion depth and histologic grade is critical for clinical management in BCa patients. However, some dilemmas still remain regarding the pathological diagnosis, such as the misdiagnosis of MIBC and interobserver variability. Erroneous staging of BCa can result in an omission or delay in providing optimal treatment, leading to disease progression and tumour recurrence39. For example, if a patient with MIBC is misdiagnosed as NMIBC, optimal treatment, such as radical cystectomy and neoadjuvant chemotherapy, cannot be implemented in time, which is likely to lead to a poor clinical outcome. In our study, the PAIDM achieved satisfactory performance in identifying muscle invasion at the WSI level, with an accuracy of 0.920 and a specificity of 0.943. The PAIDM had a relatively low sensitivity of 0.743. The underlying reason might be that some specimens lacked a complete muscle layer and the model failed to extract effective features, leading to missed diagnoses of MIBC. It is worth noting that the PAIDM also performed well at patch-level recognition, with an AUC of 0.904, indicating a significant clinical application for some specimens with minimal tissue.
With the rise in cancer morbidity, there is a significant shortage of pathologists to meet the growing demands for diagnosis. In clinical practice, pathologists have to deal with many cases and review associated pathology slides to confirm cancer diagnosis every day, which is labour-intensive and time-consuming. For atypical or complex cases, pathologists are prone to subjectivity with significant inter- and intra-observer variability, which greatly relies on the skills and experiences of pathologists. As a result, there is a high demand for automated analytic systems to reduce the burden, increase diagnostic consistency and reliability. In clinical practice, our system enables a completely automated and integrated diagnosis process. The pathologists only need to put a batch of stained slides into the scanner, and the scanner will automatically complete the scanning, upload the WSIs to the diagnostic platform and realize the end-to-end diagnostic output, without additional manual involvement. It could handle vast amounts of images effectively and is less prone to fatigue, with better reproducibility and stability. Furthermore, our system could not only provide diagnosis outcomes, but also highlight prediction masks in the images, allowing pathologists to visualize the inference results and aid in focusing on suspicious regions to improve diagnostic efficiency.
Additionally, the PAIDM is a practical tool for bridging the diagnostic gap between national hospitals and primary care hospitals, as well as the gap between experienced pathologists and junior pathologists. In the comparison between the PAIDM and pathologists, the PAIDM was non-inferior to the average diagnostic level of pathologists, reaching the intermediate expert level, indicating that the PAIDM might improve the diagnostic accuracy of inexperienced pathologists, particularly junior pathologists. Although the PAIDM did not reach the level of senior experts, all the intermediate pathologists in the comparison came from the first-rate hospital in China, and all of them had more than ten years of clinical expertise, so we assume that they are no less competent than the experts in municipal or grass-roots hospitals. Hence, in China, where medical resources are unbalanced between urban and rural areas, we can apply the PAIDM in developed areas to improve the diagnostic efficiency of experienced experts and in remote areas to improve the diagnostic accuracy of inexperienced pathologists, thereby promoting medical care homogenization. It is worth mentioning that we believe that AI-based diagnostic models are currently used as an adjunct rather than a replacement.
According to reports, few previous studies have applied deep learning for the pathological grading of NMIBC. Ilaria Jansen et al. developed an automated detection and grading model for classifying low-grade and high-grade BCa33. Peng-Nien Yin et al. used six machine learning approaches to distinguish stage Ta and stage T1 BCa34. However, the efficacy of the above models was not convincing enough due to the small sample size and low diagnostic accuracy, which limited their clinical application. Compared with the previous models, our model was based on a larger dataset and achieved a higher accuracy for the pathological grading of BCa. Additionally, the images included for training were completely annotated, thus making full use of the information in each pathological image. More notably, the PAIDM performed well in identifying MIBC, which had not been achieved in previous studies but is vital for clinical decision making. To our knowledge, this is the first study to apply AI for the pathological diagnosis of muscle invasion in BCa.
Although our model achieved remarkable results, some limitations still must be addressed. First, since this study was single-centre and retrospective, the issue of overfitting needs to be thoroughly considered. Despite the fact that we applied data enhancement strategies such as translation, rotation, scaling, flipping and colour jitter to improve the robustness, multi-centre and prospective studies are still needed for further validation. The generalizability of the PAIDM can be boosted by increasing the amount and diversity of the samples. Second, the annotation method used in this study was full annotation, which fully utilized the information of each pathological image; however, the labelling work was time-consuming and it was difficult to include more images for training. It will be critical to incorporate annotated data based on partial annotation and weak supervision40 to further improve the performance.