Pancreatic cancer is one of the most lethal malignancies with a poor 5-year survival rate of around 5–9% that has remained almost stagnant since the 1960s [1, 2]. Among pancreatic cancers, more than 85% of the cases are pancreatic ductal adenocarcinoma (PDAC), which mostly originates from pancreatic ductal epithelium in the pancreatic head [2]. Chronic pancreatitis (CP), though an entirely different disease with a distinct prognosis compared to PDAC, can present morphological features similar to PDAC ranging from radiographical level to histopathological level [3–7]. Therefore, differentiating PDAC and CP is a challenge in pathology and the consequence of misdiagnosis can be severe due to the rapid progression of PDAC and the high frequency of distant metastases [8–10]. Moreover, multiple attempts at diagnosis with fine-needle aspiration or needle biopsy techniques are frequently required due to the small amounts of tissue recovered with these procedures.
In clinics, computed tomography and ultrasound are often used to detect irregular masses in the pancreas, but histopathological assessment based on tissue sections is required to ensure the accurate differential diagnosis of PDAC and CP [3–5, 11, 12]. CP can present similar histomorphological features such as inflammatory infiltration, dense stroma, angulated glands, cytologic atypia, and tumor-like duct organization that mimic PDAC. The difficulty in distinguishing PDAC from CP is further aggravated by the fact that PDAC may induce CP in the surrounding pancreatic tissue, posing additional challenges to tissue sampling and diagnosis [3–6].
One important hallmark of the tissue microenvironment for both PDAC and CP is a dense desmoplastic stroma, characterized by the increased deposition of fibrillar collagen. In PDAC, the extensive desmoplasia present in the tumor has been shown to have a negative impact on stromal vascularization and immune cell infiltration, thus, it may contribute to resistance to radiotherapy and hinder drug delivery [13–15]. Besides PDAC, evidence in a variety of cancer types has also shown that stromal properties are important factors for disease diagnosis, cancer progression, and tumor response to therapy [16–21]. In differentiating PDAC and CP, the collagen fiber topology at the stromal-epithelial interfaces has been shown to be a statistically significant discriminating feature [22]. In PDAC, ordered periductal stroma characterized by highly aligned and elongated collagen fibers is a negative prognostic factor [18]. Quantification of the size, shape, and patterns of collagen fibers show that these factors might impact tissue stiffness and are associated with an increased risk of cancer progression [23]. Many of the findings are derived from studies using label-free collagen-sensitive imaging modalities such as second-harmonic generation microscopy (SHG) and polarization-based optical microscopy, which can enable the quantification of collagen fibers and desmoplasia [24–28]. With SHG, new insights into tissue properties associated with stroma have been obtained to inform biological understanding, diagnosis, prognostication, drug development, and therapeutic innovations [13–15, 29, 30].
In the last 30 years, the advent of digital slide scanners and whole slide imaging (WSI) has made digital pathology a fruitful research field for image feature analysis with strong potential for clinical applications [31–34]. The potential for WSI-based computational analysis is further increasing due to the recent success of deep neural networks in computer vision and language modeling [35, 36]. The ability to digitize routinely processed tissue on glass slides paved the way for computer algorithms to quantify histological image features. Now, deep learning has demonstrated state-of-the-art performance when applied to many computational histopathology analysis tasks such as cancer detection, tissue segmentation, disease prognostication, and spatial omics analysis [37–39].
In this paper, we present a computational pipeline for the detection and differentiation of histological samples of PDAC and CP, empowered by advances in artificial intelligence and collagen-targeted tissue imaging. The design of our analysis methods is motivated by the observation that discriminative histological features of PDAC and CP span multiple fields and multiple magnifications, from cell morphology to duct organization, and both diseases are characterized by dense stroma with potentially different collagen topography. Thus, a model with the ability to learn image features across different scales and incorporate collagen-based image features would be favored.
The analysis pipeline includes a deep learning model built on graph neural networks (GNNs) [40, 41] that can be trained with manually classified, coarsely annotated regions of varying sizes, along with a region proposal algorithm that generates candidate regions in unannotated slides. Built on the canonical scheme of WSI data processing, the proposed method extracts local features from image patches but further models the patch-to-patch interactions by constructing graphs from the patch sets and utilizing graph convolutions [41, 42]. GNNs increase the expressivity of the model by letting the information flow between adjacent image patches, thus, could capture histomorphological features that span multiple patches [43–45]. This more closely mimics histopathological examinations conducted by pathologists where relationships between various tissue features are frequently characterized. The concept of using GNNs aligns well with the fact that the major diagnostic criteria of PDAC and CP include features ranging from the cellular level (e.g. variation of epithelial nuclei sizes) to the tissue level (e.g. spatial organization of ducts in comparison to nerves and arteries).
The proposed GNN-based method is evaluated on two types of datasets, one collected from human tissue micro-arrays (TMAs) and the other from human tissue sections, both consisting of PDAC, CP, and normal pancreas tissue samples. The resulting model outperforms the widely used multiple-instance learning (MIL) framework [46, 47], achieving 86.4% accuracy with an average area under the curve (AUC) of 0.954 on the TMA dataset and 88.9% accuracy with an average AUC of 0.957 on the tissue section dataset. Furthermore, we demonstrate that the incorporation of collagen-based features extracted from SHG images leads to higher classification accuracy compared to using brightfield hematoxylin and eosin (H&E) features alone, with classification accuracy increasing from 88.9–91.3%. This result confirms the diagnostic potential of utilizing characteristic collagen topology in differentiating PDAC, CP, and normal pancreas tissue.