Requirement of image standardization for AI-based macroscopic diagnosis for surgical specimens of gastric cancer

The pathological diagnosis of surgically resected gastric cancer involves both a macroscopic diagnosis by gross observation and a microscopic diagnosis by microscopy. Macroscopic diagnosis determines the location and stage of the disease and the involvement of other organs and surgical margin. Lesion recognition is, thus, an important diagnostic step that requires a skilled pathologist. Nonetheless, artificial intelligence (AI) technologies could allow even inexperienced doctors and laboratory technicians to examine surgically resected specimens without the need for pathologists. However, organ imaging conditions vary across hospitals, and an AI algorithm created in one setting may not work properly in another. Thus, we identified and standardized factors affecting the quality of pathological macroscopic images, which could further affect lesion identification using AI. We examined necessary image standardization for developing cancer detection AI for surgically resected gastric cancer by changing the following imaging conditions: focus, resolution, brightness, and contrast. Regarding focus, brightness, and contrast, the farther away the test data were from the training macro-image, the less likely the inference was to be correct. Little change was observed for resolution, even with differing conditions for the training and test data. Regarding focus, brightness, and contrast, there were conditions appropriate for AI. Contrast, in particular, was far from the conditions appropriate for humans. Standardizing focus, brightness, and contrast is important in the development of AI methodologies for lesion detection in surgically resected gastric cancer. This standardization is essential for AI to be implemented across hospitals.


Introduction
Macroscopic observation is important for surgically resected gastric cancers. One of the standard treatments for gastric cancer is resection, for which pathology reports are prepared, and staging is performed. The pT factor included in the Union for International Cancer Control and American Joint Committee on Cancer staging systems for gastric cancer is determined by the depth of the lesion (Edge et al. 2017). Therefore, the extent of the lesion must be carefully identified upon macroscopic examination of the specimen. Otherwise, the tissue used for processing may not be taken from the appropriate area, hindering the staging process.
The process of pathological diagnosis includes gross observation, sectioning, specimen preparation, and microscopic observation. Among these processes, two 1 3 observations must be conducted by a specialist such as a pathologist: macroscopic diagnosis by gross observation and microscopic diagnosis by microscopic observation. These steps are particularly important for the pathological diagnosis and staging of surgically resected gastric cancer.
Macroscopic diagnosis determines the location and quality of a lesion, for example, early or advanced, its relationship with other adjacent organs, and the surgical margin, which, in turn determine the site of specimen preparation. In practice, lesion identification is an essential aspect of macroscopic observation in the pathological diagnosis of gastric cancer and requires a highly skilled pathologist. If the location of the lesion is unclear, it cannot be grossed appropriately, and its margins cannot be examined.
Artificial intelligence (AI) technologies are advancing rapidly and have been applied to the field of pathology.
Most algorithms are used to diagnose pathology and genetic mutations and to predict disease prognosis using micro-specimens, primarily serving as diagnostic aids for pathologists (Echle et al. 2022;Coudray et al. 2018;Skrede 2022). However, we believe that in addition to imaging classification, AI algorithms can help improve lesion identification as part of the macroscopic diagnostic workup of gastric cancer based on gross pathological examination.
Lesion identification using AI can simultaneously identify the type and location of lesions based on images. This technology has been commercialized and is being used in various medical fields, such as radiology (Shimazaki et al. 2022;Lee et al. 2021) and endoscopy (Joseph et al. 2021;Nam et al. 2022;Xu et al. 2021). While it is also applicable Fig. 1 Schematic diagram of factors affecting macro photography. Macro photography is affected by focus, shooting condition, light source, shooting platform, shooting distance, camera model, lens model image sensor image processing. Focus is affected by camera lens and settings. Brightness is affected by camera settings and light source. Resolution is affected by camera settings and shooting distance to the field of pathology, there has only been one report of lesion identification using AI for surgical tissues in gastric cancer ). If AI can accurately identify a lesion, tissue samples used for processing and microscopic examination may be prepared from the identified site following standard grossing guidelines without the need for a specialist.
However, the application of AI for lesion identification is not without caveats, as the macroscopic images used have not been standardized. Each hospital uses different cameras and conditions for imaging, such as lighting and focus, while image data processing also varies. Furthermore, pathologists have their individual methods for taking photographs, resulting in discrepancies even within hospitals. Thus, since the quality of images differs between hospitals, an AI model trained in one hospital may not work properly in another. Image standardization is, therefore, necessary to solve this problem. For this, the conditions that affect lesion identification by AI must be identified, thus allowing its appropriate standardization for widespread use.
Camera models, lenses, shooting platforms, light sources, shooting distances, image sensors, and image processing all have an impact on the resulting macro-images. Lens and shooting conditions affect focus; light source, shooting Model and cases for gastric cancer detection using AI learning. a Yolo v5x was used to develop artificial technologies. Yolo shows the location of the lesion as a bounding box and the probability of cancer as a confidence. b Gastric cancer resection material from 2017-2019 was used for training and produced a model that identified lesions with a probability of 81.8%. This model was also able to identify cases collected at different times with a probability of 60.9% conditions, platform, and distance affect brightness; shooting conditions and distance affect resolution; and camera models and shooting conditions affect image processing (Fig. 1).
In this study, we investigated the role of focus, resolution, brightness, and contrast in image processing. We aimed to identify and standardize the factors that affect the quality of pathological macroscopic images, which could further affect lesion identification using AI.

Dataset
Training and test data were collected using a picture database. Surgical tissues were resected by surgeons at the National Cancer Center East (Kashiwa City, Japan) from January 2017 to December 2019, and the pictures were taken by a pathologist following fixation. A total of 415 cases (467 lesions) were categorized as the training set, 68 (78 lesions) as the validation set, and 33 (37 lesions) as the test data. Next, 23 cases (25 lesions) that were resected during a different period (from January 2020 to March 2020) were included as additional test data (Fig. 2a). This retrospective study was approved by the Institutional Review Board of the National Cancer Center East (Kashiwa City, Japan) (IRB approval number: 2020-344), and informed consent was obtained from all patients.

Annotation
Annotation of the cancer portion was performed by an expert pathologist using the Visual Object Tagging Tool software program (Microsoft, WA, USA). Histological findings were also considered in the identification of lesions. After bounding boxes were added to the cancerous areas, they were converted to YOLO format on the Roboflow web page (https:// robofl ow. com) (Roboflow, Inc, Des Moines, IA, USA).

Model and graphics processing unit
Yolo v5x (G & J. 2020. Available from: https:// github. com/ ultra lytics/ yolov5) was used for training and lesion detection inference. The hyper-parameter setting was not changed from the default setting. This mode calculates the bounding box to suggest the lesion area with the confidence limits, which indicates the probability of it being cancerous (Fig. 2a). The graphics processing unit used was an NVIDIA A100.

Statistical analysis
For the calculation of lesion detection accuracy, the cutoff value of confidence was arbitrarily selected as 0.10. For the comparison of model accuracy, cases with a confidence of > 0.75 were selected. One-way analysis of variance and Dunnett's comparisons tests were performed using the GraphPad Prism 9 software program (San Diego, CA, USA). p < 0.05 was considered statistically significant.

Image pre-processing
Trimming was performed manually using the Photoshop software package (Adobe Inc., San Jose, CA, USA). One side of the picture was arranged at 25 cm using a scale. The images were taken at the same time at 800 pixels resolution. For resolution analysis, images of 720, 640, 560, and 480 pixels were generated using the Python-OpenCV library (version 4.5.3). Gaussian processing, which is an alternative method to focusing, was performed using the Python-OpenCV library. Brightness and contrast adjustments were also performed using the Python-OpenCV library (version 4.5.3).

Model development for the standardization study
Yolo v5x was used for lesion detection, and a lesion detection model was developed (training set, 415 cases (467 lesions); validation set, 68 cases (78 lesions); and test set, 33 cases (37 lesions)) (Fig. 2b). The identification rate was defined as the number of lesions with a cutoff of ≥ 0.10. The identification rate was 81.1% (31/37). There was one case in which a non-cancer lesion area was identified as a cancer lesion. We subsequently validated the results using 23 cases at different times; the identification rate was 64.0% (16/25). None of the cases showed false positives. Next, the test cases were selected for the standardization study. Specifically, we used cases with a confidence of ≥ 75%.

Focus
Instead of an actual out-of-focus image, we used a Gaussian blur-processed photo. The higher the number (e.g., Gauss 9), the less focus there is (Fig. 3a). Initially, we created models with various conditions and compared their accuracies. As expected, the model trained using the in-focus control image had the highest accuracy (Fig. 3b, Control vs. Gauss 3: p = 0.18, Control vs. Gauss 5: p = 0.02, Control vs. Gauss 7: p = 0.005, Control vs. Gauss 9: p = 0.12). However, even the model trained with control slides had poor accuracy when the quality of the test images was poor (Fig. 3c, Control vs. Gauss 3: p = 0.13, Control vs. Gauss 5: p = 0.09, Control vs. Gauss 7: p = 0.012, Control vs. Gauss 9: p = 0.0058). Interestingly, even the model trained by Gauss 5 correctly inferred the test data processed by Gauss 3 (Fig. 3d, Gauss 5 vs. Control: p = 0.33, Gauss 5 vs. Gauss 3: p = 0.047, Gauss 5 vs. Gauss 7: p = 0.048, Gauss 5 vs. Gauss 9: p = 0.013). This means that even if the training and test data are misaligned, the model could adapt.

Resolution
Resolution was examined by generating images of 720, 640, 560, and 480 pixels from an 800-pixel image (Fig. 4a). The accuracy declined as the resolution was lowered from 800 to 720 and then to 640 pixels. However, when the resolution was further decreased to 560 and then to 480 pixels, the accuracy increased, even though no constant trend was observed ( Fig. 4b: 800 vs. 720 pixels, p = 0.012; 800 vs. 640 pixels, p = 0.0009; 800 vs. 560 pixels, p = 0.93; 800 vs. 480 pixels, p = 0.51). We also tried the model trained using 800-and 640-pixel images. In both cases, the results were not affected by the resolution of the test data (Fig. 4b, c: 800 pixels, p = 0.26; 640 pixels, p = 0.73).

Brightness
Beta processing with OpenCV was used to examine the effect of brightness (Fig. 5a). The principle of Beta treatment is the translation of histogram data (Online Resource 1). The models trained with untreated (Beta 0), and Beta 50-treated cases were more accurate than the other models, but without a significant statistical difference (Fig. 5b: p = 0.10). For the model trained with Beta 0, the test images transformed to be darker, indicating significantly worse accuracy (Fig. 5c: Beta 0 vs. Beta 100, p = 0.0019; Beta 0 vs. Beta 50, p = 0.049). For the test data that were slightly transformed to be brighter (Beta 50), the accuracy was slightly higher. When the image was too bright (Beta 100), the accuracy decreased slightly but not significantly ( Fig. 5c: Beta 0 vs. Beta 50, p = 0.64; Beta 0 vs. Beta 100, p = 0.079).

Contrast
Alpha processing with OpenCV was used to examine the effect of contrast (Fig. 6a). The principle of Alpha treatment is the compression of histogram data (Online Resource 1). The lower the value, the lower the contrast. Surprisingly, the lowest contrast condition, which seemed dark to the human eye, had the highest confidence score. Images with a lower contrast tended to produce a higher accuracy. Models trained with images processed with Alpha 0.2 were significantly more accurate than models trained with images processed with Alpha 1.4 and Alpha 1.8 (Fig. 6b: Alpha 0.2 vs. Alpha 0.6, p = 0.23; Alpha 0.2 vs. Alpha 1.0, p = 0.91; Alpha 0.2 vs. Alpha 1.4, p = 0.050; Alpha 0.2 vs. Alpha 1.8, p = 0.0093). The model trained with Alpha 0.2-processed images did not correctly infer the test images processed with the other methods ( Fig. 6c: Alpha 0.2 vs. Alpha 0.6, p = 0.047; Alpha 0.2 vs. Alpha 1.0, p < 0.0001; Alpha 0.2 vs. Alpha 1.4, p < 0.0001; Alpha 0.2 vs. Alpha 1.8, p < 0.0001). The model trained with images processed with Alpha 1.0 did not correctly infer the images subjected to other processing methods. Nevertheless, as with other factors, the change in accuracy was reduced for images that were shifted to the image that is suitable for AI (e.g., a darker contrast being better for AI) (Fig. 6d: Alpha 1.0 vs. Alpha 0.2, p = 0.0005; Alpha 1.0 vs. Alpha 1.6, p = 0.98; Alpha 1.0 vs. Alpha 1.4, p = 0.082; Alpha 1.0 vs. Alpha 1.8, p = 0.0028).

Discussion
In diagnostic surgical pathology, macro-diagnosis is important for obtaining appropriate specimens. Therefore, the application of AI in this context could be useful for supporting accurate diagnoses. For example, in organs of the gastrointestinal system such as the stomach, the deepest depth of invasion is important since it is related to Union for International Cancer Control/American Joint Committee on Cancer staging (Edge et al. 2017). On the contrary, superficial lesions require extensive sampling to indicate a "lack of invasion". Therefore, it is essential to identify the extent of the lesion from the surface in advance for proper pathological staging. However, only one study has reported the development of AI for lesion recognition using macro photographs of surgical materials ). This may be because macro-diagnosis is generally performed based on naked-eye observations by a pathologist instead of using photography.
In recent years, the advancement of whole-slide imaging technology (CAP secures remote work waiver for pathologists [Internet]. Available from: https:// www. cap. org/ advoc acy/ latest-news-and-pract ice-data/ march-26-2020) has allowed telemedicine for micro-diagnosis to become a reality. Thus, remote macro-diagnosis is expected to be used in the future. With the aid of AI technology, an inexperienced doctor or technician could perform grossing of specimens like a skilled pathologist. Recently, a number of convolutional neural network-based models for object detection have been developed . We chose Yolo v5x, a technology already used in medical procedures such as radiography, mammography, and endoscopy (Wan et al. 2021;Luo et al. 2021;Mohiyuddin et al. 2022) that could also be applicable to the field of pathology.
In this study, we assessed the importance of standardizing imaging conditions, with an emphasis on focus, resolution, brightness, and contrast. Gaussian processing was employed to generate out-of-focus images. As predicted, the model trained with focused pictures achieved high accuracy but had limited use with out-of-focus images. In addition, finding the best resolution was difficult as it depended on the model; nevertheless, our model made correct predictions when the resolution of the test data was lower than that of the training data. Next, for brightness examination, we used Beta treatment, which moved the histogram in parallel. We successfully determined the optimal brightness for lesion detection using AI. Our dataset tended to be darker than the best brightness. The model trained under the Beta 0 condition inferred the test data under Beta 0 and 50 conditions. For contrast analysis, we used Alpha processing. Surprisingly, the Alpha 0.2-processed image, which appeared as a dark picture to humans, had the highest accuracy score.
These findings emphasize that focus, brightness, and contrast must be quantified and standardized to improve the application and effectiveness of lesion recognition by AI. It is also crucial to take the photographs while establishing a control color palette. Issues in these processes may be overcome using engineering techniques.
This study had limitations. First, we investigated the impact of our model but not the application of AI in general. There may be AI technologies that are not affected by brightness, contrast, and focus as investigated here. Nevertheless, for AI technologies to be successfully developed and used, the type of training images must be specified. Otherwise, the accuracy of AI cannot be guaranteed when applied in medical settings. Second, this study did not examine the usefulness of AI in pathology but only evaluated whether standardization of images was necessary. To demonstrate the effectiveness of such technologies in the future, grossing with and without AI assistance could be compared.
In conclusion, this study revealed the factors that affect lesion detection by AI in the gross diagnosis of gastric cancer using pathological macroscopic images. The AI technology showed the same accuracy as the model prepared using the same-quality training images. This AI-based lesion detection system could assist inexperienced pathologists or technicians in performing surgical pathology dissection. We highlighted the importance of image standardization for the development of an AI platform that can be universally utilized across hospitals.