Automatic Machine-Learning Classi cation of the Mode of Invasion of Oral Squamous Cell Carcinoma Using Digital Microscopic Images: A Retrospective Study

Kunio Yoshizawa (  yoshizawak@yamanashi.ac.jp ) University of Yamanashi https://orcid.org/0000-0003-4779-9940 Hidetoshi Ando Yamanashi Daigaku Kogakubu Daigakuin Ikonogaku Sogo Kyoikubu Kogaku Senko Yujiro Kimura Yamanashi Daigaku Igakubu Daigakuin Sogo Kenkyubu Igakuiki Shuichi Kawashiri Kanazawa Daigaku Daigakuin Iyaku Hokengaku Sogo Kenkyuka Iyaku Hoken Gakuiki Igakurui Akinori Moroi Yamanashi Daigaku Igakubu Daigakuin Sogo Kenkyubu Igakuiki Koichiro Ueki Yamanashi Daigaku Igakubu Daigakuin Sogo Kenkyubu Igakuiki


Background
Oral squamous cell carcinoma (OSCC) accounts for approximately 90% of all cases of oral cancer.
Despite improvements in treatment options over the past few decades, the 5-year survival rates have remained fairly low (50-55%) among OSCC patients [1,2]. Treatment failure in a case of OSCC is mainly ascribed to the highly invasive nature of the tumor [3,4]. As the tumor becomes more invasive, the invasion front progresses from the epithelium through the stroma to in ltrate the lymphatic and blood vasculature [5]. This phenomenon is associated with a poor survival prognosis.
The degree of tumor invasion and metastasis can be evaluated using high-performance imaging modalities such as computed tomography (CT) and magnetic resonance imaging (MRI). However, a histopathological analysis of tumor specimens remains the most important component of an accurate determination of invasiveness [6]. Particularly, an accurate diagnosis of the invasiveness of an OSCC is a very important component of treatment planning and prognostic predictions.
Yamamoto and colleagues initially proposed a method to distinguish and classify the invasive ability of an OSCC [7]. The resulting Yamamoto-Kohama (YK) classi cation was developed for the microscopic evaluation of the invasion front in a pathological tissue section of OSCC and is now frequently used in Japan [7]. This classi cation system appears to be a powerful predictor of regional metastasis in a patient with clinically node-negative OSCC. A YK classi cation-based evaluation mainly involves biopsy and excised tissues, and the results are used to determine the prognosis and select treatments. However, this evaluation method is based on subjective visual ndings and has led to signi cant differences in determinations between evaluators and facilities. Therefore, the YK classi cation is not a su ciently objective index. Furthermore, no report has described the relationship between the visual aspects of in ltration on images of pathological tissues and the results of an objective image evaluation based on samples from patients with OSCC.
Many recent studies have demonstrated the effectiveness of pathological image analysis methods that incorporate arti cial intelligence (AI). One experimental study that compared the diagnostic accuracy of a pathologist and AI with respect to pathological images of breast cancer lymph node metastasis determined that the latter was more time-e cient [8]. In another example, reports from various countries have reported that the Gleason score, an index of prostate cancer malignancy, is poorly reproducible among pathologists. In that context, Arvantii et al. demonstrated the use of AI to match the accuracy rate of the Gleason index with its reproducibility among pathologists [9]. Several recent reports have described various approaches that have used machine learning to detect various types of carcinomas from photographic images of lesions, radiological images and pathological specimens [10][11][12][13][14][15][16][17][18]. However, no published report has described an automatic method for determining the invasion activity based on the computer processing of a digital image of the invasion front. Therefore, in this study, we aimed to develop a method in which medical image processing could be used to determine the mode of invasion automatically based on digital images of the invasion front of an OSCC. The borderlines between the tumor and stromal tissues could be distinguished clearly up to a ne point and are clearly binarized on the IHC image corresponding to each YK classi cation.

Distributional observation of the number of epithelial areas (feature value 1)
The vertical axis of the graph in Fig. 2 indicates the numerical value of the feature amount and the horizontal axis indicates the serial number of the data. In Grades 1 and 2, most of the feature value 1 results were distributed near 1, which was consistent with the single tumor masses observed on the images. In contrast, most of the data sets for Grade 3, 4C and 4D cases yielded values greater than 1, which was consistent with the appearance of multiple tumor masses on the images. Particularly, the feature value 1 for Grade 4D specimens was 15 or higher at a half ratio. This phenomenon was not observed in the other Grades (Fig. 1). Feature value 1 provided a good distinction of Grades 1 and 2 from Grades 3, 4C and 4D (Fig. 2).

Distributional observation of the disturbance of borderline" (feature values 2 and 3)
Most Grade 4D specimens yielded a feature value 2 of 0.08 or greater. Accordingly, this feature value could effectively discriminate Grade 4D tumors (Fig. 2). In contrast, feature value 3 was di cult to classify. For example, one Grade 4C case received a uniquely high numerical value of 30 (Fig. 3). 2.6 Distributional observation of the borderline length (feature value 6) Grade 1 tumors tended to yield low values for feature value 6, and most cases with values less than 1000 met the criteria for this grade. Therefore, feature value 6 can effectively discriminate Grade 1 tumors ( Fig. 6).

Confusion matrix-based performance evaluation
As shown in Table 1, the test data of Grades 1 and 4D yielded high classi cation accuracy values, whereas the data of Grade 2 yielded a low value. Among Grade 1 cases, only 2 of 23 specimens were misjudged as Grade 2. Among Grade 4D cases, only 1 of 18 specimens was misjudged as Grade 4C. However, 4 of 12 Grade 2 specimens were misjudged as Grade 1.

Precision-recall
The precision-recall was calculated using a confusion matrix and reported using F values, as shown in Table 2. The overall F value was 0.87. In an analysis strati ed by classi cation, Grade 2 received the lowest F value of 0.67, whereas Grades 1 and 4D received the highest F values of 0.93 and 0.94, respectively.  (Fig. 7). Speci cally, a Grade 2 classi cation via machine learning was associated with a lower survival rate, compared to the same classi cation when assigned by a clinician (p < 0.05). No other signi cant differences in classi cation accuracy were observed for the other YK grades (Fig. 7).

Discussion
OSCC is characterized by a high degree of invasion into the surrounding tissues, as well as a high incidence of lymph node metastasis [19]. Therefore, it is important to determine the invasive ability of the tumor in each case, as this will inform the establishment of a treatment strategy. In this study, we developed an automatic machine-learning based method for differentiating OSCC cases according to the YK classi cation criteria. Overall, this system yielded relatively accurate results, as indicated by a high F value of 0.87. However, a further analysis of individual grades yielded a relatively low F value for Grade 2, suggesting that our method may not accurately distinguish these tumors. When we analyzed the survival rates according to the YK grade, the survival rate decreased as the grade determined by the clinician increased. In contrast, however, the machine learning-determined YK Grade 2 cases had the second-worst survival rate after Grade 4D. Moreover, only two-thirds of Grade 2 cases were correctly assigned by the machine learning system, and three-quarters of the mismatched cases actually met the criteria of a higher grade.
Grade 2 may be particularly easy to misjudge via machine learning because these lesions have an unclear borderline and a cord-like shape and are easily misclassi ed as more invasive tumors (e.g., Grade 4C), even during a subjective clinician-based analysis. Grade 2 cases also comprised the smallest subpopulation in this study. Consequently, machine learning became inadequate, and many cases were misinterpreted. We further consider that Grade 2 was associated with the second-worst survival rate.
Although machine learning and AI are being promoted in the medical eld, particularly as diagnostic approaches to head and neck cancers, our ndings suggest that clinicians should consider the risk of misjudgment when using machine learning, which is instructed using human-determined features [13].
This study was complicated by the fact that that hematoxylin-and eosin-stained (HE) images were largely not used, despite the desirability of such an approach from the perspectives of cost and convenience. However, as this research involved the challenge of a rst approach to this technology, we performed IHC to detect claudin-7, which speci cally stains OSCC tumor cells, to further clarify the borderline between the tumor and the stroma and ensure clear binary images [20,21]. The use of HE specimens alone would have made it particularly di cult to capture the sparsely scattered tumor cells in the stroma tissue of Grade 4D specimens. However the inclusion of a claudin-7 IHC analysis better facilitated the detection of tumor cells even in these Grade 4D cases [20]. In the future, advance are needed to ensure that machine learning can detect bivalence using simpler and more useful HE samples.
To improve the classi cation accuracy using deep learing, it is necessary to include a substantially high number of cases; however, we did not have the required number of cases.
Therefore, a classi er can be created with a limited number of cases by providing a clinician with a minimal amount of learning of capturing supervised image data. Therefore, we initially aimed to facilitate the creation of classi ers via machine learning by setting the features used by clinicians to determine the YK classi cation. The good overall F value suggests that good feature values were extracted. Moreover, this approach might also be useful for constructing an automatic YK classi cation discrimination method, although the accuracy must be improved.
The accuracy of machine learning could potentially be improved by dramatically increasing the number of cases. Although many pathological image ndings and clinical information can be obtained from The Cancer Genome Atlas database, this information is provided in a pathological image format and the contents are not uniform [22]. Consequently, it is di cult to apply these data in a machine learning setting. In the future, it will be necessary to collect a larger number of cases through a multi-center collaboration. The deep learning and inter-pathologist reproducibility, including the YK classi cation, encouraged by these efforts will lead to a breakthrough in the eld. Furthermore, increasing numbers of patients will bene t when clinicians and pathologists use a more effective AI system. We should continue to cooperate with the eld of AI analysis to develop diagnostic tools.

Conclusion
In this study, we developed an automatic machine learning-based classi er system to discriminate the mode of invasion of OSCC. Notably, this classi er was con rmed to generate decisions similar to those made by a clinician. Our results suggest that an automatic medical diagnostic imaging system could feasibly and accurately determine the mode of OCC invasion.

Specimens
Sixty-seven primary OSCC biopsy specimens were obtained from patients who underwent surgical resection at the Department of Oral and Maxillofacial Surgery, Kanazawa University Hospital between 1989 and 2009. The patients (38 male and 29 female subjects) ranged in age from 32 to 91 years (mean age, 60 years). Informed consent for the experimental use of the samples was obtained from the patients according to the hospital's ethical guidelines.
The engineering department of Yamanashi University performed the imaging analysis of the pathological specimens as a third-party assessment organization to eliminate evaluator bias. A total of 101 specimens were evaluated and assigned the following YK grades: Grade 1, 23 specimens; Grade 2, 12 specimens; Grade 3, 27 specimens; Grade 4C, 21 specimens and Grade 4D, 18 specimens (Table 3). The retrospective study protocol was approved by the ethics committees of Yamanashi University (approval number: 1267) and Kanazawa University (approval number: 1647-1). This study was conducted in accordance with the Declaration of Helsinki.

Staining methods
Immunohistochemistry (IHC) of depara nized and rehydrated sections was performed according to the labeled streptavidin-biotin (LSAB) method as described by Nozaki et al. [23]. To clearly detect tumor cells at the borderline, the sections were reacted overnight at 4 °C with a primary monoclonal antibody speci c for claudin-7 (Invitrogen Corp., Camarillo, CA, USA; 200-fold dilution in phosphate-buffered saline [PBS]). This tight junction component was proven to distinguish OCC in an immunohistochemical analysis of pathological tissue specimens according to the YK classi cation [20]. The sections were then reacted with a secondary antibody (biotin-labeled goat anti-rabbit immunoglobulin polyclonal antibody; Dako Japan, Kyoto, Japan) at room temperature for 60 min. A section of normal oral epithelium previously identi ed to stain strongly for claudin-7 was used as a positive control in each batch of stained samples. Sections treated with PBS instead of the primary antibody were used as the negative controls.

Yamamoto-Kohama (YK) classi cation
In Japan, the departments of oral-maxillofacial surgery at many institutions use the YK classi cation.
This method is used for the histological evaluation of malignant tissues and is focused on the invasion pattern at the tumor-host tissue border. The YK classi cation was previously shown to be strongly correlated with the risk of lymph node metastasis and prognosis [7]. The YK evaluation criteria are presented in Table 4. In this study, the clinician observed the histopathological specimens of OSCC under a microscope and observed the demarcation and deformation of the border between the tumor and stroma. Figure 8 presents images of typical OSCCs that were stained to detect claudin-7 and classi ed according to the YK system.

Overview
Two approaches to the automatic determination of the OSCC invasion pattern were applied in this study. IHC images of OSCCs were obtained at 100 × magni cation. Second, machine learning was applied to cases for which a clinician had previously evaluated the invasion patterns based on the YK classi cation, and the images were classi ed by a computer. Here, we considered a shape characterization of the invasive front in the image to be effective for discriminating the invasion mode. The characterized shape was then extracted to create a feature vector.
The proposed processing method was performed as described by Inoue et al. [24] and is summarized in Fig. 9. The invasion mode (i.e., YK classi cation) was determined automatically using machine learning according to the following methods, which are presented in order [24]: 1. extraction of color features for binarization; 2. creation of classi ers for binarization; 3. binarization of unknown color data; 4. extraction of shape features for the discrimination of YK classi cation; 5. creation of classi ers for YK classi cation and 6. discrimination of the YK classi cation of the binary image.

Binarization
The histopathological image of each tumor was divided into epithelial and stromal regions to extract the invasion front from the image. First, a borderline was created to divide the tumor epithelium and stromal regions. Binarization was then performed to distinguish the epithelial and stromal sides according to the color image pixels. In this process, the color pixels on the tumor side were converted to black and those on the stromal side to white. Initially, the binarization processing series was performed by a clinical expert, and the resulting human analyst-generated images were used as a training dataset for machine learning. The binarization process is summarized in the upper panel of Fig. 9.
Furthermore, a color image and corresponding binary image were obtained by a clinical expert, prepared as training data and used to create a classi er that would binarize each pixel of an unknown color image [25]. Next, the classi er was used to extract the color data (RGB) values from 49 pixels within a square area of which one side comprised 7 pixels centered on the pixel of interest. We created a classi er that could use machine learning to determine whether a training data image should be classi ed as black or white based on the color pattern of the pixel of interest and the neighboring pixels [26].
6.3. Features associated with the YK classi cation and binary images of OSCC 6.3.1. Features associated with YK Grade 1 As shown in Fig. 10, Grade 1 is characterized by a low level of borderline curvature, which is thought to correspond to the YK criterion of a "well-de ned borderline." The representative image also contains many large epithelial masses, with no more than one island-like nest of tumor cells.

Features associated with YK Grade 2
Grade 2 is characterized by a high level of borderline curvature. In many images, the epithelium exhibits protrusions that may be sharp (Fig. 10). These features are thought to correspond to the YK Grade 2 criteria of "cords and a less well-demarcated borderline, indicating that the borderline is slightly disturbed." Also, as in Grade 1, the tumor is not divided into islands and often appears as a large dendritic mass.

Features associated with YK Grade 3
This grade is distinguished from Grades 1 and 2 by the presence of multiple epithelial tumor islands, as shown in Fig. 10. In addition, Grade 3 OSCC is less likely to exhibit the sharp protrusions observed in Grades 2, 4C and 4D, and contains a generally round-shaped nest of cancer cells.
These features are thought to correspond to the YK Grade 3 criterion of "groups of cells with no distinct borderline."

Features associated with YK Grade 4C
Grade 4C is characterized by a cord-like epithelial region, as shown in Fig. 10. Morphologically, the tumors tend to have elongated epithelial regions. This appearance is thought to correspond with the YK Grade 4C criterion of "cord-like diffuse invasion."

Features associated with YK Grade 4D
Grade 4D is characterized by a smaller epithelial region, as shown in Fig. 10. In these lesions, the cancer nests are often too small to identify tumor masses containing only a few cells. This pattern is thought to correspond to the YK Grade 4 criterion of "diffuse invasion (diffuse type)," in which the borderline is unclear and cancerous nests are created.

Design of the feature extractor
The clinician applied the following ve features to automatically classify the binarized image into YKclassi cation by machine learning: 1. number of epithelial areas, 2. borderline disturbance, 3. cord-shaped epithelial area, 4. size of the epithelial area and 5. borderline length. These ve features were determined by an oral surgeon who received instruction about the classi cation method directly from Professor Yamamoto [7], the proponent of the mode of invasion.
6.5. Features and extraction 6.5.1. Number of epithelial areas The inputted binary image data were subjected to labeling on the tumor side (i.e., black-colored side). The areas surrounded by continuous black lines were counted to determine the number of epithelial islands. Figure 10 demonstrates that the number of tumor islands, which was de ned as feature value 1 [24], increased as the YK grade increased. The data serial numbers were then ordered from YK Grade 1 to 4D, such that Grade 4D data sets had the highest serial numbers.
Consequently, the curvature of the borderline increases as the number of division points increases, even when the lengths of the borderlines are identical. This curvature is de ned as feature quantity 2. Grade 2, which is characterized by more strongly disordered borderlines relative to Grade 1, contains some borderline areas with a high curvature and many others with a low curvature, as shown in Fig. 10.
Therefore, it is di cult to distinguish between these grades when using only the average curvature of the borderline. To correct for this non-speci city, the borderline disorder was calculated by combining feature value 2 with feature value 3. This latter value was extracted from the sharp protrusion that occurred when the angle formed by a vector from the middle division point to the front and rear division points was smaller than the threshold value (Fig. 11).

Cord shape of the epithelial area
For an inputted binary image, the highest numerical value yielded by dividing the square root of the labeled object size by the length of the corresponding contour line was extracted and set as feature value 4 [24].

Size of the epithelial area
The number of labeled objects with a size below a certain required threshold was extracted and set as feature value 5 [24].

Length of the borderline
The length of the borderline was set as feature value 6 because this parameter was expected to facilitate the distinction between YK Grades 1 and 2.
6.6 Performance evaluation test using a random forest approach Next, we experimentally analyzed the resulting discriminant performance when we performed an evaluation based on the YK classi cation and the extracted features. Here, we used the leave-one-out evaluation method [27,28] and the random forest machine learning algorithm to create a classi er [29]. Table 5 summarizes the main parameters of the random forests used in the experimental analysis of the image data subjected to machine learning. The F-measure was used as an indicator of precision-recall and was calculated from a confusion matrix that summarized the discrimination analysis of each YK classi cation [25]. The Kaplan-Meyer method was used to calculate the 5-year overall survival rates associated with each YK classi cation. The estimates generated by the machine learning approach and a clinician were compared, and the log-rank test was used to evaluate the statistical signi cance of the inter-group difference. A p-value of < 0.05 was considered to indicate a signi cant difference.

Statistical Analysis
Data analyses were performed using the statistical software SPSS 23.0 for Windows (SPSS, Inc., Chicago, IL, USA).

Declarations
Author's contributions: KY has planned the study with the help of KU and AM. HA has executed the experimental work with the help of SK and YK. All authors read and approved the nal manuscript.   Image processing to differentiate the mode of invasion a. The binary image processing procedure. b. The procedure used to discriminate the mode of invasion (Yamamoto-Kohama [YK] classi cation).

Figure 10
Representative binary image of each of mode of invasion. Black-colored regions indicate the spread of tumor cells, whereas white-colored regions represent stromal tissue that is not in ltrated by tumor cells.