The establishment of a multi-source dataset of images on common oral lesions


 Purpose

To establish an oral lesion image database that could accelerate the development of artificial intelligence systems for lesion recognition and referral decision.
Materials and Methods

We describe the establishment of a multi-sourced image dataset through the development of a platform for the collection and annotation of images. Further, we developed a used-friendly tool (MeMoSA® ANNOTATE) for systematic annotation to collect a rich dataset associated with the images. We evaluated the sensitivities comparing referral decisions through the annotation process with the clinical diagnosis of the lesions to identify lesions that are challenging to identify through images alone.
Results

The image repository hosts 2474 images of oral lesions consisting of oral cancer, oral potentially malignant disorders, benign lesions, normal anatomical variants and normal mucosa that were collected through our platform, MeMoSA® UPLOAD. Over 800 images were annotated by seven oral medicine specialists on MeMoSA®ANNOTATE, to mark the lesion and to collect clinical labels. The sensitivity in referral decision for all lesions that required a referral for cancer management/surveillance was moderate to high depending on the type of lesion (64.3–100%).
Conclusion

This is the first description of a database with well-annotated oral lesions. This database has already been used for the development of AI algorithm for classifying oral lesions. Further expansion of this database could accelerate the improvement in AI algorithms that can facilitate the early detection of oral potentially malignant disorders and oral cancer.


Introduction
Oral cancer (OC) was diagnosed in more than 350,000 individuals globally, with over 177,000 deaths in 2018 (GLOBOCAN 2018). OC disproportionately affects low-and middle-income countries (LMICs), and the majority are detected late resulting in poor patient survival (Warnakulasuriya 2009). Late diagnosis is mainly due to poor oral health awareness and inaccessibility to healthcare due to large rural populations that are exacerbated by remote geographical locations and insu cient healthcare resources (Nagao and Warnakulasuriya 2020). OC is often preceded by oral potentially malignant disorders (OPMD), affording the opportunity to detect these lesions for management before the development of OC. Early detection of OPMD and OC requires trained healthcare practitioners who can differentiate high-risk lesions that are malignant or likely to become malignant from the many types of oral lesions that do not have any risk for malignant transformation and enabling the appropriate management of oral lesions (Güneri and Epstein 2014). Indeed, a lack of dental specialists has been associated with increased rates of delay in the detection of oral cancer (Crossman et al. 2016;Onizawa et al. 2003).
Applications of arti cial intelligence (AI) techniques for disease detection, prognostication and prediction by medical image analysis are now being embraced for healthcare decision-making in medicine and dentistry (Joda et al. 2020;Lindsell et al. 2020). A study by Esteva et al. (2017) has successfully used a form of AI called deep convolutional neural networks (CNNs) to classify skin lesions, identifying those that are malignant and most deadly. The trained CNN performance was on par with board-certi ed dermatologists, demonstrating that AI algorithms could reach a level of competence comparable to trained experts (Esteva et al. 2017). Such automated classi cation systems could be applied to address the limited number of dental specialists which is often a bottleneck for clinical diagnosis LMICs.The development of AI to automate the identi cation of OPMD and OC through clinical images is still at its infancy mainly because of the lack of systematically collected well-annotated images of oral lesions that are required to develop such systems. Large training datasets with their corresponding ground truth (GT) and annotated labels are necessary to achieve good performance through e cient learning and to prevent over tting (Krig 2014;Mendonça et al. 2013;Yamashita et al. 2018).
We have initiated a repository for the collection of oral lesion images through selected members of the recently established Asia-Paci c Oral Cancer Network (APOCNET) (Syed Mohd Sobri et al. 2020) and other clinical collaborators. Here, we report on the development of a platform for secure transfer of images to the repository and a customized annotation tool for a uniform collection of clinically relevant information of images of the oral cavity that have hitherto not been available. This resource has resulted in the development of AI systems for the classi cation of oral lesions and will continue to improve the performance of AI. When incorporated into a mobile phone application, such as MeMoSA® (Haron et al. 2020), early detection of OPMD and OC at point-ofcare can be facilitated.

Materials And Methods
Images utilised in this study were collected from clinical collaborators and Open Access sources. Ethical approvals were obtained from the Institutional Review Board (IRB) of each centre for the use of the anonymised images. Open Access images were downloaded systematically using speci c keywords through Google Images with pre-determined exclusion criteria ( Supplementary   Fig. 1). To build a repository of well-annotated images, a platform to collect, store and annotate images securely was developed

MeMoSA® UPLOAD
MeMoSA® UPLOAD is a customised web interface, hosted on a web server used by clinical contributors to transfer images securely to the MeMoSA® Data Vault. Collaborators access their accounts to upload images with the associated metadata, which includes patient demographics and risk habits for each corresponding image. Uploaded data is transferred via a secure DICC internal network communication and stored in speci c folders on the DICC server.

MeMoSA® ANNOTATE
MeMoSA® ANNOTATE is a customised annotation tool built on top of the open-source tool, ImageTagger (Fiedler et al. 2018).
MeMoSA® ANNOTATE was used for systematic annotation of the images in the MeMoSA® Data Vault. The images were chosen from the repository by a research team member to include a variety of lesions and of normal mucosa, and uploaded to MeMoSA® ANNOTATE with its metadata. Users perform the annotation on their workbenches by accessing the DICC network via an encrypted (SSL)/HTTPS connection.
MeMoSA® ANNOTATE enables the systematic annotation of images with labels describing the appearance of the lesion and captures referral recommendations made by the annotators. Guided by three board-certi ed oral medicine specialists, a decision tree that describes the ow of the tool was rst developed. Seven main types of lesions were identi ed as signi cant clinical descriptors to describe an oral lesion: ulcer; white lesion, red lesion, mixed white and red lesion, swelling, pigmented lesion, and erosion (Scully 2012). Lesions that did not t into these descriptions such as oral submucous brosis (OSF) were categorized as "not applicable".
Descriptions of the appearances including site, colour, presence of ulceration or swelling, texture, number of lesions, borders, and shape (Scully 2012) were incorporated into the tool under the seven main types of lesions. Finally, the annotation process culminated in a referral decision and a classi cation of the lesion into a disease type (Table 1). Data on the image quality of each image was also collected. Four board-certi ed oral medicine specialists validated the annotation tool by using a test image dataset consisting of eight OPMD, ten non-OPMD and two oral cancers. A questionnaire was administered to collect feedback on the accuracy of clinical descriptors, the feasibility of the annotation process and the user interface experience. The tool was modi ed based on this feedback. The tool was further tested by seven board-certi ed oral medicine specialists using 400 Open Access oral lesion images where each image was annotated by at least three specialists. These images included a variety of oral lesions and normal variants that fall within the following broad categories of OC, OPMD, non-OPMDs (benign lesions and developmental abnormalities (DA)) and normal mucosa/normal anatomical variants (NAV). A virtual workshop was conducted amongst all the annotators to discuss discrepancies in the clinical diagnosis, to reach a consensus on annotation labels. Annotations included bounding boxes and labels, such as size, margin, site, outline were captured for each image in SQL format and stored in the PostgreSQL database in the MeMoSA® Data Vault.
The web server provides an application programming interface (API) for secure outward transfer of information from the MeMoSA® Data Vault (Fig. 1). For example, to facilitate AI algorithm training using the annotated images, between members of our group, data can be downloaded as described above. The server is password protected and only accessible to an authorized user. The use of these images in the development of AI for the automated detection and classi cation of oral lesions have been published (Welikala et al. 2020a;2020b).

Sensitivity of Annotations to Ground Truth
To identify lesions that were di cult to diagnose and make referral decisions from visual images alone which could indicate similar di culties for the AI algorithm to classify lesions, we calculated the sensitivity of referral recommendations made by the annotators through MeMoSA® ANNOTATE compared to the clinical diagnosis (ground truth; GT). We also computed accuracy, positive predictive value (PPV) and F1 scores for each referral category compared to GT. For this analysis, 400 images were analysed, consisting of 61 images of OC, 90 of OPMD, 159 of benign lesions, 20 of DA and 70 of normal mucosa/NAV.
OC and OPMD are classi ed as referable lesions, benign lesions and DA could be either 'refer for other reasons' or 'no referral' depending on the clinical attention required, whilst mucosa with NAV and those that do not have any changes would not require any referral and therefore categorized as 'no referral'. Sensitivity between the referral decision from the data collected through the annotation tool versus the GT was calculated for each disease category (OC, OPMD, benign, DA and normal mucosa/NAV) and for individual disease type for the OPMD category. Statistical analysis was conducted using SAS V9.4 (SAS Institutes, Cary, NC, USA).

MeMoSA® UPLOAD
Two thousand seven hundred and three (2703) images were uploaded using MeMoSA® UPLOAD between June 2019 to September 2020, comprising of a variety of oral lesions as detailed in Table 1. The largest number of images collected was of benign lesions with 1041 images. This was followed by 539 images representing oral lichen planus (OLP), 482 images OSF and 298 images of OC.
Images were assigned a unique sequential ID by the clinical collaborators as they were submitted through MeMoSA® UPLOAD.
Manual assessment of the quality of uploaded images was done to i) determine image quality ii) identify images with potentially identi able content and iii) remove duplicated images. Of the 2703 images, 105 images were excluded because they were out of focus, 105 images had extraoral content that could potentially identify a subject and another 19 images were excluded because they were duplicates. A subset of these images was randomly chosen to be annotated as described below.

MeMoSA®ANNOTATE
Images to be annotated are preloaded by the administrator and displayed as shown in Fig. 2a. A selected image will appear along with its corresponding metadata (Fig. 2b & c). Annotators will rst annotate if a lesion is present on the image (Fig. 2d). If present, a bounding box is placed around the lesion using their cursor, marking the location of the lesion (Fig. 2e). The lesion type and its associated clinical descriptors to describe the appearance of the lesion is determined (Fig. 2d) and a referral decision is decided ( Fig. 2f), followed by naming the lesion (Fig. 2g). Finally, the image quality is rated (Fig. 2h) before the annotations are saved (Fig. 2i). Once submitted, the annotation is led (Fig. 2j). More than 800 images were annotated by seven oral medicine specialists.

Sensitivity of Annotations to Ground Truth
Of the 800 images, 400 images with GT resulting in 2800 annotations were included in our analysis. Overall, the sensitivities in referral decision for all lesions that required a referral for cancer management or surveillance was moderate to high, ranging from 64.3% for erythroplakia and OSF up to 100% for discoid lupus erythematosus (DLE). For the lesions falling into the category of "refer-high risk" the overall sensitivity between the annotated decision and the GT to refer a lesion was 86.7%, and for those with cancer it was 90.6%. In this category, the lowest sensitivity was for erythroplakia and OSF at 64.3% for each of these categories. About 10.4% of the "refer-high risk" lesions were annotated to be "refer-low risk" where the majority of these fell into the nonhomogenous leukoplakia disease type (30.2%) ( Table 2). For lesions in the "refer-low risk" category, the overall sensitivity to refer a lesion was 85.7% with DLE having the highest sensitivity at 100% and homogenous leukoplakia at 80.2% being the lowest. Lesions in the "refer-low risk" category that were annotated as "refer-high risk" were mainly homogenous leukoplakia (16.5%). Overall, 13.3% of "refer-high risk" and 14.2% of "refer-low risk" lesions, were annotated as "refer for other reasons" or "no referral needed". The majority of these were OSF, erythroplakia and homogenous leukoplakia. Accuracy of 90.5% and 92.0% were achieved for "refer-high risk" and "refer-low risk" categories respectively. The PPV was 80.7% for "refer-high risk" and 71.5% for "refer-low risk", with F1 scores of 78.5% and 74.2% respectively (Supplementary Table 1). Benign lesions and developmental abnormalities also were correctly referred for other reasons/no referral with accuracies of 88.8% and 93.6% respectively. Similarly, normal mucosa or mucosa with NAV were correctly identi ed as requiring no referral with a sensitivity of 83.7%, with only 5.7% annotated for referrals associated with cancer management or surveillance ( Table 2). Accuracy of 88.6% and 96.0% were achieved for "refer-other reasons" and "no referral" categories respectively. The positive predictive value was 85.8% for "refer-other reasons" and 93.0% for "no referral", with F1 scores of 87.5% and 88.1% respectively (Supplementary Table 1).

Discussion
Whilst emerging evidence demonstrates that AI could be used in classifying oral lesions (Uthoff et al. 2018), progress has been slow due to the lack of a clinically labelled, well-annotated training dataset on oral lesions. MeMoSA® UPLOAD provides a standardised system to collect and uniformly transfer images and data, facilitating collaborations across many countries where oral cancer incidence is high. The large number of OC images collected over a short time represents the high burden of OC in these countries (GLOBOCAN 2018). The images collected were representative of the most prevalent oral mucosal lesions in South and South-East Asia. These included OLP, which is prevalent in Malaysia (Zain et al. 1997), while OSF is prevalent in Sri Lanka, Nepal and India (Amarasinghe et al. 2010;Warnakulasuriya et al. 1984). Similar efforts to establish image datasets have been seen in dermatology, in the HAM10000 (10,015 images) (Tschandl et al. 2018) and PH2 (200 images) databases (Mendonça et al. 2013). Both these databases have facilitated machine learning for automated classi cation of skin lesions and accelerating the development in this eld (Esteva et al. 2017). The Age-Related Eye Disease Study Research Group (2001) have also collected over 130,000 colour fundus images from 4,613 patients as a result of a 12-year longitudinal study for a better understanding of disease progression and risk factors behind macular degeneration. We believe that a large global network would accelerate the collection of a substantial number of diverse images of oral lesions that will cover all important diagnosis of oral lesions. MeMoSA® UPLOAD will enable the expansion to other clinical collaborators with ease to continue building on the database.
MeMoSA® ANNOTATE enables images to be annotated in a way that mimics the observations that a clinical specialist makes when examining a lesion. Tools similar to MeMoSA®ANNOTATE have been described in the literature. 'DerMat', used to annotate images in the PH2 database (Mendonça et al. 2013) allows users to draw and focus on a region of interest (Ferreira et al. 2012). In the work of Age-Related Eye Disease Study Research Group (2001), colour fundus images were graded by-hand by technicians and were used in several studies to train and validate deep learning algorithms (Burlina et al. 2017a;Burlina et al. 2019;Burlina et al. 2017b). The difference between MeMoSA®ANNOTATE and the tools mentioned is that it was built with a decision tree to capture or mimic a clinical oral examination by a specialist, and customised to annotate features of oral lesions. Specialists manually annotated the images and referred to the metadata to arrive at a referral recommendation based on the risk of malignant transformation of the suspected lesion. Bounding boxes and clinical descriptions were collected to build a rich set of labels for each lesion. Some of these annotations have been proven to improve the performance of the AI algorithm for image detection and classi cation by employing attention mechanisms in the training by guiding the AI to regions of interest within the image (Welikala et al., manuscript in preparation). Further, the detailed annotated descriptions could improve AI through multiheaded training. Besides, this information could be used in the future to generate text rich reports for each referral recommendation by the AI. This could give clinicians more con dence in the type of information that has been used by the AI to arrive at a referral decision. This database containing these annotated images has been used to guide the training of a deep learning algorithm, as described in recent publications (Welikala et al. 2020a;2020b).
We observed that sensitivity was high in providing the correct referral decision for cancer surveillance compared to the GT, with more than 85% for all disease types except erythroplakia and OSF. The high sensitivity shows that the data input in the form of images, and information on patient demographics and risk habits are reliable in making a referral decision and should be reliable information for AI training. However, further re nements can be made. Some lesions have been identi ed to be di cult to make decisions based on images alone, such as early OSF, which is particularly di cult to diagnose even during a clinical oral examination, since it may appear as blanching of the oral mucosa or loss of pigmentation which only appears in some South Asian populations (Warnakulasuriya 2018). Clinical diagnosis is usually reached after examination of mucosa and evaluating clinical history and information on symptoms such as burning sensation, dry mouth or limited mouth opening along with the presence of palpable brous bands. Therefore, such should be collected as metadata in anticipation that these would be also useful in the training of the AI algorithm. As for erythroplakia, 35.7% were annotated as "refer for other reasons", by identifying it as a benign lesion. Clinically, erythroplakia could be mistaken as several red-appearing benign conditions, including erythematous candidiasis or erythema migrans (Warnakulasuriya 2018). However, the review of a single image hampered the ability of the specialist to distinguish erythroplakia from these benign lesions. We will need to consider displaying multiple images of the oral cavity for each patient to present a comprehensive view to obtaining more accurate annotations, or collect clinical information and medical history based on questions that enable the capture of these information. As for lesions that did not require a referral, the high positive predictive values and F1 scores for "refer for other reasons" and "no referral needed" categories indicate these lesions were correctly identi ed and not referred, meaning the AI could reduce the burden to the healthcare system due to inaccurate referral should it be implemented.

Limitations
Whilst patient demographics and risk habit information are available, further questions to collect more comprehensive clinical information that could help annotators reach a more accurate decision should be considered. This is particularly so for the OPMD lesions such as erythroplakia and leukoplakia which are by de nition diagnosed based on the exclusion of other diagnoses. Therefore, in the next developmental stage of MeMoSA®ANNOTATE, avenues for richer data collection possibly including patient history such as chief complaint, history of chief complaint, medical history and oral hygiene products used would be incorporated. We could also bene t from a standardisation of images captured and an image capturing protocol is being developed.

Conclusion
We described the establishment of a systematic and secure platform for the development of an oral lesion image repository that has hitherto not been available. Given that clinical oral examination can be conducted relatively easily by a primary healthcare practitioner, automated detection and classi cation algorithm would go a long way in helping clinicians distinguish the multitude of types of oral lesions and NAV that occur in the oral cavity. Our work as described here could accelerate the progress in developing such an AI system.