Automatic 2D material detection in optical images using deep-learning-based computer vision

Computer vision algorithms can quickly analyze numerous images and identify useful information with high accuracy. Recently, computer vision has been used to identify 2D materials in microscope images. 2D materials have important fundamental properties allowing for their use in many potential applications, including many in quantum information science and engineering. In order to use these materials for research and product development, single-layer 2D crystallites must be prepared through an exfoliation procedure and then identified using reflected light optical microscopy. Performing these searches manually is a time-consuming and tedious task. Deploying deep learning-based computer vision algorithms for 2D material search can automate the flake detection task with minimal need for human intervention. In this work, we have implemented a new deep learning pipeline to classify crystallites of 2D materials based on coarse thickness classifications in reflected-light optical micrographs. We have used DetectorRS as the object detector and trained it on 177 images containing hexagonal boron nitride (hBN) flakes of varying thickness. The trained model achieved a high detection accuracy for the rare category of thin flakes (< 50 atomic layers thick).


INTRODUCTION
Object detection is an important computer vision task that deals with detecting instances of visual objects of certain categories in digital images. The goal of object detection is to develop computational models and techniques that provide one of the most basic pieces of information needed by computer vision applications. 1 In other words, the goal of object detection is to determine whether there are any instances of objects from given categories in an image and, if present, to return the spatial location and extent of each object instance. 2 Object detection supports a wide range of applications, including robot vision, consumer markets, autonomous driving, human computer interaction, content based image retrieval, intelligent video surveillance, and augmented reality. 2,3 Single and few-layers of two-dimensional (2D) materials provide many opportunities to explore quantum phenomena in systems with appealing features such as strong many-body interactions, pristine interfaces, and strong confinement in a single direction. [4][5][6][7] In single-and few-layer form, many 2D materials exhibit excellent optical, electronic, and/or magnetic phenomena that provide means to prepare, interact, and study quantum states in matter. [8][9][10][11][12] With over 1000 known van der Waals 2D materials that span all functionalities (i.e., insulators, metals, semiconductors, ferromagnets, ferroelectrics, etc.), a prolific parameter space of systems and structure-property relationships for quantum, optoelectronic, and magnetic technologies exists. [13][14][15] The task of exploring the vast world of 2D materials requires the ability to efficiently and reliable produce single-and fewlayer samples of 2D materials. 16,17 However, the most widely-used state-of-the-art method for preparing singleand few-layer samples of 2D materials relies on the manual exfoliation of a large population of smaller crystallites from a bulk crystal. The resulting crystallites have a range of thicknesses, and the entire population must be manually searched using optical reflected light microscopy to identify those that have a desired thickness, which is most often the thickness of a single layer. This labor-intensive method for collecting single-and few-layer samples strongly inhibits rapid discovery and exploration of these material systems.
Deep learning computer vision methods identify and analyze numerous images into useful information at a fast rate with a high accuracy. Deploying DL algorithms for 2D material search can automate the flake detection task with minimal need for human intervention. So far, there have been various attempts taken toward flake identification in images. [18][19][20][21][22][23][24] However, these methods failed to detect flakes within different microscope settings and image magnification. In this work, we propose a new detection pipeline which not only makes improvements in detection accuracy, but also is able to detect the rare class of thin flakes (< 50 atomic layers) with a recall of 0.43 and precision of 0.43.

METHODS
Our proposed deep learning method to detect hBN flakes in 2D material microscopic images is shown as a pipeline in Fig 1. This pipeline consists of three main steps which will be explained in detail.

Data Acquisition
To prepare the dataset, 208 images were collected over 2 months. To do this, hexagonal boron nitride (hBN) samples were fabricated via mechanical exfoliation. Bulk hBN crystals were placed onto a piece of exfoliation tape and separated into thin layers by folding the tape together and peeling the tape apart five times. The exfoliation tape was then placed onto a 1 x 1 cm 2 silicon (Si) wafer with a 100 nm silicon oxide (SiO 2 ) layer. To ensure the 1x1 cm 2 region of the tape was in contact with the Si/SiO 2 , the tape was pressed down with a cotton swab across the full area of contact. The sample, Si/SiO 2 /exfoliation tape, was then heated for 1-2 minutes at 90 • C and allowed to cool to room temperature, where the exfoliation tape was removed peeled. The hBN sample images were taken with a 20X objective using a Motic BA310MET-T Incident and Transmitted Metallurgical Microscope. The hBN images for the dataset were taken with the same camera settings and light intensity.

Data Labeling
Labeling the data is the preliminary step in a supervised learning object detection task, and the results of the model heavily depend on the accuracy of the annotations. To annotate our data, we used Roboflow (https://roboflow.com) which is an online annotation tool. For better accuracy of training, we annotated the flakes within each individual image manually and in three different categories as; Thick, Intermediate, and Thin. Table 1 shows the thickness and number of layers for each class. The table also indicates the number of images and the number of labeled instances of each flake type present in the full dataset.
The annotations were then saved in the Microsoft common objects in context (COCO) JSON format. To train and evaluate the machine learning algorithm, the dataset was split into three subsets: training set, validation set and testing set. The training set consists of 177 images which contain 81% of the annotations; the validation set consists of 21 images which contain 14% of the annotations; and the testing set consists 10 images which contain 5% of the annotations.

Model and training
To detect the 2D objects, DetectoRS 25 was chosen as the training algorithm. DetectoRS is a multi-stage supervised object detection algorithm. This algorithm involves two major components: Recursive Feature Pyramid (RFP) and Switchable Atrous Convolution (SAC). RFP is employed at the macro level that incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers. At the micro level, SAC is utilized, which convolves the features with different atrous rates and gathers the results using switch functions. Combining them results in DetectoRS, which significantly improves the performance of object detection. DetectoRS achieves state-of-the art performance on the COCO test-dev dataset.
The performance of supervised machine learning models depends on the quantity of the dataset to avoid overfitting. However, most of the time collecting and labelling the data is a tedious time-consuming process. Data augmentation techniques are a solution to address this problem. The augmented data will represent a more comprehensive set of possible data points, thus minimizing the distance between the training and validation set, as well as any future testing sets. 26 To augment the data, we used various techniques such as rotating, flipping and color contrast twice; once within Roboflow before exporting the annotation and another time during the training procedure.
To improve the performance of the model, we employed transfer learning. Fine-tuning pre-trained deep networks is a practical way of benefiting from the representation learned on a large database while having relatively few examples to train a model. 27 We took the model which was previously trained on the COCO dataset 28 and used it as a starting point to retrain it on our own dataset. For training on 2D material images, we chose DetectoRS algorithm with Cascade+ResNet-50 as the backbone detector architecture implemented in MMDetection library. 29 The model was trained with the learning algorithm rate of 0.0025, using Cuda and for 20 epochs.

RESULTS
The inference results of the trained model when applied to one of the test images is shown in Fig 2. Each flake identification in the image consists of three components; a bounding box, a class label, and a confidence number or probability score which shows the certainty of the algorithm that the class is detected correctly. As can be seen, the algorithm was able to detect the Thin sample, which is the most important among the three classes as it possibly contains a true monolayer material.
To measure the accuracy of the object detector across all test images, we used various evaluation metrics. Before examining the these metrics, we present some basic concepts in object detection:  It is also good to notice that a true negative result is when the algorithm detects the background correctly. Since labeled boundary boxes are only applied to objects, a true negative is not reported in the context of object detection.
Two common metrics to measure the accuracy of the object detection performance are precision and recall which are defined based on the TP, FP and FN: Precision = TP/(TP+FP) and Recall = TP/(TP+FN).
In words, precision returns the ratio of correctly identified objects to all identified objects. Recall measures the proportion of the ground-truth objects that were identified by the algorithm. Mean Average Precision (mAP) is a measure that combines recall and precision for ranked retrieval results. 31 .
Another common tool used to measure the number (percentage) of correct and incorrect detections for each individual class based on the ground truth data is the confusion matrix. In our research, we have considered each flake within an image for evaluation. Therefore we consider a true positive example to be when individual flake was detected and classified correctly considering IoU of 0.5. Incorrect detections occur when a flake is not detected at all (i.e. it is considered by the algorithm to be part of the background substrate), when it is detected as an incorrect class, or when the background substrate is detected as a flake. Fig. 3 shows the confusion matrix showing the classification results on the 10 images in the test dataset. It can be seen that our trained model was able to achieve a high number of correct detections, more notably for the Thin.

DISCUSSION
Before training the DetectoRS model, we applied Masubuchi's pre-trained Mask-RCNN on our own dataset. We discovered this pre-trained model was not able to perform well on our dataset. For many test images, including the image shown in Fig. 2(a), the pre-trained model was unable to identify any flakes. For other images, the pre-trained RCNN was able to identify only a small subset of the total number of flakes.We note that one cause of the poor performance of their algorithm is that we modified the labeling procedure and criteria. Future work will explore a robust, standardized method for data labeling so that algorithm performance can be compared more directly.
In addition, future work will explore integrating the proposed algorithm with an optical microscope to make the detection pipeline fully automatic. We also intend to collect and label data associated with additional 2D materials, expanding beyond hBN.