Pulmonary Tuberculosis Bacilli Detection from Sputum Smear Microscopy Images Using K-Nearest Neighbor Classifier

doi:10.21203/rs.3.rs-1967480/v1

Tuberculosis is a very deadly disease worldwide, including in Ethiopia. TB is caused by Mycobacterium tuberculosis, which can cause pulmonary tuberculosis disease. Sputum smear microscopy is the most commonly used diagnostic tool in developing countries. The main purpose of this study is to develop a k-nearest neighbor classifier model for detecting PTB bacteria from sputum smear microscopic images. This study developed an algorithm based on the image processing technique to identify pulmonary tuberculosis bacilli in a digital image of a stained sputum smear. Thus, k-nearest neighbor classifiers were used to identify bacilli from sputum smear images in two classes: bacilli detect and non-bacilli detect. The total sample size of the image dataset of 180 from stained sputum images of PTB bacilli infected was obtained from Ethiopian Public Health Institute (EPHI). The model's accuracy, sensitivity, specificity, and F-measures then provided an average performance of 92.6%, which is the average performance of the prototype KNN model's sensitivity of 93%, specificity of 92%, and F-measure of 94.7%.

Medical Informatics

Keywords: Pulmonary Tuberculosis

Sputum Smear Microscopy

machine learning

KNN

A pulmonary Tuberculosis (PTB) bacillus is a type of tuberculosis which is a contagious, infectious disease that primarily attacks the lungs. It is a communicable disease and transmitted whenever individuals with active PTB cough, sneeze, laugh and release droplets in the air (Heyd, 2020). Close collaboration between patients and healthcare professionals is essential for the effective treatment of PTB bacilli diseases. Healthcare professionals include those who work in hospitals and clinics as well as physicians, nurses, pharmacists, and lab technicians. The condition can be treated by using antibiotics in the proper method. However, identifying PTB bacilli is a tough challenge for medical professionals (Kim et al., 2020).

The most common worldwide method for diagnosing patients with active PTB bacilli disease is by using sputum smear microscopy, which can determine the presence of bacteria in sputum smear images (WHO, 2017). This manual microscopy used Ziehl-Neelsen (ZN)-stained sputum smear, which was a time-consuming and error-prone procedure (Panicker et al., 2016). Lab technician examines the manual diagnosis of PTB bacilli disease using the stained smear image under microscopy, depending on the number of viable or latent mycobacteria in the sputum smear image seen as red-colored and rod-shaped objects.

In Ethiopia, the most widely used technique for diagnosing PTB bacteria is sputum smear imaging microscopy using Ziehl-Neelsen (ZN) staining, especially known as Acid-Fast Bacilli (AFB) (EHNRI, 2009). The limitations of AFB staining lie in the manual diagnostic process that does not distinguish between living and dead organisms, the limited sensitivity, and the need for high bacterial loads of 5,000–10,000 AFB/mL. It misses less than 50% and has limited specificity (Lamb et al., 2013). Sputum smear microscopy is a standard manual diagnosis for PTB bacilli detection. However, it is time-consuming, very tedious (laborious), subject to poor specificity (human error) and requires highly trained personnel (Forero et al., 2006). Therefore, the accuracy of the diagnosis decision-making ultimately depends on the skill and experience of the technicians.

Medical imaging is a technique and a process for producing visual depictions of a person's internal organs and tissues for use in clinical image analysis and medical intervention, as well as for representing physiological (organ or tissue) properties and their functions visually. (Thirumaran & Shylaja, 2015). It has become essential in many medical and laboratory research and clinical practice fields. Advanced image processing and analysis techniques of medical imaging are increasingly used in medicine. Image process quality plays a vital role in medical imaging and is expected to provide quantitative data useful for patient treatment and care (Deserno, 2011). The range of image processing and analysis of medical imaging improves the quality of acquired images and accurately extracts quantitative information from efficient medical imaging data.

The detection of PTB bacilli from sputum smear image manually, i.e., under microscopy by eye, is time-consuming and prone to error as a lab technician or pathologist has to manually change the field of view of the microscopy several times until the entire stained sputum smear image on the glass slide is viewed. Thus this study was initiated with the main aim to develop automatic PTB bacilli detection from microscopic sputum smear images using image process techniques. The developed system is considered a solution to manually diagnosing problems such as the burden on the clinician's workload, sensitivity, specificity and time consuming (the number of slides that can be screened) (Forero et al., 2006). Additionally, the proposed system can assist physicians/pathologists in diagnosing PTB bacilli early. As a result, it can save time, increase accuracy and enhance sensitivity during diagnosing PTB bacilli disease.

Objectives of the study

General Objective

This study's general objective was to develop a PTB bacilli detection model from a sputum smear microscopy image using K-Nearest Neighbor Classifier.

Specific Objectives

To collect patients' stained sputum samples and acquire sputum smear images which can be used to develop a prototype system,
To experiment by KNN on whether or not it is an appropriate algorithm to classify PTB bacilli detection from sputum smear images,
To develop a model for PTB bacilli detection that can help pathologists in decision-making,
To evaluate the model's performance using a specimen sample of image data.

Different approaches and tools are used to develop a prototype automatic diagnosis system for PTB disease-causing bacilli. In order to achieve the objectives of this study, the following methods and techniques were employed. The experimental research design is used in this research. Experimental research can be conducted in a lab setting or other settings, and its objectives might be basic or applied. In this study, a prototype model was created and tested using the experimental method. The dataset was collected from Ethiopia Public Health Institute. The main data source used for this study is the image dataset of previously solved PTB bacilli disease cases. Dataset for digital image processing purposes from these areas used for prototype system testing was collected. The organization was selected considering the seniority issue and also easy of getting experienced domain experts, the prevalence of the disease dataset (image data acquisition).

The target populations of this study are the domain experts of the EPHI National TB Reference Laboratory staff. Both interview and document analysis was done to acquire knowledge from domain experts. The domain experts were those who diagnose and treat PTB patients at EPHI. This study used the purposive sampling technique to select domain experts for knowledge acquisition and to collect previous PTB patient cases sampled from study sites. The selection criterion of domain experts for the study is based on the professions or expertise, educational qualification level, and years of experience in PTB bacilli diagnosis. For this study, a total sample size of an image dataset of 180 (100 positives and 80 negatives) from stained sputum images of PTB bacilli infected were obtained from EPHI.

The sample dataset consists of the sputum smear slide used in this study was collected at EPHI. Sputum smears images were collected from stained sputum smear specimen of patients with PTB disease. A total sample of data (180) was collected from sputum smear microscopy through the ZN stain process using Leica Microsystems microscopy connected to a computer (PC) by a domain expert. Dataset was collected using examine ZN-stained smears with a 10x100 objective lens under oil immersion views at EPHI. The image acquisition was captured from stained sputum smear slides at 100X magnifications. The pixels resolution was 696x514 pixels. The images were saved in Joint Photograph Experts Group (JPG) file format, with 24 bits per pixel, in RGB (red, green and blue) color space. The images acquired can then be stored in the computer and can be processed in real-time or offline mode using image processing techniques. These image databases result from a real patient's sputum smear specimens prepared through the ZN staining process using microscopic examination by domain experts. Concerning the data source, both primary and secondary data sources were used for this study. The primary data was collected through interviews with domain experts. At the same time, secondary data was collected from published articles, journals, and TB program reports (WHO TB reports). MATLAB tool was used to implement the proposed prototype. To evaluate the performance of the prototype system, accuracy rate, sensitivity, specificity and F-measure were used.

Different approaches and tools are used to develop a prototype automatic diagnosis system for PTB disease-causing bacilli. In order to achieve the objectives of this study, the following methods and techniques were employed. The experimental research design is used in this research. Experimental research can be conducted in a lab setting or other settings, and its objectives might be basic or applied. In this study, a prototype model was created and tested using the experimental method. The dataset was collected from Ethiopia Public Health Institute. The main data source used for this study is the image dataset of previously solved PTB bacilli disease cases. Dataset for digital image processing purposes from these areas used for prototype system testing was collected. The organization was selected considering the seniority issue and also easy of getting experienced domain experts, the prevalence of the disease dataset (image data acquisition).

The target populations of this study are the domain experts of the EPHI National TB Reference Laboratory staff. Both interview and document analysis was done to acquire knowledge from domain experts. The domain experts were those who diagnose and treat PTB patients at EPHI. This study used the purposive sampling technique to select domain experts for knowledge acquisition and to collect previous PTB patient cases sampled from study sites. The selection criterion of domain experts for the study is based on the professions or expertise, educational qualification level, and years of experience in PTB bacilli diagnosis. For this study, a total sample size of an image dataset of 180 (100 positives and 80 negatives) from stained sputum images of PTB bacilli infected were obtained from EPHI.

The sample dataset consists of the sputum smear slide used in this study was collected at EPHI. Sputum smears images were collected from stained sputum smear specimen of patients with PTB disease. A total sample of data (180) was collected from sputum smear microscopy through the ZN stain process using Leica Microsystems microscopy connected to a computer (PC) by a domain expert. Dataset was collected using examine ZN-stained smears with a 10x100 objective lens under oil immersion views at EPHI. The image acquisition was captured from stained sputum smear slides at 100X magnifications. The pixels resolution was 696x514 pixels. The images were saved in Joint Photograph Experts Group (JPG) file format, with 24 bits per pixel, in RGB (red, green and blue) color space. The images acquired can then be stored in the computer and can be processed in real-time or offline mode using image processing techniques. These image databases result from a real patient's sputum smear specimens prepared through the ZN staining process using microscopic examination by domain experts. Concerning the data source, both primary and secondary data sources were used for this study. The primary data was collected through interviews with domain experts. At the same time, secondary data was collected from published articles, journals, and TB program reports (WHO TB reports). MATLAB tool was used to implement the proposed prototype. To evaluate the performance of the prototype system, accuracy rate, sensitivity, specificity and F-measure were used.

KNN is a classification algorithm that classifies the given data by how closely the data are related. Distance calculation methods like Euclidian distance are used to find the cohesive data in the given dataset. KNN was used as a classification model that can identify PTB bacilli and applied to the feature vector constructed to classify the PTB bacilli detected. The PTB bacilli object class is identified then the recognition/classification is completed. If the PTB bacilli object class is not known after KNN classification, then KNN is applied to reduce the number of classes into two, and then the distance matrix computed during the KNN is converted to a kernel matrix using the kernel trick. The experimentation was conducted by segregating the dataset into different numbers of training and test images based on the features extracted for each image in training and testing, the preprocessing, feature extraction, feature reduction process and feature vector construction process. In this study, we used the KNN classifier algorithm on the selected view of sputum smear images under-sample of a dataset mentioned before, 70% of this dataset was used for training, and 30% was used for testing purposes for the scenario.

KNN is classified data based on distance metrics and was used as multi-class classifiers. KNN use distance metric is calculated each time it comes across a set of new unlabelled data.

As was presented in the previous section, the experiments were conducted under scenario by using extracted features of the sputum smear images. The experimental results were used KNN classifier using holdout validation at 30% percentage held out were display results shows over the scenario and their performance in table 2.

The total number of the dataset was 180 sputum smear images. There were two output classes in this study because the predefined sputum smear images of PTB bacilli were positive and negative. Classifying the test images into PTB bacilli negative or positive is required to evaluate the system's performance by assigning the image into categories done by domain experts (pathologists), and the domain experts were selected from Ethiopian Public Health Institute. As indicated in table 4.1, the results of the KNN classifier using both computed morphological and color features alone showed that from the tested dataset of 54 sputum smear images. Fourteen features were combined both (8 morphological features and six color features), including the dataset predefined by the radiologist reading label of each sputum smear image as PTB bacilli positive (1) and negative (-1). Finally, the classification performance of the prototype system was computed based on table 3.4, using the level set method with classifier on the extracted feature of 54 (tested dataset) view of sputum smear images.

As described before, most pathologists failed to identify PTB bacilli detected and missed less than 50% due to an oversight error or done manually (Lumb et al., 2013). Therefore, in diagnosing PTB detection, it is amenable that pathologists' skills have an essential role in the accuracy of detecting the bacilli. In this regard, the developed model could make a higher level of accuracy that depends on pathologists' skills and decision-making. The researchers developed a model for PTB bacilli detection, and its accuracy was tested using a sample dataset selected from the ground truth, and sources mentioned previously. As described above, performance like accuracy, sensitivity and specificity of the developed model were measured using 30% of the tested dataset. A confusion matrix was utilized to carry out this. The four categories in the confusion matrix are true Positive, False Positive, False Negative and True Negative. True positives are bacilli images that were accurately classified by the prototype model and also identified by the domain expert. False positives happen when the model receives incorrect image data but is produced as correctly classified results. As a result, the model returns some inaccurate images as relevant. True negatives are images that the prototype model and expert domain incorrectly recognize. This is the image after inaccurate detections were made and the suggested model made a wrong classification, meaning that PTB bacteria were not found. False negative is when incorrectly identified images are inserted into the system for testing, and the prototype model classifies positive result.

Table 2: Confusion matrix of prototype system of KNN classifier

As shown in table 2, out of those 54 sputum smear images, the KNN classifier predicted as bacilli True Positive was 36 out of 54 and as bacilli True negative was 14 out of 54. However, in reality, as it is evaluated by domain experts, 37 bacilli images were as positive and 17 bacilli images. Based on Table 2, the researcher obtained the following findings that aid PTB bacilli detection through accuracy, sensitivity, specificity, and F-measure calculation. Thirty-six true positives, 14 true negatives, three false positives, and one false negative were observed. According to the performance data used by KNN algorithms, the overall detection accuracy was 92.6%, with sensitivity, specificity, and F-measure, 93%, 92%, and 94.7%, respectively.

Algorithms that guarantee reliable detection in unpredictable situations are data dependent.

KNN can function successfully if the data points are heterogeneously distributed. Thus, for most practical problems, KNN is the wrong choice because it scales poorly, and it would take a long time (linear to the number of examples) to find K nearest neighbors.

Graphical User Interfaces of PTB Bacilli Detection

A graphical user interface (GUI) is a set of techniques and mechanisms used for interactive communication between programs and users. GUI has been designed for the user action to display the PTB bacilli detection results. It gives the user a better perspective of the operation that they can perform. GUI of PTB bacilli detection can make programs easier to use by providing them with a consistent appearance and with intuitive controls like buttons, boxes, axis and menu. In this study, the researchers developed a GUI using a user guide to browse images and analysis the display results of the PTB bacilli detected or not. The user can browse images by clicking the components' button at any location.

After loading the image, the PTB bacilli detected button displays the image processed and performs the classification by the k-nearest neighbor classifier. Finally, the presented result gives the user a better view of each processed, whether PTB bacilli are positive or negative at the click of the button. Generally, GUI can be used to identify PTB positive or negative after being analyzed. The same GUI can be used to image processing by altering the callbacks. With the use of GUI-based programs, PTB can be quickly and effectively detected without the need to rewrite the program's code. The proposed GUI, which can clearly demonstrate the findings, whether PTB bacilli are positive or negative, is shown below in Fig. 1.

The number of acquired images from each of the two categories of the PTB positive and negative detected. The second step is an image preprocessing technique manipulating images to remove unwanted (undesired) noise and enhance the image quality from the image acquired. Therefore, image preprocessing is employed to make images look better to human viewers and to get them ready for image segmentation of the region of interest. To reduce the workload associated with image preprocessing, the study considers various environmental parameters, such as illumination and camera resolution. In addition, the relative positions of the sources and camera concerning the items of interest, or the geometry of the viewing scenario, typically also significantly impact the contrast between the object and its background.

This study addresses how automatically identification of PTB bacilli disease is possible by using image processing techniques by effectively analysing various features of bacilli image characteristics by computing both morphological and color features. The developed automated PTB bacilli detection system can be used at a low cost for developing countries. The system was developed by using image processing techniques to images captured by conventional microscopy that could save lives in low resources communities burdened by PTB bacilli detected. The total sample of the dataset, 180 PTB bacilli (100 positives and 80 negatives), was collected from sputum smear microscopy through the ZN stain process using Leica Microsystems microscopy connected to a computer from EPHI. The collected dataset was used for data training and evaluation. The accurate detection and classification of the PTB bacilli detection is vital for the successful detection of PTB bacilli can be done by using processing techniques. The main aim of this study is to develop automatic PTB bacilli detection from microscopic sputum smear images using image processing techniques. In this study, an algorithm based on the image processing technique is selected to identify pulmonary tuberculosis bacilli in a digital image of a stained sputum smear. The tested data results were displayed in GUI to indicate the PTB bacilli positive or negative classes. The classification PTB bacilli detection measuring the performance of the proposed system results are found. The performance of the prototype KNN findings was measured by accuracy, sensitivity, specificity, and F-measures, which were 92.6%, 93%, 92%, and 94.7%, respectively. A graphical user interface (GUI) has been created to take an image and show whether PTB bacilli are present in the sputum smear image or not.

Recommendation

The main objective of this research was to develop an automated detection system for pulmonary tuberculosis bacilli identified from sputum smear microscopy images using image processing techniques. However, developing a system using image processing techniques needs to consider the performance of the techniques used regarding speed. Therefore, there are several problems to be investigated by future researchers in applying image processing techniques which are as follows:

Applying stained sputum smear images obtained using digital microscopy as an input image to acquire quality images for the systems is recommended.
Applying the KNN classifier algorithm can also be used to design various mobile phone application tools that help in diagnosis on decision making.
Applying the PTB Drug Resistant image processing approach is necessary because the diagnosis of tuberculosis and resistant cases requires the creation of much more rapid, effective, patient-friendly, and affordable techniques with greater precision.
Apply to extract the features to design and compare KNN and ANN for further investigation.
Future work should also look into the problem of overlapping objectives in PTB bacilli feature extraction to represent the validation of the proposed algorithm.

I declare that this manuscript is our original work and it has not been published on any journal yet. The authors declare that there is no conflict of interest. All the material sources used in this work are duly acknowledged.

Deserno, T.M. (2011): Fundamentals of Biomedical Image Processing. Springer-Verlag Berlin Heidelberg. DOI: 10.1007/978-3-642-15816-2 1.
Ethiopian Health & Nutrition Research Institute (EHNRI).(2009). Laboratory Network Guidelines for Quality Assurance of Smear Microscopy for Tuberculosis Diagnosis. Addis Ababa, Ethiopia.
Forero, G. Cristobal, and M. Desco (2006). Automatic identification of mycobacterium tuberculosis by Gaussian mixture models, J. Microsc., 223(2), 120–132.
Heyd, A. T. (2020). Management of Latent Tuberculosis Infection Among an Inner-city Population with Psychosocial Barriers to Treatment Adherence.
Kim, C. J., Kim, Y., Bae, J. Y., Kim, A., Kim, J., Son, H. J., & Choi, H. J. (2020). Risk factors of delayed isolation of patients with pulmonary tuberculosis. Clinical Microbiology and Infection, 26(8), 1058–1062.
Lumb, R., Deun, A. V., Bastian, I. and Fitz-Gerald, M. (2013). The handbooks: Laboratory diagnoses of Tuberculosis by using sputum smear microscopy, SA Pathology from Road Adelaide South Australia, ISBN: 978-1-74243-602-9.
Mohajan, H. K. (2015). Tuberculosis is a Fatal Disease among Some Developing Countries of the World. American Journal of Infectious Diseases and Microbiology, 3(1), 18–31.
Panicker, R.O., Sman, B., Saini, G. and Rajan, J. (2016). A Review of Automatic Methods Based on Image Processing Techniques for Tuberculosis Detection from Microscopic Sputum Smear Images. J Med Syst (2016), 1(40), 17.
Thirumaran, J and Shylaja, S. (2015). Medical Image Processing: An Introduction. International Journal of Science and Research (IJSR),4(11), 2319–7064
WHO. (2017). World Health Organization Global tuberculosis report 2017, Geneva. Retrieved from http://www.who.int/tb/publications/global_report/en/

Table 2 can be found in the supplementary files.

Tables 1, 3, and 4 are not available with this version.

table2.jpg
Confusion matrix of prototype system of KNN classifier

Pulmonary Tuberculosis Bacilli Detection from Sputum Smear Microscopy Images Using K-Nearest Neighbor Classifier

Status:

Version 1

Abstract

Figures

Introduction

Objectives of the study

General Objective

Methodology

Experimental Results

Graphical User Interfaces of PTB Bacilli Detection

Conclusion

Recommendation

Declarations

References

Tables

Supplementary Files

Status:

Version 1