Automated Segmentation of Nucleus And Cytoplasm of Cervical Cells From Pap-Smear Images Using A Quadtree Decomposition Approach.

Background: Digital pathology and microscopy image analysis is widely used for comprehensive studies of cell morphology especially for cervical cancer screening from pap-smears. Manual assessment of pap-smears is labour intensive and prone to interobserver variations. Computer-aided methods, which can significantly improve the objectivity and reproducibility, have attracted a great deal of interest in recent literature. A critical prerequisite in automated analysis of pap-smears is nucleus and cytoplasm segmentation, which is the basis of cervical cancer screening. This paper articulates a potent approach to the segmentation of cervical cells into nucleus and cytoplasm using a quadtree decomposition approach with statistical measures. Results : Choosing an appropriate quadtree decomposition strategy was a great challenge and a novel task in the proposed approach. The image is pre-processed using an enhanced median filter and is decomposed based on the mean, maximum entropy and the variance statistical measures of the pixels in the subtree. As a result, highly efficient and segmentations of acceptable performance were obtained. Comparison of the segmented nucleus and cytoplasm with the ground truth nucleus and cytoplasm segmentations resulted into a Zijdenbos similarity index of greater than 0.9034 and 0.9498 for nucleus and cytoplasm segmentation respectively. Conclusion: Given the accuracy of the classifier in segmenting the nucleus which plays an important role in cervical cancer diagnosis and classification, the classifier can be adapted for automated systems for cervical cancer diagnosis and classification. The method serves as a basis for first level segmentation of cervical cells for diagnosis and classification of cervical cancer from pap-smears. cytoplasm the of the quadtree cell It presents a quadtree-based algorithm for nucleus and cytoplasm segmentation of cervical cells from pap-smear images using statistical


Background
Cervical cancer is one of the most deadly and common forms of cancer among women in the world [1]. About 85% of the cases occur in developing countries [2]. Cervical cancer is preventable through a regular simple papsmear screening test and there have been numerous attempts to automate the analysis of pap-smears since its introduction more than 70 years ago [3]. However, in many of the low middle-income countries, a pap-smear analysis is a manual process. This manual analysis is carried out by experienced cytotechnicians who are also very few in low middle-income countries. The manual visual examination of the pap-smears is time-consuming, labour intensive, subjective and error prone [4]. Due to the limitations of the manual pap-smear analysis, it is beneficial to develop a computer-assisted diagnosis system to make the pap-smear test more accurate and reliable.
Cell segmentation is one of the most important stages of such an automated system. However, segmentation of the cervical cells in a pap-smear digital image is an ill-posed problem. Very few of the current methods can achieve complete nucleus and cytoplasm segmentation due to a number of challenges involved in delineating individual nucleus and cytoplasm with severe overlap and poor contrast [5].
High-tech light and electron microscopes are capable of acquiring quality pap-smear images, but quantitatively evaluating these images often involves manually annotating structures of interest in the cell. This process is errorprone, time-consuming and is becoming the main bottleneck in the automated pap-smear analysis [6]. Cell nuclei segmentation from pap-smear images is an important image processing task necessary for automated cervical cancer screening due to the fundamentally important role of nuclei in cervical cancer cell [7]. The success of automated cervical cancer classification is often a direct consequence of the accuracy of the nucleus and cytoplasm segmentation. A cell segmentation paradigm involves separating a cell into regions or contours corresponding to different objects in the cell [8]. This is usually achieved by identifying common properties in an image or identifying differences between regions (edges) in the cell.
As the Biomedical Engineering field is growing, new imaging modalities and staining techniques are being developed, hence many existing methods specifically designed for current imaging modalities and staining techniques may not work well. Therefore, considerable resources have to be spent to modify existing methods or develop entirely new nuclei and cytoplasm segmentation methods to better suit the new applications. This study proposes a cervical cell nucleus and cytoplasm segmentation algorithm based on a quadtree region segmentation and statistical measures. The goal is to provide a method that can be used effectively for segmenting nuclei and cytoplasm from a cervical cell for automated diagnosis and classification of cervical cancer from pap-smears.

Review of some of the segmentation approaches applied to cervical cells.
Image segmentation techniques can be classified as edge based, region growing, region split, watershed, thresholding, deformable model fitting and morphology-based [9]. A number of surveys have been carried out on the different image segmentation approaches. Shaoo et.al. [10] surveyed only segmentation algorithms based on thresholding and attempted to evaluate the performance of some thresholding algorithms using some uniformity and shape measures. Eri et al. [11] presented a survey of cell segmentation techniques 50 years down the road. He showed that threshold based methods were the commonest form of segmentation techniques used for cell segmentation between 1960 and 2015. William et al. [12] presented a survey of image analysis and machine learning techniques for automated cervical cancer screening from pap-smear images.
The simplest property that pixels in an image can share is intensity. So many approaches [6][7][8] segment the image through thresholding which involves separation of light and dark regions. Wong et al [9] proposed an entropy based thresholding method. However, the results obtained by this approach were found to be biased. Ortiz et al. [10] presented an edge-based deformable model for cell segmentation. The approach utilized gradient information to capture nuclei and cytoplasm surfaces. However, the object boundaries were very blurred susceptible to noise.
Zhang et al. [15] presented a nuclei segmentation algorithm where HSV colour space was used to enhance the contrast between nuclei and cytoplasm and nuclei segmentation was achieved using a concave point based algorithm. Plissiti et al. [16] presented an approach for cell nuclei segmentation where the detection of the candidate nuclei was based on a morphological operations and the segmentation was accomplished with the application of the watershed transform. Lin et al. [17] proposed a method for nucleus and cytoplasm segmentation from pap-smear images where a Gaussian filter was used for noise elimination and a two-group object enhancement technique was used to enhance the gradients of the edges of the cytoplasm and nucleus.
Plissiti et al. [18] presented a fuzzy C-means algorithm for cervical cell segmentation and clustering. Yang et al. [19] presented an edge enhancement nucleus and cytoplasm contour detector to cut the nucleus and cytoplasm from a pap-smear image for automated cervical cancer diagnosis. Kale et al. [20] presented a nucleus segmentation technique . The approach determines a segmentation threshold based on the stability of the perimeter of the cell and clustering method was used to separate the cytoplasm and nucleus. Bergmeir et al. [21] presented a method for segmentation of the nuclei from pap-smear images. The algorithm localizes cell nuclei using a voting scheme and prior shape knowledge by means of an elastic segmentation algorithm. Pai et al. [7] presented a nucleus and cytoplast contour detector for cytoplasm and nucleus segmentation in pap-smear images using a maximal grey-level-gradient-difference method. Muhimmah et al. [22] presented a method for nuclei segmentation from pap-smear images using morphology and watershed transformation and Li et al. [23] proposed a radiating gradient vector flow snake algorithm to extract nucleus and cytoplasm from single cervical cell images.

Region Quadtree Segmentation
The quadtree is a data structure concept that refers to a hierarchical collection of maximal blocks that partition a region. It was first introduced by Hanan et.al [24]. The item to be partitioned is the root quadtree which is recursively partitioned according to predefined criteria [24]. Each step of decomposition produces four new quadtrees of the same size that are hierarchically associated with their parent quadtree based on a predefined partitioning criteria. Decomposition finishes whenever there are no more quadtrees to be partitioned or when the quadtrees have reached their minimum size. Quadtree-based approaches can present excellent results when applied in medical image segmentation [25][26]. They are becoming commonly used due to their ability to discard very quickly large amounts of information in an easy and efficient way, while preserving image details [27].
A quadtree is obtained by separating the image into regions based on a given similarity measure, then merge regions based on the same or a different similarity measure. This is achieved through recursive decomposition of the image through the following steps.
1. Determine some similarity measure criteria e.g., mean, standard deviation, variance, texture, etc. Due to the efficiency of the quadtree approach in image segmentation, researchers are attempting to use it for image segmentation. Gerardo et al. [28] presented a simplified quadtree image segmentation for image annotation.
The method was able to efficiently divide the image into homogeneous segments by merging adjacent regions using border and colour information. Spann et al. [29] presented a quad-tree approach to image segmentation by combining a nonparametric classifier, based on a clustering algorithm, with a quad-tree representation of the image. Reza et al. [30] presented a quadtree-based blood vessel detection algorithm using RGB components in fundus images. The technique applied the quadtree on the green component only of the images and the results were promising. This paper explores the applicability of the quadtree for cervical cell segmentation. It presents a quadtree-based algorithm for nucleus and cytoplasm segmentation of cervical cells from pap-smear images using statistical measures to guide the quadtree decomposition. Segmentation of nucleus and cytoplasm can be used for automated diagnosis and classification of cervical cancer from pap-smear images.

Nucleus and Cytoplasm segmentation
Promising results for nucleus and cytoplasm segmentations were obtained using the proposed quadtree algorithm.
Application of morphological operations further improved the segmentation by filling small holes in the nucleus and cytoplasm segmentations and also removing pixels around the boundaries of the nucleus and cytoplasm as shown in Figure 1.  were calculated and shown in Table 1.    The images in the Herlev dataset belong to 7 classes and were used to test the ability of the Quadtree decomposition algorithm to accurately segment the nucleus regions in each class. As done in [20] and [23], we also use the segment with the highest overlap with the ground truth nucleus region for comparison using the ZSI (Zijdenbos similarity index) which is given by Eq. (1).
where X and Y are two sets of segmented pixels. The ZSI computed was compared with that obtained by Asli et al [20] and Kuan et al. [23] who also tested their nuclei segmentation algorithm on the Herlev dataset. The ZSI for the quadtree algorithm has a mean larger than 0.9034 and standard deviation smaller than 0.1735 for all the 7 classes, as shown in Table 2. It can be observed that the quadtree algorithm produces segmentations of acceptable performance compared to the methods in [20] and [23].  [20] and [23] cells Asli et al. [20] Kuan et al. [ The performances of the quadtree algorithm for cytoplasm segmentation was also evaluated using ZSI and compared with results obtained by Shys et.al [19] and Kuan et al [23] who also compared the performance of their cytoplasm segmentation algorithms on Herlev dataset. The statistical results are shown in Table 3. Table 3: Comparison of the cytoplasm segmentation accuracy with methods in [19] and [23] Method μ ZSI ±σ ZSI Shys et.al [19] 0.8992±0.0348 Kuan et al [23] 0.9545±0.0439 Quadtree 0.9498±0.0921

Discussion
A median filter played a significant role in normalising the cervical cells for the quadtree decomposition algorithm.
Unlike in many studies [37][38][39] where a smaller median filter is used, in this study, a large median filter (27×27) was applied for removing the objects superimposed on the cell background while leaving the very slow variations in the background relatively unchanged. This retrieved the objects of interest while yielding approximately uniform background intensity as shown in Figure.2. Similar to Abramoff et al. [40], application of morphology operations on the initial segmentation helped to overcome inaccuracies in the segmentation by closing small holes in the nucleus and cytoplasm segmentation and closing pixels around the boundaries. This research has shown that the development of an efficient quadtree decomposition criteria has the potential of producing excellent image segmentation as shown in Figure 5 which has also been reported by several studies [29,[41][42].

Conclusion
A critical prerequisite in automated analysis of pap-smears is nucleus and cytoplasm segmentation, which is usually considered as the basis of cervical cancer screening from pap-smears. It provides support for various quantitative analyses including calculating cellular morphology, such as size, shape and texture. However, it is difficult to achieve a robust and accurate nucleus/cytoplasm segmentation. This paper articulates a potent approach to the segmentation of cervical cells into nucleus and cytoplasm using a quadtree with statistical measures. Choosing an appropriate quadtree decomposition strategy was a great challenge and a novel task in the proposed approach. As a result, highly efficient and segmentations of acceptable performance were obtained.
Comparison of the segmented nucleus and cytoplasm with the ground truth nucleus and cytoplasm segmentations resulted into a Zijdenbos similarity index of greater than 0.9034 and 0.9498 for nucleus and cytoplasm segmentation respectively. The method serves as a basis for first level segmentation of cervical cells for diagnosis and classification of cervical cancer from pap-smear images using nucleus and cytoplasm features.

Methods
The proposed method was developed through a sequential approach depicted in Figure 5.

Input Images.
The dataset used in the work documented in this paper contains cells obtained from the Herlev University Hospital Cervical Cells Dataset (http://labs.fme.aegean.gr/decision/downloads) prepared by Jantzen et al. [31]. The dataset contains 947 cervical cells that were obtained by skilled cytotechnicians using a microscope connected to a frame grabber and taken with a resolution of 0.201µm/pixel. The images were segmented using CHAMP commercial software developed by DIMAC Imaging systems and then classified into 7 classes [31].

Grayscale
Since cervical cells are stained in different colours, the input images were first transformed into grayscale. The quadtree algorithm presented in this paper uses pixel level information for decomposition. The grayscale conversion was carried out to ensure that the value of each pixel was a single sample representing only intensity information each pixel has. This made quadtree decomposition more efficient and reliable. The grayscale conversion was implemented using Eq. (2).

Pre-processing.
A pap-smear is stained for easy identification of cell nuclei [32]. The staining usually delineates the nuclei pretty well, however, the staining is not homogenous, as areas of condensation levels can vary across the chromosomes and uneven lighting across the field of view can make the nuclei appear granular [33]. In order to remove the noise, denoising was carried out on the original images using a median filter. In general, the median filtering output is given by Eq.

Quadtree Decomposition
A quadtree scanning of the full image is the core step of the proposed segmentation technique. However, the efficient median filter enhanced the decomposition process by reducing noise in the image. We adopt a split and merge quadtree decomposition strategy where the image is divided into four regions, and each of these regions is  b) Search from the root of the tree to find the node highest in the tree whose mean, maximum entropy and the variance are in the required range (as for the root node), and c) Use neighbour finding techniques to merge adjacent nodes whose mean is within the desired range.
The mean (µ) of n-pixels in a subtree at a node, I (x, y) is represented by Eq. (4).
From equation 3, the variance can be defined by Eq. (5).
Image entropy is a statistical measure of randomness that can be used to characterize the texture of an image.
( ) = − ∑ ( ). log 2 ( ( )), = 1, … . , where H(s m ) is the entropy of the pixel intensity. The (p n (s m )) is the probability of a pixel having entropy s m , and m are all the possible entropy values. The probability density p n is calculated using the grey level histogram.
After computation of the above statistical measures, the cervical cell images were segmented using the quadtree decomposition method searching for the node within the required mean, maximum entropy and variance range.
To save search time, the highest uniform node in each subtree was recorded as the tree was being built. This eliminated the need for a search after the tree was built. The pixel segmentation was guided by the uniformity of the pixels in each node. The uniformity was recorded by adding a pointer β to each node in the tree which is a membership function for a fuzzy set (statistical measures) for that pixel. The pointer β was computed as follows: 1. If a node (N) and all nodes in its subtree (SN) were out of range of the statistical measures (mean, maximum entropy and variance), then its pointer was set to null as shown in Eq. (7).
2. If a node was in range, then its pointer was set to point to itself as shown in Eq. (8).
3. If a node was out of range, but some nodes in its subtree were in range, then its pointer was set to point to the highest node in its subtree which was within range as shown in Eq. (9).
The pointers helped in the segmentation of the cervical cell into the nucleus, cytoplasm and background based on pixel level information. Taking the node pointed to by the root, we used it as a seed and merged its neighbours and this was used for nucleus segmentation, setting the pointers of each merged nodes to null. We then extracted further regions by moving down the tree and processing similarly any nodes with non-null pointers which were then used for cytoplasm segmentation. To overcome some inaccuracies in the segmentations obtained, morphology operations were applied to these images.

Morphology
Morphology is a set-theory approach that considers an image as the elements of a set and process images as geometrical shapes [36]. It is a powerful technique for solving a number of problems in image analysis and computer vision including overcoming inaccuracies in segmentation. The two basic morphological operators are the erosion and the dilation based on Minkowski algebra [39]. A dilation followed by erosion is called a closing operation. Dilation operation of a grayscale image by a two-dimensional point A is defined by Eq. (10) as; where g is the grayscale image, A is a structuring element [40] and (r, c) is the pixel of the image g, (k, l) is the size of the element A. In this paper, the closing operation was utilised to overcome the inaccuracies in the quadtree segmentation by fusing narrow breaks around the nucleus and cytoplasm. It was also used to fill small holes and gaps in the image. Closing is mathematically defined by Eq. (12).
where ѳ is an erosion,  is dilation, g is a binary image and A is structuring element. After the morphological operation, image shape features, such as blurred edges, holes, corners, wedges and cracks were removed.    Table 1: Percentage errors between the ground-truth and extracted measurements for selected cells.   Table 2: Comparison of the nucleus segmentation accuracy with methods in [20] and [23]