An Effective Contour Detection based Image Retrieval using Multi-Fusion Method and Neural Network

: Day by day, rapidly increasing the number of images on digital platforms and digital image databases has increased. Generally, the user requires image retrieval and it is a challenging task to search effectively from the enormous database. Mainly content-based image retrieval (CBIR) algorithm considered the visual image feature such as color, texture, shape, etc. The non-visual features also play a significant role in image retrieval, mainly in the security concern and selection of image features is an essential issue in CBIR. Performance is one of the challenging tasks in image retrieval, according to current CBIR studies. To overcome this gap, the new method used for CBIR using histogram of gradient (HOG), dominant color descriptor (DCD) & hue moment (HM) features. This work uses color features and shapes texture in-depth for CBIR. HOG is used to extract texture features. DCD on RGB and HSV are used to improve efficiency and computation. A neural network (NN) is used to extract the image features, which improves the computation using the Corel dataset. The experimental results evaluated on various standard benchmarks Corel-1k, Corel-5k datasets, and outcomes of the proposed work illustrate that the proposed CBIR is efficient for other state-of-the-art image retrieval methods. Intensive analysis of the proposed work proved that the proposed work has better precision, recall, accuracy.


I. INTRODUCTION
The growth of internet usage and the need for high storage for multimedia data produce complexity on image retrieval schemes from large databases. Day by day, multimedia resources increase in numbers because of less cost of most digital devices and advancement in technology. A digital image is always provided an enormous amount of helpful information which could be helpful for human. Hence, there is always a need for an effective and efficient image retrieval-based method to retrieve valuable images from the database according to the user's query. Giving a textbase query to the database is a traditional method and Conventional text-based search requires manual annotation whenever a user gives a keyword to search the desired image. The given keyword is verified with the annotation and the particular image is retrieved from the database. However, this methodology is not efficient for extensive image databases due to human perception. So, it will be incomplete [4,5]. A new method of image retrieval is introduced and it is referred to as CBIR (content-based image retrieval) to overcome the problem of the textbase user query. These methods work on the primary features of the image, such as color shape and texture. These features are the input to the CBIR system. However, these primary features are not able to give the same information as human visual perception gives. Because a human needs a high level of semantics for discrimination, the images and machine will use a low level of primary features. To fill these semantic gap researchers proposed machine learning techniques which can add more semantic meaning. SBIR (Semantic-Based Image Retrieval Systems) is a method that can add a high-level semantic meaning of the image to the low level of primary features and produce the desired results [23]. SBIR methods are enhanced with the help of vocabulary and deep learning-based methods. A vocabulary-based method works on the cluster of local features extracted using SURF, SIFT, and HOG. The other image retrieval-based method is content-based image retrieval which uses the inherent features of the image. These inherent features are color, shape and texture. Color features are the most dominant feature of the image because these features information remains the same even if transformation and rotation perform on the image. The most popular color descriptor is DCD which is used in a variety of image retrieval applications. There is another feature extraction technique with the name of EHD. EHD is represented by the MPEG-7 standard of edge-based feature extraction scheme used to illustrate the image geometry. Both Global and local EHDs are utilized to extract the shape of a given image. Further, EHD describes the edge orientation of all sub-images. Different features are extracted from images and CBIR is used to extract images from any database based on various features. Considering the same, this paper combines HOG, DCD and EHD feature extraction techniques to find the color, shape and texture of the image [24][25]. First, CLAP and Sobel technique is applied to find the edges of the images and results combined and the input for the second step. The second step features are extracted from the image using the above feature extraction technique and these features are fed into the neural network for classification purposes. A detailed description is given in Section 3. Further paper is organized as follows: Section 2 gives the existing work details; Section 3 describes the detailed description of the proposed work. Section 4 explains results and comparative analysis and finally, section 5 represents the conclusion and future scope.

II. LITERATURE SURVEY
Guangyi Xie et al. [22], 2020 proposed a novel color descriptor and hu moments-based CBIR method for image retrieval. The proposed work extracted the features and calculated the color descriptor feature for translational, rotational, and image shapes. Finally, the proposed work used three benchmark datasets, Corel-1k, Corel-5k and Corel-10k and achieved superior results compared to other state-of-the-art methods. Ahmad Raza et al. [23], 2018 described a visual-based texton concept for image retrieval. This proposed work provides the correlation between texture, shapes, intensity, color and orientation of an image with the help of a co-occurrence matrix. Finally, the proposed work used three benchmark datasets, Corel-1k, Corel-5k and Corel-10k. The proposed work outperformed compared other methods. Another CBIR algorithm is presented by Atif et al. [18], 2018, for fusing texture and color features. The matching of similarities and feature vectors is performed by Manhattan distance and it uses Corel 1-k data set. The accuracy of the CBIR Algorithm examines the precision and recall and performance of experimental results with other related image retrieval system methods. Yati et al. [15], 2017 proposed an edge-based image retrieval system using RLBP and discrete DWT by reducing dimension. Edge features improve the detection quality. This method achieves 88.37 % on precision. Swati et al. [16], 2014 combine color and texture features for CBIR. Y-matrix of YCbCr is prepared from extracted edge features that are used in canny edge detection. The distribution of colors is described by computing global statistical description using RGB histogram. Further, a similarity measure is performed using Manhattan distance. The result shows proposed algorithm is better from query image alterations. Dong et al. [19], 2013 has proposed an improved algorithm for image retrieval. This algorithm uses a Gaussian filter and it calculates the preserving of edge information. The Canny edge detection system improves the performance of image retrieval operations.

III. PROPOSED WORK
This research has proposed an arrangement of texture, color, and shape features. This system is to achieve better precision and reduce recall rate as evaluated to existing work. This algorithm consists of the following stages: (a) Pre-Processing (b) Feature Extraction (c) Classification as shown in Fig.   1.

Fig. 1. Flow Chart of Proposed Work
Step-1: Image Acquisition: It is a method of acquiring the digital image from some resources.
In this step standard datasets are acquired and given as an input for the next step.
Step-2: Pre Processing: Resizing of image is a preprocessing technique used to change input image size. In the proposed method, all the images are converted into 192*128 sizes.
Step-3: ROI Segmentation: Contour and Sobel edge detector are used to segment the object from the resized image shown in Fig. 2

 Sobel Edge Operator
The mask of the edge Sobel operator is defined by horizontal ( ) and vertical direction ( ) components as shown in equations (1) and (2), respectively.
 CLAP based wireframe model The contour or wireframe of the object is a simple curve that joins all the points of the same intensity. The CLAP-based wireframe model [22] is used to identify the contour/wireframe of the input image. This approach traces the whole image and if all the boundary points have the same values, then it will detect that part of the image and this process will continue until it finds all the edge of the given input image. It can find all the curves, lines and junctions.
Step-4: Feature Extraction using CBIR It is a process of automatically determining the compressed and meaningful representation of the image. Various feature extraction techniques give better results even though every method has its own pros and cons. Therefore, In this paper, feature extraction has been done in three different methods and then results are combined for better accuracy. Further, a brief explanation of feature extraction methods is discussed.

 Histogram Oriented Gradient (HOG)
Object recognition and facial detection techniques use gradient-oriented histogram (HOG) as descriptors [6]. HOG analyses the object and shape by examining either the direction of edges or examining the gradient distribution of the density. The HOG separates an image into different cells and then calculates the gradient-directed histogram for each cell. The following three steps can accomplish this.

First
Step: Deriving masks of both horizontal M1 and vertical M2 is illustrated in the following equations. These masks are used for determining the gradient of image points.
Generally, the gradient's supreme assessment will matter because of a contrast between the white background of a black object and the black background of a white object. Both of these two regions have the same response.

Second
Step: The orientation is calculated at first, then measurement of gradients is performed in the image and which is treated as the angle: performance. The DCD is structured as F, as shown in equation (7).
F={pi, ci, vi, s}, 1,2…..NDCD Where NDCD gives the count of principal or dominant colors, the percentage of the pixels in the image subsequent with the i th dominant color is provided by pi, vector of i th dominant color is represented by vi. An operational variation on the dominant color pixels is represented by ci. Spatial coherency is represented by s; usually, it is fixed as o in wide applications [7].

 Edge Histogram Descriptor (EHD)
EHD is represented by the MPEG-7 standard of edge-based feature extraction scheme used to illustrate the image geometry. Both Global and local EHDs are utilized to extract the shape of a given image. Further, EHD describes the edge orientation of all sub-images [8]. EHD categorizes the orientation of edge in five different classes. They are vertical, horizontal, 45degree, 135-degree and no-direction orientations [9].
i. The calculation of local EHD is performed by splitting the image into 16 sub-images and measuring the orientation of edges in all sub-images [10][11].
ii. The evaluation of global edge distributions on the given image is performed by global EHD. The bin value of global EHD depends on the five types of the orientation of edge pixels. Thus, the local EHD consists of 80 bins and the global EHD consists of 85 bins [12].

STEP-5: NEURAL NETWORK
The composition of the neural network consists of nodes, connections between nodes, and different processing elements [14]. The connection between any two different nodes consists of some weight. This weight is utilized to measure the effect of one node on another. An input node is derived from a unit as a sub-set and some other sub-sets are used as output nodes [15][16]. A value is assigned and the network allows the value to propagate to the output node. The mapping of a neural network is saved in weights on the network. The Feed-forward neural network(FFNN) [17][18] (Figure 1) allows signals to pass through one path only, starting input to the output. No feedback or loop available and thus, the output nodes do not affect the similar layer. The FFNN tends to forward directly with an association of input and output [19].

Fig. 4. FEED-Forward Network
In FFNN, only the input and output mapping is performed directly but not aligned with the previous input. However, the next coming scheme call series arranged in a gathering of particular call series and this series cannot be anything [20][21]. For instance, two series like (fstat, old_ mmap, close) and (execve, uname,brk) are valid series and meaningless due to the requirement of opening a file before making any effort to close. Therefore, a sequence or series similar to (open, fstat, old_mmap) is required. Thus, the series or sequences are connected with the previous series or sequence and interrupt the detection of information required to store. 3. Convert RGB image into HSV color space.
4. Edge orientation is used for edge detection and shape feature extraction on HSV Images.

DCD is applied on HSV
and RGB color space The gradient magnitude , and gradient angle ∅ , at position (x, y) are given by The Gaussian weights of the incline magnitudes contained by a chunk are achieved through multiply there calculated standards of the Gaussian filter matrix stored in a ROM. The magnitude of the weighted gradient is added for every bin to generate a histogram of a cell. After that, the histograms of four cells are concatenated for deriving the resultant block histogram.

11.
Step 1 to Step 10 repeats again in the image dataset. Like this, L T will be 219 measurements rather than 974 measurements, which can spare the calculation time and memory space.
13. Calculate the distance metrics of test image and database through ED, MD, HD, Chebyshev and JD. We have utilized MD, which is the most unsurprising measurement for figuring the absence of association among two vectors.

IV. RESULT ANALYSIS
The implementation of the proposed methodology is done on Mat lab version 2014a and the Matlab deep learning toolbox is used for evaluation. The experiments are performed on NVIDIA GPU-based system with 8GB RAM. Corel-1k and the Corel-5k are utilized and 5 fold cross-validation technique is used and 80% of data is used for training purposes and 20% of data is used for testing purposes to evaluate the performance of the proposed methodology. A sample of the validation performance is shown in Fig. 5 and the confusion matrix is shown in Fig. 6. Classification is done through a neural network and various hyperparameters have to be set and value of these parameters is shown in Table   2.

Given two vectors Q and D, where
Where f q speaks to the feature vector of the question picture and f d speaks to include a vector of pictures in the database. The pictures are recovered by the positioning given to them. Pictures with higher comparability will be given higher positions.
14. Using the NN classifier, classify the images and joined all features.
15. Find accuracy and precision of extracted images.
Where TP presents the truly positive and TN gives truly negative, further N deals with the size of the given dataset.  Table 3 and graph plotting is shown in Fig. 9.    Table 4 and graph plotting is shown in Fig. 11. Sample results are shown in Fig. 12.

Comparison Analysis of Proposed Work:
After implementing the proposed methodology, the comparative analysis of the results has been done with various states of art methods. For the comparison of results, accuracy, precision and recall are used as parameters. The results of comparative analysis shown in Table 5, 7 and plotting of the graph are shown in Fig. 13-14. The proposed methodology takes less time for feature extraction; therefore, the proposed methodology is better than existing methods, as shown in Table 6. ROC curve is also identified during the evaluation process and the results are shown in Fig. 15.    ROC curves for all steps while evaluating on the standard database.

V. Conclusion
CBIR is a mining image technique used to retrieve the desired image from an extensive image database. The proposed algorithm combines the characteristics of texture, shape and color. This work implements a new scheme of enhanced CBIR by utilizing multi-feature as well as neural networks with color points. Color features of image stem by DCD and DCD is the quantization of color either in RGB HSV field and value with the histogram of any image. The extensive presentation of these schemes based on the accuracy it gives. Thus, the research performed on different image databases and the projected CBIR scheme with previous related work exposed projected work consequences in privileged accuracy in expressions of exactitude and correctness rate, i.e., 87% for the Corel-1k and 97% for Corel-5k datasets. The results of the proposed method are better than existing image retrieval methods. This method could be helpful for other computer vision-related applications. In the future, this technique can be applied for face and skin detection to increase the versatility of the proposed method.
Funding Information: Funding information is not applicable / No funding was received.