Content-Based Image Retrieval Using Multilevel Robust Mechanism

doi:10.21203/rs.3.rs-1755851/v1

Download PDF

Research Article

Content-Based Image Retrieval Using Multilevel Robust Mechanism

https://doi.org/10.21203/rs.3.rs-1755851/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Content-Based Image Retrieval only concentrates on the information about the image like its texture, color, shape, homogeneity, and contrast rather than the image tags, description, and keywords. Traditionally many machine learning algorithms are used to retrieve the information but fail to acquire accuracy and created overfitting problems. Not only that large image database handling becomes quite hard to achieve quicker results without human intervention. Hence, we are opting for a robust mechanism for every step of retrieval. Image pre-processing inhibited AlexNet architecture with latent feature extraction and feature selection wear out Mutual Information neural estimation, which is used to handle high dimensional images with fixed resolution and removes the redundancy of images in the database. The similarity measure using Euclidean distance is employed to retrieve the images. The accuracy and performance efficiency is considerably high and pertinent.

Content-Based Image Retrieval (CBIR)

Mutual Information neural estimation (MIME)

AlexNet

Latent Feature extraction

Euclidean distance

Query by Image Content (QBIC)

Taking images and sharing them on sites like Google, Yahoo, Facebook, and WhatsApp has become commonplace in recent years. Finding or searching for a suitable image was difficult [11–13]. When retrieving an image, the Text-Based Image Retrieval (TBIR) approach was commonly used. The image saved in the database in TBIR should [3] be retrievable using the matching term. The issue occurs because each individual communicates differently, which might lead to misunderstanding. Also, there are so many images that labeling each one is difficult.

This shortcoming is overcome by Content-based Image retrieval (CBIR) or Query by Image Content(QBIC) [5–7]. As the name indicates the CBIR analysis is based on the content of the image rather than the text. Traditionally the image is retrieved based on color, texture, and shape which is the open problem [8–10], where the image data is stored and represented as the features. It reduces the human intervention and enhances the retrieval process easier to carry our work in a very large database. This CBIR is largely used in the crime detection, medical diagnosis, military geographical information, etc.,

Both the query image and the image stored in the database must go through feature extraction, which extracts low-level properties like color, shape, and size before comparing their similarity is shown in Fig. 1. If both features are the same, the picture obtained will be the same.

When an image has some noise, it is necessary to choose the regions of interest, thus we employed pre-processing in deep learning to achieve an accurate result. [4]. Deep Convolution Neural Network (CNN), which aids in image analysis, is used to achieve great performance and efficiency. Previously, there were no high-dimensional photographs; now, just a few images are preserved, and they are only accessed on rare occasions. For pre-processing, the MINST dataset is employed. The backpropagation method employed in [14] necessitates extensive training and resources, both of which are difficult to obtain

The technique of generating a compact representation (numerical or alphanumerical) of certain features of digital images to be utilized to deduce information about the image contents is known as feature extraction for content-based image retrieval. Every feature is closely related to the type of data it collects. The selection of one feature over another is determined by image retrieval. Kernel PCA is used if greater resources are required since all data is saved. It is stored in a matrix format, with each data point increasing quadratically. [16] In addition, PCA feature extraction fails to produce good results in high-dimensional data. [15]

Based on feature selection, the extracted feature must be chosen. It is critical to eliminate redundancy, improve accuracy, and minimize computing costs. The selected feature should be tested for similarity measurement, which addresses the feature relevance redundancy of the class conditional redundancy [17], however, all the conditions do not cope with high dimensional image feature selection [18]. If both characteristics have a high degree of similarity, the images are identical or nearly related [19].

So, in this study, we employed Alexnet, which delivers reliable results even in huge databases with high-dimensional datasets. After experimenting with several feature extraction methods, latent feature extraction proved to be the most promising. It takes into account the hidden variable. Even though there are numerous Mutual Information algorithms in use, they have never shown promising results in high-dimensional images. To solve this, we employed MINE (Mutual Information Neural Estimation), which has a high accuracy rate. The Euclidean Distance similarity metric is applied, and the result is accurate.

The image that has to be retrieved goes through pre-processing first. Many pre-processing techniques were utilized, including LeNet, which was the first architecture used in CNN. It employs the backpropagation technique to get the desired result [3], and it improves performance by removing certain unnecessary parameters [4]. However, with a huge database, LeNet does not function efficiently.

The VGG architecture is also employed in CNN. VGG employs a large number of kernels to assist in minimizing the number of training variables and hence lessen the overfitting problem. However, when compared to Alexie, the accuracy is lacking. [8]

There are multiple layers in the aforementioned design that may cause path information to be lost. To address this, Huang[6] suggests DenseNet, in which each layer is feed-forward linked and has direct access to the gradient. Despite the various advantages, the numerous connections reduce computation and parameter efficiency, and the overfitting issue may occur[7].

Mishra employed a random forest technique in conjunction with profile extraction and the deep autoencoder feature extraction algorithm. It provides a hundred percent fault detection accuracy and does not require a lot of data, reducing the problem of overfitting. However, this has not yet been tested in a big database. [10]

Manual information feeding is required for all feature extractions. To circumvent this, many filters are initiated with automatic feature extraction. The optimal filter is chosen from the raw data for the particular problem with little or no human interaction. However, this form of feature extraction is quite costly. [11]

Bi suggested Multiscale feature extraction in place of single-scale feature extraction. The images are divided into distinct patches of varying scales here. The SIFT (scale-invariant feature transform) is then used to construct the multiscale feature, after which the matching method is used. When rotation, noise addition, or compression are applied, this feature extraction delivers a decent outcome. [12]

Joint mutual information reduces feature redundancy by removing features that are dependent on one another, and it also selects the best feature by which one feature can function differently when it is individual or aggregated. [13]

The relationship between interaction and conditional mutual information is used to determine feature selection based on conditional mutual information. In contrast to previous feature selection methods, it considers synergy and removes redundancy. It operates well since it does not need to calculate the total MI(Mutual Information). It may also be used on data with a lot of dimensions [14].

The meta-heuristic algorithm is a hybrid of the genetic algorithm and the iterative local search algorithm, in which the genetic algorithm increases the chances of finding the optimum answer by stimulating the biological system's heredity. The iterative local search technique provides the optimum optimization solution. Combining these elements yields a potential outcome. [15]

Content-based image retrieval is a much-needed technology to retrieve relevant images from the search engine. As we know, there are millions and millions of images are uploaded using platforms into the engine database. Retrieving the needed image during searching becomes complicated and hard. Hence CBIR plays a huge role to retrieve relevant at the time of need. It doesn’t depend on the tag or text description used during surfing the data in a search engine instead depends on the texture, color, contrast, and information about the picture. The main challenge of processing every image in the database requires more computational power and time. Handling High-dimensional pictures in the database becomes tough. Therefore, we are implementing a robust efficient algorithm for every step of CBIR using deep learning and other eminent methods described in Fig. 2. The images in the database are subjected to the pre-processing phase, here we are using an eminent algorithm named AlexNet to process the high dimension images. Mostly AlexNet is used for classification purposes not in pre-processing phase. But here we are utilizing it for pre-processing of the image in the database consisting of 1000 different classes with different information in every picture.

3.1. Pre-Processing

The Pre-processing of the image using AlexNet [16] initiates by changing every image into an RGB 256x256 pixel dimensional image and then uses the correlation technique to locate other co-related images in the database whether by background or class or nothing which correlates to the other picture. Then it is allowed to the convolutional Layer which has more than 95 filters. These filters give us in-depth information about the images. We are going to use 3 convolutional Layers, 2maxpooling layers, 2 fully connected layers, and 1 softmax pooling Layer. The max-pooling layer is used to reduce the high dimensional image into low dimensional images without losing its data. The fully connected connect every picture through criteria or class. The softmax pooling layers contain an activation function. It categorizes the image and assigned it into a specified category without complicating it by using a multi-class assign function.

3.2. Feature Extraction

Here we are utilizing Latent Feature Extraction to extract features from the pre-processed image. Latent features are 'hidden' features to recognize from observed features. Latent features are registered from observed features utilizing matrix factorization[17]. A model would be information in the image separated from the archives feature. If you factorize the information of words, homogeneity, entropy, shade, and prominence in the image. Low-rank matrix factorization maps a few lines (observed features) to a more modest arrangement of columns (latent features). The fundamental thought is that latent features are semantically applicable 'totals' of observed features. At the point when you have huge scope, high-dimensional, and uproarious observed features. It is a good idea to fabricate your classifier on latent features. This is an improved portrayal to clarify the idea. Latent Features extraction only extracts the necessary information or features in the image or the core texture information needed for classification purposes is shown in Fig. 4.

In this method, we can track down highlight vectors for each picture by making snares on a pre-prepared network and separating the vector from past layers. Another strategy devises the utilization of Autoencoders where the Latent features can be removed from the Encoder itself. For this information, we will continue using latent feature extraction with Autoencoders is shown in Fig. 3.

Autoencoders are neural networks that expect to duplicate their contributions to their results. They work by compacting the contribution to a latent-space representation, and afterward reproducing the result from this representation. This sort of network is made out of two sections:

Encoder: This is the piece of the network that packs the contribution to a latent-space representation. It very well may be addressed by an encoding function h = f(x).

Decoder: This part expects to recreate the contribution from the latent space representation. It very well may be addressed by an interpreting function r = g(h).

For the recovery part, we will investigate Euclidean-based Search (O(NlogN)) and MINE to reduce the redundancy of images in the database

3.3. Feature Selection using MIME

“MIME is a feature selection technique that can be utilized to lessen the feature set before clustering unmitigated features. The algorithm involves Mutual Information as a choice variable to subset the feature set. This feature set can then be utilized as a contribution to the KModes clustering algorithm. Since this is a filter-based algorithm, it is quicker than wrapper or embedded algorithms. The essential goal of this algorithm is to diminish the calculation time taken to cluster datasets comprising of just categorical features” [18].

MIME is mutual information that provides forward and backward selection processes is shown in Fig. 5. It gradually selects the latent extracted features both forward and backward to find out the optimal set needed for the classification or retrieval of images according to the category. It has the option of removing features called backward elimination to employ forward selection in the set of features. Backward elimination helps to find the most set which produces the error rate and eliminate it from the set. The feature score is also set with a threshold value, then by comparing these two values. If the feature score is greater than the threshold then that feature will be kept and if it is less than the threshold value that particular feature gets eliminated. Thus, our MIME efficiently eliminates unnecessary features and selects the necessary needed to find the similarity measure.

3.4. Image Querying

Image querying refers to the problem of finding objects that are relevant to a user query inside image databases (Image DBS). “The traditional answers for dealing with such problems include the semantic-based approach, where an image is represented through metadata (e.g., keywords), and the content-based arrangement, normally called content-based image retrieval (CBIR), where the image content is represented through low-level features (e.g., color and texture). While with the semantic-based approach the image querying problem is transformed into a data retrieval problem, for CBIR more sophisticated query evaluation techniques are required” [19].

“CBIRSs require the definition of image descriptors, feature extraction, and feature matching of images, they can be very demanding due to the required resolution, database sizes, and search procedures. Query by example compares an information image to a database. The essential search algorithms may differ depending on the application, nevertheless, resulting images ought to all share normal qualities with the example. Content correlation through distance measures helps decisions during the steps of locating, retrieving, and displaying significant precedent cases with reports. Especially challenging aspects in this space are taking out nearby spots in specific image features and setting the closeness model. The most frequent image correlation approach in CBIR is through an image distance metric or likeness measure to match images in a variety of dimensions or categories, like shape, color, texture, significance, and others. The thought of dimension refers to the number of axes used to represent an image (2D, 3D, 4D, etc.), so that distance or similitude metrics can be calculated. Blending specific criteria yields the best metrics, allowing the full ranking of the database concerning the user-submitted image. Category denotes a conception that does not require quantization, like shape, color, and meaning” [20].

3.5. Similarity Measure

The most common usage for similarity and dissimilarity measure is Euclidean distance. Here the similarity measure just measures the feature similarities between output derived from the database image feature selection process and the output derived from Query image feature selection. The similarities between these two images are determined using Euclidean distance

$$\left(x,y\right)=\sqrt{\sum _{i}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2}}$$

The measure between two processes is considered as vectors. $\left({x}_{i}\right)$is the vector derived from database query process and ${y}_{i}$ is the vector derived from query image process. The square root for the sum of squared differences between two vectors is used to find the similarity measure and the image will get retrieved according to the client’s need.

The dataset is taken from the CIFAR-10 corpus (by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton) and has 50000 and 10000 trained and tested images. Each image pixel dimension is in a 32×32 color matrix. It has images of different classes up from 10 to 15. There is no overlap between the trained and tested images. The training and testing accuracy and precision of the AlexNet algorithm of the images and its comparison are shown in the Figs. 6 and 7. The latent feature extraction extracts the necessary features from the pre-processed data. The autoencoder is used in latent feature extraction. The training and testing accuracy of an encoder is shown in Fig. 8. The feature Selection process uses Mutual Information neural estimation which helps to select the needed subset from the extracted feature set. The Delay time taken for the selection process is shown in the Fig. 9. The correlation between every picture in the dataset would be categorized using a threshold value. The relationship between correlation and Mutual information is shown in Fig. 10.

Content-based Image Retrieval is one of the emerging research areas these days. The necessity to improve its performance accuracy and precision was quite demanding nowadays. Numerous ML-based algorithms are introduced to achieve good efficiency while retrieving the intended image from the search engine. The machine-learning algorithm of CBIR or Query by image content always involves Pre-processing, feature extraction, feature selection, and similarity measure. Here we are inducing a robust algorithm in each step involved in the Content-Based Image Retrieval Phase not only that Mutual Information Neural estimation is also used to remove the redundancy and to increase the performance of the feature Selection by utilizing the backward elimination technique. AlexNet with a slight change in using layers has improved its effort in Pre-processing resulting in 90% accuracy and 95% in precision. Latent Feature extraction uses an autoencoder to minimize the size of the high dimensional images in the database. Thus, a robust mechanism used in each level results in an optimal solution and their performance in each phase improved tremendously during very less consumption time of fewer than 4 seconds after the user requested to the server about the images.

Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

Authors Contributions: Isaac Sajan R and Bibin Chistopher.V wrote the main manuscript. Akhila T.S and Joselin Kavitha M prepared figures and contributed the idea of Multilevel Robust Mechanism. Then Isaac Sajan R, Bibin Chistopher.V, Akhila T.S, Joselin Kavitha M has involved in all aspects of developing and implementing this concept.

Ranganathan, G. (2021). A Study to Find Facts Behind Pre-processing on Deep Learning Algorithms. Journal of Innovative Image Processing (JIIP), 03(01), 66–74. DOI: https://doi.org/10.36548/jiip.2021.1.006
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, 609–616
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551
LeCun, Y. (1989). Generalization and network design strategies. Technical Report CRG-TR-89-4, Department of Computer Science, University of Toronto
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2(NIPS*89), 396–404
Huang, G., Liu, Z., Maaten, L. V., Weinberger, K. Q. Densely Connected Convolutional Networks, Proceedings of the IEEE Conference on Computer Vision and, & Recognition, P. (2017). (CVPR), 4700–4708
Liu, W., Zeng, K., & SparseNet: (2018). A sparse DenseNet for image classification.arXiv preprint arXiv:1804.05340
Simonyan, K., & Zissermanvery, A. Very deep convolutional networks for large-scale image recognition, Proceedings of the ICLR 2015
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Recognition, P., et al. (2015). (CVPR), 1–9
Mishra, K. M., Krogerus, T., & Huhtala, K. J. (2019). Profile Extraction and Deep Autoencoder Feature Extraction for Elevator Fault Detection. In Proceedings of the 16th International Conference on e-Business Signal Processing and Multimedia Applications, 1, 313–320
Weimer, D., Reiter, B. S., & Shpitalni, M. (2016). Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Annals, 65(1), 417–420
Bi, X., Pun, C. M., & Yuan, X. C. (2018). Multi-scale feature extraction and adaptive matching for copy-move forgery detection. Multimedia Tools and Applications, 77, 363–385
Kashif, M., Raja, G., & Shaukat, F. (2020). An Efficient Content-Based Image Retrieval System for the Diagnosis of Lung Diseases. Journal of Digital Imaging, 33, 971–987
Cheng, H., Qin, Z., Feng, C., Wang, Y., & Li, F. (2011). Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy. ETRI Journal, 33(2), 210–218
Alsmad, M. K. (2017). Query-sensitive similarity measure for content-based image retrieval using the meta-heuristic algorithm. Journal of King Saud University - Computer and Information Sciences, 30(3), 373–381
Sun, J., Cai, X., Sun, F., & Zhang, J. (2016). Scene image classification method based on Alex-Net model. In Proceedings of the 3rd International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS), 363–367
Wen, Z. J., Liu, Z. H., Zong, Y. C., & Li, B. J. (2020). Latent Local Feature Extraction for Low-Resolution Virus Image Classification. Journal of the Operations Research Society of China, 8, 117–132
Chattopadhyay, S. (2021). MIME: Mutual Information Minimizer for Selection of Categorical Features. In Proceedings of the 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT),1–3
Ardizzoni, S., Bartolini, I., & Patella, M. (1999). Windsurf: Region-based image retrieval using wavelets. In Proceedings of the Tenth International Workshop on Database and Expert Systems Applications. DEXA 99,167–173
Frigui, H., Caudill, J., & Abdallah, A. C. B. (2008). Fusion of multimodal features for efficient content-based image retrieval. In Proceedings of the 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence),1992–1998

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Content-Based Image Retrieval Using Multilevel Robust Mechanism

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Works

3. System Model

3.1. Pre-Processing

3.2. Feature Extraction

3.3. Feature Selection using MIME

3.4. Image Querying

3.5. Similarity Measure

4. Experimental Results

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1