This section summarizes recent research achievements in the field of big data classification and cloud security issues that help our research history and theoretical foundation representation.
2.1 Review of Classification Techniques
Zardari et al. [1] discussed that the data classification method depends on the confidentiality of the data. The KNN algorithm is used to identify the data as per the safety needs. It classifies the data into the sensitive and non-sensitive type that presents the data's need for protection. The RSA algorithm is used for encryption, and the CloudSim Simulator simulates this.
Moghaddam et al. [2] discussed that a variable data classification index must be used to ensure cloud data security and privacy. The index value is determined through the use of various parameters and the key parameters are confidentiality, honesty, and availability.
Zardari et al. [3] discussed the cloud computing issue of data protection and data recovery. Such problems are addressed by using the data and cloud model classification. The challenge is to solve it with data classification using the hybrid multi-cloud model. This model is worked on multiple clouds, grouping, and various cluster numbers.
Tawalbeh L et al. [4] presented a classification-based model that provides safe cloud computing. This model reduces the overhead and processing time included in the safety mechanism. For variable key sizes, it determines the protection at a different level. The proposed model is evaluated with different safety measures and produces positive results with high efficiency in the proposed work.
Shaikh et al. [5] proposed an approach of classification based on different parameters. The different dimensions are defined by those parameters. The security of the data can be given according to the level of protection needed. The proposed method solves the issue of privacy security and data leakage.
Zardari et al. [6] proposed that the K-nearest neighbor classifier suggested the confidentiality of the data in cloud-based data. The method is extended to the virtual cloud, and the data are categorized according to its security needs. KNN classifier classifies the data into two sensitive and non-sensitive classes. The data classification discusses which data would need to be more secure.
Balachandran et al [7] Discussed that the most challenging task in today's scenario is to choose the right Institute. Any student browsing through the social network sites for the reviews, ratings about the specific institution to get the approximate information about the particular institute. But the statistical dimension from the feedback is hard to examine. In this Aspect based Sentiment Analysis is applied directly to the comments that offer us negative and positive feedback of the institution in question. The different techniques are used to classify aspects such as NLP-based methodology, Machine Learning based (ML), unsupervised approach, Dictionary-based, Corpus-based approach. The NLP and the ML classifier give the best possible results analytics to classify each factor into their respective category.
Hagge Marvin et al. [8] described that the micro-blogging service identified consumer sentiment i.e. This is Facebook. Twitter is a social media on which every person expresses their opinions. Consumer preferences are the basis for evaluating the view of customers of the individual product. In this user view, aspect-based sentiment analysis is done by part-of-speech tagging, and, for exchange for its excerpts, the optimistic, neutral, negative aspects of the tweets are parsed directly from natural language processing. The software toolkit in the proposed approach is designed to extract the tweets first, then filter to evaluate the polarity of the feelings and then show the result. In this people over the web platform will rent out their homes to each other. The aspects are Airbnb, place, time, house, day, people, night, view, apartment, space, for study. The results are shown by the table. The future will operate on Airbnb's website feedback.
Pannala, NipunaUpeka, et al. [9] Existing opinion mining work defined shall be performed at the word level, not at the sentence level. He includes the views articulated explicitly. The paper proposed is based on the qualified data set that analyzes and offers positive, favorable, and negative feedback for different products. The Aspect-based sentiment analysis (ABSA) operates on the various aspects of the object and reveals the polarity in returns. Techniques are used for applying ABSA machine learning (ML), and Natural language (NL). The dataset used in this proposed paper has annotations of the 1654 aspect category in the training dataset and annotations of the 845 aspect category in the test data set.
KeumheeKang et al. [10] Proposed a novel way to identify stressed mood users by monitoring their frequent tweets for a long time. They manipulate all forms of tweets on the internet, i.e. photos, emoticons, and texts. To assess the validity of the proposed method, two types of experiments were performed: 1) the proposed multimodal approach has been validated with several tweets and its output has been compared with SentiStrength; 2) it has been used to identify 45 mental states of users as depressive and non-depressive. The experimental results indicated that the proposed method of multimodal analysis has higher precision than existing methods, and it can more accurately predict the moods of individuals.
Rongrong et al. [11] Proposed approach to the approaches to visual sensation research. This is presented with a survey that describes the different techniques used for the study of visual sentiment. In this kind of research, photographs are used to assess the person's feelings. The survey concentrated largely on cutting-edge approaches that are used in the field of image analysis. This survey explains the current researcher framework since research is done mainly on the text, but the ontology of visual sentiment is a new concept for doing something else. The principle of deep learning is useful in the firm's successful visual sentiment analysis.
P. D. Turney et al [12] proposed a supervised learning algorithm that classifies the analysis as thumbs up and thumbs down. The mean semantic orientation is used to determine a review's classification. The positive and negative interaction with the review is indicative of the review's orientation. The semantic orientation is determined using PMI-IR which is this research's core step. The proposed algorithm provides different accuracy on different tweet forms including 74 percent on movies, 80 percent on banks and vehicles, and 84 percent on traveler reviews.
R. Socher, et al. [13] worked on the prediction of label distribution through the sentence level using a new approach based on the estimation of the sentence level in the recursive autoencoder. The suggested work is done on the criteria for improving feeling and lexica. The dataset used in this study is personal user stories that have been annotated with several labels and aggregated from multinomial distribution capturing the emotional responses.
A.-M. Popescu et al. [14] proposed an unmonitored method of extraction of information used to derive the opinion from the comments. This work is done in the section below. Firstly, the product's characteristics are identified secondly, the product-related opinion is established, and thirdly, the opinion polarity. The final step of the proposed methods will be to rate opinions based on their power. Use the relaxation marking method specifies the semantic orientation. The tests of the suggested approach's accuracy and recall indicate the success in recognizing sentiments.
M. Abdul-Mageed et al. [15] worked on standard Arabic data for a study of sentiments. In this collection of work, data is collected, and then the automatic classification phase in which tokenization is performed on the data is started. The method of classifying the two stages is carried out on the data set. The outcomes of the proposed solution are evidence of the method's effectiveness.
Donglin et al. [16] worked on approaches to visual sense analysis. This is presented with a survey that describes the different techniques used for the study of visual sentiment. In this kind of research, photos are used to assess the person's feelings. The survey focused essentially on cutting-edge methods that are used in the process of image analysis.
Gitanjali et al.[17] discussed text classification is a basic approach to text mining and the processing of natural languages. In the previous usage, classifiers use human interface features such as frequency base and n-gram features that cannot find non-linearity in features and increase variance in features that directly impact classifier performance. The convolution-based approach refines the traditional features in the layered approach by an activation function. This method improves the effective learning pattern that is learned by logistic regression and is optimized through the boosting approach. The results showed that the proposed CNN-Logistic regression method significantly improves the accuracy due to improving feature pattern.
2.2 Review of encryption techniques
Sehra et al. [18] discussed that the role-based approach of access control is an efficient way of managing information access and reducing ambiguity in large network applications. It also helps to lower safety costs in large applications. In this RBAC work policy, on the cloud as migration policy is considered, which allows the user to migrate the database schema with effective security. Restriction policy helps limit the number of cloud-based transactions. The new backup and restore policy is being introduced to provide the data lost and restore policy helps to recover the data even if the local system crashes and the migration policy helps to transfer data from one cloud to another using XML.
Almorsy et al. [19] proposed a cloud security management system based on the FISMA standard that enables security certification for customers and cloud providers. To control the protection it improves collaboration or cloud providers and service users. Using that method is applied. NET platform, and SaaS network security management.
Yibin et al. [20] presented a smart cryptographic approach that allows the cloud provider not to access partial data. This method separates the file into subfiles and stores certain files on cloud servers distributed. Another strategy is also proposed for determining when to split the data packets to reduce running time.
Diwan et al. [21] proposed various cryptographic algorithms that were compared and taken into account to ensure the confidentiality of the data. In these various cryptographic algorithms, different parameters such as block size, key length type, and characteristics are compared. He provided the idea of a different cryptographic algorithm that can be used to ensure data security in the cloud.
Sood et al. [22] A hybrid solution providing data security in cloud computing was suggested. In this job, various techniques are combined to provide successful protection from the sender to the ends of the receiver. Data security is given to the user based on confidentiality, honesty, and availability of the information. The safe socket layer provides data protection using the encryption method, and integrity is provided by Media Access Control. Using the login Id and password method to all users will enhance the protection.
Sengupta et al. [23] discussed a Cloud computing protection framework using cryptography. For this work, the cryptography is performed using the form of hybrid Ceaser cipher encryption. This offers security at the client, server, and network location for the cloud. That method provides hackers with effective security.
Somani et al. [24] proposed an RSA algorithm used to ensure confidentiality as part of protection while using Digital Marks to improve security by verifying it through Digital Signatures. The solution used five-stage carryout encryption. The key is generated in an initial step. In the second step, advanced labeling is carried out, and encryption and decoding in stage 3 and stage 4
Rewagad et al. [25] discussed the specification for maintaining the confidentiality of information placed in the cloud by manipulating the use of the computerized mark and Diffie Hellman for key exchange with the Advanced Encryption Standard encryption algorithm. Regardless of whether the transmission key is hacked, Diffie Hellman's key trade office makes it useless because the traveling key is of no use without the private key of the customer, which is only issued to the true blue customer. This proposed design of a three-way instrument makes it extreme for programmers to breach the security system, ensuring information is put in the cloud in this way.
Prabhakar et al. [26] proposed an information encryption protocol with the AES algorithm in mind. In cloud conditions, the AES approach covers the knowledge for the entire life cycle from start to finish. This encryption process uses an AES-256 encryption algorithm and a Secure Socket Layer to ensure information records in the cloud interchange. This method prevents data from being targeted by force and provides efficient protection for data in the cloud. It's not relying much on data protection and data effectiveness. The proposed approach ensures that knowledge is finished in all stages and is separated into two stages. In the first stage, information encryption is finished by AES − 256 encryption. In the second stage the client should be verified, the client sends the username and secret word to the cloud. At the point when the cloud gets the demand from the client at that point confirms the client's subtle elements, if the client is substantial at that point begin the procedure of information recovery.