A novel Security Aware Sensitive Encrypted Storage approach to improve the encryption of big data

The data security and privacy have become a critical issue that restricts many cloud applications. One of the major concerns about security and privacy is the fact that cloud operators have the opportunity to access sensitive data. This concern dramatically increases user anxieties and reduces the acceptability of cloud computing in many areas, such as the nancial industry and government agencies. This paper focuses on this issue and proposes an intelligent approach to cryptography, which would make it impossible for cloud service operators to reach sensitive data directly. The suggested method divides the le with precision using an intelligent classication technique. An alternative approach is designed to determine whether data packets need splitting to shorten operating time and reduce storage space. Our experimental assessments of both safety and eciency performance and experimental results show that our approach can effectively address major cloud hazards and that it requires an acceptable computing time using an intelligent machine learning classication technique. We have proposed a novel approach entitled as a model for Security Aware Sensitive Encrypted Storage (SA-SES). In this model, we used our proposed algorithms, including Convolution Neural Network with Logistic Regression (CNN-LR), Elliptic-curve Die–Hellman-Shifted Adaption Homomorphism Encryption (ECDH-SAHE) and Elliptic-curve Die–Hellman-Shifted Adaption Homomorphism Decryption (ECDH-SAHD) .


Introduction
Cloud computing is a very popular and successful technology in this period. Earlier, applications were built on the local server, but if the local network was blocked, the whole system and the application got failed immediately. Cloud Computing came into use to solve this problem and to store data digitally. So many famous companies like Google, Microsoft, Amazon, Facebook, and others have their clouds [7].
Because of minimal investment, low costs and so many different access services, many companies are shifting into the cloud. Cloud Computing offers services such as application services (e.g., SaaS), operator platform (e.g., PaaS), and operator-service network (e.g., IaaS).
Most companies that provide cloud services like Amazon(AWS), Dropbox, Google Drive, and One Drive for Microsoft offer different storage services packages and adjustable cloud storage spaces for customers.
However, the security problem that cloud operations create remains a problem for the use of cloud services. Many cloud users care about their sensitive data accessed by cloud operators. Security risks create a lot of problems in cloud computing's direction of progress.
In cloud storage, users do not know the physical location of the data because they stored it on unknown servers where there is always a chance of user's private data getting leaked. This research shows a security architecture for cloud security. This system helps build a partnership between cloud service providers and consumers to effectively manage security. Big data is safe in cloud computing using the Smart Cryptography approach. In this method, the le is divided into pieces or packets and these packets are is distributed to the cloud servers and stored there. Another method is often used in this work, which is to nd out the data packet required to break to get less operating time. This method provides good security services with an effective time of computation [9].

Types of classi cation algorithms
Classi cation is a Supervised method used for learning, which is used in machine learning and statistics. Classi cation is done using the principle of learning based on the data input provided to it. It classi es the data based on bi-class and multiclass such as male or female gender classi cation, classi es the emails in the spam or not spam. In Machine Learning, there are various types of classi cation algorithms: Naive Bayes Classi er (Generative Learning Model): The classi er Naïve Bayes is based on the Bayesian probability theorem. It is a Supervised Learning Algorithm used for classi cation purposes. It solves the problem in attributes of both continuous and categorical nature.
• It is used mainly in word detection and spam ltering.
• This classi er was also used in recommendation-based systems.

Logistic Regression (Predictive Learning Model):
The logistic regression technique is used to assess the data set outcomes in which one or more independent variables are present. The outputs were only calculated for two possible outcomes.
Decision Trees: In this approach, data is divided into sub-nodes into a tree-like structure and allows the model for regression and classi cation. The tree nodes are linked to each one and give the decision in the tree form.
Tree subnodes are called leaf nodes.

Random Forest:
The logistic regression technique is used to assess the data set outcomes in which one or more independent variables are present. The outputs were only calculated for two possible outcomes.

Neural Network:
The neural network is based on the neuronal biological method. This network consists of the basic unit neurons, which are arranged in a layer and modify the input according to the decided amount and submit the output. This is used to nd out the classi cation and patterns. Neural Network is capable of getting the relevant data from the complex data. It is very di cult for humans and also computer techniques to get information from complex data. Solving that problem is a solution.
Nearest Neighbor: This algorithm is used based on similarity to store the current cases and to use those cases for potential classi cation. It is mostly used in statistical estimation and pattern recognition. It classi es the data by the nearest class of neighbors.

Encryption Techniques
Encryption is a technique that is used to safeguard the data. In encryption, meaningful data are turned into meaningless data that the normal person cannot understand. It is often found in military and other data centers where classi ed data is stored for data protection purposes. Below are the different algorithms that can be used in the data encryption process.

Triple DES
The more up-to-date, the improved version of DES is Triple DES or 3DES as it is written now and again, and its name suggests what it does. In three stages it runs DES three times on the information: scramble, unscramble, and then encode again. It doesn't give the e ciency of the cipher a triple increase.

RSA
RSA is an asymmetric cryptographic algorithm used to encrypt the message, without separately exchanging the private key. Uses the principle of large numbers factoring.

Blow sh
Blow sh has a 332-448 bits variable-length key and is a 64-bit square number. The two techniques consisting of the Blow sh algorithm are the introduction of the key and the scrambling stage of the details. In the rst stage, a client variable key is consumed to 4168/8336-byte sub-key varieties, which is given a 4-byte component cluster size.

Two sh
This is a method of symmetric encryption in which two blow sh algorithms are combined for effective security. This algorithm matches the length of the keys up to 256 bits but only one key is used in the encryption process.

AES
Advanced Encryption Standard is an encryption algorithm that is based on an asymmetric key, and it works effectively on hardware and software. It supports 128bit, 192bit, and 256bit block capacity. It obeys the substitution-permutation network theory.
The rest of this paper follows the structure described below. Recent work related to this work is discussed and summarized in Sect. 2. We also provide a motivating illustration to illustrate the method of execution in Sect. 3. Also, the proposed model and the main principles used in the model are set out in Sect. 4. Section 5 then interprets the main algorithms with pseudo-codes and algorithm explanations. Besides, we are testing our proposed model through expert demonstrations in Sect. 6. Finally, our conclusions are set out in Sect. 7.

Related Work
This section summarizes recent research achievements in the eld of big data classi cation and cloud security issues that help our research history and theoretical foundation representation.

Review of Classi cation Techniques
Zardari et al. [1] discussed that the data classi cation method depends on the con dentiality of the data. The KNN algorithm is used to identify the data as per the safety needs. It classi es the data into the sensitive and non-sensitive type that presents the data's need for protection. The RSA algorithm is used for encryption, and the CloudSim Simulator simulates this.
Moghaddam et al. [2] discussed that a variable data classi cation index must be used to ensure cloud data security and privacy. The index value is determined through the use of various parameters and the key parameters are con dentiality, honesty, and availability.
Zardari et al. [3] discussed the cloud computing issue of data protection and data recovery. Such problems are addressed by using the data and cloud model classi cation. The challenge is to solve it with data classi cation using the hybrid multi-cloud model. This model is worked on multiple clouds, grouping, and various cluster numbers.
Tawalbeh L et al. [4] presented a classi cation-based model that provides safe cloud computing. This model reduces the overhead and processing time included in the safety mechanism. For variable key sizes, it determines the protection at a different level. The proposed model is evaluated with different safety measures and produces positive results with high e ciency in the proposed work.
Shaikh et al. [5] proposed an approach of classi cation based on different parameters. The different dimensions are de ned by those parameters. The security of the data can be given according to the level of protection needed. The proposed method solves the issue of privacy security and data leakage.
Zardari et al. [6] proposed that the K-nearest neighbor classi er suggested the con dentiality of the data in cloud-based data. The method is extended to the virtual cloud, and the data are categorized according to its security needs. KNN classi er classi es the data into two sensitive and non-sensitive classes. The data classi cation discusses which data would need to be more secure.
Balachandran et al [7] Discussed that the most challenging task in today's scenario is to choose the right Institute. Any student browsing through the social network sites for the reviews, ratings about the speci c institution to get the approximate information about the particular institute. But the statistical dimension from the feedback is hard to examine. In this Aspect based Sentiment Analysis is applied directly to the comments that offer us negative and positive feedback of the institution in question. The different techniques are used to classify aspects such as NLP-based methodology, Machine Learning based (ML), unsupervised approach, Dictionary-based, Corpus-based approach. The NLP and the ML classi er give the best possible results analytics to classify each factor into their respective category.
Hagge Marvin et al. [8] described that the micro-blogging service identi ed consumer sentiment i.e. This is Facebook. Twitter is a social media on which every person expresses their opinions. Consumer preferences are the basis for evaluating the view of customers of the individual product. In this user view, aspect-based sentiment analysis is done by part-of-speech tagging, and, for exchange for its excerpts, the optimistic, neutral, negative aspects of the tweets are parsed directly from natural language processing. The software toolkit in the proposed approach is designed to extract the tweets rst, then lter to evaluate the polarity of the feelings and then show the result. In this people over the web platform will rent out their homes to each other. The aspects are Airbnb, place, time, house, day, people, night, view, apartment, space, for study. The results are shown by the table. The future will operate on Airbnb's website feedback.
Pannala, NipunaUpeka, et al. [9] Existing opinion mining work de ned shall be performed at the word level, not at the sentence level. He includes the views articulated explicitly. The paper proposed is based on the quali ed data set that analyzes and offers positive, favorable, and negative feedback for different products. The Aspect-based sentiment analysis (ABSA) operates on the various aspects of the object and reveals the polarity in returns. Techniques are used for applying ABSA machine learning (ML), and Natural language (NL). The dataset used in this proposed paper has annotations of the 1654 aspect category in the training dataset and annotations of the 845 aspect category in the test data set.
KeumheeKang et al. [10] Proposed a novel way to identify stressed mood users by monitoring their frequent tweets for a long time. They manipulate all forms of tweets on the internet, i.e. photos, emoticons, and texts. To assess the validity of the proposed method, two types of experiments were performed: 1) the proposed multimodal approach has been validated with several tweets and its output has been compared with SentiStrength; 2) it has been used to identify 45 mental states of users as depressive and non-depressive. The experimental results indicated that the proposed method of multimodal analysis has higher precision than existing methods, and it can more accurately predict the moods of individuals.
Rongrong et al. [11] Proposed approach to the approaches to visual sensation research. This is presented with a survey that describes the different techniques used for the study of visual sentiment. In this kind of research, photographs are used to assess the person's feelings. The survey concentrated largely on cutting-edge approaches that are used in the eld of image analysis. This survey explains the current researcher framework since research is done mainly on the text, but the ontology of visual sentiment is a new concept for doing something else. The principle of deep learning is useful in the rm's successful visual sentiment analysis. P. D. Turney et al [12] proposed a supervised learning algorithm that classi es the analysis as thumbs up and thumbs down. The mean semantic orientation is used to determine a review's classi cation. The positive and negative interaction with the review is indicative of the review's orientation. The semantic orientation is determined using PMI-IR which is this research's core step. The proposed algorithm provides different accuracy on different tweet forms including 74 percent on movies, 80 percent on banks and vehicles, and 84 percent on traveler reviews. R. Socher, et al. [13] worked on the prediction of label distribution through the sentence level using a new approach based on the estimation of the sentence level in the recursive autoencoder. The suggested work is done on the criteria for improving feeling and lexica. The dataset used in this study is personal user stories that have been annotated with several labels and aggregated from multinomial distribution capturing the emotional responses.
A.-M. Popescu et al. [14] proposed an unmonitored method of extraction of information used to derive the opinion from the comments. This work is done in the section below. Firstly, the product's characteristics are identi ed secondly, the product-related opinion is established, and thirdly, the opinion polarity. The nal step of the proposed methods will be to rate opinions based on their power. Use the relaxation marking method speci es the semantic orientation. The tests of the suggested approach's accuracy and recall indicate the success in recognizing sentiments.
M. Abdul-Mageed et al. [15] worked on standard Arabic data for a study of sentiments. In this collection of work, data is collected, and then the automatic classi cation phase in which tokenization is performed on the data is started. The method of classifying the two stages is carried out on the data set. The outcomes of the proposed solution are evidence of the method's effectiveness.
Donglin et al. [16] worked on approaches to visual sense analysis. This is presented with a survey that describes the different techniques used for the study of visual sentiment. In this kind of research, photos are used to assess the person's feelings. The survey focused essentially on cutting-edge methods that are used in the process of image analysis.
Gitanjali et al. [17] discussed text classi cation is a basic approach to text mining and the processing of natural languages. In the previous usage, classi ers use human interface features such as frequency base and n-gram features that cannot nd non-linearity in features and increase variance in features that directly impact classi er performance. The convolution-based approach re nes the traditional features in the layered approach by an activation function. This method improves the effective learning pattern that is learned by logistic regression and is optimized through the boosting approach. The results showed that the proposed CNN-Logistic regression method signi cantly improves the accuracy due to improving feature pattern.

Review of encryption techniques
Sehra et al. [18] discussed that the role-based approach of access control is an e cient way of managing information access and reducing ambiguity in large network applications. It also helps to lower safety costs in large applications. In this RBAC work policy, on the cloud as migration policy is considered, which allows the user to migrate the database schema with effective security. Restriction policy helps limit the number of cloud-based transactions. The new backup and restore policy is being introduced to provide the data lost and restore policy helps to recover the data even if the local system crashes and the migration policy helps to transfer data from one cloud to another using XML.
Almorsy et al. [19] proposed a cloud security management system based on the FISMA standard that enables security certi cation for customers and cloud providers. To control the protection it improves collaboration or cloud providers and service users. Using that method is applied. NET platform, and SaaS network security management.
Yibin et al. [20] presented a smart cryptographic approach that allows the cloud provider not to access partial data. This method separates the le into sub les and stores certain les on cloud servers distributed. Another strategy is also proposed for determining when to split the data packets to reduce running time.
Diwan et al. [21] proposed various cryptographic algorithms that were compared and taken into account to ensure the con dentiality of the data. In these various cryptographic algorithms, different parameters such as block size, key length type, and characteristics are compared. He provided the idea of a different cryptographic algorithm that can be used to ensure data security in the cloud.
Sood et al. [22] A hybrid solution providing data security in cloud computing was suggested. In this job, various techniques are combined to provide successful protection from the sender to the ends of the receiver. Data security is given to the user based on con dentiality, honesty, and availability of the information. The safe socket layer provides data protection using the encryption method, and integrity is provided by Media Access Control. Using the login Id and password method to all users will enhance the protection.
Sengupta et al. [23] discussed a Cloud computing protection framework using cryptography. For this work, the cryptography is performed using the form of hybrid Ceaser cipher encryption. This offers security at the client, server, and network location for the cloud. That method provides hackers with effective security.
Somani et al. [24] proposed an RSA algorithm used to ensure con dentiality as part of protection while using Digital Marks to improve security by verifying it through Digital Signatures. The solution used vestage carryout encryption. The key is generated in an initial step. In the second step, advanced labeling is carried out, and encryption and decoding in stage 3 and stage 4 Rewagad et al. [25] discussed the speci cation for maintaining the con dentiality of information placed in the cloud by manipulating the use of the computerized mark and Di e Hellman for key exchange with the Advanced Encryption Standard encryption algorithm. Regardless of whether the transmission key is hacked, Di e Hellman's key trade o ce makes it useless because the traveling key is of no use without the private key of the customer, which is only issued to the true blue customer. This proposed design of a three-way instrument makes it extreme for programmers to breach the security system, ensuring information is put in the cloud in this way.
Prabhakar et al. [26] proposed an information encryption protocol with the AES algorithm in mind. In cloud conditions, the AES approach covers the knowledge for the entire life cycle from start to nish. This encryption process uses an AES-256 encryption algorithm and a Secure Socket Layer to ensure information records in the cloud interchange. This method prevents data from being targeted by force and provides e cient protection for data in the cloud. It's not relying much on data protection and data effectiveness. The proposed approach ensures that knowledge is nished in all stages and is separated into two stages. In the rst stage, information encryption is nished by AES − 256 encryption. In the second stage the client should be veri ed, the client sends the username and secret word to the cloud. At the point when the cloud gets the demand from the client at that point con rms the client's subtle elements, if the client is substantial at that point begin the procedure of information recovery.

Motivational Example
An example of motivation in this section shows the important part of the suggested model, which is to secure data packets with sensitive information. The method consists of broken data packet and data packet retrievals. This situation is taking place in the nancial sector, where sensitive information on cloud users' needs to be strongly secured. The data volumes have exploded and a huge amount of data has been generated in the last two years than in the human race's entire history. In big data, the major challenge is resources, because static and non-adjustable resources cannot support big data. Therefore, developing a suitable classi cation method is the key requirement.
In this research work, a unique classi cation strategy will be proposed to resolve the different problems with current techniques in use. The program suggested will use innovative machine learning methods and cryptography to handle the big data. Firstly, data will be categorized in sensitive and non-sensitive and classi ed without calculating the data non-linearity and dynamic information. The duration of encryption and the storage will be di cult because, after encryption, capacity still increases. So, the size of the data for encryption should be improved. The performance of classi cation will be increased with the use of the approximating function. This scheme will be capable to protect user data, as the main value is generated at random and no content information is contained in any split data. Attackers(hackers) are unable to get sensitive information even if they focus on details.
From an industry point of view, this scheme will be a very good method as the data protection system will be helping to categorize data to protect critical, sensitive, and classi ed information. If sensitive data is not managed properly, organizations will have to pay ne for breaking laws and regulations and may face nancial loss or damage to reputation. From a society's point of view, the scheme of classifying big data will be cost-effective, as it will decrease the computation cost and will provide a more useful and effective method to information technology security. This phase of the proposed work will be focused on the model of secure data classi cation, which will further be based on the level of sensitivity of the data and graded according to this point. Precision, recall, and accuracy are evaluated in this process.

Phase 2
This process will only encrypt and save sensitive data in the cloud, and use the same server to store the non-sensitive data for e cient data use. In this step, the encryption Technique, calculation-time, and storage space will be analyzed.

Phase 3
This proposed research process would aim to achieve better results than current algorithms by using accuracy, time, and parameters as well as improving cloud data security and integrity as well as enhancing total execution time and reducing overall storage space.

Methods
In this section, we'll present descriptions of our proposed algorithms. Our proposed model is supported by three main algorithms, including the Convolution Neural Network with Logistic Regression (CNN-LR), Di e-Hellman-Shifted Adaptation Homomorphism Encryption (ECDH-SAHE), and elliptic curve Di e-Hellman-Shifted Adaptation Homomorphism Decryption (ECDH-SAHD). The sections below explain the detailed structure of the algorithms respectively.

Convolution Neural Network with Logistic Regression (CNN-LR) algorithm
CNN-LR algorithm uses Convolution Neural Network with Logistic Regression to present the pseudo-code for the proposed method. Algorithm input is tweet text, rst, it is pre-processed and then the function is extracted and the rst step is to reduce non-linearity through the mechanism of convolution, pooling, and activation after the part of learning begins. Then use logistic regression. Then measure loss and accuracy in different numbers of EPOCH which improves the accuracy and reduces the loss iteratively.
In the following statement, we de ne the principal steps of Algorithm 1:  proposed scheme was 4 and 5 more effective than other techniques and Fig. 7 shows that ECDH-SAHE takes less time than other encryption techniques.
6.1 Reason for Selecting CNN-LG: • As we go to KNN to Neural Network, in experimental analysis. Table 1 shows that the neural network enhances outcomes that enable the selection of the layered network of convolution.
• Features rely on linear structures in machine learning approaches and there is no nonlinearity.
• The classi cation by activation function is optimized in machine learning.

Discussions
The output of the suggested and current solution is shown in Fig. 3 and Fig. 4, based on different dataset sizes. The range of the experiment goes from 50 MB-512 MB. The resulted graphs show that the suggested technique highly improves encryption, decryption time, and safety. When we compare the performance of a small dataset and large dataset, we may nd that increase in size only contains such overhead. Therefore, when the data size increases, storage and time don't increase as much. This experiment offers great bene ts for a large dataset and the output of this solution is completely checked. The change in this solution is seen from the following reasons: • This removes overhead time using binary stream rather than unary text stream.
• Improves storage by moving to the left during encryption and to the right during decryption.
• Shifting analysis generally depends on binary streaming and it depends on security-based improvement indirectly.

Conclusions
This paper is based on the topic of cloud data management and found a solution that doesn't allow cloud users to access private data from the customers. To meet this goal, we introduced a unique solution named Security Conscious Sensitive Encrypted Storage (SA-SES) model. In this model, we used our proposed algorithms, including Convolution Neural Network with Logistic Regression (CNN-LR), Elliptic-curve Di e-Hellman-Shifted Adaption Homomorphism Encryption (ECDH-SAHE), and Elliptic-curve Di e-Hellman-Shifted Adaption Homomorphism Decryption (ECDH-SAHD). Our tests have shown that our suggested scheme is capable to protect the main cloud-side problems. The paper is further focused on the topic of encryption and offered a summary of the turnaround time for data recovery, although the range of the data was limited. The time used for decryption was close to data encryption. Our suggested method provided a shorter turnaround time than the other strategies that are already in use. Future research must deal with the problem of data replication to improve the amount of data access, as data recovery will be failed due to any upgrade in the data center.

Declarations
Availability of data and material: Dataset have been taken from twitter.com. Twitter API has been used to obtain the twitter dataset.
Competing interests: The authors declare that they have no con ict of interest.
Funding:-No Funding for this editorial support was taken by anyone except your journal and they have said 20% discount will be given to me.
Authors' contributions: 1 lakh tweets have been taken and novel technique has been proposed that will rstly classify data into sensitive and non sensitive part with more accuracy and then sensitive data will be encrypted with less storage space and time. So this model could be used by industry or society also.