Hybrid Encryption for Securing HDFS Data in Big Data Environments

doi:10.21203/rs.3.rs-4443306/v1

Download PDF

Research Article

Hybrid Encryption for Securing HDFS Data in Big Data Environments

https://doi.org/10.21203/rs.3.rs-4443306/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

In the previous era, Conventional encryption schemes that work on file were linked with a lot of processing overhead that negatively impacted performance in the Hadoop framework. But now, With the help of the MapReduce framework or parallel processing method, this paper suggests a unique solution for securing large data and boosting performance: A hybrid encryption technique that combines the Twofish and AES algorithms with the map-reduce framework inside the HDFS (Hadoop Distributed File System) environment. Different conventional methods exist like Twofish + RSA, and AES + Twofish as hybrid encryption that is not decent for dealing with large data, our solution drastically improves performance through parallel encryption by mapper-reducer processes and advances the security of HDFS storage data. In this paper, we use hybrid encryption using AES with Twofish because Twofish provides better data security, and AES is used for optimizing the speed of data encryption, and decryption. Empirical findings validate the suggested approach's effectiveness in protecting private information kept in HDFS and improving performance in terms of speed of encryption ≅ 2–3%, throughput ≅1–2%, and efficiency ≅ 1–2% parameters. This work enhances the performance and reduces the risks associated with unwanted access to important data assets by data security in HDFS-based systems.

Hadoop

HDFS

AES

Twofish

RSA

Map-Reduce

Massive data collection is now feasible because of recent technologies like cloud computing, social networks, data analytics, and IoT. On the other hand, if data are to be utilized to their fullest potential, the safety and privacy of the data are necessary. Data safety and privacy have been extensively studied during the previous three decades. However, there are new data security and protection issues, which open up challenging new research directions. Some of these issues are brought on by growing privacy concerns about using such vast amounts of data and the requirement to balance privacy and data use. Additional issues emerge due to the increased possibility of attacks posed by the placement of new smart devices, including those found in IoT systems. We need to augment and alter such early solutions to give Big Data a full range of data protection options [1].

The three primary security properties data confidentiality, integrity, and availability, or the "CIA triad"—must be guaranteed to secure data [2]. Data security from unwanted read accesses is referred to as confidentiality, although Integrity means protection of data against unwanted changes. The term "data integrity" has been extended to include "Data reliability," which is the assurance that information is accurate, current, and derived from reliable sources in addition to not being altered by unauthorized parties. Thus, ensuring data reliability is a challenging issue that varies based on the application context. Combining various methods is necessary to find a solution. These techniques include data quality methods, which identify and correct data errors, provenance techniques, which identify the data sources, reputation techniques, which evaluate the reputation of data sources, and cryptographic methods for digital signature. The last characteristic of availability is promising that information is accessible to authentic users. These trio needs remain crucial, and fulfilling them has become more difficult in the modern era due to the sophistication of data assaults and the expansion of the data attack surface brought about by an increase in data-gathering activities from a variety of sources and data sharing. Apart from the properties held by the CIA, privacy has become another essential demand. Information privacy has garnered significant attention due to the development of more sophisticated technology [3–4].

A framework called Hadoop is a strategy put forth by Google to make it easier to create and distribute submissions over millions of nodes and Petabytes of data [5]. Each node provides computational power and storage, and the system was built as a collection of machines that may hold more than thousands of nodes. Large data should therefore be able to be processed and stored by the cluster in a reasonable amount of time and money [6].

Map-Reduce nodes for data processing, which divide and backup data to a collection of nodes, and data storage using HDFS comprised the initial Hadoop project [7]. One of the most significant issues with Hadoop is that it is not always possible to guarantee the safety of user data.

Therefore, users are obligated to protect their data in HDFS. Nowadays DES, RC4, MD5, AES, and RSA are mostly used techniques to secure data in Big Data environments but each unique encryption algorithm has some weaknesses.

The leading motives of this work are to find a solution to the data security problem in Hadoop and optimize the performance-related issues when securing the data in a big data environment. This has been fulfilled by suggesting a hybrid encryption mechanism that integrates, HDFS files with two security encryption algorithms (i.e., Twofish and AES) and a map-reduce framework.

This article presents a novel concept for expanding the hybrid encryption method using a distributed cluster. The host computer is utilized in the process of doing distributed cooperative encryption, which has the potential to successfully address the issue of slow encryption performance and storage bottlenecks that are associated with a single computer. In this approach, the proposed encryption technique distributes mass plaintext to multiple hosts over the network and builds whole logic to provide a quick and efficient encryption system for the customers. In a distributed system, each host is only accountable for a share of the encrypted plaintext part. Every host is encrypted with every other host. It simply communicates the result of the ciphertext between each host speeds up the encryption process, and enhances the practicability of the encryption algorithm compared with standard single-host encryption. According to the quantity of data to be encoded, the encrypted host can be deployed flexibly, encryption time can be accelerated, the load balance of the distributed encryption system cluster is controlled, and it has robust scalability.

We use hybrid encryption because this approach puts the best which enhances the security level and also minimizes the weakness of individual algorithms. In this hybrid encryption, we use Twofish because this symmetric block cipher is not intrinsically insecure against side-channel attacks and is not susceptible to a straightforward brute-force attack. AES is an insecure side-channel attack. If only security is the main objective, then we choose the Twofish algorithm. If performance is the main concern, then we choose AES. In this paper, we need both security and speed so we choose Twofish and AES as a hybrid encryption for securing high-speed large amounts of data in a Big data environment via a map-reduce framework.

Existing research has extensively explored cascading encryption algorithms to improve data security in various contexts, including cloud storage and Big Data environments. Filaly et al. (2023) proposed a hybrid encryption algorithm for information security in Hadoop based on AES, CP-ABE, and RSA encryption while the hybrid encryption method aims to enhance security for big data applications, the scalability issues related to encrypting and processing large data efficiently are not addressed in this paper. Aswathi et al. (2022) focused on securing Big Data in Hadoop using hybrid encryption (RSA and AES). This paper addresses the limitations of AES encryption. Similarly, Viswanath and Krishna (2021) focused on a hybrid encryption framework in a multi-cloud environment for securing big data storage. The researcher prevents the data from unauthorized access in the private cloud only. Negi et al. (2023) introduced a hybrid cryptographic approach (AES integrated with ECC) for secure cloud-based file storage. The key management and performance overhead are challenging issues of this paper. Lai et al. (2022), and Kumari and Malhotra (2022) explored secure storage of files on the cloud using hybrid cryptography but did not effectively implement its scheme in a real-world scenario. Chaudhari et al. (2023), a survey on hybrid cryptography for secure file storage, highlighting its relevance and effectiveness.

On the other hand, Jain et al. (2019) introduced the SMR (Secure Map Reduce) layer, and also Gupta et al. (2023) introduced the FSMR layer (Fortified Secure Map Reduce) between the HDFS and the map-reduce to secure the data but overall performance, scalability, complexity of the structure is the main issue of this approach.

However, these studies address various aspects of hybrid encryptions and when introduce an extra layer within HDFS and MapReduce framework in the big data environments. We noticed that HDFS data security and speed optimization is a significant gap in this literature.

Justifications

The selection of the current research topic is motivated by the rising importance of the security Big Data field, particularly within the HDFS framework. While existing literature has explored hybrid encryption in diverse contexts, and also adds a new layer in the Hadoop environment. There is a notable absence of studies focusing specifically on performance in securing HDFS data. This research aim is to fill this gap by evaluating and proposing a new hybrid encryption scheme tailored for safeguarding data stored in HDFS, thereby contributing to the advancement of data security and improving the performance practices in Big Data environments.

Summary of current work

A novel approach is presented through this paper, to boost the security of HDFS data and improve performance by implementing a hybrid encryption scheme combining Twofish and AES algorithms with the map-reduce framework. The proposed method utilizes MapReduce for parallel encryption, aiming to mitigate the processing overhead associated with conventional encryption techniques. The results of the experiments validate the proposed hybrid method's effectiveness, which is used to safeguard sensitive data stored in HDFS and improve performance, addressing the identified literature gap.

Proposed Methodology

The proposed hybrid encryption (combine Twofish with AES) model with a map-reduce framework [27] aims to boost the security of HDFS data in the Big Data field. The model leverages the parallel processing capabilities of MapReduce to achieve efficient encoding and decoding of data stored in HDFS. The following components constitute the proposed model:

3.1. Twofish and AES Integration:

To protect the large amounts of data stored in HDFS, the hybrid encryption technique that has been developed includes both the Twofish and AES algorithms. In comparison, to more conventional encryption methods, the double encryption operation achieves a higher level of data protection. Twofish provides strong encryption capabilities, while AES ensures compatibility and efficiency.

Nowadays, security point of view symmetric block cipher algorithm Twofish is the most popular. Twofish has 128-bit block sizes and 128 to 256-bit variable key sizes. The Blowfish algorithm [18–20] is the algorithm that came before the Twofish algorithm. The Twofish and Blowfish algorithms have mostly identical structures. Pre-computed key-dependent S-boxes and a very complicated key schedule are two of Twofish's distinguishing characteristics. In n-bit keys are separated into two parts: the key’s first half is used as encryption keys, and the key’s other half is used for modifying the algorithm. The security performance of Twofish is better than AES [17, 18]. Twofish compared with AES, both algorithms may support key sizes of 128, 192, and 256-bit. In the Twofish algorithm, 16 rounds are fixed for any key.

The difference between Twofish and AES is that the substitution-permutation network used by the AES and Twofish uses a Feistel network to encrypt the data. The structure of the Twofish algorithm is more complicated but most secure.

The most common symmetric block cipher algorithm is AES. AES includes block sizes 128-bit, key lengths 128, 192, and 256-bit, and 10, 12, and 14 rounds to encrypt the data [17, 18, 21]. AES uses different length keys to encrypt and decode data, for example, a 128-bit length key used by AES-128.

We may test how various algorithms resist to compete with themselves. During the race, the most important concerns are security, defense against attacks, and performance. Evaluating different encryption algorithms also involves evaluating, implementation, scalability, and suitability, which are both important considerations. AES is more efficient than Twofish in terms of hardware requirements. However, AES requires less memory and fewer cycles to encrypt data. Speed points of view AES is better. In this paper, we integrate both algorithms and overcome their limitation. In this way, we provide better encryption algorithms in big data environments.

3.2. Hadoop Distributed Environment

In 2006, it was a component of Lucene's subproject called Nutch, which consisted of the framework of the distributed system and became an independent project. In addition to being a distributed storage system for big data files, Hadoop also possesses a robust data processing capability [22]. A distributed framework concept large distributed computing cluster can be formed that can be installed on a low-commodity hardware device. The HDFS and MapReduce programming models are the two parts that make up the Hadoop framework [23–25]. MapReduce framework is based on the parallel processing concept for handling massive data sets [23]. The idea of a Map function based upon mapping, and reduce based on protocol. This is the most significant characteristic of the MapReduce. Developers write map-reduce programs run on the distributed system and effectively manage large data.

Firstly, the file is divided into smaller parts known as blocks and saved into different datanode. Namenode and datanode are two nodes in the Hadoop framework. All types of operations are performed between two nodes. The workflow of map-reduce architecture is as follows: In the beginning, the namenode sends the access permission request to the client if this request is granted the file name will turn to the HDFS block ID where the blocks of the file are stored. List of block IDs taken by the client where the part of the file is stored. The map-reduce framework [26] processes files with the help of a mapper and reducer. The mapper converts the file into key-value pairs [27]. This key-value pair moved forward to the new partition and sorted the data based on keys. A combiner is used as an optional to reduce the work of the reducer. The combiner combines the sorted key-value pair and counts the values of the same key. In last, partitions will divide the key-value pair again and give it to the reducer. The map-reduce framework uses a shuffle operation. This operation gives the mapper’s result to the specific reducer. In this way, the process is finished. After that, the reducer works on the reduced function and uploads the output to the HDFS again. Figure 1 displays a Hadoop MapReduce architecture Ahmed et al. (2020).

3.3. Data Encryption Process

Data stored in HDFS is encrypted using the proposed scheme before being written to disk. Initially, a substantial amount of plaintext material, going to be encoded is cut up into sequential parts of plaintext. After that, the sequence components of plaintext are recognized in reverse chronological order. The first part of plaintext is identified as 1, and the next part with the number 2 is found after it. In this approach, the final components of plaintext are identified as 3.

The plaintext components and hybrid encryption technique subsequently their identification are allocated to the server in chance, and each job is truly a proposed scheme operation shown in Fig. 2. The steps are as follows: individual part of plaintext is encrypted with the proposed scheme in the map layer. Firstly, the plaintext part is encrypted with a Twofish key. The Twofish key is generated using its generator. Then this encrypted data is again encrypted with the help of the AES key. In this way, each plaintext parts are encrypted, and the correspondence of elements to that part is obtained simultaneously, with the help of the proposed encryption. Assuming that the ciphertext part grouping identities and plaintext parts are equal, the task is to recognize the encoded parts, depending on the identifiers allocated to their plaintext parts. The task completes its operation, sending the encrypted parts to the 4th host for execution. After getting the ciphertext portions of all Map jobs, the Reduce job initially arranged the ciphertext parts in sequence according to the ciphertext part grouping identities and obtained the merged ciphertext. Finally, the ciphertext is kept in HDFS, as shown in Fig. 2.

The Map-reduce framework-based distributed proposed encryption scheme is presented. In the standard encryption method, one computer is used to encode the data but in the parallel encryption method, multiple servers are used simultaneously to encrypt and calculate the data. This method is suitable for carrying out the encryption operations for huge plaintext data which is set in a manner in which both are efficient and quick.

3.4. Key Management:

The model includes a key management system for storing securely, generating, and distributing encryption keys for both Twofish and AES algorithms. Key rotation and revocation mechanisms are implemented to guarantee the security of encryption keys.

3.5. Data Decryption Process:

Encrypted data retrieved from HDFS is decrypted using the corresponding decryption keys. First, the large amount of encoded data sets is to be separated into 3 ciphertext chunks in series. Later, the parts of ciphertext that have been detected in consecutive order are additionally identified in sequential order, which means that the initial ciphertext is 1, and the next is 2 respectively. In this style, the last ciphertext part is identified as 3. The ciphertext part and proposed decryption scheme are given to the server to achieve tasks, and each job is decrypted by the map-based proposed decryption scheme. In this proposed scheme firstly, the ciphertext part is decrypted through the AES key then the Twofish key. In the decryption scheme to decode the ciphertext parts and get the matching plaintext parts, the task must first recognize the extracted parts according to their ciphertext parts, that is, the plaintext part grouping identities and the ciphertext parts are identical. Then the job will be forwarded to the 4th host for execution. After receiving the plaintext parts of all the Map jobs, the Reduce job initially arranged the plaintext parts in order according to the plaintext part grouping IDs and gained the merged plaintext data. Finally, the plaintext data is put into HDFS, as shown in Fig. 3.

3.6. Security Mechanisms:

This model incorporates security mechanisms to protect against possible attacks, including data interception, tampering, and unauthorized users. The proposed hybrid encryption model offers a comprehensive approach to securing HDFS data in a big data environment, leveraging the strength of Twofish and AES algorithms while harnessing parallel processing capabilities of a map-reduce framework for efficient processing. This scheme also improves the speed of encryption and decryption.

3.7. Pseudocode of Proposed Encryption Scheme

Map1(key, value):

// Generate Twofish key

twofishKey12 = generate TwofishKey1()

// Encrypt data using Twofish

encryptedTwofishData12 = TwofishEncrypt1(value, twofishKey12)

// Generate AES key

aesKey12 = generateAESKey1()

// Encrypt Twofish-encrypted data using AES

encryptedData12 = AESEncrypt(encryptedTwofishData12, aesKey12)

// Emit encrypted data with its associated key

emit(key, encryptedData12)

Reducer1(key,value)

//combine and sort encrypted data

Aggregrate = 0;

// read each line

for line in sys.stdin:

lineN = line.strip()

lineClass1 = EncryptedN(lineN)

aggregation + = lineClass1

//print ciphertext

Pseudocode of Proposed Decryption Scheme

Map2(key, values):

// For simplicity, assume only one value per key in this example

encryptedData12 = values[0]

// Decrypt data using AES

decryptedTwofishData12 = AESDecrypt(encryptedData12, aesKey12)

// Decrypt Twofish-encrypted data

decryptedData12 = TwofishDecrypt(decryptedTwofishData12, twofishKey12)

// Output decrypted data

emit(key, decryptedData12)

Reducer2(key,decryptedData12)

//combine and sort the decrypted data

Aggregrate = 0;

//Read each line

for line in sys.stdin:

lineN = line.strip()

lineClass1 = decryptedN(lineN)

aggregation + = lineClass1

//print plaintext

In this paper, proposed hybrid encryption is applied to the distributed computing environment. The MapReduce framework is based on the map and reduce function. For data splitting, inputting, and outputting task data, the data processing engine is answerable. The runtime environment of map-reduce offers support for the operation of programs.

In this paper, an experiment was carried out on the Ubuntu operating system, which uses JDK version 1.8, the Hadoop version 3. This experiment takes the plaintext data as input and fine-tunes it to the file size required for continuous replication.

In this paper, we measure the performance based on the encryption speed, the efficiency of the proposed scheme, and the throughput time for the encryption process.

1. Encryption Speed of proposed encryption (Twofish + AES) versus Twofish + RSA, AES + Twofish:

To naturally compare the performance of encryption between the proposed (Twofish + AES) and the standard scheme (Twofish + RSA) and (AES + Twofish). The plaintext 64 KB, 128 KB, 256 KB, 512 KB, and 1024 KB data size consequently test set up. Encryption and decryption experiment trials 10 set up are carried out for the proposed and standard technique. In conclusion, the average execution time of the method is determined, and the outcomes are shown in Fig. 4. The rise of plaintext data and the rise of the encryption time of the proposed scheme is significantly smaller than that of the standard scheme observed from graphics.

When 1024 KB plaintext, the time consumed on the standard algorithm is about 2 times that of the distributed proposed technique. It is demonstrated that the speed of encryption of the proposed method is highly amazing.

2. The performance of the Encryption Process changes with the amount of plaintext:

This test will evaluate the performance of the encryption process by the size of plaintext and the proposed algorithm running time. By adjusting the size of the plaintext, we compute the running time of the proposed scheme. It can be detected from Table 1, that when encrypted host nodes are 3, the distributed proposed encryption algorithm running time to encrypt different plaintext data blocks is significant. Compared with the previous hybrid encryption technique, the distributed proposed encryption algorithm can grip both large plaintext data and an entirely short waiting time. plaintext data size-based efficiency also increases. It illustrates that the more encrypted plaintext data, the larger the encryption efficiency. Moreover, we can see in Fig. 5. that with the rise of plaintext data size, encryption efficiency is also improving slowly. So, the distributed proposed encryption scheme is well suited. Encryption Efficiency formula is given as

Efficiency of Encryption (KB/nano sec) = Plaintext Data size / Encryption Time

3. Throughput time for encryption of proposed encryption versus standard scheme:

When calculating the encryption time throughput, encryption time is a common factor related to the speed. For the encryption technique, the throughput is determined by dividing the total plaintext in megabytes encrypted by the entire amount of time required for each algorithm to encrypt the data through encryption. The power consumption of this encryption method is reduced in proportion to the rise in the throughput value at which it is implemented.

When the encryption step is reached, the findings of the simulation for this compassion point are displayed in Fig. 6 and Table 1. The results show the advantage of the distributed proposed encryption algorithm in big data environments.

Throughput of the proposed Encryption algorithm (MegaByte/Sec) = Total size of plaintext / Total time of Encryption

Table 1

Distributed Proposed encryption schedule
Plaintext Size (kb)	Proposed Hybrid Scheme Encryption time (nsec)	Twofish + RSA Encryption time (nsec)	AES + Twofish Encryption time (nsec)	Proposed Scheme Encryption efficiency (KB/nsec)	Twofish + RSA Encryption efficiency (KB/nsec)	AES + Twofish Encryption efficiency (KB/nsec)
64	3203160	3235260	3233170	1.99802	1.978202	1.979481
128	4721620	4745870	4734712	2.710933	2.697081	2.703437
256	21420502	21740722	21620621	1.195116	1.17751	1.18405
512	18126118	18456348	18326241	2.824653	2.774113	2.79380
1024	43031530	44281630	44131602	2.37965	2.312471	2.32033
Throughput time (MB/Sec)	21.97	21.457	21.5543

This paper explains how to encrypt enormous amounts of data on a distributed cloud platform using a map-reduce programming approach and proposed hybrid encryption Scheme (Twofish + AES). To implement the distributed proposed encryption scheme based on (Twofish + AES) algorithm, the Hadoop distributed computing platform must first be constructed. Then, the MapReduce algorithm and the proposed encryption technique must be merged. We use Twofish and AES as symmetric block cipher algorithms for data security and speed features. Twofish is used for providing internet data security and AES is used for providing quick and fast encryption and decryption process of large data sets. Second, the effectiveness of the proposed encryption scheme with a map-reduce framework is examined and the viability of the encryption technique is checked by encrypting various plaintext datasets. This paper investigates whether the proposed encryption is feasible or not. Also, this proposed scheme is compared with other conventional hybrid encryption methods Twofish + RSA and AES + Twofiosh. The result shows that this scheme improves encryption efficiency ≅ 1–2% (kb/nsec), speed of encryption ≅ 2–3%, and throughput ≅ 1–2% (megabytes/sec). Subsequent studies might be implemented in the real-time project system to finish the vast data distributed encryption.

This paper presents a novel approach to enhance the security of HDFS data in Big Data environments through the implementation of a hybrid encryption scheme combining Twofish and AES algorithms. The proposed method utilizes MapReduce for parallel encryption, aiming to mitigate the processing overhead associated with conventional encryption techniques. Experimental results demonstrate the efficacy of the proposed approach in safeguarding sensitive data stored in HDFS, addressing the identified literature gap and contributing to the broader field of data security in Big Data environments.This paper presents a novel approach to enhance the security of HDFS data in Big Data environments through the implementation of a hybrid encryption scheme combining Twofish and AES algorithms. The proposed method utilizes MapReduce for parallel encryption, aiming to mitigate the processing overhead associated with conventional encryption techniques. Experimental results demonstrate the efficacy of the proposed approach in safeguarding sensitive data stored in HDFS, addressing the identified literature gap and contributing to the broader field of data security in Big Data environments.This paper presents a novel approach to enhance the security of HDFS data in Big Data environments through the implementation of a hybrid encryption scheme combining Twofish and AES algorithms. The proposed method utilizes MapReduce for parallel encryption, aiming to mitigate the processing overhead associated with conventional encryption techniques. Experimental results demonstrate the efficacy of the proposed approach in safeguarding sensitive data stored in HDFS, addressing the identified literature gap and contributing to the broader field of data security in Big Data environments.Top of Form

Author Contribution

all authors reviewed the manuscript

Bertino E, Ferrari E. Big data security and privacy. In: Studies in big data [Internet]. 2017. pp. 425–39. https://doi.org/10.1007/978-3-319-61893-7_25.
Warkentin M, Orgeron C. Using the security triad to assess blockchain technology in public sector applications. International Journal of Information Management [Internet]. 2020;52:102090. https://doi.org/10.1016/j.ijinfomgt.2020.102090.
Yang P, Xiong N, Ren J. Data Security and Privacy Protection for cloud Storage: a survey. IEEE Access [Internet]. 2020;8:131723–40. https://doi.org/10.1109/access.2020.3009876.
Narayanan A, Toubiana V, Barocas S, Nissenbaum H, Boneh D. A critical look at decentralized personal data architectures. arXiv (Cornell University) [Internet]. https://arxiv.org/abs/1202.4503.
White T, Hadoop. The Definitive Guide. O’Reilly Media, Inc.; 2012.
White T, Hadoop. The Definitive Guide: Storage and Analysis at Internet Scale. O’Reilly Media, Inc.; 2015.
Borthakur D The Hadoop Distributed File System: Architecture and Design, Access. Pp 1–14, 2007. 2007;1–14.
Filaly Y, Mendili FE, Berros N, Idrissi YEBE. Hybrid Encryption Algorithm for information Security in Hadoop. International Journal of Advanced Computer Science and Applications/International Journal of Advanced Computer Science & Applications [Internet]. 2023;14(6). https://doi.org/10.14569/ijacsa.2023.01406137.
Sunder A, Shabu N, Nair TR. Securing big data in Hadoop using hybrid encryption. In: Smart innovation, systems and technologies [Internet]. 2021. pp. 521–30. https://doi.org/10.1007/978-981-16-3675-2_39.
Viswanath G, Krishna PV. Hybrid encryption framework for securing big data storage in multi-cloud environment. Evolutionary Intelligence [Internet]. 2020;14(2):691–8. https://doi.org/10.1007/s12065-020-00404-w.
Negi K, Shrestha R, Borges TL, Sahana S, Das S. A hybrid cryptographic approach for secure Cloud-Based file storage [Internet]. 2023. https://doi.org/10.1109/globconet56651.2023.10150148.
Lai JF, Heng SH, Secure File Storage On Cloud Using Hybrid Cryptography. Journal of Informatics and Web Engineering [Internet]. 1(2):1–18. https://doi.org/10.33093/jiwe.2022.1.2.1.
Kumari N, Malhotra V. Secure cloud data storage using hybrid cryptography. International Journal for Research in Applied Science and Engineering Technology [Internet]. 2022;10(4):60–3. https://doi.org/10.22214/ijraset.2022.41081.
Chaudhari A. A survey on hybrid cryptography for secure file storage on the cloud. International Journal for Research in Applied Science and Engineering Technology [Internet]. 2023;11(6):2523–5. https://doi.org/10.22214/ijraset.2023.54089.
Jain P, Gyanchandani M, Khare N. Enhanced Secured Map Reduce layer for Big Data privacy and security. Journal of Big Data [Internet]. 2019;6(1). https://doi.org/10.1186/s40537-019-0193-4.
Gupta M, Dwivedi RK. Fortified MapReduce Layer: Elevating security and privacy in big data. ICST Transactions on Scalable Information Systems [Internet]. 2023; https://doi.org/10.4108/eetsis.3859.
Bangera S, Billava P, Naik S. A Hybrid Encryption Approach for Secured Authentication and Enhancement in Confidentiality of Data [Internet]. 2020. https://doi.org/10.1109/iccmc48092.2020.iccmc-000145.
Jintcharadze E, Iavich M. Hybrid Implementation of Twofish, AES, ElGamal and RSA Cryptosystems [Internet]. 2020 Sep 1. https://doi.org/10.1109/ewdts50664.2020.9224901.
Schneier B. Twofish Cryptanalysis Rumors. Schneier on Security Blog. 2005.
NIST announces Encryption Standard finalists [Internet]. NIST. 2017. https://www.nist.gov/news-events/news/1999/08/nist-announces-encryption-standard-finalists.
Menezes AJ, Van Oorschot PC, Vanstone SA, HANDBOOK of APPLIED CRYPTOGRAPHY [Internet]. 1996 Jun. https://theswissbay.ch/pdf/Gentoomen%20Library/Cryptography/Handbook%20of%20Applied%20Cryptography%20-%20Alfred%20J.%20Menezes.pdf.
Yan X, Zhu Z, Wu Q. Intelligent inversion method for pre-stack seismic big data based on MapReduce. Computers & Geosciences [Internet]. 2018;110:81–9. https://doi.org/10.1016/j.cageo.2017.10.002.
Rahim LAb, Kudiri KM, Bahattacharjee S. Framework for parallelisation on big data. PloS One [Internet]. 2019;14(5):e0214044. https://doi.org/10.1371/journal.pone.0214044.
Khan M, Jin Y, Li M, Xiang Y, Jiang C. Hadoop performance modeling for job estimation and resource provisioning. IEEE Transactions on Parallel and Distributed Systems [Internet]. 2016;27(2):441–54. https://doi.org/10.1109/tpds.2015.2405552.
Ma C, Zhao M, Zhao Y. An overview of Hadoop applications in transportation big data. Journal of Traffic and Transportation Engineering/Journal of Traffic and Transportation Engineering [Internet]. 2023;10(5):900–17. https://doi.org/10.1016/j.jtte.2023.05.003.
Taylor RC. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics [Internet]. 2010;11(S12). https://doi.org/10.1186/1471-2105-11-s12-s1.
Vohra D. Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. A; 2016.
Ahmed N, Barczak ALC, Susnjak T, Rashid MA. A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. Journal of Big Data [Internet]. 2020;7(1). https://doi.org/10.1186/s40537-020-00388-5.

No competing interests reported.

Download PDF

Reviews received at journal
19 Oct, 2024
Reviewers agreed at journal
22 Sep, 2024
Reviewers invited by journal
29 Jun, 2024
Editor assigned by journal
08 Jun, 2024
Submission checks completed at journal
20 May, 2024
First submitted to journal
19 May, 2024

You are reading this latest preprint version

Hybrid Encryption for Securing HDFS Data in Big Data Environments

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3.1. Twofish and AES Integration:

3.2. Hadoop Distributed Environment

3.3. Data Encryption Process

3.4. Key Management:

3.5. Data Decryption Process:

3.6. Security Mechanisms:

3.7. Pseudocode of Proposed Encryption Scheme

4. Result and Discussion

1. Encryption Speed of proposed encryption (Twofish + AES) versus Twofish + RSA, AES + Twofish:

2. The performance of the Encryption Process changes with the amount of plaintext:

3. Throughput time for encryption of proposed encryption versus standard scheme:

Conclusion

Declarations

Author Contribution

References

Additional Declarations

Status:

Version 1