Improving Hadoop Performance through scheduling and prefetching

doi:10.21203/rs.3.rs-1570462/v1

Download PDF

Research Article

Improving Hadoop Performance through scheduling and prefetching

https://doi.org/10.21203/rs.3.rs-1570462/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In this paper, we continue investigating how to improve Hadoop performance , we will investigate further strategies, tackling the data locality with respect to data access patterns, cluster memory and effective scheduling. In a Hadoop cluster, users are generally accessing data according to their business needs, which make some data more accessed than others, reason why we consider data patterns as a crucial element in our approach. In fact, the performance can be affected by the latency of data access, which is best when blocks are already in memory while requested for processing, knowing that caching too much can lead to memory overhead or even severe delays when wrong data are always taking memory space. Actually, clusters are nowadays offering more capabilities in term of nodes memory, which is considered in most cases as highly underutilized according to several studies. In our approach we considered the cluster memory to improve efficiency by using the underutilized memory space in a way to offer better data locality for future tasks, through a new prefetching/scheduling algorithm. The idea behind our approach is to differentiate blocks by popularity based on the previous access patterns, then proceed by memory allocation or memory eviction in optimal and efficient way across all the cluster.

Big Data

Hadoop Distributed file system

distributed computing

Scheduling

Prefetching

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Improving Hadoop Performance through scheduling and prefetching

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1