Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. Building an accurate performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is critical to implement autonomic self-management big data systems. An accurate performance model is challenging because the allocation of preemptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation.


I. INTRODUCTION
There is a trend towards automatic configuring and managing thousands of database nodes to enable the selfmanagement feature for big data systems. The big data analytics jobs are often represented by Directed Acyclic Graph (DAG) workflows [1], [2], [20], [26], [27], [41]. A DAG of computational stages are built for parallel execution. For example, (1) Hive Query Language (HQL) is translated to the execution plan of MapReduce [10] jobs to be run in parallel, (2) the Spark [41] program and machine learning workloads are transformed to a DAG workflow for execution [2], [27], and (3) the Tez [1] framework allows for a complex DAG of tasks for processing data. Performance model and optimization for DAG workflows is critical to implement self-management big data systems.
For data parallel computing frameworks, a precise yet useful cost model often measures the job execution time (i.e., beyond simple cost like I/O) [16], [31]. The main challenge in building an execution time based cost model for a DAG workflow is the inherent complexity of system resource allocation for heterogeneous tasks in each stage. This allocation may vary among computational stages. This is caused by two main factors that may vary among different stages: (1) the degree of parallelism (i.e., the number of simultaneously running tasks in the cluster) for parallel jobs, and (2) system resource bottleneck. Given the cluster computing resource, the degree of parallelism is determined by the resource requirement (i.e., CPU cores and memory) of running jobs. The resource requirement of tasks may be changed from one stage to the next due to computation stage changes, which may lead to the change of the degree of parallelism for each job. Then, the bottleneck resource may be changed from one stage to the next stage. It finally leads to the variation of the allocation for preemptable system resource and task execution time accordingly.
We use a DAG of web site analytics [8] in MapReduce to illustrate the above challenge. As shown in Figure 1, The DAG has four jobs to process the event log of page views to report the metrics. Job 1 pre-aggregates the duration of each visit to generate records that contain the page, visiting IP and duration on the page. Job 2 counts the number of views for each page (i.e., Word Count like job). Job 3 sorts the pages by the duration of each visit (i.e., Sort like job). Finally, job 4 generates a report for the pages of min, median and max duration on each page. The DAG workflow is divided into 7 stages (states) 1 based on the start and end of Map/Reduce stages. The task execution plan shows why the execution time estimation is challenging because of the parallel execution of job 2 and 3. In the 3rd state, the map task time for job 2 is 27 seconds, bounded by CPU. In the 4th state, the system bottleneck becomes network I/O due to the shuffle operation for job 3. The map task time for job 2 is reduced from 27 seconds to 24 seconds because its CPU resource allocation is increased. In the 5th state, there are only two map tasks in the cluster. The map task time for job 2 is further reduced from 24 seconds to 20 seconds due to the released CPU resources from job 3. In summary, the execution time of map tasks of job 2 varies between the 3rd and 5th state due to the variation of system bottlenecks (i.e., CPU-bound, network-bound and none). It indicates that the execution time of the same task may vary from one stage to the other due to the variation of system resource allocation. Unfortunately, the previous cost models such as Starfish [16] and MRTuner [31] are not able to capture the variation of resource allocation among stages because the degree of parallelism is assumed to be unchanged for a single job.
In this paper, we study the cost models for a DAG workflow on data parallel frameworks (i.e., MapReduce). We use MapReduce programming paradigm because it is a wellknown framework in distributed computing, and the result is easy to be extended to other cluster-based distributed systems such as Spark and Tez, of which the key mechanisms for execution model, task distribution and fault-tolerance are similar. A starting point of our study is a thorough understanding of the system behavior for parallel jobs, by using a set of benchmarks. We have two findings to build cost models: (1) The task execution time variation is caused by the change of system resource bottleneck among computation stages. The cost model for parallel jobs should be able to handle bottleneck resource estimation. (2) For each computation stage, the resource allocation for each running job is steady. This property can be used to estimate the DAG execution plan break-down in an iterative manner.
We propose the Bottleneck Oriented Estimation (BOE) model to estimate the execution time at the task level. The model estimates the bottleneck resource and its allocation among tasks by predicting the cost of each type of tuple level operations (i.e., read, transfer, compute and write). The pipelined and blocked operations are modeled separately. The effective time of the identified bottleneck resource is derived as the execution time for the pipelined operations. The BOE model identifies the bottleneck resource and accurately estimates the task execution time variation (e.g., 27s, 24s and 20s for task m 2 among the stages in Figure 1).
Next, we use a state-based approach to integrate the tasklevel BOE model in holistic estimation for the execution plan of a DAG workflow. For each stage of a DAG workflow, we estimate the degree of parallelism for each job using the properties of schedulers and estimate the task-level execution time for each job through the BOE model. Then, we iteratively estimate the task execution plans for parallel jobs on each stage. The workflow level execution time (e.g., the DAG execution time from stage 1 to stage 7 in Figure 1) is estimated by this state-based iterative approach.
The key contributions of this paper are as follows.
• We study a set of workloads to thoroughly understand the system behavior for typical parallel DAG jobs. Our key insight includes (1) the key reason for task time variation during DAG execution is because of the change of the underlying resource bottlenecks, and (2) the resource allocation for parallel tasks is steady during each computation stage. This insight guides the design of the cost model. • We propose a BOE model to estimate task-level execution time for parallel jobs. To the best of our knowledge, this is the first general cost model that addresses the problem of preemptable resources allocation for parallel jobs. • We use a state-based approach to integrate the BOE model for a holistic execution plan estimation of a DAG workflow. This framework makes use of the property that the resource bottleneck is steady during a stage, and iteratively estimates the execution plan for parallel tasks from one stage to the next stage. • We conduct extensive experiments with hybrid analytics benchmarks (HiBench) and query benchmarks (TPC-H) to validate the cost models. The result shows that the BOE model can correctly identify resource bottlenecks. As a comparison, the BOE model outperforms existing MapReduce models including Starfish and MRTuner by a factor of five for task-level estimation. For the state-based approach, the average prediction error is under 3% when predicting the execution time of 51 hybrid analytics and query DAG workflows.
The rest of the paper is organized as follows. We provide the background and problem description in Section II. We propose the task-level execution model in Section III. In Section IV, we present the holistic cost model for a DAG workflow. The evaluation results are presented in Section V. We present the related work in Section VI and conclude in Section VII.

A. MapReduce
MapReduce is a data-parallel computing framework to execute user-defined map and reduce functions in parallel. A MapReduce job is divided into three stages: map, shuffle and reduce. Each stage has parallel tasks for execution.

Map:
The map stage reads input tuples to execute the user-defined map function, and writes the results (e.g., (k1, v1), (k1, v2)) to the local disk for a shuffle. In the case when the intermediate map output cannot fit into the memory, the framework uses external merge & sort for generating the map output. Users can choose to compress map output to trade CPU overhead for disk and network I/O reduction. To reserve memory for the user-defined reduce function, the reduce input is materialized on the disk before each tuple is sent to the reduce function for processing. Reduce: The reduce stage processes the value list for each key (e.g., (k1, List(V1, V2))), and writes the output to HDFS. By default, there are three replicas configured for the reduce output.

B. Resource Management and Job Scheduling
The job scheduler is responsible for assigning tasks based on the availability of system resources on nodes. To separate the task scheduling and resource management, Apache YARN [35] uses a resource manager to monitor and allocate resources for multiple jobs. The resource manager provides multi-dimensional fairness (e.g., Dominant Resource Fairness, DRF [14]).
In this paper, we follow the scheduling model in YARN for parallel tasks. For both single job and parallel job cases, the tasks in a computation stage are scheduled based on DRF.

C. DAG Workflow
Directed Acyclic Graph (DAG) based execution is popular for modern data analytics workloads. For example, (1) the physical execution plan for HiveQL is a DAG of MapReduce jobs [34]; (2) SystemML [3], [15] compiles DML (Declarative Machine learning Language) to a DAG of hybrid MapReduce jobs and control programs; (3) Spark [41] transforms the userdefined analytic program to a DAG workflow for parallel execution.
In this paper, we define the DAG workflow as follows. Definition 1: A DAG workflow is composed of a set of jobs connected through a DAG relationship G F (J, E), where J is the set of jobs that compose the workflow, and the arc (j m , j n ) ∈ E indicates that the start of j n depends on the completion of j m . Figure 2 presents an example of such a DAG workflow composed of 7 MapReduce jobs: (1) A job in the workflow is started if and only if all its parent jobs finish (e.g., j 6 has to wait for the completion of both j 3 and j 5 ), and (2) multiple jobs from the DAG can run simultaneously (e.g., j 2 , j 3 , and j 5 run in parallel).

D. Problem Definition
We formulate the problem of cost estimation for a DAG workflow as follows:

Fig. 3. The Task Execution Model
Problem 1: Given a DAG workflow G(J, E) with job profile J and topology dependency E, the objective is to estimate the execution time t(G, D, P, C) of G against data D with parameter sets P and cluster resources C.

III. TASK-LEVEL MODEL
In this section, we present a cost model, Bottleneck Oriented Estimation (BOE), for task-level execution time estimation.
A. BOE Model 1) Task Execution Model: We first model the fundamental behavior for task execution on data parallel computing frameworks. As shown in Figure 3, we break down a task into multiple sub-stages. For each sub-stage, the task is executed in a pipelined fashion from one tuple to the next tuple, which consists of a subset of operations including reading, transferring, computing, and writing. There is bulk synchronization at the end of each sub-stage which blocks all the tuples to be processed by the next sub-stage. The task execution model is general for data parallel computing frameworks that follow the functional programming model (e.g., MapReduce, Spark, and Tez).
This execution model distinguishes pipelined and blocked processing in the tuple level, which formalizes task-level execution plans to predict the allocation of preemptable resources for parallel tasks.
2) Resource Usage Model: Given the above task execution model, we use the resource usage model in [13] to make a uniformity assumption for resource usage behavior. For each sub-stage, since the subset of read, transfer, compute and write operations is executed in the pipeline from one tuple to the next tuple, the usage of preemptable resources is uniform during a sub-stage. We assume that disk and network are preemptable. CPU is preemptable when there is no free CPU core (e.g, the number of simultaneously running tasks is larger than that of CPU cores). Memory is not preemptable because it is managed by JVM. This resource usage model follows the execution model to distinguish pipelined and blocked processing in the tuple level and provides the hint to estimate resource utilization (i.e., effective time) for a bottleneck resource.
3) Bottleneck Oriented Estimation: Given the task execution model in Figure 3, we estimate the execution time t σ for a sub-stage of a task as follows, where Λ(·) estimates the non-overlapped time among t X to process tuples in the pipeline, and t X is the actual execution According to the resource usage model, the resources are uniform over the pipelined execution of tuples. Since the processing time for each tuple is very short, we omit the processing time for the first tuple and last tuple. Then, we have That is, We assume that the resource throughput for X is θ X . The resource usage for X is µ(∆) when X is fully utilized by tasks with ∆ parallelism. Thus we have where p X · µ X (∆) is the actual resource usage for X when it is not a bottleneck. D is the size of data to process. For the bottleneck resource X, we have p X = 1 and t X = D µ(∆)·θX . Otherwise, 0 ≤ p X < 1, and we have t X < D µ(∆)·θX . If there is at least one bottleneck resource, we have where B is the size of input for the task.
Example of the BOE Model: Figure 4 shows how the BOE model estimates task execution time. Suppose that there are 10 million records (with 100 bytes for each) to be processed on a node. For a task with one sub-stage, there are three pipelined operations including reading, network transferring and computing. The aggregated read throughput is 500 M B/s on the node. The network throughput on that node is 100 M B/s. For the task, the compute throughput using a CPU core is 50 M B/s. In Figure 4 (b), the degree of parallelism for tasks is increased to 5 on the same node. We assume that there are more than 5 cores on the node. The disk read throughput for each task is µ read (5) · θ read = 1 5 · 500 M B/s = 100 M B/s when the disk read resource is fully utilized. The network throughout for each task is µ transf er (5)  The example shows how the BOE model estimates the execution time for each task by identifying the bottleneck resources. The resource allocation is estimated with respect to the degree of parallelism.

IV. WORKFLOW LEVEL MODEL
In this section, we present the workflow level model to estimate the holistic execution plan of a DAG workflow.

A. The State-based Approach
The resource allocation for each job is steady during a stage that is divided by the map/reduce stages. We make use of this property to break down a DAG workflow into multiple stages and propose a state-based approach for a DAG workflow level estimation.
1) State division for a DAG workflow: We define the state (i.e., stage) s (s = 1, 2, · · · , S) for a DAG workflow based the map or reduce stage transition of its jobs. As shown in Figure 5, the workflow state is transited from 3 to 4 when job j 3 is transformed from the map stage to the reduce stage. During the execution of a stage for a DAG workflow, the degree of parallelism ∆ i for the running job i will not change. Consequently, the allocation of shared bottleneck resources (i.e., disk, HDFS, and/or network) is fixed for each running job during a stage of a DAG workflow. This property provides the foundation to estimate the allocation of shared resources among running jobs of a DAG workflow.

Algorithm 1 State-based Cost Estimation for a DAG workflow
Add new jobs to job queue Q from G

6:
while job end f lag = 0 do 7: Estimate ∆i for each job i ∈ Q

8:
for each job i ∈ Q do 9: Estimate t task (i, s) using BOE model

21: Return t dag
2) Cost Estimation for a DAG workflow: Algorithm 1 presents the algorithm of the state-based approach to iteratively estimate the execution time for a DAG workflow. Given a DAG workflow, we estimate the state transition sequence 1 → 2 · · · → S by iteratively estimating the duration for each workflow stage. For each iteration, we estimate the stage duration as follows: (1) estimate the degree of parallelism ∆ i for each running job i; (2) identify the bottleneck resource and task execution time for each running job using the BOE model; (3) estimate the rest of the execution time of the current stage for all the running jobs; (4) find the job with minimum stage duration time; (5) update the progress for other running jobs. Therefore, given a DAG workflow G, input data D, cluster resources C, and historical profile P , we estimate the execution time for the DAG workflow by estimating the duration of stages: t dag = S s=1 t stage (s).
As the example shown in Figure 5, when job j 1 completes (i.e., the DAG workflow enters the stage 3), we estimate the degree of parallelism ∆ 3 , ∆ 4 , and ∆ 5 , for j 2 , j 3 , and j 4 , respectively. Next, we estimate the task execution time for each job, and estimate the state duration time t stage (3), and update the state and progress for each running job. Finally, we enter the state 4.

V. EVALUATION
In this section, we conduct a set of experiments to evaluate the proposed cost models using a variety of representative DAG workflows. First, we evaluate the effectiveness of the BOE model in comparison with existing models including Starfish [16] and MRTuner [31], for both single job and multiple jobs. Second, we evaluate the state-based approach for the execution plan estimation of DAG workflows. Finally, we evaluate the latency overhead for the estimation, which validates the cost model application scenarios like DAG workflow auto-tuning.

A. Experimental Setup
The Hadoop clusters are deployed on identical hardware, with a total of eleven servers. Each node has 6 physical CPU cores at 2.4 GHz, 2 disk drives at 7.2k RPM with 500 GB each, and 32 GB of physical memory. Nodes are connected using a 1 Gbps Ethernet switch.
We define a set of representative workloads for the experimental evaluation. As shown in Table I, we use Word Count and TeraSort for micro-benchmarks. We use PageRank for graph analysis and Kmeans for machine learning, both from HiBench [19]. The query workload is selected from TPC-H 2 . C represents the compression is enabled or not. R denotes the number of replicas. The hybrid workload means to run two jobs/queries in parallel. For WC and TS, we use 100 GB input. We use the huge data set for Kmeans and PageRank in HiBench. For TPC-H, we generate 80 GB input for 8 input tables.

B. BOE Model
We evaluate the effect of the task-level BOE model. We use the best cases of Starfish [16] and MRTuner [31] as the baseline. That is the ground truth execution time when the degree of parallelism is equal to that in the profiling stage. We use the median execution time of tasks as the ground truth in all the evaluations.
1) Single Job: Figure 6 (a), (b), and (c) present the WC evaluation result for the map, shuffle, and reduce stages, respectively. The average accuracy for the execution time estimation is 95.2%, 82.3%, and 85.1% for the BOE model. When the degree of parallelism is 12, the BOE model outperforms the baseline by a factor of 6.6x, 4.3x, and 4.1x for the map, shuffle and reduce stages, respectively. For the map stage, there are enough idle CPU cores when the degree of parallelism is less than 6. When the degree of parallelism is higher than 6, the job becomes CPU-bound due to the saturated computing resource.  2) Multiple Jobs: We evaluate the task level BOE model for parallel jobs. The DAG workflow has two parallel jobs. Table II presents the accuracy of the task level model for parallel jobs including WC, TS, and TS3R, running simultaneously.
For WC and TS run in parallel, the average accuracy is 99.7% and 88.7% for states 1 and 2, which consist of parallel jobs. For state 1, the BOE model identified the CPU bottleneck. When the workflow enters state 2, the BOE model identifies bottlenecks for the TS reduce stage, which is network-bound for the shuffle and disk-bound for HDFS write (with 1 replica). For states 3 and 4, we skip detailed evaluation since it is covered by single job models in Section V-B1.
For WC and TS3R run in parallel, the average accuracy is 99.9% and 96.3% for states 1 and 2, which consists of parallel jobs. For state 1, the behavior is the same as the previous DAG (WC+TS). For state 2, the reduce stage of TS is networkbounded due to HDFS write (with 3 replicas). For the shuffle stage, the execution time for TS is reduced by a factor of 2 in comparison with the single job case. This is because the number of parallel tasks to use the bottleneck resource (i.e., network) is reduced by a factor of 2 for the state 2.
Consequently, for parallel jobs, the BOE model can identify the bottleneck resource and its allocation for each job, and hence to estimate the execution time for each task.

C. State-based Approach
We evaluate the effectiveness of the state-based estimation framework for DAG workflow cost estimation. To eliminate the error of task-level models, we use task execution time profiles with the identical degree of parallelism for each stage. For the TPC-H workload, we also count the compilation time for each query in estimation. We run both micro-benchmarks (WC or TS) and query/analytics DAGs (TPC-H or HiBench) in parallel to cover real workloads. Besides the end-to-end execution time for DAG workflows, we present the average accuracy of the estimated execution time for each stage (denoted as Stage Break-downs). This metric provides a breakdown for the estimation and evaluates the accuracy of the statebased approach for each stage. Overall DAG Results: First, we present the accuracy for DAG execution time. The average accuracy of 51 workflows is 93.50%, 95.00%, 96.38% for median, mean and normal distribution respectively.
The last 7 columns of the third row group in Table III present the result for the state-based approach using analytics DAG workflows. Overall, the minimal accuracy of end-to-end execution time estimation is more than 81.13% for all the workflows by Algorithm 1.
The first 2 row groups and the first 10 columns of the third row group in Table III present the overall DAG estimation accuracy for hybrid HiBench and TPC-H workload. Overall, the estimation accuracy on average of 22 WC+TPC-H workflows is 94.62%, 96.58%, 97.42% for the median, mean and normal distribution, respectively. The prediction accuracy on average of these 22 workflows is 92.94%, 93.67%, and 96.21% for the median, mean and normal distribution, respectively. The result indicates that our state-based approach can handle various workloads from short to long. Some of the queries have many jobs. For example, Q21 has 9 MapReduce jobs, which leads to 18 stages when it is run in parallel with the WC job. Execution time: Finally, we evaluate the execution time of the state-based approach for each DAG workflow used in the above evaluation. The result indicates that the overhead for computing the cost models is less than 1 second for all the DAG workflows. This means that the cost model is suitable to be used in runtime optimizations such as query re-writing and self-tuning for DAG workflows.

VI. RELATED WORK
MapReduce Cost Models: The cost models for MapReduce are studied since the bottleneck for data parallel computing framework are different compared to traditional database systems. The cost models for single MapReduce jobs are used  TABLE III  ESTIMATION ACCURACY FOR DAGS   TS-Q1  TS-Q2  TS-Q3  TS-Q4  TS-Q5  TS-Q6  TS-Q7  TS-Q8  TS-Q9  TS-Q10  TS-Q11  TS-Q12  TS-Q13  TS-Q14  TS-Q15  TS-Q16  TS- to tune MapReduce configurations [11], [16], [21], [22], [31], [37], [38]. These works proposed the general cost-based estimation framework for MapReduce. The authors use queuing theory to predict key performance indicators (e.g., task waiting time and blocking probability) of MapReduce jobs in [30]. However, these cost models are for single MapReduce jobs and do not consider the resource allocation variance with respect to the degree of parallelism. Thus these cost models have the limitation in terms of resource estimation for parallel MapReduce jobs, which is the main focus of this paper. An analytical cost model is used as the fundamental building block to optimize resource configuration for SystemML programs [18]. In contrast to our work, this cost model is specifically for SystemML resource configuration and does not consider the general problem for resource contention among MapReduce tasks. Ernest [36] is a performance prediction model that collects as few training points as required by using a statistical technique (i.e., optimal experiment design). Like Starfish and MRTuner, Ernest also focuses on single jobs rather than DAGs with parallel jobs. The machine learning based prediction model is proposed in [32] to estimate job execution time for Spark. However, the identified features do not consider the impact of parallelism on system bottleneck. Thus the model does not fit for the multiple job scenario.
Query Optimizers: Prior to MapReduce, cost models are widely used in query optimizers in relational database systems. The cost estimation for relational queries is widely studied for a parallel database [6]. There do exist interesting works on the resource usage model for parallel queries such as join [13], which take the impact of resource contention into account for the cost estimation. However, the analytical model for MapReduce is different due to different task execution frameworks. Resource Bricolage is proposed for parallel query optimization in a heterogeneous cluster [24]. This approach quantifies the performance differences among machines with various resources by profiling workloads. Our problem differs from theirs as we aim to model the resource usage for parallel MapReduce tasks rather than parallel queries. A MapReduce cost model is proposed in [40] to estimate I/O and CPU costs. Since the model is specially designed for query optimizers, accurate running time is not estimated. The cost model in [39] is designed for multi-query optimization. However, it only models the disk and network I/O costs since these are the bottleneck in its problem.
DAG Workflow: DAG workflow is a natural representation for high level query in data parallel frameworks. Stubby is a transformation-based Optimizer for MapReduce Workflows [26]. It uses the What-if Engine building block of Starfish for the cost estimation [16]. However, the resource statistics are assumed to be the same between the profiling and estimation stages, and hence it does not address the preemptable resource issue for parallel jobs. ParaTimer is a Progress Indicator for MapReduce DAGs [28], which estimates the critical path for parallel jobs of a DAG workflow. However, ParaTimer does not consider resource contention among parallel tasks. The authors experimentally demonstrate the impact of the degree of parallelism on the execution time of DAGs in [33]. However, this work focuses on DAG-level, and it does not address the task-level cost models. The work in [25] estimates the execution time of DAGs in tuple-level for distributed streams. However, this work uses regression algorithms for the prediction, and the accuracy relies on the quality of the sample space. Distributed and Parallel Computing: There are previous works to estimate the execution time for distributed and parallel computing frameworks. Bandwidth-latency models such as LogP model [9] and the BSP model [7] models are proposed to estimate latency and throughput for parallel computing systems. These models are not suitable for the MapReduce framework because MapReduce does not rely on a messaging-based asynchronous communication system. The work [29] measures the job completion time for a best-case scenario without blocking on network or disk use, by using finer-grained instrumentation to Spark compute thread. They find that the upper bound on the improvement from optimizing disk and network performance is limited. This work is crossvalidation of the idea that tasks are pipelined executed using multiple resources, and it focuses on execution analysis rather than the cost models to estimate the task execution time based on data, system and job profiles. Jockey [12] uses a simulationbased approach to predict job completion time for SCOPE [5]. While the prediction framework is similar to that in this paper, it does not take the skewness into account. Task completion time estimation is proposed in [4] for scheduling. However, the prediction is coarse-grained without estimating the accurate running time.
VII. CONCLUSION In this paper, we proposed the BOE model to predict the allocation of preemptable system resources for task-level execution time estimation. Based on the insights of resource allocation among stages, the state-based iterative approach was proposed for workflow-level execution plan estimation.
Our experimental evaluation showed that the BOE model can automatically identify the bottleneck resource for each stage. We performed comprehensive experiments to show that our new cost model outperforms existing models by a factor of five for task execution time estimation. For the skew-aware state-based approach to estimate the execution time of a DAG workflow, the average prediction error is under 3%.
As the follow-up research, we will study the impact of skewness in cost estimation and apply our cost models in automatic tuning for DAG workflows.