We propose two individualized precise scheduling strategies to support efficient consolidation. When a new program needs to be scheduled, the schedulers find an appropriate node for the program according to the future knowledge of programs. The future knowledge of these scheduling strategies are all from the label of the novel index system. The first scheduling strategy takes the resource sensitivity hyper-plane as the future knowledge, and we call it Sche-index2 for short; The second scheduling strategy takes the sensitivity curve of five dimensions as future knowledge, and we call it Sche-utility for short. At last, the program can be mapped to the most appropriate node for running. Both methods can improve program performance, resource utilization, and throughput.
4.1. Scheduling Strategy Based on Resource Sensitivity Hyper-plane
Sche-index2 should not only guarantee the performance of the mapped program, but also guarantee the performance of the programs already running on servers. With this goal, Sche-index2 should consist of three components below:
1) Profiler. The mission of the profiler is to collect the information of idle resource on five dimensions, and report them to the predictor periodically. By making use of common profiling tools such as collectl, the resource usage of CPU, Disk r/w bandwidth, memory and network bandwidth can be easily collected, the idle resource can be calculated by the overall resource minor the resource has been used.
2) Predictor. The primary objective of the predictor is to predict the program performance if the program was mapped onto the server. Meanwhile, it has to predict the performance of programs already running on the server. For the new arrival program, the performance can be calculated by Eq. 1, Serveridle[s][C], Serveridle[s][Dr], Serveridle[s][Dw], Serveridle[s][M], Serveridle[s][N] are the idle resource of CPU, Disk r/w bandwidth, memory and network bandwidth, and they are collected by profiler. resource sensitivity is a function to calculate the program performance under the assumption of these idle resources. Under the novel index system, the sensitivity labeled to the program is a sensitivity hyper-plane on five resource dimensions. The mechanism of the resource sensitivity function is to search the nearest point to the idle resource and take the sensitivity value of the point as Prog_perf. Prog_perf is the predicted performance. For the programs already running on the server, the performance can also be calculated by Eq. 1, but the idle resource is the overall resource of the server minor the resource usage of other programs (including the new arrival program).
(1)
3) Scheduler. The scheduler maps programs to an appropriate server. If the performance value of the program in server s predicted is lower than a threshold, the server s is dropped and the next server is to be checked; If the performance value of all programs in server s predicted is bigger than the threshold, the server s is the object server. Then the scheduler maps the new arrival program to server s.
The pseudo-code of this scheduling strategy is shown in Algorithm 1. For a program to be scheduled, the server selection process is carried out iteratively over each server. For the first checked server s, the available resource is collected, and the sensitivity of a program on the checked server is obtained by accessing the 5-dimensional array in the program label using the available resource levels of server s as the indices. If the sensitivity value is bigger than a threshold, then check the sensitivity of other programs running on server s. If the sensitivity value of all the programs running on server s is bigger than a threshold, then take server s as the target server, and then the program is scheduled to that server for execution.
Algorithm 1. Mapping programs based on resource sensitivity hyper-plane
|
1: //The resource label is a two-tuple (resource usage, resource sensitivity)
|
2: //Program resource sensitivity id 5-dimensional array (defined as in previous paper) with each element a sensitivity value at a specific resources combination
|
3: //Threshold_sens is the threshold of sensitivity
|
4: //Available_resource is a 5-element vector keeping the available resource of the current server
|
5: for s=1 to max_server do
|
6: Available_resource=the available resources on server[s];
|
7: Prog_sens=resource sensitivity [Serveridle[s][C], Serverid-le[s][Dr], Serveridle[s][Dw],Serveridle[s][M],Serveridle[s][N]];
|
8: if Prog_sens>Threshold_sens then
|
9: find all the programs running on server s and calculate the resource sensitivity of these programs
|
10: if all the sensitivity value > Threshold_sens then
|
11: take server s as the object server;
|
12: update the idle resource of server s;
|
13: end if
|
14: end if
|
15: break;
|
16: end for
|
Actually, for different workloads, they have their own Equation 1. We had tried to synthetic the resource_sensitivity into a formula, but at last we gave up because of the relatively big inaccuracy. We used the following method to get the data of Prog_perf: we had a database about the program’s resource sensitivity hyper-plane, the sensitivity data from the database were from our multiple experiment record. when we got the Available_resource of C, Dr, Dw, M and N, we found the nearest point of the Available_resource in the database. Then the sensitivity data could be used as Prog_perf.
4.2. Scheduling Strategy Based on multi-dimension utility combination
We choose the multi-dimension utility combination to realize our second scheduling strategy because they have very similar characteristics. Every resource dimension has an impact on the program’s performance, and the whole resource dimensions have an impact on the program’s performance. This fits perfectly with the multi-dimension utility combination. However, there is indeed an inaccuracy between the sub-utility combination with the utility. Sche-utility is consist of three components:
1) Profiler. the profiler of the Sche-utility is the same as the profiler of the Sche-index2.
2) Predictor. Sche-utility uses multi-dimension utility combination to predict the program performance if the program was mapped onto the server. For the new arrival program, the performance can be calculated by Eq. 2, UC, UDr, UDw, UM, UN are the sub-utility function of five resources, and u is the utility value of these available resources, it is also the performance value predicted by multi-dimension utility combination.
(2)
The formula is valid in theory because it directly makes use of the multidimensional utility merge model. The performance of the program on one resource dimension can be taken as a sub-utility, then there are five sub-utility. In order to describe the utility of program’s performance as a whole, we combine sub-utility to the utility, then the utility is the performance value of the program with the five-resource restriction.
3) Scheduler. The scheduler of the Sche-utility is the same as the scheduler of the Sche-index2.
The pseudo-code of this scheduling strategy is shown in Algorithm 2. For the first checked server s, the available resource is collected, by which the sub utility value of CPU, disk read bandwidth, disk write bandwidth, memory and network bandwidth can be calculated by the sub utility function. The utility value is the product of the five sub utility values. If the utility value is bigger than the utility threshold, then take server s as the target server.
Algorithm 2. Mapping programs based on multi-dimension utility
|
1: //Threshold_utility is the threshold of utility value
|
2: //Available_resource is a 5-element vector keeping the available resource of the current server
|
3: for s=1 to max_server do
|
4: Available_resource=the available resources on server[s];
|
5: utility[s]=1;
|
6: for i=1 to 5 do
|
7: sub_utility[i]=sub-utility-funi [Available_resource];
|
8: utility[s]=utility[s]*sub_utility[i];
|
9: end for
|
10: if utility[s]>Threshold_utility then
|
11: take server s as the object server;
|
12: update the idle resource of server s;
|
13: end if
|
14: break;
|
15: end for
|
The advantage of these two scheduling strategies is that they are more intelligent than the common least load scheduling strategies. In section 3, we delved into the resource sensitivity of the program and got four observations. The two scheduling strategies proposed in this section take full use of the program’s resources sensitivity. That is, the two scheduling strategies use the conclusions from the previous observations to allocate resources. So, the two scheduling strategies will get better performance in theory. Meanwhile, Sche-index2 takes the program’s resource sensitivity hyper-plane into consideration and Sche-utility takes the impact of every resource into consideration. Because there is an inaccuracy between the sub-utility combination with the utility, the performance of Sche-utility wouldn’t catch up with Sche-index2 in theory.