To conduct the performance comparison, the search, insert and delete algorithms on Red-Black trees and the new delete algorithm were implemented in Delphi. We focused our simulation mainly on the delete process. The two delete algorithms were exposed to the same scenario. We based our simulation essentially on the following deletion pattern: both random insertions and deletions are performed on an existing tree. This model scenario seems a typical case for the comparative analysis of data structures. In our simulation work we start with Red-Black trees built with N random integer values. Tree nodes hold integers in the range [0, 999 999 999].
We applied M insert/delete operations randomly generated on an existing Red-Black tree with N items. N and M are large numbers. For our purpose, we considered N close to M. There are exactly the same number of insert and delete operations. An insertion adds a new random item into the tree. A deletion removes an existing item from the tree.
On the positive side, the proposed delete algorithm reduces substantially color changes. It also significantly reduces maintenance operations. Furthermore, the number of rotations slightly decreases. So, we observed color changes, maintenance operations and rotations.
On the negative side, the proposed change to the delete algorithm adds three tests in order to reduce color changes. So, we separately showed that the added tests improve running time of the standard delete algorithm.
We also studied the impact of the new tree coloring on the insert algorithm. To prove that the new algorithm does not hurt search performance, a study is also made on the tree heights produced by the two delete algorithms.
Finally, to confirm that the new delete algorithm outperforms the standard one, the running time is computed when both insert and delete operations are performed on an existing tree.
In our simulations, we considered the following experiment (EXP 1):
1. Build a Red-Black tree with N random integer values.
2. Build a random sequence of M/2 insert and M/2 delete operations.
3. (a) Run the same sequence of operations separately (with RB and RB + ) on the same tree to insert a new item in the tree or to remove an existing item randomly chosen from the tree.
(b) - Compute the number of color changes, the number of operation maintenances and the number of rotations made by each delete algorithm.
- Compute α 1 , α2 and β as defined in the previous section.
- Compute the number of color changes, the number of operation maintenances and the number of rotations made by the insert algorithm when each delete algorithm is used.
- Compute the overall time when each delete algorithm is used.
- Compute the average number of comparisons to find an existing item in the tree.
4. Repeat 1 – 3 three times for N= 100 000 to 1000 000 by step of 100 000 nodes.
In the random sequence, the elements to insert or to delete are known in step 2.
In all the tables below:
-
Column "Tree size" denotes the exact sizes of Red-Black trees initially generated.
-
Column "Oper" denotes the exact number of insert operations as well as the one of delete operations.
-
Values denote the average values of three tests.
-
The last lines of the tables show the average values of the corresponding columns.
In the entire section, the expression "Delete/insert X items" means delete X items and insert X items in any order. We organize this section as follows:
We first give the main contributions of the new delete algorithm. Then, we perform a deep analysis of the new codes for cases 2 and 3. Finally, we make a synthesis both on our simulation works shown with tables and on our work not shown due to space reasons. In particular, coloring change consequences on the other operations and characteristics will be considered as the impact of the new delete algorithm on the insert and search algorithm and on the overall effects.
4.1 Main contributions
The main contributions of the new delete algorithm reside at several levels. As we outlined it before, the new delete algorithm substantially reduces color changes. It also reduces maintenance operations and increases the delete operations that do not require any maintenance operation. Moreover, the Red-Black tree resulting of the new coloring decreases slightly the number of rotations. All these contributions are the main cause of the improvement of the new delete algorithm compared to the standard delete algorithm.
Color changes
Table 2 shows in columns "RB+" and "RB" the numbers of color changes performed by RB+ and RB respectively.
Column "(RB - RB+) /RB" gives the gain in percentage. The gain is very important since the average percentage of gain is 28.93% (Last column in Table 2). To delete/insert 250 082 items in a tree with 499 909 nodes (the underlined row in Table 2), RB performs 339 475 color changes while RB+ performs 241 334. In other words, the new algorithm saves 98 141 color changes. This confirms well our analysis done in section 3.
Table2: Color changes – RB versus RB +
Tree Size
|
Oper
|
RB
|
RB+
|
(RB - RB+) /RB
|
99996
|
50160
|
68450
|
48643
|
28.94%
|
199986
|
100102
|
136119
|
96689
|
28.97%
|
299966
|
149845
|
202707
|
144164
|
28.88%
|
399943
|
200059
|
271256
|
192620
|
28.99%
|
499909
|
250082
|
339475
|
241334
|
28.91%
|
599858
|
300204
|
407730
|
289863
|
28.91%
|
699807
|
349884
|
474647
|
337414
|
28.91%
|
799737
|
399746
|
541993
|
385255
|
28.92%
|
899686
|
449964
|
610793
|
433936
|
28.96%
|
999613
|
499404
|
677450
|
481523
|
28.92%
|
|
|
|
|
28.93%
|
Maintenance operations
Table 3 shows in columns "RB+" and "RB" the numbers of maintenance operations completed by applying RB+ and RB respectively. Column "(RB - RB+) /RB" gives the gain in percentage. The number of maintenance operations is reduced by a factor of about 10.98% on average. To delete/insert 250 082 items in a tree with 499 909 nodes (the underlined row in Table 3), RB performs 129 923 maintenance operations while RB+ performs 115 688. In other words, the new delete algorithm saves 14 235 maintenance operations.
Table3: Maintenance operations - RB versus RB +
Tree Size
|
Oper
|
RB
|
RB+
|
(RB-RB+)/RB
|
99996
|
50160
|
26248
|
23342
|
11.07%
|
199986
|
100102
|
52111
|
46376
|
11.01%
|
299966
|
149845
|
77483
|
68987
|
10.96%
|
399943
|
200059
|
103733
|
92326
|
11.00%
|
499909
|
250082
|
129923
|
115688
|
10.96%
|
599858
|
300204
|
156124
|
139015
|
10.96%
|
699807
|
349884
|
181636
|
161655
|
11.00%
|
799737
|
399746
|
207263
|
184543
|
10.96%
|
899686
|
449964
|
233671
|
208017
|
10.98%
|
999613
|
499404
|
259188
|
230813
|
10.95%
|
|
|
|
|
10.98%
|
Table 4 shows in columns "RB+" and "RB" the numbers of delete operations performed without using any maintenance operation by RB and RB+ respectively. Column "(RB - RB+) /RB" gives the gain in percentage. The number of delete operations performed without maintenance operations is reduced by a factor of about 6.07 % on average. To delete/insert 250 082 items in a tree with 499 909 nodes (the underlined row in Table 4), RB performs 158 929 delete operations without using maintenance operations while RB+ performs 169 137. In other words, with the new delete algorithm, 10 208 additional delete operations will not require any maintenance operation.
Table4: Delete operations without maintenance operations - RB versus RB+
Tree size
|
Oper
|
RB
|
RB+
|
(RB-RB+)/RB
|
99996
|
50160
|
31784
|
33874
|
6.17%
|
199986
|
100102
|
63537
|
67659
|
6.09%
|
299966
|
149845
|
95337
|
101461
|
6.04%
|
399943
|
200059
|
127291
|
135488
|
6.05%
|
499909
|
250082
|
158929
|
169137
|
6.04%
|
599858
|
300204
|
190686
|
203011
|
6.07%
|
699807
|
349884
|
222315
|
236707
|
6.08%
|
799737
|
399746
|
254180
|
270536
|
6.05%
|
899686
|
449964
|
285837
|
304316
|
6.07%
|
999613
|
499404
|
317284
|
337685
|
6.04%
|
|
|
|
|
6.07%
|
Rotations
Having a tree differently colored will certainly affect the number of rotations to be made. Table 5 shows in columns "RB+" and "RB" the numbers of rotations performed by RB+ and RB respectively.
Table5: Rotations – RB versus RB +
Tree size
|
Oper
|
RB
|
RB+
|
(RB-RB+)/RB
|
99996
|
50160
|
19879
|
19688
|
0.96%
|
199986
|
100102
|
39563
|
39180
|
0.97%
|
299966
|
149845
|
58854
|
58260
|
1.01%
|
399943
|
200059
|
78719
|
78000
|
0.91%
|
499909
|
250082
|
98709
|
97742
|
0.98%
|
599858
|
300204
|
118612
|
117453
|
0.98%
|
699807
|
349884
|
137686
|
136309
|
1.00%
|
799737
|
399746
|
157220
|
155706
|
0.96%
|
899686
|
449964
|
177549
|
175869
|
0.95%
|
999613
|
499404
|
196938
|
195011
|
0.98%
|
|
|
|
|
0.97%
|
Column "(RB – RB+)/RB" gives this gain in percentage. Although the gain is minor, since it is about 0.97% on average, the proposed algorithm reduces the number of rotations.
4.2 Deep analysis on the new codes
To properly compare in more details the two delete algorithms, we conducted statistical work on the various cases of the deletion algorithm that are applied when items are removed from an existing tree. We first show that both cases 2 and 3 cover an important percentage of total cases. Then, we find the satisfiability rates of the added tests: α1, α2 and β as defined in the previous section. Finally, we present a simulation program showing that the new proposed codes (Figure 5, right side) run faster than the codes of the standard algorithm (Figure 5, left side).
Distribution of maintenance operations
We studied for RB and RB+ how the maintenance operations are distributed over the four cases to determine the corresponding percentages in relation to the total number of cases.
For the standard delete algorithm, when items are randomly removed from a Red-black tree, the distribution of maintenance operations is, on average, as follows: cases 1, 2, 3 and 4 occur with 40.71%, 28.00%, 16.61% and 14.69% of total cases respectively. Note that both cases 2 and 3 cover 44.61%. For the new delete algorithm, the distribution is, on average, as follows: 33.94%, 32.16%, 18.39% and 15.51%. Thus, cases 2 and 3 together cover 50.55%.
Recall that in case 2, the standard delete algorithm performs one rotation always followed by three color changes while the new delete algorithm performs one rotation possibly followed by one or three color changes. As case 2 is more frequent with the new algorithm (32.16% on average versus 28.00% in the standard algorithm), so there are far fewer color changes.
Likewise, for case 3, the standard delete algorithm performs two rotations always followed by two color changes while the new delete algorithm also performs two rotations possibly followed by one or two color changes. As case 3 is more frequent with the new algorithm (18.39% on average versus 16.61% in the standard algorithm), so there are less color changes.
Finding α1, α2 and β
To determine α1 and α2, it suffices to find the distribution of sub cases inside case 2 of RB+.
We determined by simulation:
- Nb_case2: the total number of cases 2 performed when items are inserted into or deleted from a Red-Black tree previously built.
- Nb_Case2_a: the number of times the first test of case 2 is true (see Figure 5). It is also the number of times where only one color change is performed.
-NB_Case2_b: the number of times the first test of case 2 is false but the second test of case 2 is true, i.e. the number of times where three color changes are performed.
"α1" and "α2" are then the ratios (Nb_Case 2_a / Nb_Case2) and (Nb_Case2_b /Nb_ Case2) respectively.
Our simulation results give α1 = 29.79% and α2 = 29.31%. This means that, on average, for 29.79% of case 2, only one color change is performed per operation and for 29.31% of case 2 three color changes are performed per operation. This implies that for about 40.9% (100% – (29.79% + 29.31%)) of case 2, no color change is made.
As an example from our simulation tests, to delete/insert 250 082 items in a tree with 499 909 nodes, Case 2 occurs 37 156 times. For 11 084 cases, only one color change is performed. For 10 853 cases, three color changes are performed. Thus, in total 43 643 (11 084 + 10 853 X 3) color changes are performed. For the rest, i.e. 15 219 (37 156 – (11 084 + 10 853)), no color change is made. Another way to estimate the number of color changes is to apply formula C2 (α1 + 3 α2) (see Table 1) where C2 is the number of occurrences of case 2. Indeed, 37 156 X (29.79% + 3 X 29.31%) gives 43 740 color changes, which is close to 43 643.
Similarly, to determine β it suffices to find the distribution of sub cases inside the case 3 of RB+.
We also determined by simulation:
- Nb_Case3: the total number of cases 3 achieved when items are inserted into or deleted from a Red-black tree previously built.
- Nb_Case3_a: the number of times the unique test in case 3 is true (see Figure 5). It is also the number of times where two color changes are performed.
"β" is then the ratio (Nb_Case3_a / Nb_Case3).
Our simulation results give β = 73.84%. This means that for 73.84% of case 3 on average two color changes are performed per operation. This implies that for about 26.16% (100%-73.84%) of case 3, only one color change is made.
Likewise, as an example from our simulation tests to find, to delete/insert 250 082 items in a tree with 499 909 nodes, case 3 occurs 21 341 times. For 15 770 cases, two color changes are performed. For the rest, i.e. 5 571 (21 341 – 15 770), only one color change is made. The total number of color changes is then 37 111 (2 X 15 770 + 5 571). This can also be estimated by formula C3 (1 + β) (see Table 1). Indeed, 21 341 X (1+ 0.7384) gives 37 099 which is near to 37 111. C3 being the number of occurrences of case 3.
RB codes versus RB + codes
Once values α1, α2 and β are known, we can now show that RB+ codes of cases 2 and 3 (see Figure 5) are faster than the corresponding RB codes. Indeed, this allows us to evaluate the compromise "Adding test -Reducing color changes" and the use of simple and read-dependent writes in the codes.
For case 2, we just need to apply the following experiment (EXP2) in which "RB code for case 2", "RB+ code for case 2", "RB code for case 3", "RB+ code for case 3" are described in Figure 5.
1. Create a binary search tree rooted at P with a left child X (supposed a doubly black node) and a right child S. Add also to node S a left child Ls and a right child Rs.
2. Run the code {Assign any color to P and Ls; RB code for case 2} C2 times.
3. Run the RB + code for case 2 M times but in three steps as follows:
(a) Run the code {Assign color Black to P and color Red to Ls; RB + code for case 2} α1 C2 times.
(b) Run the code {Assign color Red to both P and Ls; RB + code for case 2} α2 C2 times.
(c) Run the code {Assign color Red to P and color Black to Ls; RB + code for case 2} (1- α1- α2) C2 times.
4. Compute the time consumed by the RB and RB + codes for case 2.
5. Repeat 2 – 4 three times for C 2 = 10 000 000 to 100 000 000 by step of 10 000 000 nodes.
Likewise, for case 3, we use the same algorithm except that C3 replaces C2 and case 3 replaces case 2 and in 3 the three steps are replaced by the two following steps:
(a) Run the code {Assign color Red to P and any color to Ls; RB + code for case 3 } β C3 times
(b) Run the code {Assign color Black to P and any color to Ls; RB + code for case 3 } (1-β) C3 times
Note that color assignments before codes allow tests to be true or false.
RB code for case 2 versus RB+ code for case 2
Table 6 gives the running times in milliseconds consumed by the RB code for case 2 (Column "RB") and by the RB+ code for case 2 (Column "RB+"). Column "(RB-RB+)/RB" shows the gain in percentage. As the table shows it, this gain is very important since it is close to 30.81% on average. This confirms well our expectation, i.e. adding tests to reduce color changes is preferable to perform directly color changes.
Table6: Case 2 – RB code versus RB+ code - Running time
C2 X 102
|
RB
|
RB+
|
(RB-RB+)/RB
|
100000
|
155
|
109
|
29.74%
|
200000
|
296
|
203
|
31.34%
|
300000
|
447
|
312
|
30.22%
|
400000
|
609
|
411
|
32.49%
|
500000
|
739
|
515
|
30.32%
|
600000
|
890
|
615
|
30.91%
|
700000
|
1036
|
729
|
29.67%
|
800000
|
1174
|
812
|
30.83%
|
900000
|
1343
|
927
|
30.98%
|
1000000
|
1484
|
1016
|
31.55%
|
|
|
|
30.81%
|
RB code for case 3 versus RB+ code for case 3
Table 7 gives the running times in milliseconds consumed by the RB code for case 3 (Column "RB") and by RB+ code for case 3 (Column "RB+"). Column "(RB-RB+)/RB" shows the gain in percentage. A 5.04% gain is completed by the new code of case 3.
Table7: Case 3 – RB code versus RB+ code - Running time
C3 X 102
|
RB
|
RB+
|
(RB-RB+)/RB
|
100000
|
125
|
118
|
5.60%
|
200000
|
249
|
235
|
5.62%
|
300000
|
359
|
344
|
4.18%
|
400000
|
499
|
468
|
6.21%
|
500000
|
609
|
578
|
5.09%
|
600000
|
734
|
703
|
4.22%
|
700000
|
860
|
811
|
5.70%
|
800000
|
968
|
937
|
3.20%
|
900000
|
1108
|
1047
|
5.51%
|
1000000
|
1218
|
1156
|
5.09%
|
|
|
|
5.04%
|
4.3 Recapitulation
To sum up, Table 8 recapitulates test results with the average values. When values are prefixed by sign "+", it is in favor of the new delete algorithm. When values are prefixed by sign" –", it is in favor of the standard delete algorithm.
Table8: Result recapitulation
Main contributions
|
Color changes
|
+ 29%
|
Rotations
|
+ 0.97%
|
Maintenance operations
|
+ 10. 98%
|
Deletions without maintenance operations
|
+ 6.07%
|
RB codes versus RB+ codes
|
RB case distribution
|
40,71%
|
28,00%
|
16,61%
|
14,69%
|
RB+ case distribution
|
33.94%
|
32.16%
|
18.39%
|
15.51%
|
RB+ case 2 distribution
|
α1=29.79% α2=29.31%
|
RB+ case 3 distribution
|
β = 73.84%
|
RB case 2 versus RB+ Case 2 - Time
|
+ 30,81%
|
RB case 3 versus RB+ case 3 - Time
|
+ 5.04%
|
Impact on insert algorithm
|
Color changes (Insert algorithm)
|
-7.17%
|
Rotations(Insert algorithm)
|
-1.32%
|
Maintenance operations (Insert algorithm)
|
- 6.21%
|
Insertions without maintenance operations (Insert algorithm)
|
- 7.96%
|
Cascading in the insert algorithm
|
Case distribution – Insert algorithm with RB
|
48.81%
|
25.46%
|
25.73%
|
Case distribution – Insert algorithm with RB+
|
51.32%
|
24.23%
|
24.45%
|
Cascade length - Insert algorithm with RB
|
1.42
|
Cascade length - Insert algorithm with RB+
|
1.36
|
Overall effects
|
Color changes (Insert and delete algorithms)
|
+8.79%
|
Rotations (Insert and delete algorithms )
|
-0.32%
|
Overall time (Insert and delete algorithms)
|
+ 4.04%
|
Time of an operation ( insert or delete algorithm)
|
+ 0.29µs
|
Impact on search algorithm
|
Search algorithm – Largest difference
|
0.001677
|
Main contributions and performance of the RB + code compared to the RB code
The main contributions of the proposed delete algorithm can be summarized in what follows. A substantial gain in color changes is observed since this gain is about 29%. Results also reveal that the new coloring of the tree implies a diminution of the number of maintenance operations by a factor of 10.98%. Moreover, the new delete algorithm includes two advantages. Case 1 is less frequent (33.94% in RB+ versus 40.71% in RB) and case 2 is more frequent (32.16% in RB+ versus 28% in RB).The former reduces cascading effects since these can be triggered only by this case. The latter decreases color changes. Another advantage of the new algorithm lies in the fact that there are more delete operations (6.07%) that are performed without maintenance operations compared to the standard algorithm. The new coloration of the tree also implies a slight decrease in rotations. Indeed, a gain of about 0.97% is observed.
The price to pay for these advantages is the addition, in the standard algorithm, of only two tests in case 2 and one test in case 3. Recall that both cases 2 and 3 cover 44.61% and 50.55% for RB and RB+ respectively. Our deep analysis allowed us to determinate le percentage of cases on the total number of case 2 for which only one color change is performed per operation (α1=29.79%) and the one for which three color changes are performed per operation(α2=29.31%). This implies that for about 40.9% of case 2, no color change is made.
Our deep analysis also allowed us to determinate le percentage of cases on the total number of case 3 for which only two color changes are performed per operation (β = 73.84%.)
This implies that for about 26.16% of case 3, only one color change is made. On the basis of α1, α2 and β parameters, the simulation results revealed that the "Adding test-Reducing color changes" tradeoff is interesting since the RB+ code for case 2 saves 30.81% of running time compared to the corresponding RB code and the RB+ code for case 3 saves 5.04% of running time compared to the corresponding RB code.
Consequences on insert process and overall effects
As we pointed out before, we studied the impact of the new tree coloring - involved in the proposed delete algorithm - on the insert and search operations. The impact on overall effects was also studied. For space reasons we omited simulation tables. Below, we report some simulation results (Table 8).
In order to study the impact of the proposed delete algorithm on the insert algorithm we measured the same parameters. The insert algorithm used with RB+ generates 7.17% more color changes and 1.32% more rotations. Moreover, it increases maintenance operations by a factor of 6.21% and decreases the insert operations performed with maintenance operations by a factor of 7.96%. Certainly, these are not convenient. However, the proposed delete algorithm offers an attractive tradeoff between its advantages and the drawbacks of the insert algorithm used in conjunction with it.
To prove that the new delete algorithm does not deteriorate the insert process, we computed the average length of the cascade involved in case 1. First, we analyzed case distributions of the insert algorithm when it is used with RB and RB+. Case 1 occurs 48.81% of total cases when RB is used and 51.32% when RB+ is used. Case 1 then covers about 50% of maintenance operations. It was not expected that cascading in the insert algorithm would be shorter when the new delete algorithm is used. Indeed, the cascade length is close to 1.42 when RB is used and it is close to 1.36 when RB+ is used.
Let us consider an insert operation regardless of the delete algorithm used to remove items from a Red-Black tree. When an item is inserted into a Red-Black tree, either it requires a maintenance operation or not. If it requires such an operation, one of the three cases of the insert algorithm is performed. For case 2 one rotation and two color changes are performed and for case 3 two rotations and two color changes are performed (Red-Black tree properties). On the contrary, in case 1, the number of color changes depends on the length of the cascade. As the latter is shorter when RB+ is used, the insert algorithm should be then more efficient. The only drawback is that the insert algorithm used with RB + performs more the case 1 (51.32% against 48.81%). In contrast, the frequencies of cases 2 and 3 are slightly lower. Indeed, for case 2, we have 24.23% of total cases for RB against 25.46% for RB+ and for case 3, we have 24.45% of total cases for RB against 25.73% for RB+.
Of course, these improvements have an impact on the running time. Indeed, test results show an overall gain in running time of about 4.04% and a gain of about 0.29 microseconds per operation (insert/delete). This confirms well that the new delete algorithm works well with the insert process. Note also that, overall, there is an 8.79% gain in the total number of color changes performed by both the insert algorithm and RB+ and only a 0.32% loss in the total number of rotations performed by both the insert algorithm and RB+.
Furthermore, the new coloring of the tree resulting from the proposed delete algorithm does not deteriorate search performance. Indeed, the average lengths of paths of trees generated by the two delete algorithms are almost the same. The largest observed difference is about 0.001677.