HEVC (High Efficiency Video Coding)[1] as the latest international video compression standard, has been jointly developed by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) [2-3]. HEVC mainly users the new quadtree structure based on Coding Tree Unit (CTU) to achieve a more high-efficient video coding compression rate, especially in some high-resolution video contents, such as HD(High Definition) videos, UHD (Ultra HD) videos, and so on, compared to previous video coding standards, such as H.264/AVC. It is reported that the latest HEVC standard only requires about 50% bit-rate of the H.264/AVC at the same perceptual quality [4-5].
HEVC improves its coding efficiency by using the new structures technology, but at the same time, it also greatly increases its computational complexity because it needs to exhaustive execute the RDCost (Rate-Distortion Cost) process to obtain the optimal partition mode for each CU (Coding Unit) from all of their CU combinations[6]. However, the large computational complexity has become a very important problem in some embedded real-time devices, especially in some power constrained or storage constrained application devices. Therefore, it is very necessary to lower the computational complexity for HEVC by developing some efficient CU partition algorithms with negligible degradation [7].
In order to solve the problem above, many efficient methods have been taken to reduce the computational complexity in intra CU size decision for HEVC recently. For example, a fast CU size decision algorithm based on homogenous CUs termination [8], a fast CU size decision algorithm based on texture[9], a fast CU size determination algorithm based on adaptive discretization total variation threshold[10], a fast CU size partitioning algorithm based on data mining[11], and so on, have been proposed to reduce the computational complexity for HEVC. Although these schemes above are well designed, but there is still a need to further develop more efficient schemes to greatly reduce their high computational complexity for HEVC.
Different from these methods above, in this paper, a fast CU size decision optimal algorithm based on coding bits is proposed to reduce its total computational complexity for HEVC. The contribution of this paper lies in the following:
(1) Find the correlation between coding bits and partition size for a CU based on the relationship among the texture complexity and coding bits and partition size above after carefully observation and statistical analysis of the relationships among the texture complexity and partition size and coding bits of a CU in video image;
(2) The corresponding threshold of coding bits for CU partition under different size and QP value is exploited to quickly judge whether the current CU needs to be further divided based on the correlation above, which can reduce many unnecessary prediction and partition operations for the current CU;
(3) A series of experiments are performed to verify the effectiveness of our proposed method in this paper. And the kinds of simulation results show the effectiveness of our proposed method in this paper;
The rest of this paper are organized as follows. In section 2, the related knowledge on CU size decision schemes are introduced. The proposed method is presented in section 3. The simulation result and analysis of proposed scheme compared with existing methods are presented in section 4. In section 5, we give some my conclusion.
2. The Related knowledge on CU Size Decision Schemes
2.1 Standard process of CU size decision scheme
During the standard process of CU size decision in HEVC, HEVC needs to traverse all possible partitions modes to find the optimal size partition mode for the current CU, and the partition mode with minimum RDCost will be taken as the optimal size partition mode for the current CU. A CTU with 64×64 can be recursively divided into four sub CUs with the same size in the form of quadtree. Then, each sub CUs can be further divided into four sub CUs with the same size, and the process will end only when the smallest CU is reached. The depth change of CU size from 64×64 to 8×8 corresponds to the depth 0 to 3. Fig.1 is partitioning process for a LCU with 64×64 size, and its specific division process can be shown in following steps.
Step 1: Calculate the RDCost of LCU with 64×64 in depth 0 before partition, and take it as RDCostunsplit;
Step 2: Calculate the RDCost of four sub CUs in depth 1 after partition respectively, then calculate their sum and take it as RDCostsplit. It is necessary to recursively search down layer by layer until the maximum depth is reached;
Step 3: Compare the calculated RDCostsplit and RDCostunsplit before and after the partition. If Costsplit is less than RDCostunsplit, the optimal partition mode for the current LCU is the sub CUs in smaller level after the partition; otherwise, if RDCostsplit≥ RDCostunsplit, the best partition mode is the current LCU, and the current LCU does not need to be divided;
Step4: Traverse all CUs step by step according to the Z-scan sequence, and execute step 1 to step 3 until each CU of all levels in the LCU is calculated and judged, and the optimal partition mode of the whole LCU is finally determined;
After the four steps operation above, the optimal partition mode for the LCU with 64 × 64 can be determined, and higher coding efficiency can be realized by using the optimal partition mode above.
2.2 Complexity Analysis of CU size decision
As mentioned above, HEVC needs to use recursive quadtree to divide a LCU with 64×64 size, and finally find an optimal partition mode for the LCU. Since a 64×64 LCU can be divided into four sub CUs with 32×32, each sub 32×32 CUs can be divided into four sub 16×16 CUs, each sub 16×16 CUs can be further divided into four sub 8×8 CUs, therefore, a LCU with 64×64 can generate 40+41+42+43=85 sub CUs. Each in 85 sub CUs needs to be predicted and encoded by dividing it into one or more PUs, whose size ranges from 64×64 to 4×4, so a LCU with 64×64 can produce 40+41+42+43+44=241 PUs. Table 1 shows the corresponding number of sub PUs with different CU sizes for a LCD.
Table 1 Number of sub CU and PU blocks with different sizes
The size of CU blocks
|
The number of sub PUs blocks
|
4×4
|
256
|
8×8
|
64
|
16×16
|
16
|
32×32
|
4
|
64×64
|
1
|
There are 35 candidate modes for each PU partition, including 33 angular modes and DC and Planar modes in HEVC. If each in 341 PUs needs to independently execute 35 prediction and partition operations based on RDCost calculation, it will need to take 341×35=11935 times RDCost calculations for each CU in HEVC, which is very complex and time-consuming. Accordingly, it is very necessary to reduce the computational complexity for HEVC.
2.3 Related study for CU partition decision schemes
Recently, a number of CU size decision algorithms have been proposed to reduce the computational complexity for HEVC.
Zhang et.al [8] proposed a fast CU size decision algorithm based on homogenous CUs and linear support vector machines to reduce the computation complexity for HEVC, where homogenous CUs and two linear support vector machines based on depth difference and HAD cost ratio are used to complete the decisions of early CU split and early CU termination for some of the CUs. Ha et al [9] presented a fast CU size decision algorithm based on texture to reduce the computation complexity for HEVC. In this paper, the determined CU sizes, which are decided by the texture of the image, are only use. Zhou et al [10] suggested a fast CU size decision algorithm based on visual saliency detection to reduce the computation complexity for HEVC, which mainly use visual saliency detection to realize the adaptive and perceptual CU size decision, thus alleviating computation complexity for HEVC. Song et al [11] proposed a fast CU size determination algorithm based on adaptive discretization total variation (DTV) threshold to reduce the computation complexity for HEVC. In this paper, adaptive discretization total variation (DTV) threshold is used to skip some specific depth levels for HEVC. Ruiz et al [12] presented a fast CU partitioning algorithm based on offline-trained decision tree with three hierarchical nodes to reduce the computation complexity for HEVC, where they use the content texture properties of CUs as well as the inter-sub-CUs statistics of the same depth level to determine the decision rules computed in each node. Cen et al [13] proposed a fast CU depth decision mechanism based on adaptive CU depth range determination and comparison to reduce the encoding complexity for HEVC, where the RDCost calculations outside the adaptive CU depth range and at the current CU depth are skipped to alleviate computation complexity for HEVC. Zhang et al [14] proposed a fast and efficient CU size decision algorithm based on temporal and spatial correlation to reduce the computation complexity for HEVC. In this paper, they mainly used video temporal and spatial property of coding information in previous coded frames to predict the current treeblock prediction mode and to early skip unnecessary CU. In the paper [15], a fast CU size decision algorithm based on depth information of neighboring CUs is proposed to reduce its encoding complexity for HEVC. In their scheme, they mainly exploited the depth information of neighboring CUs to realize the split decision and pruning decision of early CU. In the reference [16], a fast CU size decision algorithm based on bayesian theorem detection is presented to reduce the encoding complexity for HEVC, where they mainly use CU property and bayesian theorem detection to reduce TU processing complexity. In the paper [17], a novel fast CU encoding scheme based on spatiotemporal encoding parameters is proposed to reduce the encoding complexity for HEVC. In their method, an improved early CU SKIP detection method and a fast CU split decision method are used in this paper. In this work [18], a fast CU size decision algorithm based on statistical analysis is developed to reduce its encoder complexity for HEVC. In the paper, three approaches based on the statistical analysis, SKIP mode decision (SMD), CU skip estimation (CUSE), and early CU termination (ECUT) are presented to reduce its computation complexity for HEVC. In the paper [19], a fast CU size selection based on the bayesian decision rule is presented to reduce its whole encoder complexity for HEVC, where they mainly collected relevant and computational-friendly features to assist decisions on CU splitting. Chen et al. [20] proposed a fast CU size decision algorithm based on online progressive bayesian classification. In this paper, they mainly used SATD (Sum of Absolute Transformed Differences) of the prediction residual and neighboring CU depths as key features to decide the CU size. Liu et al.[21] presented a fast CU size decision algorithm based on dual-SVM, where they divided the complexity of video content as high, low, and middle, and used a four-dimensional features, textural variance, neighboring Mean Squared Error, directional complexity from Sobel operations and the variance difference of the four sub-CU blocks, to represent them. Liu et al. [22] proposed a fast CU size decision algorithm based on Convolution Neural Network (CNN) accelerator for hardwired INTRA encoder. In their scheme, they mainly used three trained CNNs as well as Very Large Scale Integration circuits to predict the depth of CU with 32×32, 16×16 and 8× 8. Xu et al. [23] proposed a fast CU size decision algorithm based on early terminated hierarchical CNN, where hierarchical CNN is mainly used to determine the partition or non- partition at each prediction layer. Zhang et al. [24] suggested a fast CU size decision algorithm based on two-stage classification. At the first stage of their scheme, they firstly used off-line learning to determine the split, non-split, and uncertain, and at the second stage of their scheme, they used on-line learning to refine the uncertain predictions.
Different from these methods above, in this paper, a fast CU size decision optimal algorithm based on coding bits is proposed to quickly determine its optimal CU partition mode for the current CU, which can reduce many unnecessary prediction and partition operations, thus saving much computational complexity for HEVC.
3. Proposed Scheme
3.1 Motivation of our proposed method
3.1.1 Observe and analyze the relationship between texture complexity and partition size
In HEVC, the optimal partition mode of a CU generally requires the least resource consuming and data transfer. A CU with complex texture information is usually divided into more sub CUs, and the sub CUs with complex texture can be further split into more sub CUs until the until the maximum depth is reached. On the contrary, a CU with simple texture information generally does not need be divided into more sub CUs because of its simple texture features. Smaller size of CU contains more texture information and needs complex partition computations, while the larger size of CU includes less texture information and requires less partition computations. There is closer relationship between texture complexity and its partition size in a CU of video image. Fig 2 is the partition number of sub CUs in different texture complexity of CU in a BasketballDrill test sequence.
Fig2 shows that texture complexity of CU in video image has an important influence on its partition size of CU. The CU1 with complex texture can be split into 85 sub CUs. The CU2 with less complex texture can be partitioned into 36 sub CUs. CU3 is not divided into some sub CUs due to it simple texture information in its CU. The other CUs in the video image have the similar phenomenon as CU1, CU2 and CU3 above.
3.1.2 Observe and analyze the relationship between texture complexity and coding bits
In HEVC, the texture complexity in a CU also has to do with its coding bits. Different texture complexity of CU generally requires different number of coding bits to encode. A CU with complex texture usually requires more coding bits to accurately describe its rich texture information when dealing with the CU. On the contrary, A CU with simple texture generally asks less coding bits to describe its simple texture information. Under the same condition, large number of coding bits can usually represents more texture information for a CU, while small number of coding bits can generally show less texture information for a CU. The number of coding bits can reflect the complexity of texture information of a CU in video image. Fig 3 is the number of coding bits in different texture complexity of CU in a BasketballDrill test sequence.
From Fig 3, we can see clearly that different texture information requires different number of coding bits to obtain the optimal partition mode for a CU. In Fig 2, the CU 1 has the most complex texture information, therefore it needs to 852 coding bits to describe its rich texture information. Since the CU 2 has the less texture information compared to CU 1, it only needs 414 coding bits to encode its texture information. The CU 3 only requires 91 coding bits to encode its texture information due to its simple and smooth texture information. The number of coding bits is closely related to its texture complexity in a CU. The other CUs in the video image has the similar phenomenon as that above.
3.2 Find the correlation between coding bits and partition size
Based on the observation and analysis above, we find an important correlation between coding bits and partition size for a CU. A CU with more encoding bits generally contains more texture complexity and it is possible to be divided into more sub CUs with smaller size until the optimal combination is reached. On the contrary, A CU with smaller encoding bits generally contains less texture information and it is possible not be divided into more sub CUs correspondingly. Therefore, in this paper, we can use the coding bits of a CU to quickly judge its partition mode, which can reduce many unnecessary prediction and partition operations for the CU, and then save much computational complexity for HEVC.
Besides the size of CU, in HEVC, the QP value can also affect the total coding bits and partition size in a CU. For the same size of CU, when the QP value is large, the total number of coding bits are small. On the contrary, when its QP value is small, the total number of coding bits is large. In order to obtain the accurate relationship between coding bits and partition size, we use 12 different test sequences to test the number of coding bits and number of occurrences for each coding bits under different CU size and QP value. The larger the number of occurrences for each coding bits, the higher the probability of optimal partition mode for the current CU at the coding bit. Fig 4 to Fig 9 are the related experimental results. In our experiment, the frame rate and frame number of 12 different test sequences are set 25 and 100 respectively. Three different CU size (CU64×64, CU32×32 and CU16×16) and four different QP values (QP=22, QP=27, QP=32 and QP=37) are used respectively.
From Fig 4 to Fig 9, we can see clearly that, firstly the number of coding bits is closely related to its occurrences number, With the increase of coding bits, the number of occurrences for each coding bits will reach a maximum value, which stands for the optimal partition mode for the current CU and can be used to judge whether the current CU need to be further divided. Secondly, the size of CU and the value of QP have obvious effects on its coding bits and partition mode. Different CU size and QP value requires different threshold of coding bits for CU partition. Therefore, in this paper, we can make use of the close relationship between the number of coding bits and its occurrences number under different CU size and QP value to quickly judge whether the current CU needs to be further divided, which can reduce many unnecessary prediction and partition operations, and then save much computational complexity for HEVC.
3.3 Build the threshold of coding bits for CU partition under different size and QP value
Based on the analysis above, in this subsection, we get a fast CU size decision method which can quickly judge whether the current CU needs to be further divided by setting a certain threshold value for coding bits under different CU size and QP value based on the correlation between the number of coding bits and its occurrences number above. In our scheme, when the coding bits of the current CU are greater than its thresholds, the partition of the current CU will continue, otherwise, the partition will be terminated in advance, which can reduce many unnecessary prediction and partition operations, and then save much computational complexity for HEVC. The threshold of coding bits for CU partition can be expressed in the following formula.
Where coding bits represents the number of coding bits for the current CU; Coding BitsTH stands for the partition thresholds of our proposed method in this paper, which are shown in Table 2. In our scheme, we can obtained the partition thresholds of coding bits for a CU partition under different size and QP value by making use of the correlation between the number of coding bits and its occurrences number and the methods of statistical analysis, such as the Fig 4 to Fig 9, which is described in Section 3.2.
Table 2 Partition thresholds of CU with different size and QP value.
CU size
|
QP=22
|
QP=27
|
QP=32
|
QP=37
|
64 × 64
|
850
|
560
|
220
|
130
|
32 × 32
|
430
|
240
|
130
|
50
|
16 × 16
|
140
|
90
|
50
|
30
|
In order to verify the accuracy rate of our proposed partition thresholds above, in this paper, we also use the 12 different test sequences above to test its accuracy rate of partition thresholds under different CU size and QP value. Table 3 show the experimental results with QP=22 and QP=32.
Table 3 The accuracy rate of proposed partition threshold
Picture
Size
|
Sequences
name
|
The accuracy rate of partition
algorithm(%) with QP=22
|
The accuracy rate of partition
algorithm(%)with QP=32
|
64×64
|
32×32
|
16×16
|
64×64
|
32×32
|
16×16
|
Class A
|
PeopleOnstreet
|
0.78
|
0.73
|
0.78
|
0.68
|
0.69
|
0.73
|
Traffic
|
0.76
|
0.73
|
0.72
|
0.75
|
0.69
|
0.67
|
Class B
|
BasketballDrive
|
0.68
|
0.76
|
0.75
|
0.72
|
0.76
|
0.72
|
Cactus
|
0.64
|
0.69
|
0.65
|
0.69
|
0.67
|
0.62
|
Class C
|
RaceHorses
|
0.61
|
0.63
|
0.64
|
0.66
|
0.62
|
0.58
|
BQMall
|
0.64
|
0.68
|
0.66
|
0.67
|
0.71
|
0.69
|
Class D
|
BasketballPass
|
0.71
|
0.69
|
0.71
|
0.72
|
0.69
|
0.74
|
BlowingBubbles
|
0.88
|
0.84
|
0.71
|
0.81
|
0.74
|
0.75
|
Class E
|
Fourpeople
|
0.76
|
0.62
|
0.68
|
0.78
|
0.71.
|
0.68
|
Johnny
|
0.78
|
0.72
|
0.68
|
0.81
|
0.67
|
0.61
|
Class F
|
BasketballDrillText
|
0.75
|
0.76
|
0.68
|
0.67
|
0.66
|
0.72
|
Slideshow
|
0.79
|
0.73
|
0.71
|
0.76
|
0.73
|
0.67
|
Average
|
0.74
|
0.72
|
0.69
|
0.73
|
0.68
|
0.68
|
From table 2, we can clearly observer that the average accuracy rate of partition threshold in our proposed method is very high, reaching to about 70% under different CU size and QP value, which proves the validity of our partition threshold value in this paper. Therefore in this paper, we can make use of the partition threshold value to quick divide a CU under different CU size and QP value, which can reduce many unnecessary prediction and partition operations, and then reduce lots of computational complexity for HEVC.
3.4 Realizing process of our proposed method
The whole realizing principle for our proposed scheme is that we use the coding bits of the current CU as the main way to quickly judge the partition process for the current CU. When the number of coding bits of the current CU are greater than its selected thresholds, the partition of the current CU will continue, otherwise, the partition will be terminated in advance, which can reduce many unnecessary prediction and partition operations, and then save much computational complexity for HEVC.
The basic realizing process for our proposed scheme is that, we firstly carefully observe and statistically analyze the relationship among texture complexity, coding bits and partition modes of CU in video image; Secondly we find the correlation between coding bits and partition mode for the current CU based on the relationship above; Thirdly we build the corresponding thresholds between coding bit and partition mode under different size and QP value based on the correlation between coding bits and partition mode above; At last, we use the partition thresholds above to quickly determine the partition process for the current CU by reducing many unnecessary partition operations. As a result, our proposed algorithm can effectively saving lots of computational complexity for HEVC.
Fig.10 is the realizing process of our proposed scheme, which includes five main realizing contents: obtain coding bits of current CU, select threshold value, compare difference value, execute corresponding process and obtain the optimal partition mode.
Where obtain coding bits of current CU mainly executes the calculation operations of coding bits for the current CU. Select threshold value mainly executes the selection operations for partition threshold value for the current CU according to different CU size and QP value from Table 2. Compare difference value mainly executes the comparing operations between the coding bits of the current CU and selected threshold value from Table 2 above. Execute corresponding process mainly select the different partition process to execute according to the difference value above. When the number of coding bits of the current CU are greater than its selected thresholds, it will continue to execute the partition operation for the current CU, otherwise, it will terminated the partition process in advance. Obtain the optimal partition mainly execute the determining operation of optimal partition mode for the current CU.
3.5 Realizing steps of our scheme
Fig.11.is realizing flow of our proposed scheme in this paper. The detailed realizing steps for our proposed scheme can be summarized up into the following steps:
Step 1: Encode a CU and read its depth;
Step 2: Judge its depth of the CU is less than 3. If it is less than 3, jump Step 3 to execute; otherwise, jump Step 5 to execute;
Step 3: Calculate the coding bits of the current CU;
Step 4: Select the corresponding threshold value by Table 2 according to its CU size and QP value of the current CU;
Step 5: Compare the difference value between coding bits of the current CU and threshold value selected from Table 2. If the coding bits of the current CU is less than threshold value, jump Step6 to execute; otherwise, the depth of the current CU plus 1, and jump Step 2 to execute;
Step6: Obtain the optimal partition mode for the current CU;
Step7: Complete the optimal partition for the LCU;