A Fast CU Size Decision Optimal Algorithm based on Coding Bits for HEVC


 HEVC (High Efficiency Video Codng) employs quadtree CTU (Coding Tree Unit) structure to improve its coding efficiency, but at the same time, it also requires a very high computational complexity due to its exhaustive search process for an optimal partition mode for the current CU(Coding Unit). Aiming to solve the problem, a fast CU size decision optimal algorithm based on coding bits is presented for HEVC in this paper. The contribution of this paper lies that we successfully use the coding bits technology to quickly determine the optimal partition mode for the current CU. Specially, in our scheme, firstly we carefully observe and statistically analyze the relationship among the texture complexity and partition size and coding bits in the CUs of video image; Secondly we find the correlation between coding bits and partition size based on the relationship above; Thirdly, we build the corresponding threshold of coding bits for partition size under different CU size and QP value based on the correlation above to reduce many unnecessary prediction and partition operations for the current CU. As a result, our proposed algorithm can effectively saving lots of computational complexity for HEVC. The simulation results show that our proposed fast CU size decision algorithm based on coding bits in this paper can save about 34.67% coding time, and only at a cost of 0.61% BDBR increase and 0.043db BDPSNR decline compared with the standard reference of HM16.1, thus improving the coding performance of HEVC.

Introduction HEVC (High E ciency Video Coding) [1] as the latest international video compression standard, has been jointly developed by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) [2][3]. HEVC mainly users the new quadtree structure based on Coding Tree Unit (CTU) to achieve a more high-e cient video coding compression rate, especially in some high-resolution video contents, such as HD(High De nition) videos, UHD (Ultra HD) videos, and so on, compared to previous video coding standards, such as H.264/AVC. It is reported that the latest HEVC standard only requires about 50% bit-rate of the H.264/AVC at the same perceptual quality [4][5].
HEVC improves its coding e ciency by using the new structures technology, but at the same time, it also greatly increases its computational complexity because it needs to exhaustive execute the RDCost (Rate-Distortion Cost) process to obtain the optimal partition mode for each CU (Coding Unit) from all of their CU combinations [6]. However, the large computational complexity has become a very important problem in some embedded real-time devices, especially in some power constrained or storage constrained application devices. Therefore, it is very necessary to lower the computational complexity for HEVC by developing some e cient CU partition algorithms with negligible degradation [7].
In order to solve the problem above, many e cient methods have been taken to reduce the computational complexity in intra CU size decision for HEVC recently. For example, a fast CU size decision algorithm based on homogenous CUs termination [8], a fast CU size decision algorithm based on texture [9], a fast CU size determination algorithm based on adaptive discretization total variation threshold [10], a fast CU size partitioning algorithm based on data mining [11], and so on, have been proposed to reduce the computational complexity for HEVC. Although these schemes above are well designed, but there is still a need to further develop more e cient schemes to greatly reduce their high computational complexity for HEVC. Different from these methods above, in this paper, a fast CU size decision optimal algorithm based on coding bits is proposed to reduce its total computational complexity for HEVC. The contribution of this paper lies in the following: (1) Find the correlation between coding bits and partition size for a CU based on the relationship among the texture complexity and coding bits and partition size above after carefully observation and statistical analysis of the relationships among the texture complexity and partition size and coding bits of a CU in video image; (2) The corresponding threshold of coding bits for CU partition under different size and QP value is exploited to quickly judge whether the current CU needs to be further divided based on the correlation above, which can reduce many unnecessary prediction and partition operations for the current CU; (3) A series of experiments are performed to verify the effectiveness of our proposed method in this paper. And the kinds of simulation results show the effectiveness of our proposed method in this paper; The rest of this paper are organized as follows. In section 2, the related knowledge on CU size decision schemes are introduced. The proposed method is presented in section 3. The simulation result and analysis of proposed scheme compared with existing methods are presented in section 4. In section 5, we give some my conclusion.

Standard process of CU size decision scheme
During the standard process of CU size decision in HEVC, HEVC needs to traverse all possible partitions modes to nd the optimal size partition mode for the current CU, and the partition mode with minimum RDCost will be taken as the optimal size partition mode for the current CU. A CTU with 64×64 can be recursively divided into four sub CUs with the same size in the form of quadtree. Then, each sub CUs can be further divided into four sub CUs with the same size, and the process will end only when the smallest CU is reached. The depth change of CU size from 64×64 to 8×8 corresponds to the depth 0 to 3. Fig.1 is partitioning process for a LCU with 64×64 size, and its speci c division process can be shown in following steps.
Step 1: Calculate the RDCost of LCU with 64×64 in depth 0 before partition, and take it as RDCost unsplit ; Step 2: Calculate the RDCost of four sub CUs in depth 1 after partition respectively, then calculate their sum and take it as RDCost split . It is necessary to recursively search down layer by layer until the maximum depth is reached; Step 3: Compare the calculated RDCost split and RDCost unsplit before and after the partition. If Cost split is less than RDCost unsplit , the optimal partition mode for the current LCU is the sub CUs in smaller level after the partition; otherwise, if RDCost split ≥ RDCost unsplit , the best partition mode is the current LCU, and the current LCU does not need to be divided; Step4: Traverse all CUs step by step according to the Z-scan sequence, and execute step 1 to step 3 until each CU of all levels in the LCU is calculated and judged, and the optimal partition mode of the whole LCU is nally determined; After the four steps operation above, the optimal partition mode for the LCU with 64 × 64 can be determined, and higher coding e ciency can be realized by using the optimal partition mode above.

Complexity Analysis of CU size decision
As mentioned above, HEVC needs to use recursive quadtree to divide a LCU with 64×64 size, and nally nd an optimal partition mode for the LCU. Since a 64×64 LCU can be divided into four sub CUs with 32×32, each sub 32×32 CUs can be divided into four sub 16×16 CUs, each sub 16×16 CUs can be further divided into four sub 8×8 CUs, therefore, a LCU with 64×64 can generate 4 0 +4 1 +4 2 +4 3 =85 sub CUs. Each in 85 sub CUs needs to be predicted and encoded by dividing it into one or more PUs, whose size ranges from 64×64 to 4×4, so a LCU with 64×64 can produce 4 0 +4 1 +4 2 +4 3 +4 4 =241 PUs. Table 1 shows the corresponding number of sub PUs with different CU sizes for a LCD.

Related study for CU partition decision schemes
Recently, a number of CU size decision algorithms have been proposed to reduce the computational complexity for HEVC.
Zhang et.al [8] proposed a fast CU size decision algorithm based on homogenous CUs and linear support vector machines to reduce the computation complexity for HEVC, where homogenous CUs and two linear support vector machines based on depth difference and HAD cost ratio are used to complete the decisions of early CU split and early CU termination for some of the CUs. Ha et al [9] presented a fast CU size decision algorithm based on texture to reduce the computation complexity for HEVC. In this paper, the determined CU sizes, which are decided by the texture of the image, are only use. Zhou et al [10] suggested a fast CU size decision algorithm based on visual saliency detection to reduce the computation complexity for HEVC, which mainly use visual saliency detection to realize the adaptive and perceptual CU size decision, thus alleviating computation complexity for HEVC. Song et al [11] proposed a fast CU size determination algorithm based on adaptive discretization total variation (DTV) threshold to reduce the computation complexity for HEVC. In this paper, adaptive discretization total variation (DTV) threshold is used to skip some speci c depth levels for HEVC. Ruiz et al [12] presented a fast CU partitioning algorithm based on o ine-trained decision tree with three hierarchical nodes to reduce the computation complexity for HEVC, where they use the content texture properties of CUs as well as the inter-sub-CUs statistics of the same depth level to determine the decision rules computed in each node. Cen et al [13] proposed a fast CU depth decision mechanism based on adaptive CU depth range determination and comparison to reduce the encoding complexity for HEVC, where the RDCost calculations outside the adaptive CU depth range and at the current CU depth are skipped to alleviate computation complexity for HEVC. Zhang et al [14] proposed a fast and e cient CU size decision algorithm based on temporal and spatial correlation to reduce the computation complexity for HEVC. In this paper, they mainly used video temporal and spatial property of coding information in previous coded frames to predict the current treeblock prediction mode and to early skip unnecessary CU. In the paper [15], a fast CU size decision algorithm based on depth information of neighboring CUs is proposed to reduce its encoding complexity for HEVC. In their scheme, they mainly exploited the depth information of neighboring CUs to realize the split decision and pruning decision of early CU. In the reference [16], a fast CU size decision algorithm based on bayesian theorem detection is presented to reduce the encoding complexity for HEVC, where they mainly use CU property and bayesian theorem detection to reduce TU processing complexity. In the paper [17], a novel fast CU encoding scheme based on spatiotemporal encoding parameters is proposed to reduce the encoding complexity for HEVC. In their method, an improved early CU SKIP detection method and a fast CU split decision method are used in this paper. In this work [18], a fast CU size decision algorithm based on statistical analysis is developed to reduce its encoder complexity for HEVC. In the paper, three approaches based on the statistical analysis, SKIP mode decision (SMD), CU skip estimation (CUSE), and early CU termination (ECUT) are presented to reduce its computation complexity for HEVC. In the paper [19], a fast CU size selection based on the bayesian decision rule is presented to reduce its whole encoder complexity for HEVC, where they mainly collected relevant and computational-friendly features to assist decisions on CU splitting. Chen et al. [20] proposed a fast CU size decision algorithm based on online progressive bayesian classi cation. In this paper, they mainly used SATD (Sum of Absolute Transformed Differences) of the prediction residual and neighboring CU depths as key features to decide the CU size. Liu et al. [21] presented a fast CU size decision algorithm based on dual-SVM, where they divided the complexity of video content as high, low, and middle, and used a four-dimensional features, textural variance, neighboring Mean Squared Error, directional complexity from Sobel operations and the variance difference of the four sub-CU blocks, to represent them. Liu et al. [22] proposed a fast CU size decision algorithm based on Convolution Neural Network (CNN) accelerator for hardwired INTRA encoder. In their scheme, they mainly used three trained CNNs as well as Very Large Scale Integration circuits to predict the depth of CU with 32×32, 16×16 and 8× 8. Xu et al. [23] proposed a fast CU size decision algorithm based on early terminated hierarchical CNN, where hierarchical CNN is mainly used to determine the partition or non-partition at each prediction layer. Zhang et al. [24] suggested a fast CU size decision algorithm based on two-stage classi cation. At the rst stage of their scheme, they rstly used offline learning to determine the split, non-split, and uncertain, and at the second stage of their scheme, they used on-line learning to re ne the uncertain predictions.
Different from these methods above, in this paper, a fast CU size decision optimal algorithm based on coding bits is proposed to quickly determine its optimal CU partition mode for the current CU, which can reduce many unnecessary prediction and partition operations, thus saving much computational complexity for HEVC.

Proposed Scheme
3.1 Motivation of our proposed method 3.1.1 Observe and analyze the relationship between texture complexity and partition size In HEVC, the optimal partition mode of a CU generally requires the least resource consuming and data transfer. A CU with complex texture information is usually divided into more sub CUs, and the sub CUs with complex texture can be further split into more sub CUs until the until the maximum depth is reached.
On the contrary, a CU with simple texture information generally does not need be divided into more sub CUs because of its simple texture features. Smaller size of CU contains more texture information and needs complex partition computations, while the larger size of CU includes less texture information and requires less partition computations. There is closer relationship between texture complexity and its partition size in a CU of video image. Fig 2 is the partition number of sub CUs in different texture complexity of CU in a BasketballDrill test sequence.
Fig2 shows that texture complexity of CU in video image has an important in uence on its partition size of CU. The CU1 with complex texture can be split into 85 sub CUs. The CU2 with less complex texture can be partitioned into 36 sub CUs. CU3 is not divided into some sub CUs due to it simple texture information in its CU. The other CUs in the video image have the similar phenomenon as CU1, CU2 and CU3 above.

Observe and analyze the relationship between texture complexity and coding bits
In HEVC, the texture complexity in a CU also has to do with its coding bits. Different texture complexity of CU generally requires different number of coding bits to encode. A CU with complex texture usually requires more coding bits to accurately describe its rich texture information when dealing with the CU. On the contrary, A CU with simple texture generally asks less coding bits to describe its simple texture information. Under the same condition, large number of coding bits can usually represents more texture information for a CU, while small number of coding bits can generally show less texture information for a CU. The number of coding bits can re ect the complexity of texture information of a CU in video image. Fig 3 is the number of coding bits in different texture complexity of CU in a BasketballDrill test sequence.
From Fig 3, we can see clearly that different texture information requires different number of coding bits to obtain the optimal partition mode for a CU. In Fig  2, the CU 1 has the most complex texture information, therefore it needs to 852 coding bits to describe its rich texture information. Since the CU 2 has the less texture information compared to CU 1, it only needs 414 coding bits to encode its texture information. The CU 3 only requires 91 coding bits to encode its texture information due to its simple and smooth texture information. The number of coding bits is closely related to its texture complexity in a CU. The other CUs in the video image has the similar phenomenon as that above.

Find the correlation between coding bits and partition size
Based on the observation and analysis above, we nd an important correlation between coding bits and partition size for a CU. A CU with more encoding bits generally contains more texture complexity and it is possible to be divided into more sub CUs with smaller size until the optimal combination is reached. On the contrary, A CU with smaller encoding bits generally contains less texture information and it is possible not be divided into more sub CUs correspondingly. Therefore, in this paper, we can use the coding bits of a CU to quickly judge its partition mode, which can reduce many unnecessary prediction and partition operations for the CU, and then save much computational complexity for HEVC.
Besides the size of CU, in HEVC, the QP value can also affect the total coding bits and partition size in a CU. For the same size of CU, when the QP value is large, the total number of coding bits are small. On the contrary, when its QP value is small, the total number of coding bits is large. In order to obtain the accurate relationship between coding bits and partition size, we use 12 different test sequences to test the number of coding bits and number of occurrences for each coding bits under different CU size and QP value. The larger the number of occurrences for each coding bits, the higher the probability of optimal partition mode for the current CU at the coding bit. From Fig 4 to Fig 9, we can see clearly that, rstly the number of coding bits is closely related to its occurrences number, With the increase of coding bits, the number of occurrences for each coding bits will reach a maximum value, which stands for the optimal partition mode for the current CU and can be used to judge whether the current CU need to be further divided. Secondly, the size of CU and the value of QP have obvious effects on its coding bits and partition mode. Different CU size and QP value requires different threshold of coding bits for CU partition. Therefore, in this paper, we can make use of the close relationship between the number of coding bits and its occurrences number under different CU size and QP value to quickly judge whether the current CU needs to be further divided, which can reduce many unnecessary prediction and partition operations, and then save much computational complexity for HEVC.

Build the threshold of coding bits for CU partition under different size and QP value
Based on the analysis above, in this subsection, we get a fast CU size decision method which can quickly judge whether the current CU needs to be further divided by setting a certain threshold value for coding bits under different CU size and QP value based on the correlation between the number of coding bits and its occurrences number above. In our scheme, when the coding bits of the current CU are greater than its thresholds, the partition of the current CU will continue, otherwise, the partition will be terminated in advance, which can reduce many unnecessary prediction and partition operations, and then save much computational complexity for HEVC. The threshold of coding bits for CU partition can be expressed in the following formula.
Where coding bits represents the number of coding bits for the current CU; Coding Bits TH stands for the partition thresholds of our proposed method in this paper, which are shown in Table 2. In our scheme, we can obtained the partition thresholds of coding bits for a CU partition under different size and QP value by making use of the correlation between the number of coding bits and its occurrences number and the methods of statistical analysis, such as the  In order to verify the accuracy rate of our proposed partition thresholds above, in this paper, we also use the 12 different test sequences above to test its accuracy rate of partition thresholds under different CU size and QP value. Table 3 show the experimental results with QP=22 and QP=32. From table 2, we can clearly observer that the average accuracy rate of partition threshold in our proposed method is very high, reaching to about 70% under different CU size and QP value, which proves the validity of our partition threshold value in this paper. Therefore in this paper, we can make use of the partition threshold value to quick divide a CU under different CU size and QP value, which can reduce many unnecessary prediction and partition operations, and then reduce lots of computational complexity for HEVC.

Realizing process of our proposed method
The whole realizing principle for our proposed scheme is that we use the coding bits of the current CU as the main way to quickly judge the partition process for the current CU. When the number of coding bits of the current CU are greater than its selected thresholds, the partition of the current CU will continue, otherwise, the partition will be terminated in advance, which can reduce many unnecessary prediction and partition operations, and then save much computational complexity for HEVC.
The basic realizing process for our proposed scheme is that, we rstly carefully observe and statistically analyze the relationship among texture complexity, coding bits and partition modes of CU in video image; Secondly we nd the correlation between coding bits and partition mode for the current CU based on the relationship above; Thirdly we build the corresponding thresholds between coding bit and partition mode under different size and QP value based on the correlation between coding bits and partition mode above; At last, we use the partition thresholds above to quickly determine the partition process for the current CU by reducing many unnecessary partition operations. As a result, our proposed algorithm can effectively saving lots of computational complexity for HEVC. Fig.10 is the realizing process of our proposed scheme, which includes ve main realizing contents: obtain coding bits of current CU, select threshold value, compare difference value, execute corresponding process and obtain the optimal partition mode.
Where obtain coding bits of current CU mainly executes the calculation operations of coding bits for the current CU. Select threshold value mainly executes the selection operations for partition threshold value for the current CU according to different CU size and QP value from Table 2. Compare difference value mainly executes the comparing operations between the coding bits of the current CU and selected threshold value from Table 2 above. Execute corresponding process mainly select the different partition process to execute according to the difference value above. When the number of coding bits of the current CU are greater than its selected thresholds, it will continue to execute the partition operation for the current CU, otherwise, it will terminated the partition process in advance. Obtain the optimal partition mainly execute the determining operation of optimal partition mode for the current CU.
3.5 Realizing steps of our scheme Fig.11.is realizing ow of our proposed scheme in this paper. The detailed realizing steps for our proposed scheme can be summarized up into the following steps: Step 1: Encode a CU and read its depth; Step 2: Judge its depth of the CU is less than 3. If it is less than 3, jump Step 3 to execute; otherwise, jump Step 5 to execute; Step 3: Calculate the coding bits of the current CU; Step 4: Select the corresponding threshold value by Table 2 according to its CU size and QP value of the current CU; Step 5: Compare the difference value between coding bits of the current CU and threshold value selected from Table 2. If the coding bits of the current CU is less than threshold value, jump Step6 to execute; otherwise, the depth of the current CU plus 1, and jump Step 2 to execute; Step6: Obtain the optimal partition mode for the current CU; Step7: Complete the optimal partition for the LCU;

Experimental Results And Analysis
In order to verify the performance of o4ur proposed method above, some experiments are taken in this section. In our experiment, our designed experiments mainly includes the following two parts: build experimental environment and test experimental results under different testing conditions.

Build experimental environment
Our simulation environment was conducted on Inter(R) Core(TM) i7-5500 CPU @2.4. GHz and 4GB RAM, Windows 7 64-bit operating system. The test conditions and con gurations in our experiment are built based on the JCT-VC [25]. Eighteen different test sequences are used to evaluate the performance of our proposed method in our experiment. These selected eighteen test sequences come from six different class with different resolutions: Class A(2560 × 1600), Class B (1920 × 1080), Class C (832 × 480), Class D(416 × 240), Class E(1280 × 720) and Class F(416 × 240). The frame rate and frame number of these test sequences are set to 25 and 100 respectively. Table 4 is the parameters of 18 different test sequences. Other some general con guration parameters are listed in Table 5.   [27] and our proposed scheme are used to compared with the standard HM16.1 algorithm [25] in our experiment. Where Ha's scheme mainly used texture similarity of neighborhood CUs to quickly determine the optimal partition mode for the current CU; Huade's mainly used adaptive depth selection to quickly locate the optimal partition mode for the current CU; Chen's scheme mainly used depth range prediction and mode reduction to quickly determine the optimal partition mode for the current CU; while our proposed method mainly used coding bits technologies to quickly determine the optimal partition mode for the current CU. The experimental results are shown as follows.

Test experimental results
In this subsection, we mainly test the performance of BDBR, BDPSNR and TS in different testing sequences compared with the standard reference HM16.1 scheme. The related testing results are shown in table 6.

Discussion
Although this method can reduce lots of computational complexity with less BDBR increase and BDPSNR decline for HEVC compared with the standard reference of HM16.1, it is not suitable for some high real-time embedded devices. This is because these devices require lower power consumption. However, as a fast CU size decision optimal algorithm, our proposed uses the thresholds of coding bit to quickly determine the optimal partition mode for the current CU, which can reduce many unnecessary partition and prediction operations, thereby saving much computational complexity for HEVC and suiting for some general real-time embedded devices.

Conclusion And Future Work
In this paper, a fast CU size decision optimal algorithm based on coding bits is proposed for HEVC intra prediction. In our scheme, we rstly set the corresponding relationship between coding bits and partition size for the current CU based on the relationship among content complexity and its partition size and coding bits; We then use the thresholds of coding bit to quickly determine the optimal partition mode for the current CU, which can reduce many unnecessary partition and prediction operations, thereby saving much computational complexity for HEVC. The simulation results show that our proposed fast CU size decision algorithm in this paper can save about 34.67% computational complexity only at a cost of 0.61% BDBR increase and 0.043db BDPSNR decline for HEVC compared with the standard reference of HM16.1. We plan to continue our future research in the following two directions. Firstly, we will extend our proposed method to suit the processing for inter CU size prediction. Secondly, we will incorporate Bayes' theory and machine learning in our proposed to improve its whole performance for HEVC. Figure 1 Partitioning process for a LCU with 64×64 size Figure 2 Partition number of sub CUs in different texture complexity of CU in BasketballDrill test sequence Coding bits and number of its occurrences (CU32×32, QP =22)

Figure 9
Coding bits and number of its occurrences (CU64×64 QP =32) Figure 10 The realizing process of our proposed scheme Figure 11 Realizing ow of our proposed scheme    TS in six different test sequence