Fast ISP coding mode optimization algorithm based on CU texture complexity for VVC

In lately published video coding standard Versatile Video Coding (VVC/ H.266), the intra sub-partitions (ISP) coding mode is proposed. It is efficient for frames with rich texture, but less efficient for frames that are very flat or constant. In this paper, by comparing and analyzing the rate distortion cost (RD-cost) of coding unit (CU) with different texture features for using and not using ISP(No-ISP) coding mode, it is observed that CUs with simple texture can skip ISP coding mode. Based on this observation, a fast ISP coding mode optimization algorithm based on CU texture complexity is proposed, which aims to determine whether a CU needs to use ISP coding mode in advance by calculating CU texture complexity, so as to reduce the computation complexity of ISP. The experimental results show that under All Intra (AI) configuration, the coding time can be reduced by 7%, while the BD rate only increase by 0.09%.


Introduction
Intra prediction has always been the main research field in video coding, which can take advantage of the spatial correlation of image to eliminate the spatial information redundancy and realize the compression of video data. In VVC, many new intra prediction techniques have been proposed, including mode-dependent intra smoothing (MDIS), cross-component linear model (CCLM), position-dependent intra prediction combination (PDPC), multiple reference line (MRL) intra prediction, intra subpartitions (ISP), matrix-weighted intra prediction (MIP), and so on [1]. Intra prediction technology plays an important role in video coding technology and has a great impact on the coding performance. It is a crucial task to optimize the intra prediction technique to improve coding efficiency.
In the process of intra prediction, the reference samples that can be used to create intra prediction signals are located only on the left and above of the current block. As the correlation between samples in the natural image decreases with the increasing of distance, the quality of the predicted samples near the bottom-right corner of the block will be worse than those close to the top-left boundary of block [2]. In order to solve this problem, an intra sub-partitions coding (ISP) mode has been proposed in VVC, which divides the luminance intra prediction blocks horizontally or vertically into 4 or 2 equal size sub-partitions, with each contains at least 16 samples. The minimum subpartition size and the maximum block size that can use ISP coding mode are 4 × 8 (or 8 × 4) and 64 × 64, respectively. If the size of a block is greater than 4 × 8 (or 8 × 4), the corresponding block is divided into 4 sub-partitions. If the size of block is equal to 4 × 8 (or 8 × 4), the corresponding block is divided into 2 sub-partitions [3]. Given an input W × H block, the size of the sub-partition will be W×(H/K) for horizontal split and (W/K) × H for vertical split, where K is the number of sub-partitions, and W and H represent the width and height of the block, respectively. As shown in Fig. 1 In VVC, the processing process of each sub-partition is similar to that of intra prediction block. At the encoder side, the residual signal is generated based on the prediction signal, and then the residual signal is transformed, quantized, and entropy encoded and sent to the decoder. At the decoder side, the residual signal is recovered after a series of steps including entropy decoding, inverse quantization, and inverse transformation, and the reconstructed samples can be obtained by adding the residual signal with the prediction signal. Once a sub-partition has been processed, its reconstructed samples can be used to calculate the prediction signal of the next sub-partition, and this step will repeat over and over until all sub-partitions have been encoded [4]. As shown in Fig. 2, the block has been split horizontally into four sub-partitions. The first one is predicted using the neighboring samples of the CU. Afterwards, its reconstructed samples can be used to predict the next sub-partition. The procedure continues until the four sub-partitions have been processed. The advantage of the ISP coding mode is that each sub-partitions can be predicted using neighboring samples located at the shortest possible distance; therefore, the coding efficiency is improved.
Although ISP coding mode can improve coding efficiency, it still has some deficiencies. In this paper, by analyzing the rate-distortion cost of CU with different texture characteristics in ISP coding mode, it is observed that the coding efficiency of CU with simple texture is not evidently improved in ISP coding mode, although extra computation cost has been spent. Based on this observation, a fast decision algorithm for ISP coding mode based on the CU texture complexity is proposed, which aims to determine whether a CU needs to use ISP coding mode in advance by calculating CU texture complexity, so as to evidently reduce the computation complexity of ISP while keeping the coding efficiency nearly no degrade. In addition, a method for calculating the CU texture complexity based on interval sampling is designed, and the decision threshold is optimized by using the distortion cost. The remainder of this paper is organized as follows. In Section 2, the related works of ISP and the video coding algorithm based on texture feature are presented. In Section 3, the existing problems of ISP coding mode are analyzed and a fast ISP coding mode optimization algorithm based on CU texture complexity is proposed. Experiment results are shown in Section 4. Finally, in Section 5, conclusions are drawn.

Related works
ISP coding mode is developed from the line-based intra prediction (LIP) coding mode, which was first presented at the Kth meeting of JVET [5]. The main idea of LIP is to divide the luma intra prediction block into one-dimensional line. Each block needs to be partitioned using the LIP coding mode. In the AI configuration, compared to the standard reference encoder, the BD rate can be reduced by 2.34% on average, while the encoder running time is increased to 293%. A 4 × 4 block can be divided into 4 (4 × 1) lines, which can cause throughput problems. All blocks are 1 × 4 (or 4 × 1) and this kind of division leads to a bad bit stream. If the number of resulting lines for a block is large (for example, 64 rows), the encoder needs to do a lot of operation and memory access while checking the necessary rate-distortion (RD) tests. Column sub-partition (1 × N) can be more difficult to implement because samples are allocated using raster scans, which makes memory access expensive. To solve the above problems, it was proposed at the Lth meeting of JVET that each block should be set into a certain number of partitions (each partition has at least 16 samples) and the final partition width should be at least 4 samples [6]. In AI configuration, the BD rate can be reduced by 1.01% on average, while the encoder running time becomes 148%, which successfully reduces the complexity of encoder. After this meeting, LIP was officially renamed as ISP. At the Mth meeting of JVET [7], the ISP algorithm was optimized to achieve a better balance between coding gain and encoder running time. The experimental results show that in the AI configuration, the BD rate can be reduced by 0.59% on average. At the same time, the coding running time becomes 112%. By analyzing the experimental results of these proposals, we have found that ISP coding mode can significantly improve the BD rate for coding blocks with rich texture, but it is inefficiency for coding blocks with simple texture, which inspired the ideas of this paper.
In recent years, many new tools have been developed and evidently improved the image processing performance [8][9][10][11]. However, considering the hardware cost and computation complexity, many of them are not applicable to video coding. Image texture characteristic is a lightweight tool and has been applied by many scholars to optimize video coding. Shen et al. [12] proposed an effective CU size decision algorithm, which utilizes the texture characteristics and coding information of adjacent CUs. Experimental results show that the algorithm can significantly reduce the intra coding time and achieve consistent acceleration of all kinds of video sequences. In order to solve the problem of high computational complexity of QTBT, Peng et al. [13] proposed a multiple classifier-based fast QTBT partitioning algorithm for intra coding. The feature acquisition of classifier is mainly based on texture complexity and direction complexity. The algorithm can reduce the intra coding time by 64.54%, and the reduction of RD performance is negligible. Liu et al. [14] proposed an adaptive fast CU size decision algorithm based on CU complexity classification for HEVC intra prediction by using machine learning technology and some image features are selected for training classifiers in the paper. Hou et al. [15] proposed a method based on texture complexity for CU partitioning, which uses the complexity of adjacent CU to judge the current CU texture features. On this basis, useless CU size can be filtered out to reduce the encoding time. A fast mode decision algorithm based on texture partition and direction in HEVC was proposed in [16], which includes two sub-algorithms: CTU depth range prediction (CDRP) and internal prediction mode selection (IPMS). Under the AI configuration, the coding time of the proposed overall algorithm is reduced by 60% on average. In [17], the problem of the CU size decision and mode selection in intra-coding of HEVE is solved by measuring the texture complexity and directional energy distribution of each CU, which accelerate the process of RDO. An effective algorithm based on homogeneity for reducing the complexity of HEVC intra coding was proposed in [18]. The scheme aims to terminate the CU partitioning of homogeneous regions in video frames in advance, or to skip the CU partitioning of complex texture regions. The decision result of split/non-split is based on a homogenous classification algorithm that avoids testing all depths to determine the optimal CU size. In [19], a new fast intra frame coding algorithm is presented. Based on the analysis of the mode parameters obtained in the previous frame, a new feature is proposed to measure the complexity of video content and the model is built according to the relationship between the feature and the depth range of the CU. Based on the model, unnecessary operations of CU partitioning are skipped. In [20], a complexity reduction algorithm based on hierarchical classification for HEVC inter coding is proposed. At the beginning of the algorithm, the intra and inter features describing texture and context properties of CUs are obtained from the training set, and then the classification model is generated by selecting features.
To the best of our knowledge, up to now, there is no related works that adopt texture characteristics of CUs to optimize the ISP coding mode in VVC, and the proposed solution in this paper is the first to be applied to optimize the ISP coding mode.

Proposed algorithm
The ISP coding mode can improve the intra prediction efficiency of VVC. However, according to our observations and analysis, when the texture of a CU is very flat or constant, the coding efficiency of ISP coding mode will degrade evidently. A fast ISP coding mode optimization scheme based on CU texture complexity is proposed in this section, which can determine whether a CU needs to use ISP coding mode based on the analysis of CU texture complexity in advance, so as to save coding time with negligible gain loss.

Observation and analysis
The key to improve the encoder performance is to select the best coding parameters. The search for the best coding parameter is traditionally performed in the rate distortion (RD) sense, which can be balanced between the number of bits used to encode an image block and the distortion generated by using these number of bits. In the rate distortion optimization process of intra prediction, RD-cost is an excellent metric to measure prediction performance. By comparing the RD-cost of different partition modes, the final partition mode will be determined. RD-cost function J is calculated as follow [21]: where D is the distortion, R is the number of bits required to transmit the residual coefficient and signaling parameter information, and λ is the Lagrange multiplier factor. In this paper, the RD-cost of CU in ISP coding mode and not using ISP (No-ISP) coding mode is calculated respectively. The sequences for statistics are BasketballDrill, RaceHorses, BasketballPass, BlowingBubbles, FourPeople, and Kimono. The RD-cost of CU in ISP coding mode is denoted as J ISP , and the RD-cost of CU in No-ISP coding mode is denoted as J No-ISP . As shown in Fig. 3, the percentage of CU where J ISP greater than J No-ISP is only 40%, which indicates that not all CUs are suitable for ISP coding mode. It is found in this paper that the CUs whose RD-cost in ISP coding mode is higher than that in No-ISP coding mode have similar characteristics, which is that most of them have smooth texture. Therefore, for CUs with simple texture, fairly good prediction performance can be obtained in No-ISP coding mode. Figure 4 shows the first frame of FourPeople sequence. It can be found that block A and block B have rich texture, while the textures of block C and D are very smooth. The comparison of RD-cost in ISP and No-ISP coding mode for these four blocks is shown in Fig. 5. It is obvious that the RD-cost in ISP coding mode for block A and block B is significantly lower than that in No-ISP coding mode. However, the RD-cost in ISP coding mode for block C and block D is slightly higher than that in No-ISP coding mode.
In summary, for some CUs with simple texture, better prediction results will be obtained in No-ISP coding mode and in ISP coding mode. Therefore, for this kind of CUs, we can skip the ISP coding mode in advance, to save coding time. In the next subsection, a method to measure the texture complexity of CU is presented to determine whether CU needs to use ISP coding mode or not.

Metric for measuring CU texture complexity
In order to determine a CU to be with simple texture or complex texture, a suitable metric is needed. MAD (mean absolute deviation) is the most representative metric, which is the average of the absolute deviation of a set of data. It can be used as an indicator to measure the texture complexity of CU. MAD is calculated as follows: where width and height are the width and height of CU, respectively, and P(i,j) is the luminance value for pixel at (i,j).
To further reduce the computational complexity, the calculation of MAD is simplified in this paper. Instead of calculating all pixel values, the interval sampling method is used to obtain pixel values. The sampling method is shown in Fig. 6. Odd points are sampled from odd rows of pixel points, and even points are sampled from even rows of pixel points. This sampling method can not only reduce the computation by half, but also accurately measure the texture complexity. The formula for calculation texture complexity (TC) used in this paper is shown in Formula (3).
where width and height are the width and height of CU, respectively, and P(i,j) is the luminance value of pixel at (i,j), and mean is the average of all sampled pixels. According to the value of TC, CUs will be classified into two categories: simple texture and complex texture. When TC is less than the given threshold ε, the current CU is classified as simple texture. When TC is greater than ε, the CU is classified as complex texture. ε is the threshold to judge the texture complexity of CUs.
Threshold ε plays a key role in the proposed algorithm. Coding efficiency can be improved by selecting the appropriate threshold. According to Section 3.1, for some CUs with simple texture, whose RD-cost in ISP coding mode is usually higher than that in No-ISP coding mode, are not suitable for using ISP coding mode. For some CUs with complex texture, whose RD-cost in ISP coding mode is usually lower than that in No-ISP coding mode, are suitable for using ISP coding mode. We take a statistical analysis for the CU luminance samples in the test video sequences, and TC of each CU is calculated to indicate the texture complexity. In the sequences used for statistics, the TC value of CU is divided into two groups. The first group is the CUs that are not suitable for using ISP coding mode. The second group is the CUs that are suitable for using ISP coding mode. The threshold value of the CU texture complexity is determined by the probability density of TC, which is shown in Fig. 7. As can be found from the figure, when the TC value is less than 20, the probability density of TC in the first group is always greater than that of in the second group. When TC is greater than 20, the probability density of TC in the first group is always smaller than that of in the second group. Therefore, in this paper, ε is set to 20. When the TC value of a CU is less than 20, it will be judged to skip ISP coding mode in advance during coding. In this paper, the texture complexity distribution of CUs whose RD-cost in ISP coding mode is greater than that of in No-ISP coding mode is calculated. As shown in Fig. 8, the proportion of CU with simple texture is 80%, which further proves the sensible of the proposed algorithm.

Flow of the proposed algorithm
In Section 3.1, we have drawn the conclusion that the performance of ISP coding mode is closely related to CU texture complexity and a CU with simple texture can obtain fairly good coding performance in No-ISP coding mode. In Section 3.2, a metric named TC is designed to measure the CU complexity texture, and the decision threshold is set to be 20. According to the TC values of all CUs, they are divided into two groups: simple texture and complex texture. If the current CU is classified to have simple texture, the ISP coding mode will not be tested. Otherwise, if the current CU is classified to have complex texture, ISP coding mode will be tested on it, using the original procedure.
In short, a decision step is attached to the original ISP algorithm. Before testing ISP, the TC value of current CU is calculated. If TC < 20, then the ISP flag is set to 0; no ISP coding mode is tested for this CU. If TC > 20, the algorithm will continue to use the original ISP coding mode.

Experimental results and discussion
In order to verify the performance of the proposed algorithm, the algorithm is implemented on the VVC reference software (VTM8.0). Since the proposed algorithm is used to accelerate intra coding, All Intra (AI) configuration is adopted and the QP value is set to be 22, 27, 32, and 37, respectively [22]. In our work, standard test sequences of HEVC are selected, which involve different scenes and different resolutions. Bjøntegaard Delta rate (BD-rate) with piece-wise cubic interpolation method is used to assess the coding performance of the proposed algorithm. Time saving (TS) is used to measure the reduction of computational complexity and is defined as: The performance comparison between the VTM8.0 and the proposed algorithm are shown in Table 1. It presents the average BD-rate of five classes of video sequences. In the AI configuration, the proposed algorithm can achieve about 7% coding time saving with negligible loss of coding efficiency. It can be also observed that a consistent gain is obtained over all sequences. The largest gain comes from the sequence "Basketball-Drill", with up to 11% time saving. Since "BasketballDrill" contains a large number of flat or constant blocks that are considered as simple texture, it can be compressed more efficiently by the proposed algorithm in this paper. Sequences with similar characteristics such as "BasketballDrive" and "Johnny" all show remarkable time saving. The proposed method does not perform well for sequences containing rich texture, such as "PartyScene" and "BQTerrace".
As one of the main optimization goals of VVC is to improve the coding efficiency of ultra-high definition/high definition (UHD/HD) videos, it can be noted that the proposed algorithm generally performs better on UHD/HD sequences than other sequences. This is due to that the proposed fast ISP coding mode optimization algorithm based on texture complexity has better performance on simple texture blocks and simple texture blocks often take a larger portion in UHD/HD sequences. It is a prominent advantage to favor high resolution sequences in future applications. Figure 9 illustrates the rate-distortion (RD) curves comparison results of our proposed algorithm compared with VTM8.0 for the BQMall (832 × 480), BlowingBubbles (832 × 480), FourPeople (1280 × 720), and BasketballDrive (1920 × 1080) sequences respectively, which includes the best case (BlowingBubbles) and the worst case (BasketballDrill) in terms with the RD performance. The results show that the proposed algorithm is superior to VTM8.0 for most sequences, either in low bitrate or in high bitrate configuration. Even in the worst case, the proposed algorithm and the original VTM reference encoder can obtain very similar image quality under different QPs.

Conclusion
A fast ISP coding mode optimization algorithm based on CU texture complexity is proposed in this paper to reduce the computation complexity of ISP. Firstly, it is observed in this paper that the CUs with simple texture are not suitable for ISP coding mode by comparing the RD-cost of CU with different texture complexity in ISP coding mode and No-ISP coding mode. Secondly, a metric to measure the CU texture complexity is proposed, which divide CUs into simple texture and complex texture. Finally, the ISP coding mode will be skipped for CUs with simple texture. The proposed method is tested on VVC reference software VTM8.0; the experimental results show that proposed algorithm can achieve about 7% coding time saving with negligible loss of coding efficiency.