Optical flow correction network for repairing video

Some areas in the video may be defective due to storage medium damage or data transmission loss. The defective area can be repaired utilizing the optical flow since the optical flow represents the motion information of the pixel and contains the information redundancy properties of the video. However, it is a challenge for current algorithms to directly calculate the correct optical flow in the defective area. Therefore, an optical flow correction network (OFC-Net) is proposed to solve this problem. Firstly, as the basic framework, OFC-Net employs a lightweight U-shaped structure to gradually “repair” the broken optical flow map based on the known information in the defective neighborhood. And then, the activation function that conforms to the property of optical flow is used to accelerate the model’s convergence during training, while the simplified loss function is adopted to evaluate the similarity of the optical flow map from a global perspective to improve the prediction accuracy. As a result, the corrected optical flow output at the masked position can be obtained. Secondly, the optical flow corrected by OFC-Net can guide locating the pixels in nearby frames to repair the defective region. Optical flow correction and video repair experiments were conducted on MPI Sintel, DAVIS datasets, and real movie clips. The experimental results show that the corrected optical flow map is accurate without bringing artifacts under a wide range of scenarios and the arbitrary irregular mask shapes, which demonstrate the network’s strong generalization capabilities. The average PSNR and SSIM of repaired video frames are greater than 33.67 and 0.986, respectively, outperforming the results from the traditional method PatchMatch and deep learning-based synthetic pixel methods.


Introduction
Early videos, which were typically stored on physical media like film, may have been damaged due to improper medium storage, and pixel information in the current digital video may also be lost due to packet loss. Repairing these defective areas in the video are of great significance for preserving valuable historical films and improving video quality in daily life. A variety of different strategies have been developed for video repair, including pixel diffusion-based [1][2][3], search matching [4][5][6], deep learning-based image/video methods [7][8][9], etc. These techniques concentrate on repairing single-frame images, and wonderful results have been achieved. Some recent research results [10][11][12] that deep learning methods have excellent repair capabilities. Furthermore, video repair and related technologies can also be used for tasks such as eliminating watermarks [13], object removal [14], and movie special effects [15].
Video is characterized by a large number of frames, and the corresponding pixels between nearby frames have a strong correlation. It is difficult to maintain spatiotemporal consistency using single-frame repair, so the optical flow-based video repair attracts great attention because it guarantees spatiotemporal consistency by filling the corresponding pixel information of nearby frames in the defective area. At this point, the key problem is how to obtain accurate optical flow in the defective area. The main methods for extracting video optical flow are the sparse optical flow algorithm LK [16] and dense optical flow algorithm HS [17], which obtain the optical flow based on the principle that the motion vector of the local area is consistent, and the grayscale of the corresponding pixels in nearby frames is invariable. Since the destroyed region does not satisfy the grayscale invariant principle, such methods cannot extract the correct optical flow in the defective region directly. With the success of neural networks, some networks have been developed by many researchers to predict optical flow, such as FlowNet 2.0 [18], LiteFlowNet [19], and RAFT [20]. Although these methods improve the accuracy of optical flow and greatly accelerate the calculation process, like traditional algorithms, the correct optical flow of the defective areas of the video frame cannot be obtained directly.
To solve this problem, some researchers [21][22][23] recovered the optical flow of defective areas using FlowNet 2.0 and the bilinear interpolation method. This method is distinguished by its simplicity and efficiency and achieves high optical flow accuracy in the monotonic background but may cause blur in complicated areas. Using ResNet50 as the basic framework, Xu et al. [24] designed an optical flow recovery network composed of three subnetworks that synthesized the optical flow of defective frames from a global perspective and solved the problem of blurred contour caused by interpolation methods. The structure of the model is complicated and two nearby optical flow maps must be input simultaneously to correct the defective area's optical flow.
Because the optical flow map is smoother and more locally consistent, which makes it completely feasible to infer the accurate optical flow in the defective area from the known information about the neighborhood. Therefore, we proposed an optical flow correction network (OFC-Net) for recovering optical flow in the defective area based on a partial convolutional U-shaped structure [8,25], which is initially used for image repair and has excellent repair ability for irregular mask images. Compared to the existing model, a series of structural and functional modifications have been made on OFC-Net to make it suitable for the targets in this paper, so that the corrected optical flow map has a clear outline without blur. Then, we designed a video repair algorithm, which locates the pixels in the nearby frames for repairing the defective area guided by the corrected optical flow map and alternately updates the grayscale and optical flow in the defective area to gradually refine the repair results. In experiments, OFC-Net has been demonstrated the ability to correct anomalous optical flow in irregular areas and good generalization performance.
In summary, the main contributions of this paper are as follows: (1) The optical flow correction network. The model's basic framework is a lightweight U-shaped structure with input and output layers specifically created for the unique type of optical flow data. The optical flow information in abnormal areas are corrected based on the principle of the local consistency of optical flow map. (2) Specific activation and loss functions. The LeakyReLU function is employed to accelerate model convergence, and the LPIPS function is utilized to concentrate on the global similarity of the optical flow map, which enables the recovery of optical flow in complex regions. (3) Algorithm for iterative repair. Using the optical flow map obtained by OFC-Net as guidance, our algorithm directly extracts the pixels in the nearby frames to repair the defective area. The repair results are then steadily improved until a convergence condition is reached by alternately updating the optical flow and pixels in the defective area.

OFC network structure
The organizational structure of OFC-Net is depicted in Fig. 1. OFC-Net is made up of an input layer for optical flow data, 5 downsampling layers, 4 upsampling layers, and an output layer. In the input layer, independent x or y optical flow components are used as input to reduce the dimension of the data. That is, the x and y components can be corrected separately to obtain a completely corrected optical flow map. The purpose of the downsampling process is to predict optical flow in defective areas by capturing contextual information.
To ensure that the accurate prediction and the preservation of the original information, in each step of the upsampling process, the input data will be combined with the feature map from the corresponding downsampling process. The output layer is responsible for recovering the final corrected optical flow map.

Up and downsampling module
The particular up and downsampling modules are demonstrated in Fig. 2. The downsampling module consists of a partial convolutional (PConv) layer, a batch normalization (BN) layer, and a nonlinear activation function. Among them, the stride of the PConv layer is 2 to make the layer have the function of downsampling, and the convolution kernel size of the first two downsampling modules is 7*7 and 5*5, respectively, and the rest is 3*3. The optical flow map is downsampled as passing through the PConv layer, normalized in the BN layer, and then, nonlinearized by the LeakyReLU function. In comparison with the downsampling module, the kernel size of PConv layers is 3*3 in all upsampling modules. In addition, the upsampling module has an additional upsample layer to enlarge the optical flow map since the dimension of the data will not be changed in the PConv layer with a stride of 1. The upsampled data should be combined with the feature map from the corresponding downsampling process before being fed into the PConv layer. With a negative semi-axis of zero, ReLU is often adopted as an activation function by image repair networks; however, the optical flow data range is [− x, + x]( x is the maximum movement of pixels, usually between 0 and 10); ReLU does not respond to negative optical flow input and is therefore not suitable for OFC-Net. Thus, the LeakReLU is chosen as the activation function in OFC-Net so that the parameters of the model can be updated for negative inputs without causing the death of neurons.

Loss Function
In the image repair network, typical loss functions L1, L2, and MSE are often employed to improve image pixel-level accuracy. Additionally, perception functions such as VGG [26] and TV loss are frequently utilized by numerous networks to focus on global features. In order to balance the pixel accuracy, smoothness, and perception of the image, the combined loss function such as (L1 + VGG + Style + TV) is usually used, but it is more complex and thus, increases the model training time. In fact, compared to images with complex textures, optical flow maps, which represents the motion vectors of pixels in video frames, are quite smooth and relatively monotonous except for areas such as object boundaries, the complex loss functions are not appropriate for correcting optical flow maps. So, a kind of loss function, which can concentrate on the optical flow map's overall similarity and motion boundary details, is necessary for OFC-Net. Through the comparatives experiment of different types of loss functions, it has been found that the LPIPS [27] are more suitable for the OFC-Net and have a better optical flow correction effectiveness. The expression of LPIPS is as follows: where x and x 0 represent corrected optical flow maps and the ground truth, respectively;ŷ l hw andŷ l 0hw are the output of the two maps after the l th layer; H l and W l are the width and height of the l-layer maps; this LPIPS computes the average L2 difference across all layers.

Datasets and parameters
To improve the generalization ability and robustness of OFC-Net, the experimental data consist of 58,405 frames containing intricate scenes from the DAVIS dataset, MPI-Sintel dataset, and real movie clips. Following data augmentation such as random data slicing and flipping, the optical flow data extracted from the three types of video frames are taken as the training and test set. Our experiments were performed on the deep learning framework PyTorch with RTX A5000 24 GB graphics card, and the model parameters were optimized with the Adam optimizer. The model was trained in a total of 40 epochs with a batch size of 16 and an initial learning rate of 2 × 10 -4 decayed at a rate of 0.9 per 2 epochs.

Influence of activation and loss functions
To test the effect of the activation functions on the network, ReLU and LeakyReLU are used in the model training process, and the curve between the model's loss value and epoch is obtained as shown in Fig. 3. The network with LeakyReLU has a faster convergence, because the model parameters can be updated for negative inputs without causing neuron death.
To obtain the loss suitable for our task, various loss functions are adopted to perform optical flow correction experiments on the test set, and the experimental results are illustrated in Fig. 4. Optical flow in the background is accurately corrected by OFC-Net regardless of the loss; however, various corrected results will be yielded in complicated areas such as motion boundaries and contours using different losses in OFC-Net. The total loss is a complex function that combines multiple losses for image repair, although this kind of function has good performance in image repair, its correcting ability is not as good as that of a single loss in the task of optical flow correction and it takes a long time to train. The optical flow in complicated areas can be corrected by OFC-Net with L1, and MSE to maintain pixel-level accuracy but cause blur. In summary, OFC-Net with LPIPS can correct the optical flow in various complex scene areas with short model training time and better performance.

Optical flow correction comparison
To validate the performance of the OFC-Net, it is compared to the main methods such as interpolation and Deep Flow Completion Network (DFC) [24]. Different datasets and randomly shaped masks were used to test each method's generalization capability and optical flow correction ability in irregular defective areas. The experiential results are shown in Fig. 5, the interpolation method (b) inserts the optical flow of the spatial neighborhood directly into the defective area, resulting in smooth optical flow in the background but blurred contours at motion boundaries. This approach has the benefit of being straightforward and user-friendly, which just makes it ideal for correcting optical flow in limited defective areas. The DFC method (c) based on deep learning synthesizes the defective frame's optical flow according to the nearby frame's optical flow, which can correct a clear optical flow outline and compensate for the interpolation method's shortcomings. However, because this method primarily takes the optical flow's inter-frame information and ignores the spatial consistency in the defective area, thus, the corrected optical flow contains artifacts and obvious boundaries with the surroundings. The suggested method (d), which takes into account the benefits of the other two results, can provide a distinct outline while maintaining its consistency with the surroundings.

Video repair algorithm
We define the following variables to make the description easier, f 0 : frames to be repaired, I : the number of frames near f 0 available for repair, which is typically set to 2~4, O 0→i : the optical flow from frame f 0 to frame f i , where i is in the range [−I , I ], mask 0 : the area to be repaired in the frame f 0 , where values of 0 and 1 represent normal and defective pixels, respectively. The detailed iterative repair algorithm is shown in Algorithm 1. Firstly, the initial optical flow O 0→i from the defective frame f 0 to the frame f i is extracted with FlowNet 2.0. FlowNet 2.0. is a pre-trained network with high efficiency but it cannot correctly obtain optical flow in defective areas. So, the error optical flow in the mask 0 is corrected with OFC-Net.
Secondly, under the guidance of the corrected optical flow, we can find the pixels for the defective point p 0 in the frame f a and f b according to the forward and reverse optical flow O 0→a and O 0→b , respectively. The grayscale of the point p 0 can be updated as follows: where p 0 is the defective point in frame f 0 , and G 0 ( p 0 ) represents the grayscale at the position p 0 in frame f 0 ; G a and G b represent the grayscale of frame f a and frame f b , respectively. Finally, after the defective area is repaired, a more correct optical flow field can be calculated, and the correct optical flow field in turn can guide more accurate repair. The repair process converges as a result of such an iterative strategy of alternating updating the grayscale and optical flow in the defective area. So, the O 0→i can be re-extracted, and the second repair step is repeated until the defective area is satisfied with the convergence condition. The convergence expression is as follows: where M 0 is the total number of defective pixels in frame f 0 , and G 0 indicates the repaired results of the last iteration; ε represents the average grayscale change in the defective region during the repairing process. It is expected that the grayscale of the defective region will stabilize as the number of iterations increases, so the ε should be as small as possible. Let T ε be the corresponding threshold of ε, and the program will be terminated when the ε is less than T ε , or the number of iterations reaches the maximum value N .

Video repair experiments
In order to determine the threshold T ε and the maximum number of iterations N , we performed a large number of iterative repair experiments on DAVIS, MPI-Sintel dataset and real movie clips, and the variation of ε with the iterative numbers are shown in Fig. 6. It can be seen that the ε approaches around 0.1 when the number of iterations is greater than 3 under normal circumstances; thus, it is recommended to set T ε as 0.1 and N as 6.
In order to evaluate the repair ability of this algorithm, we compared it with several other algorithms for repair experiments. In Fig. 7, the initial defective frame is displayed in column a, where the first row is from DAVIS dataset, the second, third rows are from MPI-Sintel dataset, and the fourth, fifth rows are from old movie clips. The repair results obtained by PatchMatch, Liu's method [8], and the proposed method are depicted in columns b, c, and d, respectively. The repairing result of the PatchMatch method b preserves the texture of the defective area, but there are obvious artifacts at the boundary. Liu's method c uses the neighborhood information of the current frame to repair the defective area. When the difference between the defective area and the neighborhood is large, the repair effect is not ideal; for example, the defect at the hand in the image is not correctly repaired. The proposed method d acquires the pixels in nearby frames based on the corrected optical flow so that the defective region is not blurred and the spatiotemporal consistency of the inter-frame information is maintained. Damaged storage media can cause severe scratches, spots in old movies, and our algorithm has good repair effectiveness on the smooth background, face, limbs, and other areas in the defective areas.
The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are usually used to evaluate the local pixel details and global features of the reconstructed frame, respectively. The mean PSNR and SSIM for the repair results are calculated and shown in Table 1. From the table, it can be seen that Liu's method focuses on pixel details, while PatchMatch pays more attention to overall similarity. In contrast, the proposed method has the advantages of these two methods and the better PSNR and SSIM have been achieved.

Conclusion
This paper suggests a repair scheme based on the corrected optical flow to address the irregular random defect in the video. In order to correct the optical flow in the defective area, we designed a novel OFC network structure and adopted activation and loss functions suitable for the characteristics of optical flow to enhance its optical flow correction capability. In the process of video repair, the dependence between optical flow and inter-frame information is used to iteratively calculate the defective area, which improves the repair effect. According to the experimental results, the designed OFC-Net has higher optical flow accuracy and lower time consumption than the interpolation methods, and the iterative repair strategy maintains the spatiotemporal consistency of video frames without causing blur and demonstrates stable repair effectiveness for different scenes.
Author contributions Fujie Huang and Bin Luo jointly completed the idea, experiment, and writing of the paper. Fujie Huang wrote the main manuscript text and BinLuo reviewed it.
Data availability All data generated or analyzed during this study are included in this article.

Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.