Ant weight-lifting algorithm for motion estimation

Every video coding standard includes and requires motion estimation and compensation. The full search algorithm, which provides the best motion estimation, has a very high computation cost. Researchers have developed several algorithms to reduce the cost of computation. However, most of these algorithms become trapped in local minima during the search. Population-based evolutionary algorithms are widely used to develop a computationally efficient and cost-effective motion estimation strategy. The most recent effort used the Jaya algorithm to develop a motion estimation process that outperformed the state-of-the-art test zone search algorithm. In this study, a motion estimation algorithm based on the ant weight-lifting approach is proposed. Previously, the ant weight-lifting algorithm was used to solve a variety of problems, such as image segmentation, signal compression, and so on. The ant weight-lifting algorithm's computation cost was reduced by adopting a fitness estimation method that uses nearest-neighbor interpolation and an early termination strategy. Compared to Jaya algorithm-based motion estimation, the proposed algorithm executes up to 3% more quickly and exhibits up to 1.2 dB less distortion.


Introduction
In the modern world, video is the most widely used multimedia format. Along with the traditional high-definition B Suvojit Acharjee acharjeesuvo@gmail.com 1 Department of ETCE, Jadavpur University, Kolkata 700032, India video content, which has grown exponentially, recent years have seen a huge increase in non-traditional video content like 360°videos and computer-generated video sequences. In 2020, the joint video expert team standardized versatile video coding (VVC) or H.266 video coding for video storing and video transmission [1] to include such diverse characteristics of video in the encoding process. VVC was created to provide a uniform video coding standard for these various video sequence types while simultaneously enhancing the efficiency of video coding over its predecessor, the highefficiency video coding standard, or H.265 video coding [2].
Motion estimation is an essential and computationally costly component of every video coding standard that reduces the temporal redundancy between frames. Any video sequence can be thought of as a few meaningful groups of pictures (GOP). As shown in Fig. 1, the first frame inside any GOP or I-frame is only intra-coded to reduce spatial redundancy. The other frames in the GOP use block-matching algorithms for motion estimation, which will reduce the temporal redundancy between frames. All of these frames inside a GOP except the I-frame are further divided into two categories. The predicted frame or P-frame is placed at a regular interval inside a GOP, and use the I-frame or previous Pframe as a reference frame for motion estimation. Except for the I-frame and P-frames, all frames in a GOP are bidirectional frames or B-frames which use the previous I-frame or P-frame and the post-P-frames as the reference frames for motion estimation.
The block-matching algorithms in earlier coding standards like H.263 [3] and H.264 [4] divide a frame into non-overlapping macroblocks and use these as basic blocks for motion estimation. It was assumed that the pixels inside the macroblock move uniformly and were in a different location in the previous frame. The pixels are put in their current position by motion. To find the best match for the block in the current frame, the full search algorithm places the search center in the same position in the reference frame as the macroblock in the current frame. All feasible macroblocks in the reference frame that are within d pixels of the search center were also taken into consideration. The full search technique uses brute force to analyze every conceivable combination. As a result, a full search delivers the best motion estimation at a huge computational cost. Several fast block motion estimation algorithms were developed to reduce the computational complexity of the full search. Fixed pattern-based motion estimation algorithms [5] were quite popular in earlier coding standards. These algorithms are based on the likelihood of monotonical error decrement as the search converges to global minima. But Chow et al. [6] proved that error can even monotonically decrease when it reaches local minima. With the introduction of the motion vector predictor, the fixed pattern-based search was replaced by prediction-based search. Adaptive rood pattern search [7], motion vector field adaptive search technique [8], predictive motion vector field adaptive search technique [9], and enhanced predictive zonal search [10] algorithms used predictors to find the initial search point and later used a fixed pattern-based search to find the optimal motion estimation. Basic macroblock-based frame partitioning was successful using prediction-based search. However, the introduction of complex tree-based frame partitioning in H.265 video coding [2] increased the computational cost of these methods.
The tree structure allowed flexibility in frame partition, where every frame was divided into several non-overlapping coding tree units, which are equivalent to macroblocks in previous standards. Coding tree units are further partitioned using a quad-tree partition into four coding units (CU). Each CU can be divided into multiple CUs or prediction units using a quad-tree or binary-tree partitioning. Similarly, a prediction unit can be divided further using a binary-tree partition. Each prediction unit in H.265 video coding can have an independent motion vector. Also, H.265 video coding introduced a median-based motion vector predictor to find the initial search point for motion estimation.
The VVC replaced the quad-tree plus binary-tree frame partitioning of H.265 video coding with a quad-tree plus multitype-tree frame partitioning. This increases the frame partitioning versatility even further. VVC also eliminated the difference between the CU and the prediction unit to reduce complications and replaced a median-based motion vector predictor with an adaptive motion vector predictor. VVC defined test zone search (TZS) [1] as the benchmarking algorithm to evaluate the performance of other motion vector estimation algorithms. The adaptive motion vector predictor is used to locate the initial search point of TZS, and then, zonal and raster searches are used to find the final motion vector. TZS is quicker than the full search [4] algorithm, but it still executes a lot of redundant searches, because TZS is a subsampled full search in its worst case.
This article tries to develop a fast motion estimation algorithm capable of skipping redundant searches and avoiding the local minima trap. The evolutionary algorithms are the best approach for avoiding local minima and reaching global minima. However, because of their massive computational burden, evolutionary algorithms are inherently slow. The addition of domain knowledge to determine the initial search point of the algorithms increases the speed of the evolutionary algorithm by many orders of magnitude [11,12]. Cuevas et al. [13] proposed such an evolutionary algorithm-based motion estimation process in which the initial search point is chosen from a predefined pattern rather than a random distribution. Furthermore, Cuevas et al. increased the pace of these algorithms by incorporating the nearest-neighbor interpolation to approximate the fitness of new solutions from previously calculated solutions. Despite these changes, the speed of evolutionary algorithm-based motion estimation processes is not optimal, because the use of random numbers in evolutionary algorithms is still very high. This includes the most advanced Jaya algorithm-based motion estimation (JABM) [14], differential evolution algorithm-based motion estimation [15], and so on. The proposed ant weight-lifting algorithm-based (AWL) motion estimation process reduces the use of random numbers. The proposed algorithm also incorporates all of the previously discussed modifications that increase the speed of evolutionary algorithms, such as the use of domain knowledge to determine the initial search point from a predefined fixed search pattern, the fitness approximation, and the algorithm's early termination criteria. The proposed approach is compared with state-of-the-art JABM using the most commonly used test video sequences obtained from the standard video repository [16], and the proposed approach reduced output distortion by up to 1.2 dB while increasing speed by up to 3%.
The remainder of the article is arranged as follows. The following section of this article will look at previous studies that used evolutionary algorithms to estimate motion. The AWL algorithm, the early termination strategy for the AWL algorithm, and the method of fitness approximation process are all described in more detail in the section that follows. In the section "Results and discussion", the performance of the proposed novel strategy was thoroughly examined in detail as well as compared with the full search algorithm and the JABM [14] process.

Previous works
Evolutionary algorithms are very effective at finding global optima in ambiguous search spaces such as motion estimation. Chow et al. [6] first introduced the genetic algorithmbased motion estimation process with the dynamic population control scheme to avoid premature convergence. Cuevas et al. proposed an evolutionary algorithm-based motion estimation process that uses fixed pattern-based initial population selection as well as NNI for fitness approximation to accelerate the algorithms. Pandian et al. devised an early termination strategy based on the position of the global best solution to reduce the redundant computation of particle swarm optimization [17].
Dixit et al. [15] proposed a differential evolution-based motion estimation algorithm where the normal differential algorithm was used for optimized motion estimation. The initial population was selected randomly. For mutation, the DE/best/1 strategy was used, in which the best solution from the current generation was combined with the difference between two randomly selected solutions from the current generation to generate a new solution for the future generation In Eq. 1, X n,t+1 represents the nth solution at (t + 1) iteration, X best,t , X rand1,t , and X rand2,t represent the best and two random solutions in t iteration, and F is the mutation probability. Also, an uniform crossover was performed between X n,t+1 , and X n,t to finalize the solution for future generations.
Parveena et al. [18] proposed a hardware-friendly motion estimation process based on a modified differential evolution algorithm. For the initial population, the algorithm employs a fixed pattern. However, the algorithm heavily relied on random numbers for mutation and crossover operations. Dash et al. [14] proposed the JABM, which includes the adaptive termination criteria, fitness approximation using NNI, and pattern-based initial population generation. The JABM moves the initial population away from the worst solution and toward the best solution using Eq. 2 In Eq. 2, X n,t+1 , X n,t represent the nth solution at (t + 1) and t iteration, X worst,t , X best,t represent the worst and best solution in t iteration, and r a , r b are two random numbers between {0,1}. Jaya algorithm excludes all algorithmicspecific parameters and only depends on general parameters of evolutionary algorithms, such as the maximum number of iterations and size of the initial population. This reduced the complications caused by incorrectly selecting values for algorithmic-specific parameters such as the mutation probability and crossover probability in differential evolution. The JABM outperforms the state-of-the-art TZS. Praveena et al. [19] proposed an FPGA design of JABM. This design will not affect the JABM's search speed.
Despite outperforming previous algorithms, the JABM has a few areas for improvement. The extensive usage of random numbers is one of the primary causes of evolutionary algorithms' sluggish execution speed. The JABM and differential evolution-based motion estimation algorithms are no exception. The use of pattern-based initial population generation reduced the usage of random numbers in JABM. Still, the use of the random numbers is very high, as Eq. 2 uses two random numbers during the movement of one solution. Though it is impossible to completely remove the use of random numbers in evolutionary algorithms, the proposed AWL-based motion estimation process limits the use of random numbers to increase the speed of motion estimation.

Zero-motion prejudgment
The zero-motion prejudgment is a method to distinguish between static and moving coding units by comparing the sum of absolute differences (SAD) between two coding units with a threshold. The static coding units assume a zeromotion vector and skip the calculation of motion estimation. This process was first proposed by Nie et al. [7] to remove the redundant computation in the motion estimation process. According to a survey by Cuevas et al., test video sequence with high motion like tennis has 27% static blocks. As a result, the zero-motion prejudgment reduces motion estimation time by 27% for the tennis test sequence. Nie et al. proposed the SAD threshold to differentiate between static and moving coding units as 512.

Ant food collection and carrying procedure
Ant has some remarkable social characteristics. In their society, everyone has their definitive roles. Queen ants lay eggs and worker ants carry out other kinds of work, which include building the colony, searching, and collecting food. In the process of collecting food, they often leave a pheromone trail to let other ants know about the food source.
Worker ants have surprising weight-carrying capability compared to their weight. This weight-carrying capability comes from the small surface area of their bodies. Worker ants do not even possess any reproduction organs to further increase their lifting ability. Researchers claimed that some species of ant can carry five thousand times more than their weight, and they are studying this surprising weight-carrying capability to mimic it in future robotic arms [20].

Ant weight-lifting algorithm
AWL was previously used to solve multiple optimization problems such as the 0/1 knapsack problem [21], biomedical signal compression [22], and image segmentation [23]. This algorithm was proposed after observing worker ants' intelligent and surprising abilities. In the algorithm, three rules are idealized to simulate ant behavior.
a. The ants that have gathered the least amount of food will move to a new location, because there is not enough food in their present location. b. The ant moves away from areas that do not have sufficient food. c. If an ant reaches the limit of its weight-carrying capacity, it will return to the colony and be replaced by a new ant in a different position.
The AWL algorithm was developed by taking into consideration the rules mentioned above. Figure 2 describes the flowchart of the process.
Step 1. Initialization of solution: To find food, n number of ants are initially placed in randomly chosen locations throughout the search area. The random selection of positions will ensure the uniform distribution of the ants across the search space, which will result in an unbiased exploration of the search space.
a i k = random position on search space, Here, k = 1 as this is the first iteration, a i k = the position of the ith number of ants, i = 1 to n, and n is the size of the initial population size.
Step 2. Calculation of collected weight: The fitness value of a solution is considered to be the collection of food by an ant. An ant's gathered weight for the current iteration is the ant's normalized collection of food value in comparison to the total collection of all ants in that iteration where W i k is the collected weight of the ith ant in the kth iteration and f i is the fitness of the ith solution.
Step 3. Rank the performance of the ants: Rank the ants based on the collected weight (W i k ). The ant with the highest food supply will be at the top of the performance table.
Step 4. Updating the position of the best-performing ant: The ant with the highest food must remain in the same position or close to it, because it has an unrestricted food source. This will intensify the search for the best position where a best k is the position of the best-performing ant in the kth iteration and a best k+1 is the new position of that ant in the (k + 1) iteration, r i is the random number between -1 to 1. d is a small distance defined based on the problem. If the new position has better fitness, the ant will move there; otherwise, the ant will remain in its previous position.
Step 5. Updating the position of other ants: Other ants in the performance table will move away from the position of the worst-performing ant throughout all iterations by a random factor.
where a i k and a i k+1 are the ith ant in the k and (k + 1) iteration, i = best, r i is a random number ranging between -1 to 1, and a worst all is the globally worstperforming ant. The position of the ant will only Step 6. Cumulative weight collection: The total cumulative weight collected by an ant is the sum of all weights collected over all iterations CumulativeWeight i is the total cumulative weight collection by the ith ant overall. W i j is the collection of the ith ant in the kth iteration. The Cumula-tiveWeight i is reset to zero when the ith ant change position.
Step 7. Cumulative weight-carrying capacity full: Any ant whose collected cumulative weight (Cumu-lativeWeight i ) has reached its maximum weightcarrying capacity leaves the search space to deposit the collected weight in the repository. The repository is a compilation of all the ants' positions that have inspected a given region of search space. The ant will be replaced by a new one in a random position. This method will assist the algorithm in avoiding the local optimum solution and getting closer to the global optimum solution.
Step 8. Final local search: If the position of n of the highest collecting ants in the repository is within a very small enclosed circle with radius d (defined in step 4), then limit the search area within that circle. The centroid of the n highest collecting ants in the repository will be the center of the circle. Randomly place n number of ants in that closed space and look for the best performance. That best-performing ant among the newly positioned n ant is the final solution. Terminate the algorithm.
Step 9. Termination: The algorithm will increase the iteration counter and repeat the process from step two until the iteration counter reaches the maximum number of iterations or the solution has converged. The position of the best-performing ant among all the generations is the final solution.

Early termination strategy
The traditional evolutionary algorithms only terminate after converging to global minima or the maximum number of iterations was completed. However, some problems, such as the motion estimation process, necessitate the use of fast algorithms. As a result, the early termination strategy established several criteria for evolutionary algorithms to terminate once a specific goal was met. This step aids in reducing the number of computations in evolutionary algorithms. The proposed AWL-based motion search proposed one novel termination criterion and adopted one such criterion from Pandian et al.
a. According to Pandian et al. [17], the algorithm will terminate when the global best solution and the current position of the coding unit are the same. b. Another termination strategy is the termination threshold (T t ). It is one-third of the threshold for zero-motion prejudgment. If the SAD difference between the global best solution and the current coding unit is less than the terminating threshold at any step, the algorithm will terminate.

Fitness approximation
The most computationally expensive process in any evolutionary algorithm is the fitness evaluation of candidate solutions. Cuevas et al. [13] suggested nearest-neighbor interpolation (NNI) to approximate the fitness of a solution from the fitness of previously calculated solutions. In this way, NNI fitness approximation greatly reduces the computation time of evolutionary algorithms by replacing the mathematical operation with a small number of comparisons, as mentioned in the steps below, but increases the memory complexity of the algorithm as the process preserves the calculated fitness value of the solutions for future reference.
Step 1. The fitness function will be used to evaluate a new solution if the distance between the new solution and the global best solution is smaller than distance threshold d (Fig. 3a).
Step 2. The fitness function will also be used to evaluate a new solution if the new solution does not have a previously evaluated solution within d distance (Fig. 3b).
Step 3. The fitness value will be approximated if the new solution is closer than distance d to any previously explored solution except the global best (Fig. 3c).

AWL-based motion estimation
The steps of the AWL algorithm-based motion estimation are described below. Figure 4 shows the flowchart of the process.
Step 1. Zero-motion prejudgment: The proposed AWLbased motion estimation algorithm initially differentiates between static and moving coding units using zero-motion prejudgment. The SAD between Fig. 3 a New solution is closer to the best solution than distance threshold d, so fitness function will be used. b New solution does not have earlier evaluated solution within distance d, fitness function will be used.
c New solution within distance d to the previously evaluated solution, fitness approximated using NNI Fig. 4 Flowchart of AWL-based motion estimation algorithm the coding unit in the present frame and the reference frame is compared with a zero-motion prejudgment threshold. Static coding units are assigned a zero-motion vector. The moving coding unit continues with the motion estimation process.
Step 2. Initial position of the ants: The AWL-based motion estimation process selects the initial position of the ant from a predefined square pattern centered around the search center, as shown in Fig. 5. The square pattern also helps to explore each side of the search center with equal probability. Cuevas et al. observed in a survey that the motion vectors of the maximum coding units are concentrated within a very small radius of the search center; therefore, the algorithm replaces the random selection of the initial position with a predefined square pattern. This search pattern helps the algorithm converge more quicker.
Step 3. Evaluate the fitness: The SAD between the current coding unit and the coding unit at the position of the ants is the fitness of the solutions. Wherever possible, the fitness evaluation will use the NNI to approximate the fitness value.
Step 4. Rank the solutions: Eq. 3 is used to calculate the collected weight of a single ant in the current iteration. The ants are arranged based on their collected weight in the present iteration. Furthermore, the weights collected by an ant in the current iteration are added up in their cumulative collection using Eq. 6.
Step 5. Early termination: The AWL algorithm-based motion estimation process will terminate early if the SAD between the current coding unit and the coding unit at the position of the best solution in the current iteration is less than the terminating threshold. The proposed algorithm also terminates early when the position of the best solution is the same as the current block position. Step 6. Generation of a new position for the ants: The most efficient ant will use Eq. 4 to look around for a better position. Other ants will relocate to the new position using Eq. 5.
Step 7. Acceptance of new position: The ants will only accept new locations for subsequent iterations if the new location is more fit than the previous one. Whenever an ant changes positions, the cumulative weight for the ant is reset to zero.
Step 8. Maximum weight-carrying capacity reached: Once the cumulative weight of any ant reaches its maximum capacity, the ant leaves the search space to deposit the collected weight in the repository. A new ant in a random position replaces the old ant.
Step 9. Termination: If the algorithm's iteration counter reaches the maximum number of iterations, the algorithm terminates. The best global solution at the time of termination is the final solution. Otherwise, increase the iteration counter and repeat the process from step 3.

Dataset and benchmarking algorithms
The performance of the suggested technique is compared to that of the full search algorithm and state-of-the-art JABM [14] using some of the most often used test sequences obtained from the standard video library [16]. More details about the test sequences can be obtained from Table 1. Also, Fig. 6 depicts a sample frame from each video series. The goal of employing test sequences with such diverse resolution and motion characteristics is to demonstrate the effectiveness and robustness of the suggested method. The full search algorithm provides the best motion estimation at the highest computation cost. Therefore, as is the case

Crowd Run
Old Town Cross Park Joy Tennis Foreman Fig. 6 Sample frame of test sequences with any motion estimation process, the output of the proposed algorithm is compared to the output of the full search to quantify the degradation in output quality and the improvement in algorithm execution speed. On the other hand, the JABM outperformed the TZS, VVC's benchmarking algorithm. As a consequence, JABM is selected as the benchmark among the latest motion estimation techniques.

Initial condition for experiments
The suggested technique, as well as other benchmarking algorithms in this article, divides the frame into nonoverlapping coding tree units and CU. The coding tree units are 128 × 128 pixels in size. Coding tree units are further partitioned into CUs using a quad-tree partition. The partition of a CU is allowed until the height and width of the CU are 8 pixels. A CU having a size equal to or smaller than 32 × 32 pixels can also be partitioned using ternary and binarytree partitions. The search window is 15 pixels away from the current search window.
In the proposed algorithm, the maximum weight-carrying capacity of an ant is set to 50, while the maximum number of iterations is chosen as 100. Multiple tests with varied parameter values are used to determine the maximum weightcarrying capability of an ant and the maximum iteration. The maximum iteration for JABM is also set to 100. Table 2 lists out the initial parameter for the block-matching algorithms.

Performance measured by output image quality
The proposed algorithm was evaluated based on three objective performance parameters: such as peak signal-to-noise ratio (PSNR) [24], structure similarity index (SSIM) [24], and average search count. SSIM and PSNR evaluate the similarity between the reconstructed and the actual frame. PSNR compares the variance in intensity between the recovered frame and the original frame and can be calculated using Eq. 7. The PSNR will be higher when the distortion between the actual frame and the reconstructed frame is small. The structure similarity index searches for resemblance in structure between the original and the reconstructed frame and can be calculated using Eq. 8. It will be one when two frames are identical and zero for the opposite SSIM = (2×mean 1× mean 2 + A1)(2V 12 + A2) mean 1 and mean 2 are the mean intensities of the actual and reconstructed image, and V1 and V2 are the variances of the actual and reconstructed image. A1 and A2 are constants that provides stability to the equation when the denominator is zero. The comparison of the PSNR and SSIM performance in Tables 3, 4 and Fig. 7 shows that the suggested technique decreases the distortion between the original and reconstructed frames better. As a consequence, the suggested method outperforms the JABM in terms of PSNR and SSIM. The performance of the suggested algorithm is unaffected by the complexity of the motion characteristics or the resolution of the test video sequences.

Performance measured by search speed
The search speed of the block-matching algorithm is defined as the average number of searches required to estimate the motion of a coding unit. A large number of searches leads to slow execution and longer operations. The full search, like any other brute force method, requires the highest number of searches and is the slowest among all block-matching algorithms. All block motion estimation techniques are designed to provide the same degree of accuracy as the full search while running at a faster pace. Therefore, other block-matching algorithms use the speed of the full search method as a reference in Eq. 9 to calculate the speed improvement ratio (SIR). When the search count of another method is low, the ratio will attain a higher value     Table 5 and Fig. 8, it is evident that the proposed algorithm achieved a 94-97% faster execution speed than the full search algorithm, as well as up to 3% higher execution speed than the JABM. The improvement in algorithm execution speed is mostly due to the adaptation of certain modifications, such as pattern-based initial population selection, an early termination approach, and zero-motion prejudgment. Zero-motion prejudgment allows the computation for stationary blocks to be skipped, and fixed pattern-based initial population selection enables the algorithm to converge more quicker. Another key reason for the proposed algorithm's 3% quicker execution than the JABM is the addition of a termination threshold. The termination threshold is a trade-off between output quality and speed. High termination thresholds will increase the distortion between the original frame and the reconstructed frame, whereas low thresholds will reduce the speed of the algorithm. A balance is maintained between output quality and speed by selecting the value of the threshold as one-third of the zero-motion prejudgement threshold. Despite the proposed algorithm's quicker execution than the JABM, the PSNR between the original frame and the reconstructed frame from the proposed algorithm's estimated motion obtained a 1.2 dB gain over the JABM.

Conclusion
This article proposed a novel motion estimation process based on the AWL algorithm. The suggested approach substituted the AWL algorithm's random placement of the initial ants with a fixed pattern-based location selection of the initial ants for faster convergence. Furthermore, this study laid out the adaptive termination criteria for the AWL algorithmbased motion estimation process. In addition, the suggested approach employed NNI to approximate the fitness of new solutions. These changes to the AWL algorithm reduced the overall computation in the motion estimate procedure. As a consequence, the suggested method ran 94% to 97% faster than the full search algorithm and up to 3% faster than JABM. The computational efficiency of the proposed algorithm was better than the JABM, as the proposed algorithm only used one random number during the movement of the solution, whereas the JABM uses two random numbers. The proposed algorithm performed similarly well with test sequences having diverse motion characteristics.
The AWL algorithm-based motion estimation process outperformed the benchmark algorithm in terms of PSNR and algorithm execution speed. However, the suggested method had numerous development horizons. It is important to compare the efficacy of the proposed method with various geometric shapes as an initial population selection pattern. Furthermore, the AWL algorithm-based motion estimation procedure could be extended to include more complex affine motion calculations.
Author contributions SA has carried out the research work. Prof. SS Chaudhuri has supervised the research work. Both of them has reviewed the manuscript.

Funding
The authors did not receive support from any organization for the submitted work.

Data availability
The test video sequences used during the current study are available in the "YUV Dataset: Xiph.org Video Test Media [Derf's Collection]" repository, https://media.xiph.org/video/derf.

Declarations
Conflict of interest All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethical approval Not applicable as no studies on human or animal have been performed.