Frame Duplication and Insertion Forgery Detection in Surveillance Videos Using Optical Flow and Texture Features

doi:10.21203/rs.3.rs-3100153/v1

Surveillance cameras are widely used to provide protection and security through online tracking or investigation of stored videos of an incident. Furthermore, footage of recorded videos may be used as strong evidence in the courts of law or insurance companies, but their authenticity cannot be taken for granted. Two common video inter-frame forgery types are frame duplication (FD) and frame insertion (FI). Several techniques exist in the literature to deal with them by analyzing the abnormalities caused by these operations. However, they have limited applicability, poor generalization, and high computational complexity. To tackle these issues, we propose a robust hybrid forensic system based on the idea that FD or FI causes motion inconsistency at the start and end of duplicated/inserted frames. These inconsistencies, when analyzed in an appropriate manner, help reveal the evidence of forgery. The system encompasses two forensic techniques. The first is a novel method based on the texture of motion residual component where a motion residual-based local binary pattern histogram (MR-LBPH) and an SVM classifier with the linear kernel are employed to detect suspected tampered positions. The second component is the sum consistency of optical flow (OF) and standard deviation of MR-LBPH of consecutive frames to remove false positives and precise localization of forgery. By taking the intersection of the frames detected by the two methods, we remove the false positives and get the frames bounding the duplicated/inserted region. The methods were trained and tested on our developed large Video Tampering Evaluation Dataset (VTED) and cross-validated on publicly available datasets. Cross-dataset evaluation yielded detection accuracy above 99.5%, ensuring the proposed method’s generalization; it also precisely locates the locations of tampering. As the public datasets used for cross-dataset validation include videos of different formats and frame rates, it ensures the wide applicability of the method. Moreover, the method is computationally efficient and can be run in a matter of microseconds.

Inter-frame forgery

Motion residual

Local binary pattern

Optical flow

Frame duplication detection

Frame insertion detection

Nowadays, validating the authenticity and integrity of multimedia content including audio, video, and graphics, appearing on social media has become a big challenge for investigating agencies, scientists, and researchers. Editing video content can be easily accomplished by video editing tools such as Adobe Premier Pro, Video Edit Magic, Adobe After Effects, and Movie Maker. Sometimes video editing aims to improve the quality of video content. However, sometimes the alterations are not so innocent, and the video cannot be used as a primary source of evidence since the video contents do not provide a certifiable record of an event. In such matters where videos captured by CCTV, smartphones, or digital cameras constitute potential evidence, it becomes necessary to verify that their contents are genuine and provide an unmodified representation of reality. This problem requires a robust forgery detection system to determine whether the video is altered. In addition, it is imperative to identify the exact location of a forgery. The techniques to detect a forgery in videos can be classified as active and passive techniques. Passive techniques are further divided into inter-frame and intra-frame forgery detection techniques. Both techniques manipulate video content but differ according to their attacked domains. Intra-frame forgeries are applied in either spatial or spatio-temporal domains that image forensics algorithms can identify. Common intra-frame forgeries are copy-move and region splicing [1]. In the temporal domain, the sequence of frames is affected. It includes frame insertion (FI), frame deletion, and frame duplication (FD). Among them, frame duplication is common for extending or hiding a specific scene. For example, in a surveillance video, frame duplication can be performed to hide an individual leaving a building at a specific time. It is easy to perform FD but difficult to detect since frames are copied and pasted into the same video at another temporal location. In frame insertion, frames from some other video are introduced to create a false notion or add a fake event. Forgery detection techniques find significant utility in various domains, including the identification of criminal activities, the prevention of crimes, the verification of digital document authenticity, Social impression, Investigation sector and Religious faith [2, 3]. Wrong information can be generated to show that the criminal was at a different location than the crime scene [4]. If such a video is a part of a criminal investigation, it can mislead the investigators without proper video forensic tools [5, 6]. Frame duplication and insertion are illustrated in Fig. 1.

Various techniques have been proposed for the detection of inter-frame forgeries, which are based on statistical features [7–9], pixel and texture characteristics [10–12], motion residual and OF [13, 14], and deep learning [5, 15, 16]. It is revealed from the thorough analysis of the literature that optical flow (OF) [14, 17, 18] and the prediction residual (PR) [19–21] are the two most commonly utilized features for inter-frame forgery detection. These features are easy to compute and produce good results. Although various solutions to frame duplication and insertion detection have been proposed, they still face three main challenges. The first one is limited applicability; several forgery detection techniques have restrictions on videos like video format, the number of tampered frames, and frame rate, which limit their practical applicability [22, 23]. For example the method in [24] cannot detect frame duplication of more than 20 frames, and method in [25] can detect insertion and duplication if inserted/duplicated frames are multiple of 10 but it is unable to detect if tampered frames are less than 25 frames. The second one is a poor generalization; standard datasets are required to evaluate the performance of tampering detection algorithms [26]. Many researchers have developed their datasets [13, 14, 27] to conduct experiments to detect inter-frame forgeries, but these datasets are not publicly available. In addition, these datasets are small in size. Mostly cross-validation has not been performed to ensure the generalization capability of existing methods [11, 22] due to unavailability of datasets. The third one is forgery localization; where the method in [28] detects a deletion forgery in the center (between frames 8 and 9) of a single video shot.

The forth one is the high computational complexity; most earlier techniques are computationally intensive due to pixel-based [29, 30] or spatial/temporal-correlation-based methods [31–33]. It is time-consuming to analyze if a video has high resolution and/or a large number of frames. Additionally, existing image forgery detection systems may not be reliable in identifying forgeries like frame replication within a video [34]. Because of these challenges, there is a need for a detection scheme for frame-based forgery that fulfills three basic requirements: strong applicability, high generalization capability, and high accuracy with good robustness.

In this paper, we propose a robust hybrid system based on the idea that FD or FI causes motion inconsistency at the start and end of duplicated/inserted frames. The system comprises of two forensic techniques. These techniques utilize various features like motion residual-based local binary pattern histograms (MR-LBPH), sum consistency of optical flow (OF) and Std of MR-LBPH of consecutive frames to detect insertion and duplication tampering. The key insight use of different forensic features in both techniques is to eliminate the ambiguities that may occur when we depend on the information provided by just one feature. The presence of spikes in MR-LBPH and the graph of OF sum indicates the presence of forgery, and the location of these spikes helps localize the forgery.

The major contributions of the proposed work are as follows:

We proposed a novel hybrid algorithm for the detection and localization of FD and FI forgeries in surveillance videos. The algorithm shows high detection accuracy and can determine the exact location of forgery by solving the problem of a high false positive rate.
The method was trained and tested on our developed large Video Tampering Evaluation Dataset (VTED) and cross-validated on publicly available datasets. High detection accuracy on VTED, as well as on other publicly available datasets, strongly validates the method’s generalization (in previous studies, no cross-dataset validation has been done so far). It has a high generalization capability.
The algorithm does not impose any restrictions on video formats, the type of capturing device, frame rate, or the number of tampered frames. As the public datasets used for cross-dataset validation contain a variety of videos, including Event Object Person (EOP) based tampering, it ensures the wide applicability of the proposed method. Moreover, the proposed technique does not enforce any constraint on the minimum number of frames to be duplicated/inserted in a video to make the forgery detectable; it can detect duplication and insertion of as few as ten frames.
We introduced a novel method based on texture of motion residual where MR-LBPH features performed major contribution to detect frame duplication and insertion forgeries with average accuracies of 99.71% and 99.87% respectively.
The method is computationally efficient and can be run in a matter of microseconds.

The rest of the paper is organized as follows. The existing literature on frame duplication and insertion forgery detection has been discussed in Section 2. The proposed technique and experiments along with results are presented in Sections 3 and 4, respectively. Finally, the conclusions are presented in Section 5.

In the era of digital forensics, the video tampering detection problem is still in its primitive stages. It is suffering from a dearth of developing robust techniques for detecting and localization video tampering [35, 36]. Frame duplication and insertion attacks against video surveillance systems have been a threat for a long time, especially in videos taken from a stationary surveillance camera. Many researchers have developed various techniques that exploited both temporal and spatial correlations of overlapping sub-sequence clips, and the similarity was checked to detect the duplicated frames [11, 37, 38]. All these similarity detection techniques look up in the stored surveillance recording database; hence, they require much computation time to process each video frame. When frames are inserted or duplicated in a video, it disturbs the consistency of object motion at the start and end points of inserted/duplicated frames. Optical flow [14, 17, 18], prediction residual [19–21], standard deviation of residual frames [37], bag-of-words (BoW) model [39], correlation [7, 10, 40], motion residual [41], and noise residue [42] based features have been used in the literature to detect these inconsistencies in videos. Inter-frame forgery detection techniques can be divided into different categories like (i) methods based on statistical features, (ii) methods based on a frequency domain, (iii) methods based on residual and optical flow, (iv) methods based on pixels and texture features, and (v) methods based on deep learning. Different researchers have exploited statistical features for the detection of inter-frame forgeries like Consistency of Correlation Coefficients of Gray Values (CCCoGV) by Wang et al. [8] and Triangular Polarity Feature Classification (TPFC) by Huang et al. [43]. The algorithms of this class are based on statistical features and have a feature vector of a small length, but they are unable to detect a forgery in the presence of different types of compressions. Discrete Cosine Transformation (DCT), Fast Fourier Transformation (FFT) and Discrete Wavelet Transformation (DWT) are widely used to transform into frequency domain before feature extraction. The algorithms based on frequency domain features are proposed by Jaiswal et al. [49], Huang et al. [46], and Wang et al. [50]. These features are simple, and the length of the feature vector is small. However, these algorithms are hardware-dependent because the noise is used as a clue for the detection of forgery. The prediction residual, motion residual and/or optical flow-based features are found useful by Shanableh et al. [51], Chao et al. [52], Feng et al. [53], Kingra et al. [18] and Jia et al. [14] to detect tampering in a video. The algorithms of this class are also simple with a small feature length but they are not able to work on different compression rates. During the tampering process, the texture of the video frame is also disturbed. Several authors used texture features to detect tampering in a video, e.g., Zhang et al. [10], Liao and Huang [54], Zhao et al. [45], Bakas et al. [55], and Kharat et al. [12]. The techniques in this category produce suitable results; however, the complexity of these methods is high because of the high dimensional features. Motivated by the success of deep learning, Johnston et al. [16] developed a technique using CNN for tampering detection, and Fadl et al. [25] used a pre-trained 2D-CNN model for spatio-temporal feature extraction. Then structural similarity index (SSIM) is applied to get deep learning features of a whole video. Vinolin et al in [56] proposed deep learning-based strategy for video forgery detection in the 3D lighting scenarios. Techniques based on deep learning are data-driven (i.e., requiring a large volume of data) and can automatically learn high-dimensional features required to detect tampering in the video. Many investigators have performed experiments on synthetically doctored videos. Similarly, many temporal tampering detection techniques work well on a selected set of videos but fail to achieve such performances on other unknown video datasets. A qualitative study of all reviewed methods on frame duplication and insertion forgery detection, along with their advantages, disadvantages, and outcomes, are presented in Table 1.

Table 1

Analysis of frame duplication and insertion detection techniques
Sr. No.	Author, Year	Method	Dataset	Results				Merits/Demerits
Sr. No.	Author, Year	Method	Dataset	P (%)	R (%)	DA (%)	Other
1	Ulutas et al. [44] 2017	• Mirror invariant binary feature extraction technique is proposed to detect frame duplication and mirroring. Binary features are used to determine the similarity between frames. PSNR of the candidate frames is used to eliminate false candidates.	• 10 videos • Duplication • Mirroring	99.98 100	99.30 97.34	99.35 98.20	-	• Efficient and has lower computational time. • Dataset is very small. • No cross-dataset validation
2	Kingra Aggarwal et al. [18] 2017	• Prediction residual (PR) and optical flow (OF) gradient is used to identify fame-based tampering in videos that are encoded by MPEG-2 and H.264 standards.	• DIC Punjab University Group1: • Ins/del/dup Group2: • Ins/del/dup • Group3: • Ins/del/dup	-	-	80/83/75 92/83/88 100/96/100	Localization Accuracy: 80% (for all)	• Performance decreases when this technique is applied to videos sequence having high illumination. • Accuracy of localization is Poor • No evaluation is done on unknown dataset
3	Ulutas et al. [39] 2018	• A novel approach is adopted to detect frame duplication in a video. BoW model was used to generate visual words and build a dictionary from SIFT key-points of frames. Then Hierarchical k-means is adapted to generate a large vocabulary tree for quantization. By using a frequency of visual words, BOW features are created to detect frame duplication.	• 31 videos • Duplication (stationary cam) • Duplication (moving cam)	97.94 98.57	97.65 99.13	96.73 98.17	-	• Computationally efficient • Dataset is small. • Detect one type of forgery.
4	Zhao, Wang et al. [45] 2018	• Compare the similarity between H-S, and S-V histograms of every frame. • Used SURF feature extraction along with FLANN matching for confirmation of tampering.	• 10 test shots • Insertion • Deletion • Duplication and Localization	98.07 (for all)	100 (for all)	99.01 (for all)	-	• Can’t work for videos having scene changing in shot. • Poor Localization • Dataset is very small • No cross-dataset validation
5	Huang, Zhang et al. [46] 2018	• Audio forensics detection: by wavelet packet decomposition • Perceptual hash: Frame level Features are extracted by it. • Quaternion DCT feature is used for fine detection.	• 115 videos from SULFA and OV • 124 self-recorded • Deletion • Insertion	0.9876 (for all)	0.9847 (for all)	-	-	• Audio file is required with videos • Poor localization • No evaluation is done on unknown dataset
6	Jia, Xu et al. [14] 2018	• Optical flow (OF) sum consistency is used to find suspected tampered points. Fine detection is done by a correlation between frames. • Determine the location of forgery by the proposed algorithm.	• VTL:55 • SULFA:36 • DERF:24 videos • Duplication • Computation time:1.623us/pixel	0.985	0.985	-	-	• Unable to detect tampered videos with largely static scenes. • Poor localization • No evaluation is done on unknown dataset
7	Fadl et al. [47] 2018	• Energy difference b/w frames is computed to find anomalous points. • SNR, Spatial and Temporal energy is used for the detection of forgeries.	• 120 videos from SULFA, 28 from and 3 from IVY Lab • Duplication • Insertion • Deletion	0.97 0.99 0.97	0.99 0.99 0.95	-	F1: 0.98 0.99 0.96	• Can’t detect forgery if frames are deleted from static scenes. • Localization of forgery has not been made. • No evaluation is done on unknown dataset
	Bakas et al. [24] 2018	• A deep learning based digital forensic technique using 3D-CNN is proposed for the detection of inter-frame video forgery. • A difference layer in the CNN is introduced which mainly targets extracting the temporal information from the videos and helps in the detection of inter-frame forgery.	• 9000 videos taken from UCF101 • Duplication • Insertion • Deletion	-	-	97% (average)	-	• Achieves low accuracy rates under duplication forgery because it can detect the duplication of less than 20 frames in the same single video shot, but if there is a duplication of more than 20 frames then this method cannot detect it. • No evaluation is done on unknown dataset
8	Long, Basharat et al. [5] 2019	• I3D network: to find candidate duplicate sequences at the course level. • Siamese network (Resnet152) is used to confirm duplication at the frame level. • Duplicated frames are distinguished from original frames by an inconsistency detector using I3D.	• Media Forensics Challenge dataset (MFC18) • VIRAT: 12 videos • IPhone-4 videos: 17 • Frame Duplication and localization	-	-	-	AUC:• • 84.05 (VIRAT) 81.46 (IPhone)	• Detect one type of forgery • Poor localization • No evaluation is done on unknown dataset
9	Fadl, Sondos, et al. [11] 2020	• Features are extracted by the Temporal Average (TP) of each shot • Edge change Ratio (ECR): I/P video sequence is divided into small clips according to ECR. • GLCM: Statistical textural features are extracted for each TP image. • Lexicographical sorting of feature vectors in Matrix is done. Then similarity b/w adjacent vectors is used to determine duplication.	• 51 taken from SULFA, LASIESTA and IVY Lab • Duplication • Duplication with Shuffling • Execution Time: 0.011s per frame	0.99 0.95	0.98 0.98	-	-	• Localization of forgery has not been made. • No evaluation is done on unknown dataset
10	Kharat et al. [12] 2020	• The proposed algorithm is comprised of two steps, • First, the suspicious frames are identified, from the test video using a motion vector. • SIFT key points are used as a feature for comparison. Finally, the Random Sample Consensus algorithm is used to locate duplicate frames.	• 20 videos from Youtube Movies • Duplication	99.9	99.7	• 99.8	-	• A small dataset is used to test the performance of the model. • No cross-dataset validation has been performed
11	Fadl et al. [25] 2021	• An inter-frame forgeries detection system is proposed using pre-trained 2D-CNN of spatiotemporal information and fusion for deep automatically feature extraction. • Gaussian RBF multi-class support vector machine (RBF-MSVM) is used for the classification process.	• 13135 videos from VIRAT, SULFA, LASIESTA, and IVY LAB. • Insertion • Deletion • Duplication	-	-	99.9 98.7 98.5	-	• No cross-dataset validation has been performed
12	Alsakar et al. [22] 2021	• A video forgery detection scheme is developed based on representing highly correlated video data with a low computational complexity third-order tensor tube-fiber mode. • Frame insertion and deletion are detected by using an arbitrary number of core tensors. This tensor data is orthogonally transformed to achieve more data reductions and to provide good features to trace forgery.	• 18 videos taken from TRACE library • Insertion • Deletion	96 92	94 90	-	F1: 95 91	• The detection of frame duplication forgery has not been addressed • No cross-dataset validation has been performed
13	Panchal et al. [48] 2023	• First, the input videos are categorized as static or dynamic through a key frame extraction algorithm. • In 2nd step, different sets of video quality assessment attributes are chosen for static and dynamic videos using the forward selection method to improve accuracy. • Finally, multiple linear regression is used to identify outliers among the selected attributes, which determines whether the video is an original, a single tampered, or a multiple tampered frame deletion video	• 80 original and 50 tampered videos are taken from SULFA, VTD, and UCF-101 and TDTVD • Deletion			96.25%		• The detection of frame duplication and insertion forgeries have not been addressed • No cross-dataset validation has been performed

Statistical features are most widely used to detect/localize inter-frame forgery but suffers from computational overhead due to the complex correlation calculations. Motion features such as OF is considered most suitable for forgery detection but the performance of detection techniques may affect due to the speed and background of the video. Less human interaction is needed in machine learning based techniques but they need huge dataset along with more computational power. Only one type of features is not enough to determine all types of forgeries. Furthermore, the algorithms presented by different researchers work well on their developed datasets. To assess the performance of developed techniques, it is required to test them on the datasets freely available for the public. Unfortunately, this is challenging because these datasets are not available for other community/researchers. Tampered video datasets lag far behind tampered image datasets in terms of maturity.

First, we formulate the problem and then present the proposed method in detail.

3.1. Problem Formulation

We are given a surveillance video V Є R^r×c×t consisted of t frames (i.e., t represents the time axis), each of resolution r×c. It is required to detect whether the video is tampered with by duplicating or inserting frames. If the video is found to have been tampered with, then locate duplicated/inserted frames. First, using the contextual and textural information of each frame, we detect the suspected frames. This problem is formulated as a classification problem. Further, we use the sum consistency of the optical flow of each frame to detect the potential frames bounding the duplicated/inserted region. We model it as a thresholding problem. By taking the intersection of the frames detected by the two methods, we remove the false positives and get the frames bounding the duplicated/inserted region. Based on this formulation, the detail of the proposed method is shown in Fig. 2, and the detail is given in the following sections.

3.2. Proposed Detection Scheme

When a video undergoes frame duplication or frame insertion, it gives rise to motion inconsistency at the start and end points of duplication/insertion in a video. The key idea is to determine these inconsistencies by using some suitable technique. The proposed system encompasses two forensic techniques. The first is a novel method based on texture of motion residual component where MR-LBPH values of all frames is calculated to identify suspicious frames, treated as candidate frames (candidate frames are the frames that have undergone a duplication/insertion attack). For instance, MR computes the extent of change in the position of the object between consecutive frames. Since our LBP descriptor is based on a histogram of texture changes in motion residual of frames which are inserted. It is expected for an original frame to have a similar profile to other original frames, and a different profile than tampered ones. The rapid deviation in texture of MR indicates the presence of forgery. MR-LBPH plays significant role to identify suspected tampered frames. Detail is given in Section 3.4. MR of two consecutive frames in a video is calculated by (1). The second component is the sum consistency of optical flow (OF) and standard deviation of MR-LBPH of consecutive frames to remove false positives and precise localization of forgery. By taking the intersection of the frames detected by the two methods, we remove the false positives and get the frames bounding the duplicated/inserted region. The key insight use of different forensic features in both methods is to eliminate the ambiguities that may occur when we depend on the information provided by just one feature. Another major advantage of using multiple features is robustness; if one of the selected forensic features is not able to detect the absence or presence of a tampering footprint, we rely on the evidence provided by the other feature.

$$MR\left(x,y\right)=\sum _{x=1}^{width} \sum _{y=1}^{height}\left[ \right.{F}_{i}\left(x,y\right)-{F}_{i-1}(x,y)\left. \right] \left(1\right)$$

where (x, y) is the position of a pixel in all frames of video.

3.3. Preprocessing

In pre-processing step, a video is converted into frames and each frame is then converted to grayscale. Gaussian filter is a specialized filter known for its noise-reducing and smoothing capabilities; also meets the real-time video processing requirements [57–59]. It was observed through experiments that a 2D-Gaussian filter of size 3x3 does not suppress the noise well and 7x7 does over smoothing so a 2D-Gaussian filter of 5x5 is applied for smoothing in the proposed method.

3.4. Feature Extraction

The following features are extracted for the proposed method:

3.4.1. Motion Residual Based Local Binary Pattern Histograms (MR-LBPH)

Original videos exhibit strong consistency in the temporal domain. Temporally adjacent video shots have similar visual and semantic contents [60], which ensures that features extracted from adjacent video shots are also similar. Therefore, we exploit MR-LBPH values of adjacent video frames. The algorithm for feature extraction is proposed as follows.

For a video sequence with N frames, we firstly extract all individual frames and compute the MR of every two adjacent frames i and i + 1 (i = 1, 2, 3, ..., N- 1). Local binary pattern (LBP) is applied to MR and the distribution of LBPs is estimated as a histogram to estimate the scene change. LBP method is simple, has low computational complexity and rotational invariance [61–63]. It motivates us to use LBP as a feature to detect frame-based tampering. MR-LBPH show high peaks at the start and end point of duplicated or inserted frames, leading to anomalies as shown in Fig. 3. For each video, two histograms of tampered positions i.e., at the start and end points of frame duplication/insertion and two histograms as representatives from the original frames are selected. K-medoid clustering [64] algorithm is used to select representative histograms of original frames. In this way, we have four representative frames per video.

These MR-LBPH features are used to train an SVM with a linear kernel to detect suspected tampered frames. MR-LBPH provides a quantitative measurement to extract abnormal points. For the detection of each type of forgery, 225 tampered videos are used to train the model. In the testing phase, MR-LBPH values of every two adjacent frames are passed to the trained model to determine whether it is a suspected tampered position that leads to sudden spikes or not. In both types of tampering, the suspected tampered positions may be the start or end points of the duplicated/inserted frame sequence.

3.4.2. Optical Flow Sum Consistency

Detection on MR-LBPH based features help to detect suspected tampered positions, but to remove false positives and precise localization of forgery needs more detailed features to identify. OF acts as a suitable second feature since brightness variations are comparatively more consistent in fast-motion videos. In this method, two steps are proposed; OF extraction and OF sum as given by Jia et al. in [14]. They used OF sum consistency to detect suspicious tampered points in frame duplication only. Then OF correlation is used to match the duplicated frame pairs by setting some threshold values. Correlation based approaches are computationally intensive and time-consuming [65, 66]. We employed the OF sum consistency for precise location of forgery by reducing false positives. In a video with N frames, for the i^th frame, the absolute values of OX_i and OY_i at each pixel (m,n) are added with (2) as the OF sum and obtained a sum sequence consisting of N − 1 values. Due to regularity and continuity in the motion in videos, the OF sum sequence will be relatively consistent, showing no prominent spikes in the sequence. But frame duplication and insertion forgeries will disturb this consistency because frames are inserted in the same video at some other temporal location in duplication forgery. In insertion forgery, frames are selected from other video and introduced to create false notion. It brings larger differences between adjacent frames, therefore, leading to anomalies in the OF sum sequence. For the i^th frame, we determine whether it is a tampered position that exhibits sudden spike. The mean value of OF sums of its adjacent 2T frames is computed by (3). To keep the complexity optimal, four (with T = 2) adjacent neighboring frames of i^th frame are taken to compute mean value and rate of change of OF sum with respect to its neighboring frames. Rate of change is described by fluctuation extent β_i. An example of the fluctuation extent β_i of a frame duplication and insertion tampering in a video is shown in Fig. 4. In an original video, small fluctuations in β_i are due to the movement of objects. However, the spikes are dominant at the start and end points of the duplicated and inserted frames because the consistency of the OF sums is destroyed by duplication/insertion forgery. These abnormal spikes can be detected to locate the tampered positions.

$$Sum\_O{F}_{i}= \sum _{m=1}^{width} \sum _{n=1}^{height}\left(\left|O{X}_{i}\left(m, n\right)\right|+\left|O{Y}_{i}\left(m,n\right)\right|\right), i=1, 2, \dots , N-1 \left(2\right)$$

$$\stackrel{-}{ Sum\_O{F}_{i}}= \frac{1}{2T}\sum _{k=1}^{T}(Su{m\_OF}_{i-k}+Su{m\_OF}_{i+k}) \left(3\right)$$

where T is the window size for determining the number of adjacent frames. The rate of change β_i is defined to describe the fluctuation extent of the i^th frame and is measured by (4).

$${ \beta }_{i}= Sum\_O{F}_{i}/\stackrel{-}{Sum\_O{F}_{i}} \left(4\right)$$

A larger value of β_i represents an abnormal spike in Sum_OF sequence. N-frames with the highest values of β_i are selected and their intersection with suspected tampered frames identifies by MR-LBPH technique is taken. The final outputs of the proposed algorithm are frames bounding the duplicated/inserted region i.e., the start and end points of tampered frame sequences. This detection/localization process is carried out with different values of N and the best results were found with N = 4.

3.4.3. Standard Deviation (Std)

We have presented another technique for detection of video forgery. The key idea is to exploit the variation in statistical properties of MR-LBPH values of the entire frame, which is accomplished by computing standard Std. Std is used to quantify the amount of variation in a set of values. N-frames with the highest values of Std are selected and their intersection with the suspected tampered frames identifies by MR-LBPH technique is taken. The final outputs of the proposed algorithm are frames bounding the duplicated/inserted region. The process of detection of tampered position is carried out with different values of N and the best results were found with N = 4.

3.5. Hybrid Approach

If a video has fast moving objects, obviously there is a big displacement in the position of moving objects in successive frames, which means the MR in these fast motion videos varies rapidly in spite of the absence of any tampering. Rapid change in MR give rise to change in MR-LBPH. Such videos may be mistakenly declared as tampered by the forgery detection system which is entirely dependent on MR. This issue can be overcome by using another feature, and then the combination of results generated by both features can be considered to conclude the presence or absence of tampering. OF computes the brightness variation among consecutive frames of a video, can be used as a suitable second feature because brightness variations are comparatively more uniform in fast motion videos. Lucas-Kanade Optical Flow, proposed by Lucas and Kanade [67], is a widely used algorithm for the extraction of OF vectors due to its simple application, rapid computation, and robustness under noise [68]. When a video undergoes frame duplication or frame insertion, the OF patterns show remarkable abnormalities and these abnormalities may not be visible by the naked eye but can be detected by suitable techniques. An example is presented in Fig. 5. Figure 5a represents the result if only MR-LBPH are used as a feature to detect frame duplication tampering. The histogram peaks for frame 59, 60, 61, 92 and 102 of a video “CCTV Moving Vehicles” from our VTED (Video Tampering Evaluation Dataset) is presented. It indicates that frame-61 can be declared as a tampered point due to high peaks in MR-LBPH, but these high peaks are the result of fast-moving objects while the actual tampering positions are frame-92 and 102. In such cases, OF-based features overcome this problem. Figure 5b represents OF-based fluctuation extent β-values (explained in Section 3.6) of the same video, the low value of β at frame-61 (shown by the red circle) declared it as a normal frame while high peaks of β at frame-92 and 102 represent correct tampering positions.

The vice versa example is shown in Fig. 6. Figure 6a represents the value of β for a video “CCTV Bikes in Fog” taken from VTED. Due to high peaks of β-values, frame-31, 126, 129, may wrongly be declared as tampered frame whereas frame-110 is the actual tampered point having a low value of β. In such a situation MR-LBPH features play a significant role to overcome this problem as shown in Fig. 6b where high peaks are observed only for frame-110. So, the proposed forensic system utilized both features that’s why it is more suitable for the detection of tampering in slow as well as fast motion videos.

Numerous experiments have been performed on videos that have undergone frame duplication and frame insertion forgeries. The results of the experiments are summarized in this section. The experiments are carried out on a notebook computer with RTX2070, 32GB RAM, core i7 processor running Python 3.8.12. Tests are performed on a set of forged videos created by Adobe Premier Pro 2021, a freely available video editor.

4.1. Dataset and Evaluation Protocols

Due to the unavailability of large-scale video datasets for frame duplication and insertion detection in surveillance videos [69], We have created VTED (Video Tampering Evaluation Dataset) as done in [18, 69–71]. The videos are captured by multi-cameras of different models with moving and static views, in different light conditions like morning, evening, night, and fog. VTED covers frames duplication, deletion, insertion, copy-move, and splicing. The frame rate of videos in the dataset varies from 12 to 30fps, and the duration is 5.648-11.679s. Furthermore, videos in VTED depict natural scenes after the tampering process and are in common formats of avi, mp4, or mov. Figure 7 shows the original and tampered video from VTED. We conducted extensive experiments on a number of videos captured by the static cameras. We have varied the number of duplicated/inserted frames for every test video, ranging from 10 to 50 frames in the step of 5. Total 450 forged videos are used for these experiments, out of which 225 videos have gone through in the step of 5. Total 450 forged videos are used for these experiments, out of which 225 videos have gone through frame duplication forgery and 225 videos have frame insertion forgery. The detail is shown in Table 2.

For evaluation, we employed commonly used evaluation metrics: Precision Rate (PR), Recall Rate (RR), and Detection Accuracy (DA). PR, RR, and DA are defined in (5) similar to [12, 39, 44], where TP, TN, FP and FN represent ‘authentic is detected as authentic’, ‘forged is detected as forged’, ‘authentic is detected as forged’ and ‘forged is detected as authentic’, respectively.

$$PR= \frac{TP}{TP+FP}, RR= \frac{TP}{TP+FN}, DA= \frac{TP+TN}{TP+TN+FP+FN}. \left(5\right)$$

The total number of detections is represented by (TP + TN), and (TP + TN + FP + FN) shows the total number of frames in the experiment. The higher value of DA demonstrates a good detection rate by the proposed method. To evaluate and compare the performance of frame insertion detection, four indices are considered: PR, RR, DA and F1 score as given by [22, 25] and defined in (5) and (6), but here TP is the number of forged frames that are detected as forged, TN is the number of original frames that are detected as original, FP is the number of false positive which means the number of original frames that are detected as forged and FN is the number of forged frames that are detected as original.

$$F1 =\frac{2 x Precision x Recall}{Precision+Recall} \left(6\right)$$

Table 2. Detail of datasets used in literature for frame duplication/insertion detection

Reference	Total videos used in the experiment	Test videos	Number of frames duplicated	Resolution	Frame rate	Scenario
Ulutas et al. 2018 [39]	31	5	20, 30, 40, 50, 55, 60, 70, 80	320x240	29.97, 30	-
Jia et al. 2018 [14]	115	-	10, 20, 40	320x240	29.97, 30	-
Ulutas et al. 2017 [44]	10	10	20, 30, 40, 50, 55, 60, 70, 80	320x240	29.97, 30	-
VTED	225 (FD) + 225 (FI)	45 (FD) + 45 (FI)	10, 15, 20, 25, 30, 35, 40, 45, 50	640x360, 640x480, 1920x1080, 1280x720,	12.50, 15, 25, 29.97, 30	Morning, Evening, Night, and Fog

4.2. Evaluation of Frame Duplication Forgery Detection

The proposed method is compared with similar forgery detection methods reported in the literature according to the detection and localization capability. The method presented by Ulutas et al. in [44] attains high detection accuracy on a very small dataset comprised of only 10 videos taken from SULFA. BoW model presented in [39] for the detection of frame duplication forgery gives low accuracy and the model is tested on 5 videos only. The method proposed by Bakas et al. in [24] achieves low accuracy rates under duplication forgery because it can detect the duplication of fewer than 20 frames in the same single video shot, but if there is duplication of more than 20 frames then this method cannot detect it. The detection approach for frame duplication forgery proposed by Jia et al. [14] achieves high accuracy and low computational time but unable to detect tampered videos with largely static scene. In our proposed method, tests are performed on a set of 45 forged videos for each type of forgery. The test dataset is comprising of different file formats, frame rates, resolutions, and scenarios (Morning, Evening, Night, and Fog). The results in Table 3 show the effectiveness of the proposed method when a fusion of features (MR-LBPH with OF) is used to detect frame duplication forgery instead of using a single feature i.e., MR-LBPH or OF. A comparative analysis of the proposed technique with some notable and contemporary forgery detection techniques have also been presented in Table 3. RR of our Std-based method is higher than that of [12, 14, 44], and [39]. RR and DA of our OF-based method are 99.9% and 99.71% respectively which are higher than that of [14, 39, 44] and [12] as indicated in Table 3. PR of our OF-based proposed method is higher than [14] and [39] but a bit less than, [44] and [12]. The reason for low PR may be due to the evaluation of our OF-based method on a large test dataset of 45 videos, whereas [39, 44] and [12] exploited only 10 or less than 10 videos for testing the performance of their models. There is a 3% increase in DA of our proposed OF-based model as compared to BoW model of [39]. Results show that our proposed OF-based model can detect forged sequences with high PR, RR and DA.

4.2.1. Cross Dataset Evaluation

For cross dataset evaluation, we exploited datasets acquired from different sources and are not part of training and testing (see section 4.1.). Both of our techniques are evaluated on publicly available test-set videos of Ulutas et al. 2018 [39] and EOP (event-object-person) based tampered dataset of Panchal et al. 2020 [72] to make a fair comparison. The detection results for inter-frame duplication forgery are presented in Table 3. High values of PR, RR and DA on unknown datasets show that our proposed methods achieve strong generalization capability. More than 99.5% detection accuracy on our dataset as well as on publicly available datasets with a variety of videos validate its strong applicability. Furthermore, the cross-dataset validation results are significantly higher as compared to the results on our test dataset. It may be due to the presence of a diversity of videos in VTED like different frame rates, multiple file formats, videos are captured by multi-cameras of different models with moving and static views, in different light conditions like morning, evening, night, and fog. The detail of VTED is presented in Section 4.1. In previous studies, no cross-dataset validation has been done so far. PR, RR and DA of our Std-based model on Ulutas dataset are 99.67%, 99.92%, and 99.59% respectively. Whereas our proposed OF-based method gives PR, RR and DA as 99.83%, 99.92%, and 99.75% respectively. This high PR, RR and DA on publicly available datasets of Ulutas et al. 2018 [39] and Panchal et al. 2020 [72] showed that our proposed method not only can detect frame duplication forgery from videos captured by static/moving cameras with zoom-in, zoom-out settings but it can also detect event based duplicated sequences.

Execution time per frame and per pixel is measured for 45 test videos and the results are given in Table 4. The average execution time of the proposed method per frame and pixel is used for comparison with similar works in the literature [14, 39, 44]. The execution time for the Std-based proposed method is 3.3 µSec per pixel and OF-based proposed method is 3.6 µSec per pixel. Although the execution time of the proposed methods is not as much smaller as compared to [14, 39, 44], but the DA of proposed OF-based method is highest not only on our test dataset but also on unknown datasets i.e., Ulutas Dataset 2018 [39] and Panchal Dataset 2020 [72] that validate the strong applicability, high generalization and good robustness.

Table 3

Evaluation of proposed models for frame duplication forgery detection
Evaluation						Cross Dataset Evaluation (Our models are trained on 180 videos from VTED)
	Training Dataset	Test Dataset	PR (%)	RR (%)	DA (%)	Other Dataset as Test Set	PR (%)	RR (%)	DA (%)
Ulutas et al. 2018 [39]	26	05	97.94	97.65	96.73	Cross-dataset evaluation was not performed
Jia et al. 2018 [14]	115	-	98.5	98.5	98	Cross-dataset evaluation was not performed
Ulutas et al. 2017 [44]	10	10	99.98	99.3	99.35	Cross-dataset evaluation was not performed
Kharat et al. 2020 [12]	10	10	99.94	99.71	99.70	Cross-dataset evaluation was not performed
With MR-LBPH based features only	VTED (180 videos)	45	95.92	99.9	95.93	-
With OF based features only	VTED (180 videos)	45	99.84	99.84	99.69	-
Proposed (MR-LBPH with Std)	VTED (180 videos)	45	99.23	99.89	99.1	Ulutas Dataset 2018 [39]	99.67	99.92	99.59
Proposed (MR-LBPH with Std)	VTED (180 videos)	45	99.23	99.89	99.1	Panchal Dataset 2020 [72]	99.61	99.95	99.57
Proposed (MR-LBPH with OF Sum)	VTED (180 videos)	45	99.81	99.9	99.71	Ulutas Dataset 2018 [39]	99.83	99.92	99.75
Proposed (MR-LBPH with OF Sum)	VTED (180 videos)	45	99.81	99.9	99.71	Panchal Dataset 2020 [72]	99.75	99.89	99.64

Table 4

Comparison of computational efficiency
Methods in	Time per Frame (Sec)	Time per Pixel (µSec)
Ulutas et al. 2018 [39]	0.2	2.6
Ulutas et al. 2017 [44]	0.01	-
Jia et al. 2018 [14]	-	1.623
Proposed (MR-LBPH with Std)	3.6	3.3
Proposed (MR-LBPH with OF)	3.8	3.6

Table 5

Evaluation of proposed models for frame insertion forgery detection
							Cross Dataset Evaluation (Our models are trained on 180 videos from VTED)
	Training Dataset	Test Dataset	PR (%)	RR (%)	F1 (%)	DA (%)	Other Dataset as Test Set	PR (%)	RR (%)	F1 (%)	DA (%)
Alsakar et al. 2021 [22] (HARRIS features)	-	18	77	54	63	-	Cross-dataset evaluation was not performed
Alsakar et al. 2021 [22] (GLCM features)	-	18	84	56	67	-	Cross-dataset evaluation was not performed
Alsakar et al. 2021 [22] (SVD features)	-	18	96	94	95	-	Cross-dataset evaluation was not performed
With MR-LBPH based features only	VTED (180 videos)	45	25.35	100	40.45	97.29	-
With OF based features only	VTED (180 videos)	45	82.22	82.22	82.22	99.67	-
Proposed (MR-LBPH with Std)	VTED (180 videos)	45	73.78	100	84.91	99.67	Panchal Dataset 2020 [72]	93.75	100	96.77	99.97
Proposed (MR-LBPH with OF Sum)	VTED (180 videos)	45	91.4	94.4	92.88	99.87	Panchal Dataset 2020 [72]	100	94.64	97.25	99.98

Table 6

Comparison of computational efficiency
Methods in	Time per Frame (Sec)	Time per Pixel (µSec)
Alsakar et al. 2021 [22]	10.75	-
Proposed (MR-LBPH with Std)	3.39	3.1
Proposed (MR-LBPH with OF)	3.7	3.4

4.3. Evaluation of Frame Insertion Forgery Detection

To evaluate the performance of the proposed method, we have used 80% of our dataset as a training set and 20% as a testing set. The proposed method has been compared with similar state-of-the-art methods [22] based on Harris, Gray Level Co-occurrence Matrix (GLCM), and Singular Value Decomposition (SVD) features. They used a dataset of 18 video clips having the same frame rate and same file format, taken from TRACE to conduct experiments. Table 5 shows and compares the PR, RR, DA and F1 scores of our proposed technique with recent techniques suggested in the literature for the detection and localization of frame insertion forgery. The effectiveness of using hybrid features over a single feature is also presented in Table 5. Precision rates when only OF or MR-LBH based features are employed independently are 82.22 and 25.35 respectively. These very low values of PR indicate a high false positive rate. These high false positives are reduced by the proposed OF based method. Precision, recall, detection accuracy and F1 of proposed MR-LBH with Std-based method are 73.78%, 100%, 99.67%, and 84.91% respectively whereas for MR-LBH with OF-based method are 91.4%, 94.4%, 99.87% and 92.88% respectively. RR of the proposed Std-based method is much dominating than [22] which affirmed that all forged points are detected and less value of PR shows a bit increase in false positives rate. The proposed scheme shows a significant enhancement when applying the OF-based feature extraction method for tampering detection. Recall rate up to 94.4% and DA up to 99.87% for frame insertion forgery reflects the robustness of OF-based proposed method. It points out the superiority of the proposed technique (fusion of MR-LBPH with OF Sum) for detection and localization of frame insertion forgery and it has the best results in terms of precision, recall and F1 score.

4.3.1. Cross Dataset Evaluation

Alsakar et al. [22] focused on the analysis and identification of frame insertion and deletion forgery in videos based on low computational complexity third-order tensor representation. Precision, Recall and F1 for detection of frame insertion forgery by the state of the art are 96%, 94% and 95% respectively. But no cross-validation has been performed on unknown datasets. The robustness of the proposed method is also tested on a publicly available EOP (Event-object-person) based tampering dataset developed by Panchal et al. 2020 [72]. PR, RR, F1 and DA of proposed Std-based method on EOP dataset is 93.75%, 100%, 96.77%, and 99.97% respectively whereas OF-based method gives 100%, 94.64%, 97.25%, and 99.98% respectively. The proposed OF-based method shows superiority compared with published methods. More than 99.5% detection accuracy on cross dataset has been achieved which ensures the robustness of the proposed methods.

Another comparison is execution time; Table 6 shows the comparison of execution time per frame with state-of-the-art methods to detect insertion forgery. The proposed method requires lower time than compared to others. The efficiency of the proposed methods is increased 3 times as compared to [22]. The execution time of the Std-based method is 3.39sec/frame while [22] has 10.75sec/frame which clearly indicates that the execution time of the proposed method is decreased by 32% than the state-of-the-art.

4.4. Different from State-of-the-Art

Table 7 summarizes and compares the performance of the proposed algorithms with the five methods in terms of detection, efficiency and applicability. In previous studies, small datasets are used for training and testing and no cross-dataset validation has been performed which limits the generalization capability of their models. Moreover, multiple pre-defined or empirical thresholds are introduced in [5, 14, 38, 39, 44] to take an intermediate or final decision regarding forgery detection.

Table 7

Overall comparison with the state-of-the-art
	Ulutas et al. 2018 [39]	Kharat et al. 2020 [12]	Jia et al. 2018 [14]	Ulutas et al. 2017 [44]	Sitara et al. 2018 [38]	Alsakar et al. 2021 [22]	Proposed
Forgery Type	FD					FI	FD + FI
Size of train/test dataset	31	20	115	10	90	18	225 225
Detection accuracy	96.73%	99.7%	98%	99.35%	94.5%	-	99.71% 99.87%
Cross validation	N/A	N/A	N/A	N/A	N/A	N/A	99.75% 99.98%

The proposed method benefited from hybrid strategy and achieved better applicability, higher detection accuracy and stronger robustness than other techniques.

In the realm of multimedia security, surveillance video forensics poses significant challenges. The forged video can mislead the investigators and potentially influence the decisions of judges in the courts. This research paper introduces an innovative technique for detecting and localizing FD and FI forgeries in surveillance videos. This hybrid technique is comprised of two methods; in the first method candidate frames are identified from the video by exploiting MR-LBPH features to train SVM, and second method used Std and OF sum consistency independently for detection and precise localization of tampering by reducing false positives. The proposed methods have been evaluated using VTED as well as cross-dataset validation has been done on two unknown datasets that are not part of our training or initial testing phase. The experimental results strongly demonstrate the effectiveness of the proposed method. The PR, RR, and DA of OF-based proposed method are 99.81%, 99.90% and 99.71% respectively, which are higher than state-of-the-art [14, 39, 44]. To make the system flexible enough against the number of duplicated and inserted frames, it is tested on videos with variable duplicated/inserted frames ranging from 10 to 50 in the step of 5. Furthermore, the proposed technique does not enforce any restriction on the minimum number of frames that are required to be duplicated or inserted. It can detect duplication and insertion of as few as just ten frames. It is also capable to detect EOP (event-object-person) based FD and FI forgeries with high values of PR, RR and DA as compared with other state-of-the-art methods. The proposed method can detect frame insertion, and duplication types of forgeries from static single shot videos, and proves to outperform the state-of-the-art in terms of forgery detection accuracy. Instead of using multiple thresholds to take an intermediate or final decision as adopted in previous studies, our method introduced just a single threshold value which helps to reduce computational overhead. The execution time required by the proposed method for frame insertion forgery detection is far less i.e., 32% is reduced as compared to recent state-of-the-art methods and also comparatively not much high in case of duplication forgery detection when compared to other methods reported in the literature. Our future work includes continuing to extend the framework for the detection of other types of inter-frame forgeries such as frame deletion and frame shuffling.

Data Availability Statement The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author Contributions: Conceptualization, M.H., and Z.H.; Data curation and Formal analysis, N.A.; Investigation, N.A.; Methodology, N.A.; Supervision, M.H. and Z.H.; Writing original draft, N.A. All authors have read and agreed to the published version of the manuscript.

Funding and/or Conflicts of interests/Competing interests This research was partially supported by Higher Education Commission (HEC), Pakistan under Faculty Development Program (FDP) for Pakistani Universities via award letter No. 17-5/FEG1-008/HEC/Sch-FDP/2018. The authors declare that they have no conflict and competing interests.

Akhtar, N., M. Saddique, K. Asghar, U.I. Bajwa, M. Hussain, and Z. Habib, "Digital Video Tampering Detection and Localization: Review, Representations, Challenges and Algorithm," Mathematics, vol. 10(2), pp. 168, 2022. DOI: https://doi.org/10.3390/math10020168.
Nabi, S.T., M. Kumar, P. Singh, N. Aggarwal, and K. Kumar, "A comprehensive survey of image and video forgery techniques: variants, challenges, and future directions," Multimedia Systems, vol. 28(3), pp. 939-992, 2022.
Mohiuddin, S., S. Malakar, M. Kumar, and R. Sarkar, "A comprehensive survey on state-of-the-art video forgery detection techniques," Multimedia Tools and Applications, pp. 1-41, 2023. DOI: https://doi.org/10.1007/s11042-023-14870-8.
Huang, C.C., Y. Zhang, and V.L. Thing. "Inter-frame video forgery detection based on multi-level subtraction approach for realistic video forensic applications". In Proceedings of IEEE 2nd International Conference on Signal and Image Processing (ICSIP). 2017. IEEE, Singapore, August 4-6, 2017.
Long, C., A. Basharat, A. Hoogs, P. Singh, and H. Farid. "A Coarse-to-fine Deep Convolutional Neural Network Framework for Frame Duplication Detection and Localization in Forged Videos". In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2019. IEEE, Long Beach, CA, USA, June 16-20, 2019.
Johnston, P. and E. Elyan, "A review of digital video tampering: from simple editing to full synthesis," Digital Investigation, vol. 29, pp. 67-81, 2019. DOI: https://doi.org/10.1016/j.diin.2019.03.006.
Wang, W. and H. Farid. "Exposing digital forgeries in video by detecting duplication". In Proceedings of Proceedings of the 9th workshop on Multimedia & security. 2007.
Wang, Q., Z. Li, Z. Zhang, and Q. Ma, "Video Inter-Frame Forgery Identification Based on Consistency of Correlation Coefficients of Gray Values," Journal of Computer and Communications, vol. 2(04), pp. 51, 2014.
Singh, G. and K. Singh, "Video frame and region duplication forgery detection based on correlation coefficient and coefficient of variation," Multimedia Tools and Applications, vol. 78(9), pp. 11527-11562, 2019. DOI: https://doi.org/10.1007/s11042-018-6585-1.
Zhang, Z., J. Hou, Q. Ma, and Z. Li, "Efficient video frame insertion and deletion detection based on inconsistency of correlations between local binary pattern coded frames," Security and Communication Networks, vol. 8(2), pp. 311-320, 2015. DOI: https://doi.org/10.1002/sec.981.
Fadl, S., A. Megahed, Q. Han, and L. Qiong, "Frame duplication and shuffling forgery detection technique in surveillance videos based on temporal average and gray level co-occurrence matrix," Multimedia Tools and Applications, vol. 79(25), pp. 17619-17643, 2020. DOI: https://doi.org/10.1007/s11042-019-08603-z.
Kharat, J. and S. Chougule, "A passive blind forgery detection technique to identify frame duplication attack," Multimedia Tools and Applications, vol. 79(11), pp. 8107-8123, 2020. DOI: https://doi.org/10.3390/math10020168.
Feng, C., Z. Xu, S. Jia, W. Zhang, and Y. Xu, "Motion-adaptive frame deletion detection for digital video forensics," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27(12), pp. 2543-2554, 2016.
Jia, S., Z. Xu, H. Wang, C. Feng, and T. Wang, "Coarse-to-fine copy-move forgery detection for video forensics," IEEE Access, vol. 6, pp. 25323-25335, 2018.
Zampoglou, M., F. Markatopoulou, G. Mercier, D. Touska, E. Apostolidis, S. Papadopoulos, R. Cozien, I. Patras, V. Mezaris, and I. Kompatsiaris. "Detecting Tampered Videos with Multimedia Forensics and Deep Learning". In Proceedings of International Conference on Multimedia Modeling. 2019. Springer, Thessaloniki, Greece, January 8-11, 2019.
Johnston, P., E. Elyan, and C. Jayne, "Video tampering localisation using features learned from authentic content," Neural computing and applications, vol. 32(16), pp. 12243-12257, 2020. DOI: https://doi.org/10.1007/s00521-019-04272-z.
Wang, Q., Z. Li, Z. Zhang, and Q. Ma, "Video inter-frame forgery identification based on optical flow consistency," Sensors & Transducers, vol. 166(3), pp. 229, 2014.
Kingra, S., N. Aggarwal, and R.D. Singh, "Inter-frame forgery detection in H. 264 videos using motion and brightness gradients," Multimedia Tools and Applications, vol. 76(24), pp. 25767-25786, 2017. DOI: https://doi.org/10.1007/s11042-017-4762-2.
Singh, R.D. and N. Aggarwal, "Optical flow and prediction residual based hybrid forensic system for inter-frame tampering detection," Journal of Circuits, Systems and Computers, vol. 26(07), pp. 1750107, 2017. DOI: https://doi.org/10.1142/S0218126617501079.
Yu, L., H. Wang, Q. Han, X. Niu, S.-M. Yiu, J. Fang, and Z. Wang, "Exposing frame deletion by detecting abrupt changes in video streams," Neurocomputing, vol. 205, pp. 84-91, 2016. DOI: https://doi.org/10.1016/j.neucom.2016.03.051.
Stamm, M.C., W.S. Lin, and K.R. Liu, "Temporal forensics and anti-forensics for motion compensated video," IEEE Transactions on Information Forensics and Security, vol. 7(4), pp. 1315-1329, 2012. DOI: https://doi.org/10.1109/TIFS.2012.2205568.
Alsakar, Y.M., N.E. Mekky, and N.A. Hikal, "Detecting and Locating Passive Video Forgery Based on Low Computational Complexity Third-Order Tensor Representation," Journal of Imaging, vol. 7(3), pp. 47, 2021. DOI: https://doi.org/10.3390/jimaging7030047.
Sitara, K. and B. Mehtre. "A comprehensive approach for exposing inter-frame video forgeries". In Proceedings of 2017 IEEE 13th International Colloquium on Signal Processing & its Applications (CSPA). 2017. IEEE.
Bakas, J. and R. Naskar. "A Digital Forensic Technique for Inter–Frame Video Forgery Detection Based on 3D CNN". In Proceedings of International Conference on Information Systems Security. 2018. Springer, Funchal, Purtugal, January 22-24, 2018.
Fadl, S., Q. Han, and Q. Li, "CNN spatiotemporal features and fusion for surveillance video forgery detection," Signal Processing: Image Communication, vol. 90, pp. 116066, 2021. DOI: https://doi.org/10.1016/j.image.2020.116066.
Tyagi, S. and D. Yadav, "A detailed analysis of image and video forgery detection techniques," The Visual Computer, vol. 39(3), pp. 813-833, 2023. DOI: https://doi.org/10.1007/s00371-021-02347-4.
Yu, L., H. Wang, Q. Han, X. Niu, S. Yiu, J. Fang, and Z. Wang, "Exposing frame deletion by detecting abrupt changes in video streams," Neurocomputing, 2016.
Long, C., E. Smith, A. Basharat, and A. Hoogs. "A c3d-based convolutional neural network for frame dropping detection in a single video shot". In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2017. IEEE.
Subramanyam, A.V. and S. Emmanuel. "Pixel estimation based video forgery detection". In Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. IEEE.
Qureshi, M.A. and M. Deriche, "A bibliography of pixel-based blind image forgery detection techniques," Signal Processing: Image Communication, vol. 39, pp. 46-74, 2015. DOI: https://doi.org/10.1016/j.image.2015.08.008.
Fayyaz, M.A., A. Anjum, S. Ziauddin, A. Khan, and A. Sarfaraz, "An improved surveillance video forgery detection technique using sensor pattern noise and correlation of noise residues," Multimedia Tools and Applications, vol. 79(9), pp. 5767-5788, 2020. DOI: https://doi.org/10.1007/s11042-019-08236-2.
Kaur, H. and N. Jindal, "Deep convolutional neural network for graphics forgery detection in video," Wireless Personal Communications, vol. 112(3), pp. 1763-1781, 2020. DOI: https://doi.org/10.1007/s11277-020-07126-3.
Lin, G.-S., J.-F. Chang, and C.-H. Chuang. "Detecting frame duplication based on spatial and temporal analyses". In Proceedings of 2011 6th International Conference on Computer Science & Education (ICCSE). 2011. IEEE.
El-Shafai, W., M.A. Fouda, E.-S.M. El-Rabaie, and N.A. El-Salam, "A comprehensive taxonomy on multimedia video forgery detection techniques: challenges and novel trends," Multimedia Tools and Applications, pp. 1-67, 2023. DOI: https://doi.org/10.1007/s11042-023-15609-1.
Singh, R.D. and N. Aggarwal, "Video content authentication techniques: a comprehensive survey," Multimedia Systems, vol. 24(2), pp. 211-240, 2018. DOI: https://doi.org/10.1007/s00530-017-0538-9.
Al-Sanjary, O.I. and G. Sulong, "DETECTION OF VIDEO FORGERY: A REVIEW OF LITERATURE," Journal of Theoretical & Applied Information Technology, vol. 74(2), pp. 208-220, 2015.
Fadl, S.M., Q. Han, and Q. Li, "Authentication of surveillance videos: detecting frame duplication based on residual frame," Journal of forensic sciences, vol. 63(4), pp. 1099-1109, 2018.
Sitara, K. and B. Mehtre, "Detection of inter-frame forgeries in digital videos," Forensic science international, vol. 289, pp. 186-206, 2018. DOI: https://doi.org/10.1016/j.forsciint.2018.04.056.
Ulutas, G., B. Ustubioglu, M. Ulutas, and V.V. Nabiyev, "Frame duplication detection based on bow model," Multimedia Systems, vol. 24(5), pp. 549-567, 2018. DOI: https://doi.org/10.1007/s00530-017-0581-6.
Shelke, N.A. and S.S. Kasana, "Multiple forgeries identification in digital video based on correlation consistency between entropy coded frames," Multimedia Systems, pp. 1-14, 2022. DOI: https://doi.org/10.1007/s00530-021-00837-y.
Chen, S., S. Tan, B. Li, and J. Huang, "Automatic detection of object-based forgery in advanced video," IEEE Transactions on Circuits and Systems for Video Technology, vol. 26(11), pp. 2138-2151, 2015.
Fayyaz, M.A., A. Anjum, S. Ziauddin, A. Khan, and A. Sarfaraz, "An improved surveillance video forgery detection technique using sensor pattern noise and correlation of noise residues," Multimedia Tools and Applications, vol. 79, pp. 5767-5788, 2020. DOI: https://doi.org/10.1007/s11042-019-08236-2.
Huang, C.C., C.E. Lee, and V.L. Thing, "A Novel Video Forgery Detection Model Based on Triangular Polarity Feature Classification," International Journal of Digital Crime and Forensics (IJDCF), vol. 12(1), pp. 14-34, 2020.
Ulutas, G., B. Ustubioglu, M. Ulutas, and V. Nabiyev, "Frame duplication/mirroring detection method with binary features," IET Image Processing, vol. 11(5), pp. 333-342, 2017. DOI: https://doi.org/10.1049/iet-ipr.2016.0321.
Zhao, D.-N., R.-K. Wang, and Z.-M. Lu, "Inter-frame passive-blind forgery detection for video shot based on similarity analysis," Multimedia Tools and Applications, pp. 1-20, 2018. DOI: https://doi.org/10.1007/s11042-018-5791-1.
Huang, T., X. Zhang, W. Huang, L. Lin, and W. Su, "A multi-channel approach through fusion of audio for detecting video inter-frame forgery," Computers & Security, vol. 77, pp. 412-426, 2018. DOI: https://doi.org/10.1016/j.cose.2018.04.013.
Fadl, S.M., Q. Han, and Q. Li, "Inter-frame forgery detection based on differential energy of residue," IET Image Processing, vol. 13(3), pp. 522-528, 2019. DOI: https://doi.org/10.1049/iet-ipr.2018.5068.
Panchal, H.D. and H.B. Shah, "Multiple forgery detection in digital video based on inconsistency in video quality assessment attributes," Multimedia Systems, pp. 1-16, 2023. DOI: https://doi.org/10.1007/s00530-023-01123-9.
Jaiswal, S. and S. Dhavale, "Video Forensics in Temporal Domain using Machine Learning Techniques," International Journal of Computer Network and Information Security (IJCNIS), vol. 5(9), pp. 58, 2013.
Wang, Y., Y. Hu, A.W.-C. Liew, and C.-T. Li, "ENF Based Video Forgery Detection Algorithm," International Journal of Digital Crime and Forensics (IJDCF), vol. 12(1), pp. 131-156, 2020.
Shanableh, T., "Detection of frame deletion for digital video forensics," Digital Investigation, vol. 10(4), pp. 350-360, 2013. DOI: https://doi.org/10.1016/j.diin.2013.10.004.
Chao, J., X. Jiang, and T. Sun. "A novel video inter-frame forgery model detection scheme based on optical flow consistency". In Proceedings of International Workshop on Digital Watermarking. 2012. Springer,Berlin,Heidelberg,Germany., Shanghai, China, October 31 – November 3, 2012.
Feng, C., Z. Xu, W. Zhang, and Y. Xu. "Automatic location of frame deletion point for digital video forensics". In Proceedings of the 2nd ACM workshop on Information hiding and multimedia security. 2014. ACM, Salzburg, Austria, June 11 - 13.
Liao, S.-Y. and T.-Q. Huang. "Video copy-move forgery detection and localization based on Tamura texture features". In Proceedings of 6th International Congress on Image and Signal Processing (CISP). 2013. IEEE, Hangzhou, China, 16-18 Dec. 2013.
Bakas, J., R. Naskar, and R. Dixit, "Detection and localization of inter-frame video forgeries based on inconsistency in correlation distribution between Haralick coded frames," Multimedia Tools and Applications, vol. 78(4), pp. 4905-4935, 2019. DOI: https://doi.org/10.1007/s11042-018-6570-8.
Vinolin, V. and M. Sucharitha, "Dual adaptive deep convolutional neural network for video forgery detection in 3D lighting environment," The Visual Computer, vol. 37, pp. 2369-2390, 2021. DOI: https://doi.org/10.1007/s00371-020-01992-5.
Hsiao, P.-Y., S.-S. Chou, and F.-C. Huang. "Generic 2-D gaussian smoothing filter for noisy image processing". In Proceedings of TENCON 2007-2007 IEEE Region 10 Conference. 2007. IEEE, Taipei, Taiwan, 30 Oct.-2 Nov. 2007.
Hsiao, P.-Y., C.-H. Chen, S.-S. Chou, L.-T. Li, and S.-J. Chen. "A parameterizable digital-approximated 2D Gaussian smoothing filter for edge detection in noisy image". In Proceedings of 2006 IEEE International Symposium on Circuits and Systems. 2006. IEEE, Kos, Greece, 21-24 May 2006.
Deng, G. and L. Cahill. "An adaptive Gaussian filter for noise reduction and edge detection". In Proceedings of 1993 IEEE conference record nuclear science symposium and medical imaging conference. 1993. IEEE, San Francisco, CA, USA, 31 Oct.-6 Nov. 1993.
Yang, J. and A.G. Hauptmann. "Exploring temporal consistency for video analysis and retrieval". In Proceedings of Proceedings of the 8th ACM international workshop on Multimedia information retrieval. 2006.
Ke-Chen, S., Y. Yun-Hui, C. Wen-Hui, and X. Zhang, "Research and perspective on local binary pattern," Acta Automatica Sinica, vol. 39(6), pp. 730-744, 2013. DOI: https://doi.org/10.1016/S1874-1029(13)60051-8.
Mahale, V.H., M.M. Ali, P.L. Yannawar, and A.T. Gaikwad, "Image inconsistency detection using local binary pattern (LBP)," Procedia computer science, vol. 115, pp. 501-508, 2017. DOI: https://doi.org/10.1016/j.procs.2017.09.097.
Gaikwad, A., V. Mahale, M.M. Ali, and P.L. Yannawar. "Detection and Analysis of Video Inconsistency Based on Local Binary Pattern (LBP)". In Proceedings of International Conference on Recent Trends in Image Processing and Pattern Recognition. 2018. Springer.
Park, H.-S. and C.-H. Jun, "A simple and fast algorithm for K-medoids clustering," Expert systems with applications, vol. 36(2), pp. 3336-3341, 2009. DOI: https://doi.org/10.1016/j.eswa.2008.01.039.
Bourouis, S., R. Alroobaea, A.M. Alharbi, M. Andejany, and S. Rubaiee, "Recent advances in digital multimedia tampering detection for forensics analysis," Symmetry, vol. 12(11), pp. 1811, 2020. DOI: https://doi.org/10.3390/sym12111811.
Joy, S. and L. Kurian, "Video Forgery Detection Using Invariance of Color Correlation," International Journals of Computer Science and Mobile Computing, vol. 3, pp. 99-105, 2014.
Lucas, B.D. and T. Kanade. "An iterative image registration technique with an application to stereo vision". In Proceedings 1981. Vancouver, British Columbia.
Bruhn, A., J. Weickert, and C. Schnörr, "Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods," International journal of computer vision, vol. 61(3), pp. 211-231, 2005. DOI: https://doi.org/10.1023/B:VISI.0000045324.43199.43.
Fadl, S., Q. Han, and L. Qiong, "Exposing video inter-frame forgery via histogram of oriented gradients and motion energy image," Multidimensional Systems and Signal Processing, vol. 31(4), pp. 1365-1384, 2020. DOI: https://doi.org/10.1007/s11045-020-00711-6.
Zhang, Z., J. Hou, Z. Li, and D. Li. "Inter-frame forgery detection for static-background video based on MVP consistency". In Proceedings of International Workshop on Digital Watermarking. 2015. Springer.
Liu, Y. and T. Huang, "Exposing video inter-frame forgery by Zernike opponent chromaticity moments and coarseness analysis," Multimedia Systems, vol. 23(2), pp. 223-238, 2017. DOI: https://doi.org/10.1007/s00530-015-0478-1.
Panchal, H.D. and H.B. Shah, "Video tampering dataset development in temporal domain for video forgery authentication," Multimedia Tools and Applications, vol. 79(33), pp. 24553-24577, 2020. DOI: https://doi.org/10.1007/s11042-020-09205-w.

No competing interests reported.

Frame Duplication and Insertion Forgery Detection in Surveillance Videos Using Optical Flow and Texture Features

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Proposed Method