Detection of Tampered Real Time Videos Using Deep Neural  Networks

doi:10.21203/rs.3.rs-2469782/v1

Download PDF

Research Article

Detection of Tampered Real Time Videos Using Deep Neural Networks

https://doi.org/10.21203/rs.3.rs-2469782/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 27 May, 2024

Read the published version in Neural Computing and Applications →

Version 1

posted

You are reading this latest preprint version

The creation and sharing of videos with the aim of promoting the use of digitally interactive multimedia, such as music, graphics, and video between devices for both social networking applications and everyday tasks have increased significantly along with the use of digital communication devices in recent years. In the digital sphere, forgery techniques and motivations have substantially evolved. Before, methods of video editing were applied to improve digital data. With the rise in popularity of low-cost, user-friendly video editing software, there are a number of downsides and risks related to these editing methods. In order to produce altered or fraudulent videos, additional footage is mixed, edited, or synthesized. Existing method uses methods that detect forgery in videos with simply static backgrounds only. Proposed systems uses a deep learning strategy that incorporates transfer learning utilizing VGG16 and Customized CNN layers to categorize real time videos as tampered or authentic .With the aid of deep neural networks, the suggested method may identify forgery in films with both static and moving backgrounds. The experimental findings show that the suggested strategy is over 99.9% more accurate and effective than existing methods also it provides trustworthy results with low computing cost and strong detection performance.

Video forgery

VGG16

CNN

pre-trained model

Digital video material is ingrained in every facet of our everyday lives in the present era, when the internet and social media are the preeminent platforms for communication. The advancement of image and video technology, including video production, sharing, storage, and retrieval as well as applications like tools for video conferencing and sharing, has profoundly benefited people and society. Applications like those utilized in the entertainment industry, video surveillance, evidence in court cases, political films, tutorial videos, advertisements, and other social networking sites like YouTube, Facebook, and Instagram highlight their unequaled relevance in the contemporary environment. Because of the widespread use of the Internet, inexpensive, high-quality cameras, computers, and user-friendly editing software, videos may be easily captured, saved, transmitted, and processed in digital format. Unauthorized use of video editing software by a novice user could jeopardise the reliability and integrity of the video. This hypothetical situation highlights the importance of authenticating any multimedia content collected via the internet, captured as part of a monitoring system, or acquired by broadcasters.

It is essential to ensure the veracity of these CCTV recordings because in recent years, surveillance films have evolved into a critical component for the general public's protection that keeps an eye on numerous businesses. If such videos are misused, it could result in a number of important circumstances including public safety or evidence that can be utilized legally [1]. When these films are edited, they commonly deceive the eyes into thinking they are seeing truth and give off the impression of being reliable and realistic. The media occasionally presents fake videos as actual ones.

The field of digital video forensics was created to address the issue of counterfeiting and ensure the veracity of digital recordings [32]. To determine whether a digital video's contents can be independently confirmed, one might utilize a group of methods and tools collectively referred to as "digital video forensics." The video is broken into non-overlapping frames, and the authenticity of the footage depends on whether or not all of the frames are actual. The hierarchical representation of the video is subsequently determined by the CNN layers utilizing the recovered characteristics. The customized CNN layers, which serve as a classifier, may also determine whether a video is real or fraudulent.

Despite the fact that it appears more challenging when it comes to image creation, the availability of video editing tools has made to tamper video easier than previously. Splicing, Source identification, and copy-move forgery are the three general categories into which passive approaches can be subdivided [22, 23]. These methods are employed for the detection of double compression in the video, such as MPEG or H.246 and digital video. The numerous works on digital video tampering detection [24–29], provide various methods used to verify the authenticity of digital videos as well as detect traditional forging methods. These methods include frame-based tampering, image double JPEG compression, video object detection, video double compression, video frame of region duplication, and video object identification. Figure 1. shows a duplicate series of video frames used to conceal or imitate a particular occurrence. If a person is seen on camera, for example, the portion of the film that depicts their body can be cut out by duplicating and relocating a frame from another sub-sequence to fill in the blanks. It might be challenging to spot this type of video counterfeiting if the copy-move method is executed properly and successfully. As a result, video forgery detection is essential in this situation [30][36].

Numerous techniques are available for detecting tampering in the spatial. The first category extracts characteristics using noise as a foundation. While Kobayashin et al[28].[2] employed noise characteristics, Hyun et al.[1] used sensor pattern noise (SPN) to identify counterfeiting. To detect copy-move forgeries, Panday et al.[3] employed SIFT characteristics, noise residual, and correlation. Utilizing the frequency domain, the features in the second category are extracted. Exponential-Fourier moments (EFM) were used by Su et al.[4] to identify the duplicating region in video frames. Re-enacted facial expression forgery (FERF) was identified by C. Guo, Y. Zhu, and G. Luo[5] using wavelet coefficient and optical flow moment features in conjunction with SVM. In the third category, a statistical approach is used to calculate the characteristics. Statistics were examined by Richao et al. [6]. Bagiwa et al.[7] looked into the statistical correlation between artifacts of blurring. Singh and Aggarwal[8] employed noise inconsistency, pixel correlation, and discrete fractional Fourier transformation in their research. Singh and Singh[9] used the correlation coefficient to find redundant areas in the vector. The focus of the fourth category is on the optical flow and motion residual. Bidokhti and Ghaemmaghami[10] suggested an optical flow-based technique to detect copy-move fraud in MPEG films.

Block-based motion estimation is utilized by L. Li, X. Wang, W. Zhang, G. Yang, and G. Hu[11] to extract motion from neighboring frames, which is then used to identify whether the motion is real or fabricated. To detect copy-move forgeries in movies, Al-sanjary et al.[12] suggested employing an optical flow irregularity detection technique with a dynamic temporal warping (DTW) matching algorithm. The Zero-motion video residual techniques was used by Bestagini et al. [13]. Using motion residual and steganography properties, Chen et al.[14] recognized the falsified footage. The fifth group was given the histogram of oriented gradient (HOG) characteristics treatment by Subramanyam et al. [15]. The gradient orientation used by the HOG and its variations is used, however, it is not enough for expressing texture-based micro-patterns and changes. Su and Li[16] detected copy-move forgery in the first individual frame using MISFIT and fabricated portions in later frames are detected using spatiotemporal context learning. The aforementioned techniques are all effective, but they also have drawbacks, such as the fact that they can only be used with particular formats, resolutions, datasets, and handmade characteristics to detect forgeries. Author in [17] employed a CNN to eliminate temporal redundancy in order to extract high-dimensional characteristics and the actual differences amongst subsequent frames. The computational complexity is decreased by using a max pooling, and the residuals from the tampering process are increased by using a high pass filter layer. Studies [18–24] focused on digital image forensics, while few addressed the identification of digital video forgeries.

It is challenging to identify regions or frames in the video since the forged location may change in size and rate of compression. Finding the temporal and spatial extent of copy-move manipulation is the primary goal of copy move video forgery detection systems. The three steps in the editing process for a video sequence are decoding the input frame sequence, editing the actual frame sequence, and changing the coding settings or using a different codec to recode the altered video. Models using deep neural networks a lot of time, a lot of data, and a powerful computer environment. The gathering of several domain-specific data collection and labelling is a time-consuming and expensive process, and sometimes even gathering enough data is impractical [25, 26]. In such circumstances, transfer learning (TL), which opens up a fresh stream of techniques, is the only way to deploy deep learning. By employing transfer learning with deep neural networks ,we first pre-train a deep neural network with the data from similar dataset then adjust the weights using the data from the original dataset. Then use the pre trained model as feature extractor to extract features and send them to a classifier like CNN,RNN, etc, and this model is used in the proposed study. When the dataset is very small and more specific features are used for training, the complexity increases which as causes an over fitted model. This drawback can be avoided by increasing the dataset but it is difficult to get large amount of data and to to train them. The following Table 1 shows the detailed review of the related works.

Table 1

Literature Review
S. No	Name of the Researcher	Title of the Research	Concept	Disadvantage
1	Honglak Lee, Chaitanya Ekanadham, Andrew Ng	Sparse deep belief net model for visual area V2	Two layered approach is introduced to reduce the video forgery.[18]	Encoding methods used in this method are little bit complex.
2	H. Larochelle and all.	Investigating methods for instructing deep neural networks	It combines deep neural network and several RBM input unit distributions algorithms.[19]	Undiscussed are the characteristics of a learning algorithm that learns effective representations for deep networks in this study.
3	J. Chen, X. Kang, Y. Liu, and Z. J. Wang	Forensic Median Filtering Using Convolutional Neural Networks	In this, they utilized CNNs for image forensics with median filtering.[20]	Small and JPEG-compressed image blocks can be detected using a CNN-based approach, although it is not suitable for high-quality photos.
4	Y. Zhan et and all.	Based on transfer learning and convolutional neural networks, image forensics	By recognizing five forensic kinds, including median filtering, resampling, JPEG compression, contrast enhancement, and additive Gaussian noise, their models' performance is assessed on Boss base and BOW [21].	This technique is still being developed to transport data between various multimedia forensic jobs.
5	A. Krizhevsky, et and all	using deep convolutional neural networks to classify images	They employed non-saturating neurons and a very effective GPU version of the convolution process to speed up training. A recently created regularization technique termed "dropout" was used to significantly reduce overfitting in the fully linked layers, and it was quite successful[23].	They did not employ any unsupervised pre-training, despite the fact that we anticipate it will be beneficial, especially if we are able to greatly expand the network's size without also expanding the amount of labelled data.
6	P. Swietojanski, A. Ghoshal, and S. Renals	Convolutional neural networks for recognizing distant speech,	As opposed to directly using several auditory channels as a parallel input to the CNN, this method represents beam formed signal input[24].	Similar WERs were obtained while shifting the order of the channels from how they were shown during training to how they were presented during testing when employing CNNs with cross-channel pooling, indicating that the model is able to select the most informative channel.
7	Cao, et and all	Neuro computing, transfer learning for pedestrian detection	A brand-new sample screening technique built on multiple learning is suggested. The second is the proposal of a brand-new classification model based on transfer learning[25].	It won't look into how much data is collected from hidden scenes and how it affects how well detection works.
8	D. Wu, F. Zhu, and L. Shao	One-shot detection of gestures in RGBD pictures	On depth images, the approach applies morphological demising and automatically segments the temporal bounds. To combine two modalities in a physically relevant way, features are retrieved based on Extended-Motion-History-Image (Extended-MHI), and the Multi-view Spectral Embedding (MSE) algorithm is utilized[26].	Modern skeletal tracker and the use of motions for more precise temporal segmentation are not explored.
9	Saddique, Mubbashar, et al	Employing motion residual and parasitic layers to distinguish between real and altered video	This approach is built on a deep model, which has three different kinds of layers: parasitic, motion residual, and convolutional neural network. [27].	There is no discussion of where the altered objects are located in the video recordings.
10	Kobayashi, Michihiro, Takahiro Okabe, and Yoichi Sato	Utilizing noise features to identify fake videos.	In this study, they provide a method for exploiting noise characteristics to identify suspicious areas in video taken from a static scene[28].	This method of transferring data between multiple multimedia forensic tasks is still being explored.

Based on the review process the proposed system used a pre-trained VGG16 model along with customized Convolution Neural Network is used for the classification of proposed video forgery detection .

The suggested system uses deep learning to identify if a video is authentic or has been altered. The input videos are separated into two groups the original videos and the edited videos which is shown in fig 1. The video is categorized as non-frames, and the authenticity of the video is assessed based on whether all of the frames are real.

A model is trained using transfer learning to extract some features from a large amount of data. The suggested technique employs a deep CNN model with two distinct layers: (1) CNN layers (convolutional, pooling, and dense layers), and (2) customized CNN layers. The obtained attributes are given to the CNN layers after frame production so they can compute the video's hierarchical representation. categorized into two types: original and tampered. utilized as the input. The movie's authenticity is determined by determining whether each and every frame in the film is real. The video is classified as non-frames. To model some features that have been learned from a lot of data and are well-trained, transfer learning is employed. The proposed method employs a deep CNN model with two distinct layer (1) CNN layer (2) Customized CNN layers. Following frame creation, the obtained characteristics are sent to the CNN layers, which compute a hierarchical representation of the video. Ultimately, the dense layer of the custom CNN, which serves like a classifier, determines whether the clips have been edited or are authentic.

3.1 Learning Algorithm

Input video clips are divided into original and modified categories.
Input: Split video into non-intersecting frames(VC).
Feature Extraction: The features are extracted from the frames using TL and CNN Layers
1. Frames extracted
2. Frames are resized to 128*128*3.
3. VGG-16 used for transfer learning.
Custom CNN layers are added.
Training is done using a different split dataset to check the accuracy.
Output: Categorized as tampered or original videos.

3.2 Preprocessing and Feature Extraction

Even when training a classification framework across a video classification dataset, the dataset must be first preprocessed. Real-world data is generally noisy, incomplete, and in inadequate format which cannot be directly used. Data preparation is an essential operation for cleaning data and preparing it within machine learning algorithms, which increases the model's accuracy and performance. Data preparation is an iterative process that turns unstructured data into understandable and usable formats. Raw datasets are typically characterized by incompleteness, inconsistency, a lack of behavior, and patterns, as well as errors, preprocessing consists mostly of three steps: extract frames, resize frames, and normalize frames. A video is a collection of frames. Each frame indicates a distinct stage of object status. Detecting and retaining objects from the frame following the specific object is difficult to follow the video sequence entirely. Thus first phase of preprocessing is video frame extraction. The collected frames are then used for object identification, detection, and tracking. Identifying an efficient technique for extracting key frames from video is critical for this. These videos are first separated into non-overlapping frames in order to be classified. To accomplish this, OpenCV offers an interface, using the Python OpenCV binding, videos are transformed into frames. After that, the created frames are adjusted to a certain width and height.

3.3 Transfer Learning

This is a deep learning method where in an existing model that's been trained on a large amount of data and apply the features learned from such data to proposed problem. This model is good at discovering certain features because it has learned from a large amount of data which can be altered and trained for specific requirements. Over-fitting occurs when a deep model is trained on a limited dataset, which is one of its limitations. However, by increasing the size of the training data can prevent over-fitting. Though it is a challenging to create large amount of labeled data. In this case, transfer learning is used to address the problem. Transfer machine, its a learning method in which a model generated for one data is utilized as the foundation model for a different task. Rather than training all of the model's layers, transfer learning locks/freezes portions of them and utilizes those trained weights in the locked/frozen layers to extract specific attributes from the data.

Lower layers as such as FC6,FC7,FC8 could be re - trained because they will be customized for the data.

3.4 Custom CNN Layers

In the proposed technique, the input layer and last three layers of the VGG16 model (2 dense, FC6,FC7,FC8 as well as the softmax) are altered as shown in Fig 3. The VGG-16 model's input layer accepts 224x224 input picture shapes by default. This layer was replaced by one that accepts a 128x128 input shape. The final layers were modified by an 8-layer custom CNN layer. The VGG-16's last thick layers learn task-specific information and thus are trained to use the SGD algorithm, which needs a large amount of data and time. The FC6,FC7,FC8& Softmax layers of the VGG-16 have been replaced with eight custom CNN layers that are activated by the previous layer fig 4.

The eight custom levels are made up of two conv2D layers, two Batch Normalization layers, a MaxPooling 2D layer, a GlobalAvgPooling layer, and two Dense layers. Images can exist in HSV, RGB, Grayscale, CMYK, and other color spaces. The Convolutional layers is used to compress that image into a more understandable form while without compromising information critical to creating a successful prediction. Convent’s are not restricted to a single Convolutional Layer. Traditionally, low-level information such as edges, color, gradient direction, and so forth is captured by the first ConvLayer. The design reacts to the High-Level features as well, giving all with a network that understands the photographs in the dataset. Batch normalization has been a technique for training extremely deep neural networks in which the inputs to each mini-batch are normalized. It has the effect of relaxing the learning process, resulting in a considerable reduction in the number of training epochs required to develop deep neural networks. This is accomplished directly by normalizing the activations of every input parameter for each mini-batch. Normalization is the process of rescaling data so the mean is zero as well as the standard deviation is one. Batch normalization is a modification that keeps the mean output near zero and the standard deviation towards one. Just like Convolutional Layer, the Pooling Layer is used for reducing the dimension of a Convolved Feature.it is useful to extract dominant characteristics that are both rotational and positional invariant, enabling the model to be trained effectively. Maximum pooling and average pooling are the two types of pooling. Max Pooling extracts the most value from the image's Kernel-covered region. Max Pooling can be used to reduce noise and also it filters out all noisy activations then performs de-noising and dimensionality reduction. Average Pooling, just conducts dimension reduction as just a noise suppression strategy. As a result, we can conclude that Max Pooling outperforms Average Pooling.

When using Global Average Pooling, its pool size is still fixed to a size of a layer input, but the pool's average is being used instead of the highest. These are frequently used to replace entirely or densely connected layers in classifiers. Each neuron in the dense layer receives input on all neurons in the previous layer, making it a deep-connected neural network layer. Its dense layer generates a m dimensional vector and is mostly employed to alter the vector's dimensions. In this scenario, the dense layer is linked to the classification of output. Soft ax generates a probability distribution based on a set of values. Soft ax is typically used as the activation for the final layer of the classification network since the output can be viewed as a probability distribution.

3.5 DATASET DESCRIPTION

3.5.1 VIFFD - Video Inter-Frame Forgeries Detection

VIFFD is a dataset that is used for detecting video inter-frame forgeries. Inter - frame forgeries are where frames as a whole are copied from one part of the video and are being added to some other part of the video. These kinds of forgeries are also known as copy move forgeries of the frame. It consists of 136 training data and 136 testing video data making a total of 272 videos in the dataset.

3.5.2. ViFoDAC - Video Forgery Detection and Classification

ViFoDAC is a collection of Authentic videos and Forged videos. The dataset contains a total of 32 video data where 16 videos are Authentic and 16 videos are forged. The Authentic videos are recorded using a camera whereas the Forged videos are edited using certain tools by inserting certain objects into the video sequence. Each of these videos are about 10 seconds to 62 seconds long. The videos are captured using a moving camera. This dataset is very useful for forgeries where objects are externally added into the video. Apart from the video object - based forgery, these can be effective for detecting any kind of forgeries where the background keeps changing.

3.5.3. VDFF_3D DATASET

VDFF_3D is the 3rd dataset that's been used for the project. It is a data set that is used in evaluating small three-dimensional region forgery detection of videos. It consists of 50 original and tampered videos. The whole dataset was considered for the project. Each of these videos are about 6 seconds to 22 seconds long. Videos are captured using a static camera. The dataset can be effective in detecting forgeries where small 3-D objects in videos where the background is always the same and does not keep moving.

3.5.4. REWIND_3D DATASET

REWIND_3D is a dataset that is used in detecting small object based forgery detection of videos. It consists of a total of 20 video data where 10 of them are original and the other 10 videos are tampered. Each of these videos are about 6 seconds to 18 seconds long. The videos are captured using a static camera where the background is always the same. This dataset is useful for detecting forgeries where small objects are inserted into the existing video where the background does not change.

3.5.5. Tampered Video Dataset

Tampered Video Dataset is a collection of 160 tampered videos from six different source videos. Tampered videos were created by choosing an object in a video frame then tracking it for a specified number of frames. After various alterations, the duplicated object gets cloned into some other area of the same video. Transformations like change in brightness, flipping, rotation, scaling, shearing, copy move of objects without transformation, RGB etc are performed so as to obtain 7 types of transformations in forgeries from a single video. This dataset is effective in detecting forgeries where the background remains the same and objects from the video are copied to another part of the video after undergoing certain transformations.

Table 2 : Analysis of data used for forgery detection.

DATASET	VIFFD	ViFoDAC	VDFF_3D	REWIND_3D	TAMPERED VIDEO
STATIC	136 / 272	-	50 / 50	20 / 20	160 / 160
REAL TIME	-	32 / 32	-	-	-
COMBINED	136 / 272	32 / 32	50 / 50	20 / 20	160 / 160

The above table 2 shows an overall information of the number of data that is being used from each of these datasets.

Train-test ratio for the model is 75 − 25.The custom CNN model for a total of 30 epochs and batch size of 10.

4.1 Observations

4.1.1 Train Test Split

Table 3

Train test split of different ratios.
VIDEO DATASET(split set)	ACCURACY
VIDEO DATASET(split set)	70 − 30	75 − 25	80 − 20	85 − 15
STATIC	92.13	94.53	94.12	91.49
REAL TIME	97.32	99.40	95.43	90.87
COMBINED	91.45	95.93	93.54	89.46

Based on Table 3, we determined that a 75 − 25 train test split would provide the maximum accuracy percentage for our proposed CNN model.

4.1.2 Accuracy

Table 4

Accuracy for the three video dataset static, moving and combined given by custom CNN model
Video Dataset	Training Accuracy	Validation Accuracy
STATIC	94.53%	87.80%
REAL TIME	99.45	95.40
COMBINED	95.93	88%

The train accuracy of static, moving and combined are 94.53, 99.40 and 95.93 percent respectively. The validation accuracy of static, moving and combined are 87.80, 95.40 and 88.00 percent respectively.

4.1.3 Confusion Matrix

Static Video Dataset:

The confusion matrix of the custom CNN model on a static video dataset is shown in Fig. 5. True positive and true negative for our model are 226 and 213 respectively.

4.1.4 Moving Video Dataset

The confusion matrix of the custom CNN model on moving video dataset is shown in Fig. 6. True positive and true negative for our model are 249 and 228 respectively.

4.1.5 Combined Video Dataset

The confusion matrix of the custom CNN model on moving video dataset is shown in Fig. 7. True positive and true negative for our model are 224 and 207 respectively.

4.1.6 Static Video Dataset

The total loss vs total validation loss and total accuracy vs total validation accuracy of the static video dataset is shown in figure 8.

4.1.7 Moving Video Dataset

The total loss vs total validation loss and total accuracy vs total validation accuracy of the moving video dataset is shown in figure 9.

4.1.8 Combined Video Dataset

The total loss vs total validation loss and total accuracy vs total validation accuracy of the combined video dataset is shown in figure 10.

This work is a framework for finding forged frames and objects in a video. The forgery detection can be done using conventional methods like sequential and patch analysis, hierarchical methods etc. This work is implemented by using Convolutional Neural Networks with the help of a retrained base model VGG − 16 along with eight customized layers. The videos taken from 5 different datasets are divided into non-intersecting frames after which is passed through different layers in order to obtain the outputs. A comparative analysis of video forgery detection of datasets with static background, moving background and a customized dataset that consists of a combination of both static background videos and moving background videos. The train and test splits among 70:30, 75:25, 80:20, 85:15, the one with 75 : 25 split where 75% or the dataset is taken for training and 25% of the dataset is taken for testing has obtained the highest accuracy among the 4 other train and test split ratios. The deep learning model detects forged frames and objects with an accuracy of 87.80% for dataset with static background only ,95.40% for dataset with moving background only and 88% for combined video datasets.

Ethical Approval

No human or animal research conducted by the authors is included in this article.

Funding

The authors declare that no money, grants, or other kinds of support were used to create this paper.

Conflict of Interest

The authors have no justifiable financial or non-financial motivations to make this content public.

Informed Consent

No human or animal research conducted by the authors is included in this article.

Authorship Contributions

Conceptualization: Litty Koshy ; Methodology: Litty Koshy ; Formal analysis and investigation: Litty Koshy , Dr.S.Prayla Shyry; Writing - original draft preparation: Litty Koshy ; Writing - review and editing: Litty Koshy ; Resources: Litty Koshy , Dr.S.Prayla Shyry ; Supervisions: Dr.S.Prayla Shyry.

Kumar, V., Gaur, M. Multiple forgery detection in video using inter-frame correlation distance with dual-threshold. Multimed Tools Appl (2022). https://doi.org/10.1007/s11042-022-13284-2.
Shraddha Pawar, Gaurangi Pradhan, Bhavin Goswami, Sonali Bhutad, February 7, 2022, "ViFoDAC- Video Forgery Detection And Classification", IEEE Dataport, doi: https://dx.doi.org/10.21227/63t2-ea77.
Xuan Hau Nguyen, Yongjian Hu, Muhmmad Ahmad Amin, Khan Gohar Hayat, Van Thinh Le, Dinh Tu Truong, " Three-dimensional Region Forgery Detection and Localization in Videos", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.11, No.12, pp. 1-13, 2019.DOI: 10.5815/ijigsp.2019.12.01
Nguyen, Xuan hau (2019), “Videos-3D regional forgery detection”, Mendeley Data, V2, DOI: 10.17632/2b28sr4mm3.2
Ardizzone E., Mazzola G., (2015). “A Tool to Support the Creation of Datasets of Tampered Videos”, International Conference on Image Analysis and Processing, ICIAP 2015, pp. 665-675, DOI 10.1007/978-3-319-23234-8_61.
C. Richao, Y. Gaobo, and Z. Ningbo, ‘‘Detection of object-based manipulation by the statistical features of object contour,’’ Forensic Sci. Int., vol. 236, pp. 164–169, Mar. 2014
M. A. Bagiwa, A. W. A. Wahab, M. Y. I. Idris, S. Khan, and K.-K.-R. Choo, ‘‘Chroma key background detection for digital video using statistical correlation of blurring artifact,’’ Digit. Invest., vol. 19, pp. 29–43, Dec. 2016.
R. D. Singh and N. Aggarwal, ‘‘Detection of upscale-crop and splicing for digital video authentication,’’ Digit. Invest., vol. 21, pp. 31–52, Jun. 2017.
G. Singh and K. Singh, ‘‘Video frame and region duplication forgery detection based on correlation coefficient and coefficient of variation,’’ Multimedia Tools Appl., vol. 78, no. 9, pp. 11527–11562, May 2019.
A.Bidokhti and S. Ghaemmaghami, ‘‘Detection of regional copy/move forgery in MPEG videos using optical flow,’’ in Proc. Int. Symp. Artif. Intell. Signal Process. (AISP), Mashhad, Iran, Mar. 2015, pp. 13–17.
L. Li, X. Wang, W. Zhang, G. Yang, and G. Hu, ‘‘Detecting removed object from video with stationary background,’’ in Proc. Int. Workshop Digit. Forensics Watermarking, Taipei, Taiwan, 2013, pp. 242–252.
O. I. Al-Sanjary, A. A. Ahmed, A. A. B. Jaharadak, M. A. M. Ali, and H. M. Zangana, ‘‘Detection clone an object movement using an optical flow approach,’’ in Proc. IEEE Symp. Comput. Appl. Ind. Electron. (ISCAIE), Penang, Malaysia, Apr. 2018, pp. 388–394.
P. Bestagini, S. Milani, M. Tagliasacchi, and S. Tubaro, ‘‘Local tampering detection in video sequences,’’ in Proc. IEEE 15th Int. Workshop Multimedia Signal Process. (MMSP), Pula, CA, Italy, Sep. 2013, pp. 488–493.
S. Chen, S. Tan, B. Li, and J. Huang, ‘‘Automatic detection of object-based forgery in advanced video,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 11, pp. 2138–2151, Nov. 2016.
Sowmya, K., and H. Chennamma. "A survey on video forgery detection." International Journal of Computer Engineering and Applications 9.2 (2015): 17-27
L. Su and C. Li, ‘‘A novel passive forgery detection algorithm for video region duplication,’’Multidimensional Syst. Signal Process., vol. 29, no. 3,pp. 1173–1190, Jul. 2018.
Yao, Y. Shi, S. Weng, and B. Guan, ‘‘Deep learning for detection of object-based forgery in advanced video,’’Symmetry , vol. 10, no. 1, p. 3,2017.
H. Lee, C. Ekanadham, and A. Y. Ng, ‘‘Sparse deep belief net model for visual area V2,’’ in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2008, pp. 873–880.
Y. Bengio, J. Louradour, and P. Lamblin, ‘‘Exploring strategies for training deep neural networks,’’ J. Mach. Learn. Res., vol. 10, no. 1,pp. 1–40, Jan. 2009.
J. Chen, X. Kang, Y. Liu, and Z. J. Wang, ‘‘Median filtering forensics based on convolutional neural networks,’’IEEE Signal Process. Lett., vol. 22,no. 11, pp. 1849–1853, Nov. 2015.
Y. Zhan, Y. Chen, Q. Zhang, and X. Kang, ‘‘Image forensics based on transfer learning and convolutional neural network,’’ in Proc. 5th ACM Workshop Inf. Hiding Multimedia Secur. IHMMSec,Philadelphia, PA, USA, 2017, pp. 165–170.
K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for large-scale image recognition,’’ 2014, arXiv:1409.1556 . [Online]. Available: https://arxiv.org/abs/1409.1556
Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process. Syst. , Montreal, QC, Canada, 2012, pp. 1097–1105.
P. Swietojanski, A. Ghoshal, and S. Renals, ‘‘Convolutional neural networks for distant speech recognition,’’IEEE Signal Process. Lett., vol. 21,no. 9, pp. 1120–1124, Sep. 2014
Cao, Z. Wang, P. Yan, and X. Li, ‘‘Transfer learning for pedestrian detection,’’Neurocomputing , vol. 100, pp. 51–57, Jan. 2013.
D. Wu, F. Zhu, and L. Shao, ‘‘One shot learning gesture recognition from RGBD images,’’ in Proc. IEEE Comput. Soc. Conf. Computer. Vision Pattern Recognit. Workshops , Providence, RI, USA, Jun. 2012,pp. 7–12.
Saddique, Mubbashar, et al. "Classification of authentic and tampered video using motion residual and parasitic layers." IEEE Access 8 (2020): 56782-56797.
Kobayashi, Michihiro, Takahiro Okabe, and Yoichi Sato. "Detecting video forgeries based on noise characteristics." Pacific-Rim Symposium on Image and Video Technology. Springer, Berlin, Heidelberg, 2009.
D’Amiano, Luca, et al. "A patchmatch-based dense-field algorithm for video copy–move detection and localization." IEEE Transactions on Circuits and Systems for Video Technology 29.3 (2018): 669-682.
Hsu, Chih-Chung, et al. "Video forgery detection using correlation of noise residue." 2008 IEEE 10th workshop on multimedia signal processing. IEEE, 2008.
Su, Lichao, Huan Luo, and Shiping Wang. "A novel forgery detection algorithm for video foreground removal." IEEE access 7 (2019): 109719-109728.
Qadir, G., Yahaya, S., & Ho, A. T. (2012). Surrey university library for forensic analysis (SULFA) of video content.
Di Martino, F., &Sessa, S. (2012). Fragile watermarking tamper detection with images compressed by fuzzy transform. Information Sciences, 195, 62-90.
Shivakumar, B. L., & Santhosh Baboo, L. D. S. (2010). Detecting copy-move forgery in digital images: a survey and analysis of current methods. Global Journal of Computer Science and Technology, 10(7).
Gopinath, N., Shyry, S.P. Secured: quantum key distribution (SQKD) for solving side-channel attack to enhance security, based on shifting and binary conversion for securing data (SBSD) frameworks. Soft Comput (2022). https://doi.org/10.1007/s00500-022-07479-w
Shaid, S. Z. M. (2009). Estimating optimal block size of copy-move attack detection on highly textured image (Doctoral dissertation, Thesis Submitted to the University of Technology, Malaysia, 2009.

No competing interests reported.

Download PDF

Journal Publication

published 27 May, 2024

Read the published version in Neural Computing and Applications →

Version 1

posted

You are reading this latest preprint version

Detection of Tampered Real Time Videos Using Deep Neural Networks

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Related work

3. Proposed system

3.5.1 VIFFD - Video Inter-Frame Forgeries Detection

3.5.4. REWIND_3D DATASET

3.5.5. Tampered Video Dataset

4. Experimental Result

4.1 Observations

4.1.1 Train Test Split

4.1.2 Accuracy

4.1.3 Confusion Matrix

4.1.4 Moving Video Dataset

4.1.5 Combined Video Dataset

5. Conclusion

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1