A comprehensive evaluation of the model’s performance unveils distinctive trends for loss and accuracy as mentioned in Table 1. The custom model, crafted with separable convolution layers, displays the highest validation accuracy and training accuracy among the models being assessed. It notably surpasses the performance of both InceptionNetV3 and ResNet50V2, attaining a validation accuracy that stands out significantly as shown in Fig. 6. The results are also mirrored in its training accuracy, where the proposed customized model displays robust learning capabilities.
Table 1
Accuracy and Loss of Deep Learning Models
Model | Validation Accuracy | Training Accuracy | Training Loss | Validation Loss | Custom model | 0.9000 | 0.9688 | 0.0137 | 0.1043 | InceptionNetV3 | 0.7148 | 0.9240 | 0.5184 | 0.4958 | ResNet50V2 | 0.6716 | 0.7196 | 0.5068 | 0.5854 | MobileNetV2 | 0.7727 | 0.9026 | 0.5718 | 0.6068 | |
Furthermore, the custom model excels in minimizing training loss and validation loss, demonstrating its effectiveness in image tampering detection. In contrast, InceptionNetV3, though renowned for its feature extraction prowess [6], achieves 71.48% validation accuracy as shown in Fig. 3. This is further emphasized by its comparatively higher training and validation loss values which are 51.84% and 49.58% respectively, indicating a struggle to achieve good precision. ResNet50V2 model known for its depth and skip connections [5], demonstrates 67.16% validation accuracy in comparison, emphasizing the advantages of customization as shown in Fig. 4. Its training accuracy falls short of expectations, and the training and validation losses remain relatively high, indicating a challenge in capturing the intricate features essential for tampering detection.
MobileNetV2 [7], as shown in Fig. 5 while showing a competitive validation accuracy and training accuracy which is 77.27% and 90.26% respectively, slight lags in the validation loss section, which could be attributed to its inherent trade-offs in terms of computational efficiency. Nonetheless, the findings underline the superior performance too, which successfully combines efficient feature learning with faster convergence, showcasing its potential as a promising solution for image tampering detection.
In the evaluation of image tampering detection models, precision, recall, and F1 score serve as crucial performance metrics. In this context, the results highlight the exceptional performance of the custom CNN, showcasing its capability in achieving a harmonious balance between precision and recall. The custom model achieved an impressive F1 score of 0.96, indicating its proficiency in both identifying tampered regions within images and minimizing false positives as mentioned in Table 2.
Table 2. Precision, Recall and F1 Score of Deep Learning Models Model | Precision | Recall | F1 Score | InceptionNetV3 | 0.8586 | 0.8541 | 0.8538 | ResNet50V2 | 0.7409 | 0.7263 | 0.7278 | MobileNetV2 | 0.8153 | 0.8158 | 0.8154 | Custom model | 0.9562 | 0.9683 | 0.9622 | |
The proposed customized model demonstrates a strong F1 score of 0.9622, indicating a symmetric precision and recall. In comparison, InceptionNetV3 achieves an F1 score of 0.8538, ResNet50V2 records 0.7278, and MobileNetV2 attains 0.8154. These scores signify the nuanced trade-off between precision and recall for each model, offering insights into their respective strengths in identifying tampered regions within images, positioning it as an asset in applications. In results and analysis, it is evident that training times for each model significantly impact their practical utility. As shown in Fig. 7, The proposed CNN outshines the competition with a training time of 573.4 seconds for 20 epochs, which is a substantial improvement over the other models. In comparison, MobileNetV2 required 2250.5 seconds, InceptionNetV3 took 2413.5 seconds, and ResNet50V2 demanded 2220.5 seconds for the same number of epochs. What's particularly striking is that proposed model, despite having a similar number of parameters as MobileNetV2, reduced the training time by nearly 75%.
This reduction in training time showcases the efficiency of the proposed customized CNN, thanks in part to the utilization of separable convolution layers, which expedite the learning process and enhance overall training efficiency. A comprehensive evaluation of the model parameters underscores the efficiency and compactness of the proposed customized CNN architecture in image tampering detection. With a remarkably lean parameter count of 5,450,988, the custom model showcases an adept balance between complexity and performance as shown in Fig. 8. In contrast, InceptionNetV3, ResNet50V2, and MobileNetV2 exhibit substantially larger parameter counts, registering at 25,878,802, 27,908,998, and 5,047,298, respectively. This notable discrepancy in parameter sizes emphasizes the streamlined architecture of custom model, demonstrating that good performance need not necessitate an excessive number of parameters. The judicious management of parameters in customized CNN not only contributes to computational efficiency but also underscores its potential for applications where resource constraints are a critical consideration.
4.1 Hyperparameter Optimization
Hyper parameter optimization is a critical component of the proposed research methodology. To guarantee the optimal fine-tuning of custom CNN and the pretrained models (ResNet50V2, InceptionNetV3, and MobileNetV2) for image tampering detection, an extensive and systematic hyperparameter search has been used. The hyperparameter search involved running hundreds of training sessions, each characterized by a unique combination of hyperparameters. Key parameters that were subjected to this optimization process included learning rates, weight decay, dropout rates, and architecture-specific settings. Each of these hyperparameters plays a crucial role in the performance and behavior of a neural network. Therefore, determining the most suitable values for these hyperparameters was paramount. By systematically exploring the hyperparameter space, the model gained insights into the intricate interplay between these parameters and the models' ability to generalize and adapt effectively to the proposed work. This comprehensive approach to hyperparameter optimization reflects to ensuring that model, reached its full potential. It not only enhanced the performance of the models but also contributed to the reliability and robustness of comparative analysis, providing a solid foundation for the findings and conclusions.
4.2 Optimization Techniques:
In proposed work, the choice of optimization techniques played a crucial role in fine-tuning the pretrained CNN architectures for image tampering detection. Adam optimizer has been used which is widely acclaimed and renowned for its efficacy in training deep neural networks. Adam combines the benefits of both the momentum-based updates. This adaptive optimization algorithm dynamically modifies the learning rates for individual parameters, leading to accelerated convergence and more effective training. To further enhance the training performance and prevent the models from converging prematurely or getting stuck in local minima, learning rate scheduling mechanism has been implemented. Learning rate scheduling is a dynamic adjustment of the learning rate during training. This mechanism monitored the model's performance, and when it detected a plateau in accuracy, it automatically decreased the learning rate.
4.3 Addressing Overfitting:
Overfitting, a common hurdle in deep learning, occurs when a model excessively tailors itself to the training data, often incorporating noise and irrelevant patterns that hinder its ability to generalize effectively to unseen data. To tackle this issue, various strategies were executed with the aim of improving the models' ability to perform well on data they had not encountered before. First and foremost, L1 regularization has been applied which was a vital tool in arsenal against overfitting. L1 regularization works on to the network's weights, encouraging sparsity within the model. This regularization technique introduced a penalty on the magnitude of the weights, promoting a more parsimonious set of features. By doing so, L1 regularization encouraged the model to focus on the most relevant and informative features while discouraging the overemphasis on noise and irrelevant details in the training data. This process played a significant role in enhancing the model's performance with unseen data, a critical factor in image tampering detection where adaptability and accuracy are paramount. The dropout layers are incorporated at strategic points in the network architecture. These dropout layers played a pivotal role in preventing overfitting by randomly deactivating a fraction of neurons during training. This randomness introduced a degree of uncertainty into the model's learning process, effectively discouraging it from becoming overly reliant on specific features or neurons. Furthermore, batch normalization [14] was a key component of the proposed strategy to combat overfitting. This technique was applied to standardize the input to each layer during training. By doing so, batch normalization mitigated the effects of internal covariate shift [19], a phenomenon where the distribution of layer inputs changes during training. Batch normalization aided in improving the models' generalization by ensuring that the learning process was not hindered by abrupt shifts in data distributions. It contributed to the overall stability and robustness of the models, which is paramount in the context of image tampering detection where detecting subtle variations and manipulations are essential.