Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets

Scientific image tampering is a problem that affects not only authors but also the general perception of the research community. Although previous researchers have developed methods to identify tampering in natural images, these methods may not thrive under the scientific setting as scientific images have different statistics, format, quality, and intentions. Therefore, we propose a scientific-image specific tampering detection method based on noise inconsistencies, which is capable of learning and generalizing to different fields of science. We train and test our method on a new dataset of manipulated western blot and microscopy imagery, which aims at emulating problematic images in science. The test results show that our method can detect various types of image manipulation in different scenarios robustly, and it outperforms existing general-purpose image tampering detection schemes. We discuss applications beyond these two types of images and suggest next steps for making detection of problematic images a systematic step in peer review and science in general.


I. INTRODUCTION
The use of digital images has become increasingly ubiquitous in all types of publications.What comes with the growing importance of digital images is the development of image tampering techniques.In the past, modifying or concealing the content of an image would require dedicated personnel and tools.Today, however, image tampering is much easier with state-of-the-art image processing software.This trend has affected many aspects of our society, as we see prominent forgery cases occur in journalism and academia [1].Consequently, many detection techniques have been developed for these scenarios (see [2]).Only recently, however, attention has been paid to image manipulation in scientific publications [3].Although it is possible to use existing methods on scientific images directly, we hypothesize that significant adaptations must be made due to the fact that they usually possess distinctive statistical patterns, formats and resolutions.In this work, we aim at developing a scientific-specific image manipulation detection technique, which we test on a novel scientific image manipulation dataset of western blots and microscopy imagery-there are no datasets openly available about scientific image manipulation yet (but see [4]).Thus, as most scientific images increasingly come in digital form, the detection of possible manipulations should also get at the same level of quality as other fields that use digital images only.
It is undeniable that an increasing amount of tampered images are finding their ways into scientific publications.Bik, Casadevall and Fang [5] examined 20,621 biomedical research papers from 1995 to 2014, where they find that at least 1.9 percent are subject to deliberate image manipulation.The fact that these suspicious papers went through the careful reviewing process suggests how difficult it is to examine image tampering in scientific research manually.Because the large quantity of digital images present in submitted manuscript, it is be crucial for publishers to be able to identify image manipulation in an automated fashion.
The scientific research context sets a different tolerance for image manipulation.Many operations, including resizing, contrast adjusting, sharpening, and white balancing are generally acceptable as part of the figure preparation process.However, some others types of tampering, especially the ones that alter the image content semantically, are strictly prohibited.These manipulations include copy-move (without proper attribution), splicing, removal, and retouching 1 .Acuna, Brookes and Kording [6] developed a method to detect figure element reuse across a paper database.Intra-image copy-move can be detected rather robustly with SIFT features and pattern matching [7].However, detection of image manipulation that does not involve reuse is significantly more challenging.A comprehensive scientific image manipulation detection pipeline should include manipulation detection.
As scientific papers are reviewed by experts, we reckon that articles containing manipulations that incur in contextual inconsistencies (e.g., brain activation patterns from fMRI in the middle of a microscopy image) will be easily picked out.What humans cannot see properly is the noise pattern within an image-and scientists seeking to falsify images exploit this weakness.Therefore, we propose a novel image tampering detection method for scientific images, which is based on uncovering noise inconsistencies.Specifically, our proposed method contains the following features: 1) It is based on supervised learning, which is capable of learning from existing databases and new instances.
2) It works for images of different resolutions and from different devices.3) It is not restricted to any specific image format.4) It is capable of generating good predictions with a small training set.5) It is flexible and can be fine-tuned for different fields of science.
In section II, we briefly summarize previous work on digital image forensics.In section III, we discuss the design of our proposed method.In section III-B2, we introduce our scientific image manipulation datasets and present the test results of 1 https://ori.hhs.gov/education/products/RIandImages/guidelines/list.htmlour method on them.In section V, we conclude by discussing limitations and future extension of our method.
The second class of tampering detection methods aims at general-purpose image tampering detection.Dirik and Memon [26] try to catch the inconsistency of Color Filtering Array (CFA) patterns within images taken by digital cameras-a signal generated by digital cameras.However, scientific images are not necessarily taken by digital cameras.Wang, Dong and Tan [27] leverage the characteristics of the DCT coefficients in JPEG images to achieve tampering localization, but the method is confined to a specific format.Mahdian and Saic [28] propose a method that predicts tampered regions based on wavelet transform and noise level estimation.All these methods are unable to learn from data, which limits their abilities to generalize to different fields.Another group of methods combines steganalysis tools [29], [30] with Gaussian Mixture Models (GMM) to identify potentially manipulated regions [31], [32].These unsupervised-learning-based methods are also unable to learn from existing database effectively and therefore tend to underperform in practice.
Because of the occurrence of large image datasets, neuralnetwork-based tampering detection methods are likely to yield good performance [33], especially those based on Convolutional Neural Networks (CNN) [34], [35], [36].They usually target high resolution natural images.It is unclear, however, whether they can be transitioned for the scientific scenario.For example, it is challenging to train such a network for scientific images exclusively as they usually require tens of thousands of images as training data, which to the best of our knowledge is not yet available.

III. OUR PROPOSED METHOD
Our method is based on a combination of several heterogeneous feature extractors that are later combined to produce single predictions for patches (Figure 1).At first, an input image will go through a variable amount of residual image generators.The type and amount of these generators can be chosen based on the application.Each type of residual image will have its own feature extractor, which is based on our proposed feature extraction scheme with (possibly) different configurations.The features are then fed into a classifier after post-processing.
The proposed method works on residual images, which are essentially image after filtering or the difference between an image and its interpolated version.It is a way to discard content and emphasize noise pattern within an image, which is widely used in image manipulation detection practice.However, in many previous works, only one type of residuals is used [26], [32], [36].Because each residual may have different sensitivity levels to different types of manipulation, using only one not only limits the method's ability to detect a wide variety of manipulation, but also renders the method more vulnerable against adversaries.Therefore, we decide to combine a number of residuals in our method to increase the robustness.
Because our feature extraction method drastically reduces the dimensionality of image data, which relieves the need of a huge amount of training data, it is possible to use a light-weight classifier as the back end, such as logistic regression or support vector machine (SVM).As there are many ways to generate residual images, and that the feature extraction method comes with a number of parameters to decide, our image manipulation detection method possesses high degree of flexibility.Unlike the parameters in neural networks, for example, which are rather obscure for human beings, the underlying meanings of the parameters in our feature extraction method are straightforward.Therefore, it is easier for one to manually adapt our method for different fields.

A. Residual Image Generators
There are numerous ways of generating residual images, we list the following ones because they are functional for a wide range of applications.Note that the capability of our method is significantly influenced by the choice of residuals.However, it is possible to design new residual image generators for specific scenarios.
1) Steganalytic Filters Steganalysis (techniques used for detecting hidden messages in communications) has been used in image tampering detection practice extensively.This type of analysis aims to expose hidden information planted in images by steganography techniques.Although it is not directly linked to image tampering detection, it is suggested that that the tasks of image forensics and steganalysis are very much alike when the action of data embedding in steganography is treated as image manipulating [37].Similar to the rich model strategy proposed in [29], we can apply many different filters and see which one can spot inconsistencies.In our work, we use several filters that provide a relatively comprehensive view of potential inconsistencies (Figure 2).The filters selected are high-pass because we want to throw away information about the image content and emphasize noise patterns as much as possible.The residual image in this case is the image after convolution.An example of steganalytic filtering residual is shown in Figure 3.

2) Error Level Analysis (ELA)
ELA is an analysis technique that targets JPEG compression.The idea behind it is that the amount of error introduced by JPEG compression is nonlinear: a 90quality JPEG image resaved at quality 90 is equivalent

Global Info (k-means)
Proximity Info Postprocessing Classifer Fig. 1.Overall design of our proposed method.The input image goes through several residual generators and feature extractors in parallel.All extracted features will be merged in a postprocessing step and then fed to a classifier.
Fig. 2. High-pass filters selected in our experiment.
to a one-time save of quality 81; a 90-quality JPEG image resaved at quality 75 is equivalent to a one-time save of quality 67.5 [38]; and so on.If some part of a JPEG-compressed image is altered with a different JPEG quality factor, when it is compressed again, the loss of information of that part will differ from other regions.
To uncover the inconsistency, ELA residual is computed by intentionally resaving the image in JPEG format with a particular quality (e.g.90) and then computing the difference of the two images.An example of ELA residual is shown in Figure 4.

3) Median Filtering Residual
Median filtering can suppress the noise of an image.When applying median filtering to a tampered image, the tampered part may possess a different noise pattern and therefore respond differently.The median filtering residual is the difference between the original image and median filtered image.An example is shown in Figure 5.

4) Wavelet Denoising Residual
Wavelet denoising is a type of denoising method that represents an image in wavelet domain and cancels the noise based on that representation.Similar to the median filtering residual's case, the tampered region may react differently compared to the rest of the image and therefore give away its own identity.It is also suggested by Dirik and Memon [26] that using wavelet denoising can uncover the sensor noise inconsistency of digital cameras.The wavelet denoising residual is given by the difference between the original image and the denoised image.An example is shown in Figure 6.
It is worth noticing that the tampered images in the demonstrations are selected so that the manipulation pattern is visible in the specific residual.However, in practice, this may not always be the case.Usually it is necessary to examine multiple residual images before drawing a conclusion.

B. Feature Extraction
Our method is patch-based, which means it will generate a prediction for each patch in the image.Using patches instead of single pixels to represent an image not only shrinks the scale of computation, but also enriches the amount of statistical information within each smallest unit.At the limit, the patch size can be chosen so that pixel-based and patch-based become almost the same.After deciding on the patch size, the feature extraction step will generate a corresponding feature vector for each patch in the image.In this section, we discuss how these features vectors are computed.
1) Patch Reinterpretation: Residuals reduce the complexity of image data, but they still have the same dimensionality as the original image.To further compress data for classification, we propose a new feature extraction method for image tampering detection.Intuitively, an image region is considered to be tampered not because it is unique itself, but mainly due to the fact that it is different from the rest of the image.Therefore, an ideal feature design should contain sufficient amount of global information.We add global information by reinterpreting an image region using the rest of the image.First, an input image of size (h, w) will be divided into patches of size (m, n).If the shapes are not divisible, the image will be cropped to the nearest multipliers of each dimension.Therefore, an image of size (h, w) will be divided into a patch matrix of size ( h/m , w/n ).
Then, the patch matrix will be split into a rectangular patch grid of size (s, t), where each cell contains a certain number of patches.The number of patches in most cells is except for those cells on the edges, which may have fewer patches.
For each cell in the grid, we fit an outlier detector that is capable of telling the likelihood of a new sample being an outlier.Given a patch p, it can be reinterpreted by a vector v, which is given by v = (l 11 (p), l 12 (p), l 13 (p), . . ., l 1t (p), An illustration of this reinterpretation method is shown in Figure 7, where black blocks represent patches, red blocks represent grid cells and the yellow region represents the tampered region.In this case, (s, t) = (3,4).Because the tampered region has a different residual pattern, and its contaminated patches concentrate in one of the cells, the outlier detector of that cell will learn a distinct decision boundary compared to other ones.As a result, an authentic patch p a will have lower outlier likelihood in all components except for l 23 (p a ); a tampered patch p t will have higher outlier likelihood in all components except for l 23 (p t ).This difference in structure allows us to distinguish between authentic and tampered patches.In practice, we use the histogram of v (denoted by v h ), which not only encodes the structure in summary-statistics space, but also becomes position invariant.I.
2) Feature Design: Besides v h , we include some other information in order to concentrate more global information within the feature.The final feature of a patch contains the following components: 1) v h : the histogramed patch reinterpretation.After generating all histogramed reinterpretations of an image, we normalize them to [0, 1]. 2) Proximity information: how much the patch differs from its neighborhood.We choose the Euclidean distance between the histogramed reinterpretation of the patch and those of its surrounding neighbors'.3) Global information: how much the patch differs from the entire image.After computing the histogramed reinterpretations for all patches within an image, we apply k-means clustering on them, which generates a set of weights and cluster centroids.The additional global information of a patch is given by the Euclidean distance between the reinterpretation and the cluster centroids, as well as the corresponding weights of the centroids.

EXPERIMENTS
Due to the lack of science-specific image manipulation detection databases, we synthesize our own database for the experiments.

Datasets
Our novel scientific image manipulation datasets mainly consist of the following three types of manipulations: 1) Removal: covering an image region with a single color or with noise.We manually select a rectangular region to be removed from the image.Then we select another rectangular region to sample the color or noise to fill the removal region, where we can compute the mean µ and standard deviation σ of the pixels.We generate four images for each pair of selection according to the configuration given in Table II.2) Splicing: copying content from another image.We randomly choose a small region from the foreground image and paste it at an arbitrary location on the background image.To create noise inconsistency, the region will either be recompressed with JPEG or processed with sharpening filters.3) Retouching: modifying the content of the image.We will randomly choose a small region within an image and apply Gaussian blurring to it.
These manipulations are selected because we believe that they are more prevalent in problematic scientific papers.We build two datasets that contain western blot images and microscopy images, respectively.We choose images around these two topics because of their frequency in the literature huge, and they are more susceptible to manipulation.We also create a natural image dataset to compensate for the lack of microscopy images for training.It is only used in the training phase.The details of datasets are shown in Table IV.The meanings of tampering type abbreviations are shown in Table III.

C. Test Configurations
The sizes of images in the western blot collection are significantly smaller.Therefore, we need to train a special model for them.For the microscopy model, we added natural images into the training set to compensate for the lack of data.The patches from residual images are transformed into frequency domain by Discrete Cosine Transform (DCT) because it yields slightly better performance.Within each model, the parameters of each feature extractor are the same.Detailed configurations of the two models that we trained are shown in Table V.
We use a one-class SVM outlier detector [39], provided by scikit-learn [40], which is based on LIBSVM [41].The kernel we use is radial basis function, whose kernel coefficient (γ) is given by the scale, which is 1 number of features × variance of all inputs .
The tolerance of optimization is set to 0.01; and ν (the upper bound on the fraction of training errors and the lower bound on the fraction of support vectors) is set to 0.1.Note that the choice of parameters can significantly influence the speed of feature extraction.One of the most expensive operations is fitting SVM, which has a computational complexity of O(N 3 ), where N is the number of patches in each grid cell.Therefore, it is important to choose an appropriate (m, n) and (s, t) pair.With our Python implementation and the configuration given in Table V, the extraction speed for western blots is approximately 212.36 sec/megapixel (12.13 sec/image), while the extraction speed for microscopy images is approximately 86.15 sec/megapixel (49.32 sec/image).We tried to use ThunderSVM [42], which is a GPU-accelerated SVM implementation.Although it has a much higher speed, its precision is not ideal compared to LIBSVM.Therefore, our experiments are conducted with LIBSVM only.
The number of centroids of k-means clustering is set to k = 6, and the clustering algorithm is run 150 times with different initializations in order to get a best result.We select this particular value of k because when we apply k-means clustering to v h , the tampered region would usually blend with other clusters unless there are more than 6 centroids.Therefore, we consider it reasonable to represent the major content of an image by its first 6 cluster centroids.Because the dimensionality of the extracted feature is not very high, the outputs of each feature extractor are simply concatenated into a single feature vector and then fed to the classifier.The classifier we use is a simple Multilayer Perceptron neural network.For the western blot model, we use a four-layer network with 200 units per layer; for the microscopy model, we use a similar network with 300 units per layer.Softmax regression is applied to the last layer to get the classification results.

IV. RESULTS
The performance evaluation metric that we use are patchlevel accuracy, AUC scores, and F1 scores.We compare the performance of our model with two baseline models, which are widely compared against in related papers: 1) CFA [26]: a method that uses nearby pixels to evaluate the Camera Filter Array patterns and then produces the tampering probability based on the prediction error.2) NOI [28]: a method that finds noise inconsistencies by using high pass wavelet coefficients to model local noise.For our method, the threshold for F1 score is 0.5.For the baseline methods, their output map is normalized to [0, 1], and the F1 score is acquired by setting the threshold to 0.5.
Table VIII shows the accuracies of the three methods on genuine images, where AUC and F1 scores does not apply.Table VI and VII shows the AUC scores and F1 scores of our methods compared to the baseline.The meanings of the abbreviations can be seen in Table III.The "overall" scores are computed across the entire dataset, including genuine images.A visual comparison of the results of each method is shown in Figure 8.
It can be seen that CFA cannot handle western blot images very well, as it has low accuracy on genuine images.Its performance on J, F and B tampering types are also mediocre.NOI has better behavior at locating noisy regions in the image, but it fails drastically when encountering manipulations that contain less noise.It constantly treats R[0] and B manipulations as negatives, which yields a false negative region that is not always separable.Its performance on J images is not very satisfactory as well.Generally speaking, the performance of our method is more consistent across different types of manipulations, which makes it more reliable in practice.

V. CONCLUSION AND DISCUSSION
We have proposed a novel image tampering detection method for scientific images, which is based on uncovering noise inconsistencies.We use residual images to exploit the noise pattern of the image, and we develop a new feature extraction technique to lower the dimensionality of the problem so that it can be handled by a light-weight classifier.The method is tested on a new scientific image dataset of western blots and microscopy imagery.Compare to two base line methods popular in the literature, results suggest that our method is capable of detecting various types of image manipulations better and more consistently.Thus, our solution promises to solve an important part of image tampering in science effectively.
There are also some weaknesses in our study.First, our proposed method is tested on a custom database, which only contains a small amount of samples.We only include several types of manipulations in our datasets, which is rather monotonous compared to the space of all possible image tampering techniques.Nonetheless, the choice of these specific 3 format: R[noise standard deviation] image sources and manipulation types is inspired by existing problematic papers.If our method is capable of detecting these manipulations to some extent, we believe that it can make valuable discoveries once put into practice.
Second, we think that noise-inconsistency-based methods do possess certain limitations.For example, not all manipulation will necessarily trigger noise inconsistency; it is also easier for one to hide the noise inconsistency, had he/she known the underlying mechanism of the automatic detector.This kind of adversarial attack, however, is significantly challenging and unlikely to be done by the average scientist.In the future, we want to develop more advanced methods that take both image content and noise pattern into account.
However, our proposed method is one of the first methods that tackles scientific image manipulation directly.Put together in screening pipelines for scientific publications (similar to [6]), our method would significantly expand the range of manipulations that could be captured at scale.It also makes predictions based on many types of residuals, which possesses improved robustness.The method a set of easily adjustable parameters, which allows it to be adapted for different fields with less effort and a smaller amount of training data.
We would like to continue extending the database with more images from various disciplines to make it standard and comprehensive, and report test results on the updated version.It is our hope that the datasets that we propose can also be useful for the nascent Computational Research Integrity research area.But we are also facing a major difficulty: there are no openly available datasets on images that actually come from science (although see the efforts in [4]).The images that we currently have are collected from the Internet, and form a small but significant portion of images with manipulation issues.Unfortunately, access to problematic scientific images are tend to be removed from the public soon after retraction.So far, neither publishers nor authors are yet willing to share those images for understandable reasons.Hopefully, once scientific image tampering detection methods prove their efficacy, publishers and funders can start to share and create datasets with proper safeguards to check for potential problems during peer review -similar to how they do it with full-text through the Crossref organization 4

TABLE II IMAGE
GENERATION CONFIGURATION OF REMOVALS.THERE MEAN OF THE REMOVAL REGION IS EQUAL TO THAT OF THE SAMPLE REGION'S, BUT WE VARY THE STANDARD DEVIATION FROM ZERO (PURE COLOR) TO TWO STANDARD DEVIATIONS TO CREATE DIFFERENT VISUAL EFFECTS. .
Fig. 8. Visual comparison of the results.