Two-stage textured-patterns embedded QR codes for printed matter authentication

The popular use of high quality printing and scanning equipment makes it easier to counterfeit important printed matter, such as important document, anti-counterfeit label on merchandise, packaging, etc. Previously, methods such as textured image, two level quick response (2LQR) code, fractional-order spatial steganography, etc., have been used for printed matter authentication. In this paper, a simplified model is proposed to analyze the variation of the correlation of textured-patterns (hereinafter called patterns) after the print-and-scan (P&S) process, which gives accurate mathematical descriptions of the information loss during P&S processes. Based on previous theoretical results, we propose the two-stage textured-patterns embedded QR (2SQR) codes for printed matter authentication, which are based on 2LQR codes and sensitive to P&S process. In the generation process, the stage two (S2) patterns are introduced, which are the corresponding P&S version of the textures in the 2LQR codes, i.e., stage one (S1) patterns. Then part of the S1 patterns in 2LQR codes are replace with the corresponding S2 patterns, resulting the 2SQR codes. In the authentication process, the 2SQR indicator is used to eliminate the need for the original image, which eliminates the storage and retrieval process of the original images. Experiments show that, compared with the 2LQR codes, the 2SQR codes significantly improve the performance in distinguishing the authentic codes and their copied version.


Introduction
With the popularity of the mobile Internet, quick response (QR) codes are becoming more and more common in our lives. As a kind of machine-readable representation of data, QR codes enable the cameraequipped device to easily capture information printed on the surface of the material or displayed on the screen. In recent years, QR codes have been widely used in the fields of product traceability [1, 2], document authentication [3], bus tracking [4], peeping protection [5], etc. Although the QR codes are originally designed for reliable information storage, their printed versions can easily suffer illegal copying or cloning, especially in recent years with the popular use of high quality scanning and printing devices. The counterfeiting of the printed QR codes makes the data of the traceability systems in [1] and [2] unreliable, thereby reducing its credibility. Similarly, in [3] the QR codes printed on a document cannot be distinguished between the originally printed version and the illegally copied version. The inability to distinguish between * Correspondence: issthz@mail.sysu.edu.cn 1 School of Electronics and Information Technology, Sun Yat-sen University, 510006 Guangzhou, China Full list of author information is available at the end of the article original printed QR codes and their copied versions is a problem in QR code applications.
To solve this problem, many methods have been proposed, such as steganography-based methods [6] and texture-based methods [7,8,9]. Steganography-based method in [6] hides the entire QR code image into the color channel of the carrier image, so a relatively expensive color printer is required. In addition, the QR code image needs to be restored from the carrier image before it can be read, which increases the processing time and is not compatible with ordinary QR code reading programs. In [7] the textured image built by combining two types of textured-patterns (hereinafter called patterns) is used for identification of differences between printed legitimate document and printed fake document, which exploits the fact that a counterfeiter does not have access to the original digital document. But this texture image is only used for identification, and does not carry information. Then in [8,9] the dark modules of QR code are replaced with patters to form a new textured image called two-level QR (2LQR) code. The 2LQR code is one of the most promising solutions thanks to low-cost generation and easy integration [10]. The patterns in 2LQR codes are   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 sensitive to print-and-scan (P&S) process. During legitimate authentication, the printed 2LQR codes are scanned. Then the correlation of patterns between the original 2LQR codes and the scanned 2LQR codes is computed. If the correlation value is greater than the predetermined threshold, the scanned 2LQR codes are considered to be authentic. However, the authentication of 2LQR codes needs the original image, which requires a lot of storage space and time during authentication, especially in the application scenarios of massive nodes in internet-of-things (IoT) applications. Moreover, the difference of correlation values between authentic codes and non-authentic codes, which is described as sensitivity, is usually small [10], which makes the possibility of misjudgment relatively high during authentication.
In this paper, to remove the requirement of the original image and to improve the sensitivity in authentication, the 2LQR codes based two-stage texturedpattern embedded QR (2SQR) codes are proposed. Firstly, we propose a simplified model for the P&S process to analyze the variation of the correlation of patterns after P&S process. Based on the theoretical results obtained from the previous model, extra steps are introduced to replace part of the patterns in 2LQR codes with their P&S versions in the generation process, wich results in the 2SQR codes. In the authentication process, the 2SQR indicator is used to measure the degradation of patterns, where only the scanned image of the 2SQR code to be tested is needed. Therefore, using 2SQR codes instead of 2LQR codes can save a lot of storage space and time to retrieve the original image during authentication.
The paper is organized as follows. Section 2 is an introduction of work based on textured-patterns, the process of P&S and correlation coefficients that are usually used as mesurements of the information loss in printed matter authentication. Section 3 is the main method of this paper. A simplified model for the P&S process and the analysis of variation of the correlation of patterns after P&S processes are presented in Subsection 3.1. The proposed 2SQR codes are presented in Subsection 3.2. In Section 4, the experiments verify the correctness of the simplified model and the reliability of 2SQR codes in authentication. Finally, Section 5 is conclusions and future work.

Related work
This section contains three subsections. A brief introduction to current work based on textured-patterns and QR codes is given in Subsection 2.1. Subsection 2.2 is a short overview of P&S process. Subsection 2.3 introduces the correlation coefficients used for mearsuring the degration of patterns in the paper.

QR codes and textured-patterns
QR code is a matrix symbology and the symbols consist of an array of nominally square modules arranged in an overall square pattern [11], as shown in Fig. 1. A dark module represents a binary one, and a light module represents a binary zero. A QR code mainly consists of two parts: function patterns and encoding regions. Function patterns are mainly used for positioning and other functions. If the Function patterns are modified, it will affect the normal recognition of the QR code. The Encoding region mainly contains data, error correction codes, and other information. Due to the fault-tolerant design of the QR code, a small number of modules modified in the data area will not affect its normal recognition.
As QR codes become more and more common in practice, many efforts have been made to improve it to support more application scenarios, such as QR Code beautification [12,13,14,15,16,17,18,19,20], increasing the storage capacity of the QR code [21,8] and so on. However, the inability to distinguish physical copying limits its applications. The texture-based anti-counterfeiting method emerging in recent years provides a possibility for anti-duplication of QR codes. Artificially constructed textures can be used to hide information [22] or perform copy detection [7]. In [7] Tkachenko discovered that it is possible to distinguish the difference between an authentic printed document and its copy by using a textured image containing a visual message. Inspired by the textured-pattern, Tkachenko presented the 2LQR codes that guarantee the readability of QR codes and can be used for printed document anti-counterfeiting [8]. Therefore, Tkachenko proposed a printed document authentication scheme that can detect unauthorized counterfeited printed documents based on the sensitivity of the specific textured-pattern in 2LQR codes during the P&S process [9].
The method mentioned above is based on the information loss principle, i.e., every time an image is printed or scanned, some information is lost about the original digital image [10]. Usually, correlations such as Pearson correlation are used to mearsure the lost information caused by printing or scanning.
Unfortunately, the information loss principle is just an intuitive description, which is not a rigorous mathematical model, such as measuring information loss with correlation coefficients. Moreover, The difference of correlation values between authentic codes and nonauthentic codes is tiny [10], so is not sufficiently easy to distinguish the authenticity. In addition, the authentication of 2LQR codes needs the original images, which requires a lot of storage space and time to be retrieved during authentication, especially in the IoT application scenarios of massive nodes .   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 In this paper, we propose a simplified model for the P&S process to analyze the variation of the correlation of patterns after P&S process, which gives accurate mathematical descriptions of the information loss during P&S processes. Based on the theoretical results obtained from the previous model, extra steps are introduced to replace part of the patterns in 2LQR codes with their P&S version in the generation process, wich results in the 2SQR codes. In the authentication process, the 2SQR indicator is used to measure the degradation of patterns, where only the scaned image of the 2SQR code to be tested is need. Therefore, using 2SQR codes instead of 2LQR codes can save a lot of storage space and time to retrieve the original image during authentication.
In this paper, a simplified model is proposed to analyze the variation of the correlation of patterns after the P&S process, which gives accurate mathematical descriptions of the information loss during P&S processes. Based on previous theoretical results, we propose the 2SQR codes for printed matter authentication, which are based on 2LQR codes and sensitive to P&S process. In the generation process, the stage two (S2) patterns are introduced, which are the corresponding P&S version of the textures in the 2LQR codes, i.e., stage one (S1) patterns. Then part of the S1 patterns in 2LQR codes are replace with the corresponding S2 patterns, resulting the 2SQR codes. In the authentication process, the 2SQR indicator is used to eliminate the need for the original image, which eliminates the storage and retrieval process of the original images.
Like the structure of 2LQR code, the 2SQR code replaces the dark modules with the patterns in the pattern set P 0 , which is designed to satisfy two following criteria [7]: • Each type of pattern in P 0 is better correlated with its own P&S version than with all P&S versions of other types of pattern in P 0 . • The P&S version of each type of pattern in P 0 is better correlated with its own original type of pattern than with all other original types of pattern in P 0 . How many types of patterns in P 0 depends on the actual application needs. For comparison, in the experiment in Section 4, we choose the same P 0 as in [8], which is shown in Fig. 2. It's called an original pattern set P 0 = {p 0 0 , p 1 0 , p 2 0 }, including three types of different patterns p 0 0 , p 1 0 , and p 2 0 , which are called the first, the second, and the third type of pattern in P 0 , respectively. The use of the specific pattern set will be described in detail in Section 4.

The process of physical clone and P&S
The process of physical clone (copy) of a QR code image is shown in Fig. 3. It mainly consists of two types of operations, namely printing and scanning. First, the digital QR code image, called original QR code expressed by I 0 , is printed to the surface of media, such as paper, to get its printed version, called genuine QR code and expressed by I P 0 , where the letter P in the subscript indicates that I P 0 is a printed version of the I 0 . This step is a process in which the QR code image transforms from the digital domain to the physical world. Second, I P 0 is scanned to get a digital QR code image, called P&S QR code, and expressed by I 1 . This step turns the physical image back again to the digital domain, but there is significant difference between I 1 and I 0 due to the distortions and noises produced in the P&S process. Third, I 1 is printed to get its printed version, called counterfeit QR code and expressed by I P 1 . This step is usually performed by the counterfeiter. Finally, I P 1 is scanned to get its digital version, called copied QR code (or double P&S QR code) and expressed by I 2 . The process mentioned above consists two P&S processes. The first P&S process get I 1 from I 0 and the second P&S process get I 2 from I 1 .
A P&S process is considered as a physically unclonable function [23]. Degradation due to printing cannot be distinguished from degradation due to scanning [7]. Therefore, it is hard to reproduce I 0 from I 1 , or reproduce I 1 from I 2 .
In the process to get I 2 from I 1 , like that of most counterfeits are produced, the distortions and noises produce irreversible modifications to I 1 [9]. Therefore, the P&S sensitive patterns in 2SQR codes can be used to ditinguish between I 2 and I 1 .

Correlation coefficient
The correlation coefficient (sometimes called correlation value) is a measure of the association between two random vectors, which takes the range of [−1, 1] [24].
The authentication scheme of the 2LQR codes maximizes the correlation values between P&S degraded patterns and reference patterns [8]. It is clear that different correlation coefficients lead to different performances of printed matter authentication. There are three correlation coefficients commonly used for matter authentication, as shown below.

Pearson correlation
Pearson correlation is widely used in measuring the association between two variables [24]. Given two random vectors X = (x 0 , x 1 , ..., x M −1 ) and Y = (y 0 , y 1 , ..., y M −1 ), where X is the original pattern, Y is the pattern X after P&S process, Pearson correlation between X and Y is i=0 y i are the mean of X and Y , respectively.

Kendall correlation
Kendall correlation is a non-parametric hypothesis test used to calculate the statistical dependence of two random vectors [25]. If there are two random vectors X = (x 0 , x 1 , ..., x M −1 ) and Y = (y 0 , y 1 , ..., y M −1 ), where the random vector X is the original pattern, the random vector Y is the result after vector X goes through the P&S process. The components in X and Y are paired with an element pair set XY , which contains element pairs (x i , y i )(i = 0, 1, ..., M − 1), and the element pairs are divided into three categories for the set XY . The first category is called concordant, where the rank of any two element pairs in the set XY is the same, i.e., x i > x j and y i > y j x i < x j and y i < y j .
The second category is called discordant, where the rank of any two element pairs in the set XY is not the same, i.e., x i > x j and y i < y j x i < x j and y i > y j .
The third category is neither concordant, nor discordant, where the rank of any two element pairs in the set XY is uncertain, i.e., Kendall correlation is where C and D represent the number of the first and second category of element pair in the set XY , respectively.

Mathematical model for the P&S process
The Pearson coefficient, based on probability theory, is most widely used and theoretically systematic. Therefore, the following analysis is based on the Pearson coefficient, and other correlation coefficients can be modeled in a similar way to obtain similar conclusions. The impacts of P&S process include pixel value distortion and geometric distortion [26]. In the application of copy detection, since the original image is known, the geometric distortion can be well corrected. Therefore, only pixel level distortion needs to be considered. The model and experiments in [26] show that pixel level distortion is mainly caused by low-pass filtering and high-frequency noise.

A simplified model for the P&S process
Based on [26], a simplified model for the P&S process is shown in Fig. 4. For convenience, one-dimensional signals are used for analysis. However, the results can be easily extended to two-dimensional signals. Both the input signal x(n) and the output signal y(n) are nonnegative discrete sequences. The output signal consists of two parts, y L (n) and y N (n), where y L (n) is the result of the input signal x(n) passing through the low-pass filter h L (n), and y N (n) is the result of multiplying the white Gaussian random noise N (n) and the high-frequency signal y H (n), which is obtained by passing the input signal x(n) through the high-pass filter h H (n), as shown in (7), where * is convolution.
It is assumed that the mean values of the input and output signals are equal, because the output signal can be adjusted by a scale factor to make its expectation E[y(n)] equal to the expectation of the input signal E[x(n)].
3.1.2 Analysis of the physical clone process using the simplified P&S model The analysis of the physical clone process in Fig. 3 with the proposed model is shown in Fig. 5(a), where x 0 (n) is a pattern in original QR codes I 0 , x 1 (n) is a pattern in P&S QR codes I 1 , and x 2 (n) is a pattern in copied QR codes I 2 , satisfying Assume that signals x 0 (n), x 1 (n) and x 2 (n) are widesense stationary stochastic processes, whose mean values and autocorrelation functions are shown in (9) and (10), respectively Other notation used in Fig. 5 is shown in Table 1. Assume that the filters h L,1 (n), h L,2 (n), h L,1 (n) and h L,2 (n) are linear, shift-invariant (LSI) systems, their ouputs are The output sinals of the first and second P&S process, respectively, are are defined by To get an explicit relationship between x L,2 (n) and the input signal x 0 (n), we rewrite x L,2 (n) as Assuming that the cut-off frequency of high-pass filter is greater than the cut-off frequency of low-pass filter, the second term in (14) is zero. Therefore, (14) can be reduced to Therefore, x L,2 (n) is the output of the low-pass filter h L (n) = h L,1 (n) * h L,2 (n) with the input of x 0 (n). Since N 1 (n) is independent zero mean white Gaussian random noise, substituting (12) into (10) gives where σ 2 1 is the variance of N 1 (n) and δ(m) is the impulse function Similarly, r x2x2 (m) can be rewritten as where σ 2 2 is the variance of N 2 (n). 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 The cross-correlation function between x 0 (n) and Since N 1 (n) is independent zero mean white Gaussian random noise, we have (19) can be simplified to Similarly, the cross-correlation function between x 0 (n) and x 2 (n), and between x 1 (n) and x 2 (n), are respectively. In order to extend the physical copy process to more P&S processes, we use the triple (σ k , H L,k (e jω ), H H,k (e jω )) to represent the k th (k = 1, 2, · · · , K) P&S process, as shown in Fig. 5 k is the variance of noise N k (n) in the k th P&S process, and H L,k (e jω ), H H,k (e jω ) are Frequency response functions of the lowpass and high-pass filters in the k th P&S process, respectively. x k (n) is the ouput of the k th P&S process and x L,k (n), x H,k (n), x N,k (n) are the output of lowpass filter, output of high-pass filter, high-frequency noise in the k th P&S process, respectively. The impulse response functions of the low-pass and high-pass filters in the k th P&S process are h L,k (n) and h H,k (n), respectively. Then the generalized versions of (8), (9), (10), (11), (12), (15), (16), (20), (21) are shown in (23), (24), (25), (26), (27), (28), (29), (30), (31), respectively.
As mentioned earlier, the P&S process is a very complicated process, and it is difficult to directly analyze the correlation functions in their entire domain. Fortunately, from Section 2.3, it is known that practically people usually use the correlation coefficients (a kind of normalized covariance) instead of the correlation functions to measure the loss of information during the P&S process. To calculate the correlation coefficient of two signals, only their mean values, the value of the autocorrelation function of each signal at m = 0, and the value of the cross-correlation function of the two signals at m = 0, are needed. Since it is assumed that the signals under study are steady-state random signals and their mean values are constant, we only need to study the values of their autocorrelation functions and cross-correlation functions at the specific point of m = 0.
3.1. 3 The specific autocorrelation of the physical clone process In this subsection, we explore the variation of the signal's autocorrelation functions at m = 0 with an increased number of P&S processes.
Proposition 1 shows that the specific value of autocorrelation at m = 0 can be used to describe the degree of P&S processes, i.e., the more P&S process we conduct, the higher the specific value of autocorrelation of the P&S signal is.
3.1.4 The specific cross-correlation of the physical clone process In this subsection, we explore the variation of the signal's cross-correlation functions at m = 0 with an increased number of P&S processes. There are two cases. 1 Case 1: In this case, the original signal x 0 (n) is used as an absolute reference signal. Then we compare the specific value of the following two cross-correlations: cross-correlation of x 0 (n) and its P&S version x 1 (n), and cross-correlation of x 0 (n) and its double P&S version x 2 (n). More generally, for the k th and (k + 1) th P&S processes, we compare the specific value of the following two cross-correlations: cross-correlation of x 0 (n) and the output of the k th P&S processe, x k (n), and cross-correlation of x 0 (n) and the output of the (k + 1) th P&S processe, x k+ (n). 2 Case 2: In this case, we use a relative reference signal to compute cross-correlation. For a signal that is the output of a specific P&S process, the input of the P&S process is used as the reference signal, i.e., to compute the crosscorrelation of x k+1 (n), x k (n) is used as the reference signal. Therefore, for the (k + 1) th and (k + 2) th P&S processes, we compare the specific value of the following two cross-correlations: crosscorrelation of x k (n) and its P&S version x k+1 (n), and cross-correlation of x k+1 (n) and its P&S version x k+2 (n).
For Case 1, we have the following proposition: is satisfied, then 2 More generally, let Ω H,k , Ω L,l be the cut-off frequency of the high-pass filters in the k th P&S process and low-pass filters in the l th P&S process, respectively, and x L,k (n), x H,k (n), N k (n), x N,k are the output of low-pass filter and high-pass filter, random noise, high frequency noise in the k th P&S process. For the integers k = 1, 2, · · · , K − 1, if the following conditions 1 ≥ H L,k (e jω ) ≥ 0, ∀k = 1, 2, · · · , K Ω H,k > Ω L,l , ∀k, l = 1, 2, · · · , K are satisfied, then Proof The cut-off frequency conditions in (44) and (47) guarantee the establishment of (15) and (28), respectively. The proof of the two sub-conclusions is as follows.
From (44) and (51) 2 Substituting (26) into (28) gives the following recursive formula: According to (26), (54), we have Therefore, From (56), we have Proposition 2 shows that when the original signal is used as the reference, the cross-correlations of the output of the P&S processes decrease with an increased number of the P&S processes.
In summary, from Proposition 1 to 3, we have the following remarks: 1 The more P&S processes we apply to a signal, the higher its specific value of autocorrelation function at m = 0 is. 2 The more P&S processes we apply to a signal, the lower the specific value of cross-correlation function between the original signal and the final P&S signal at m = 0 is. 3 The more P&S processes we apply to a signal, the lower the specific value of cross-correlation function between the last two successive P&S signals at m = 0 is. The above results show that the copy detection of printed matters is theoretically possible, which will be discussed in the next subsection.
However, since the value of the correlation function is a real number, in the same system, the thresholds calculated by different x 0 (n) vary too much. Therefore, we can use the normalized correlation value, namely the correlation coefficient, to measure the information. The correlation coefficient (also called Pearson correlation coefficient) of x 0 (n) and x k (n) is where Cov(·, ·) is the covariance between two random variables, and V ar(·) is the variance of a random variable. Note that here we only mathematically derive the results with Pearson correlation coefficient. Other correlation coefficients, such as Spearman and Kendall correlations, can also be used to obtain similar conclusions in a similar way. Set m k be the mean value of x k (n), the crosscovariance function c x0,x k (m) between x 0 (n) and The relationships between the variance, covariance and cross-covariance function are According to Proposition 1 and 2, it is easy to conclude the following inequality: which is the theoretical basis of 2LQR codes. However, with (72), the original digital image x 0 (n) is needed to perform the copy detection of 2LQR codes. In actual anti-counterfeiting applications, especially in the IoT scenarios, different 2LQR codes will be attached to numerous items, which makes the cost of storing and retrieving the original images very high. Fortunately, Proposition 3 provides another copy detection method. According to Proposition 3, it is easy to conclude the following inequality: In (73), only two successive P&S images of x 0 (n) are needed. Therefore, we can combine two successive original or P&S images, for example, x 0 (n) and x 1 (n), to get a combined image. After k P&S processes, the combined image still contains two successive P&S images, x k (n) and x k+1 (n). Then we use the combined image to construct a new QR code, called 2SQR code, and the two images contained in the 2SQR codes are the S1 pattern and S2 pattern, respectively.

Generation of 2SQR codes
The diagram of 2SQR code generation is shown in Fig. 6. The generation process can be divided into two stages (S1 and S2), where different patterns are embedded into the standard QR code, and this is why it is called two-stage textured-patterns embedded QR code.
In S1, the dark modules in the standard QR code are replaced by patterns in S1 pattern set P 0,S1 , and result in a 2LQR code. The S1 pattern set P 0,S1 is the same as original pattern set P 0 introduced in Subsection 2.1. Assuming that there are Q types of patterns in P 0,S1 , it can be expressed as P 0,S1 = {p q 0,S1 |q = 0, 1, · · · , Q − 1}, where p q 0,S1 is called the (q + 1) th type of pattern in P 0,S1 . The numbers of each type of patterns in 2LQR code depend on the number of dark modules in the standard QR code and the number of types of patterns in P 0,S1 .
In S2, a part of patterns in the previously generated 2LQR code is replaced by patterns in S2 pattern set P 0,S2 . Every pattern in P 0,S2 is the P&S version of the corresponding pattern in P 0,S1 , i.e., where the (q + 1) th type of pattern p q 0,S2 is the P&S version of p q 0,S1 . How to replace S1 patterns with S2 patterns depends on application requirements. The easiest way is to directly replace half of the S1 patterns with the S2 patterns. For better performance, we can make the S2 patterns evenly distributed among the S1 patterns. In order to achieve better security, the locations of S2 patterns can be generated using a key, however this is not the focus of this article. We will introduce more details about the generation of 2SQR code in Subsection 4.2.
Assuming that the numbers of the (q+1) th type of S1 and S2 patterns in the 2SQR are M q and N q , respectively, then the total numbers of S1 and S2 patterns in the 2SQR code are The meaning of M q (or N q ) is that M q (or N q ) dark modules in the standard QR code are replaced by the (q + 1) th type of S1 (or S2) pattern, i.e., the (q + 1) th type of S1 (or S2) pattern repeats M q (or N q ) times in the generated 2SQR code.

1,S2
are P&S version of p q 0,S1 and p q 0,S2 in I 0 , respectively. Note that in P 1,S1 and P 1,S2 , there are both Q types of patterns (M S1 + N S2 in total), and the numbers of the (q + 1) th types of patterns in P 1,S1 and P 1,S2 are M q and N q , respectively, whereas the numbers of the (q + 1) th types of patterns in in P 0,S1 and P 0,S2 are both 1.

The process of printed matter authentication
A flow-chart of the printed matter authentication scheme is shown in Fig. 7.
Finally, the correlation coefficients for piars of patterns in the two pair sets are calculated. The mean value of correlation coefficients of all pairs in a pair set is called the indicator of I k . The indicator calculated from the first pair set uses only S1 patterns, which is the same as that of 2LQR code, so we call it the 2LQR code indicator (or 2LQR indicator) and expressed by R 2L (I k ). The indicator calculated from the second pair set uses both S1 and S2 patterns, so we call it the 2SQR code indicator (or 2SQR indicator) and expressed by R 2S (I k ). For the two types of indicators for I k , i.e., R 2L (I k ) and R 2S (I k ), one of them is used as R in an authentication process in Fig 7. The indicator R is compared with the corresponding type of pedetermined authentication threshold T H, which is one of T H 2L , T H 2S . The detailed calculation of the two types of thresholds will be introduced in Subsection 4.3. If it is greater than T H, it is judged that k = 1, i.e., I k belongs to the scanned version of the genuine 2SQR code. Otherwise, it is judged that k = 2, i.e., I k belongs to a scanned version of the forged 2SQR code.
The details for 2LQR indicator and 2SQR indicator are as follows: 2LQR indicator: The correlation coefficients of each pattern pairs in the first pair set is calculated and we get M S1 correlation values. Since there are three types of correlations introduced in Subsection 2.3, i.e., Pearson, Spearman, and Kendall correlation, each type of the three correlation coefficients can result in a different indicator. Because the patterns are two-dimensional matrixes and the correlations introduced in Subsection 2.3 take one-dimensional vectors as input, the patterns must be vectorized before their correlations can be calculated. Assuming the size of each pattern p(x, y) is N x × N y , where x = 0, 1, ..., N x − 1, y = 0, 1, ..., N y − 1, the vectorized version of p is v(n) = vec(p) = p(n mod N x , f loor( n N x )), Therefore the correlation coefficients of the (i + 1) th pattern pair of the (q + 1) th type in the first pair set of I k is R c (vec(p q 0,S1 ), vec(p q,i k,S1 )), where c = P, S, K, meaning the use of Pearson correlation R P (·, ·) defined in (1), Spearman correlation R S (·, ·) defined in (2), and Kendall correlation R K (·, ·) defined in (6), respectively. And the 2LQR indicator R 2L (I k ) is R c (vec(p q 0,S1 ), vec(p q,i k,S1 )), (83) where the additional subscript c is defined as in (82). 2SQR indicator: The correlation coefficients of each pattern pair in the second pair set is calculated and we get Q−1 q=0 min(M q , N q ) correlation values. The correlation coefficients of the (i + 1) th pattern pair of the (q + 1) th type in the second pair set of I k is R c (vec(p q,i k,S1 ), vec(p q,i k,S2 )), where c = P, S, K, takes the same meaning as (82).
where the additional subscript c is the same as in (82).

Results and discussion
In this section, two experiments are performed. The first experiment explores the variation of information 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 in multiple successive P&S processes when the information is measured with different correlation coefficients. The second experiment verifies the performance of 2SQR codes in distinguishing authenticity using different types of indicators. The equipment used is an HP Color LaserJet Pro MFP M180n multifunction printer/scanner, which is used to print the genuine 2SQR codes. In the authentication process, all the 2SQR codes to be tested are scanned by the same equipment. Unless otherwise specified, the printing and scanning process below assumes a resolution of 600 dpi.

Correlation coefficient of patterns in multiple
successive P&S processes In this experiment, we try to verify the normalized version of the information loss principle, i.e., (72) and (73).
Patterns used are in pattern set P 0 = {p 0 0 , p 2 0 , p 3 0 } as shown in Fig. 2. For convenience, we use these textures to replace the dark modules in the standard QR code to form a 2LQR code, therefore the textures in the scanned image can be extracted with the help of the existing program of the QR code reader. The generated 2LQR code is shown in Fig. 14(a), where 999 dark models of the standard QR code are replaced by patterns in P 0 , and the number of each type of texture is 333.
We put 20 such 2LQR codes into one image I 0 , where they are arranged into a 4 × 5 array. The image is printed into an A4 paper, shown in Fig 8. Then we scan the printted image using HP Scan program, as shown in Fig. 9, to get I 1 , the first P&S version of I 0 . Repeating the above steps, we get I 2 through I 8 . Part of the scanned images are shown in Fig. 10, which indicates that with the increased number of the P&S processes, the geometric distortion and uneven illumination become worse, which will caused non-negligible impact on the quality of the scanned image. However, as long as the number of P&S processes conducted does not exceed four, the influence of these factors is relatively small and can be easily corrected.
After geometric correction, all patterns in I 1 through I 8 are extracted. In each scanned image, the number of patterns for each type is 333 × 20 = 6660. The three types of patterns in I k are expressed by p 0,i k , p 1,i k , p 2,i k , respectively, where i = 1, 2, · · · , 6660, k = 1, 2, · · · , 8.
To verify (72), for the (q + 1) th , q = 0, 1, 2 type of patterns in every scanned image I k , the Pearson, Spearman, and Kendall correlations between p q 0 and p q,i k , i = 1, 2, · · · , 6660 are calculated, and the mean values of each 6660 correlation value are used as the correlation value of the specific type of pattern in I k using the specific correlation coefficient. The curves of correlation coefficient against k, the number of P&S processes conducted, are shown in Fig. 11. It shows that for all types of patterns using any correlation coefficients, their correlation values decrease monotonically, which proves that (72) holds.
Similarly, to verify (73), for the (q + 1) th , q = 0, 1, 2 type of patterns in every scanned image I k , the Pearson, Spearman, and Kendall correlations between p q,i k+1 , i = 1, 2, · · · , 6660 and p q,i k , i = 1, 2, · · · , 6660 are calculated, and the mean values of all 6660 correlation values are used as the correlation value of the specific type of pattern in I k using the specific correlation coefficient. The curves of correlation coefficient against k, the number of P&S processes conducted, are shown in Fig. 12, where trends similar to Fig. 11 are observed and it proves that (73) holds.

A detailed example of 2SQR codes generation
In this subsection, the generation process of a specific 2SQR code is introduced. The detailed settings of 2SQR used here are as follows. The module size is the same as in [8], i.e. 12 × 12 pixels. To provide a sufficient number of modules, the QR code of version 8 with error correction level M is selected, where the string 'SYSU Guangzhou Idart' is encoded. In order to facilitate comparison with the original literature, we choose the same patterns in [8] as the S1 pattern set P 0,S1 , as in Fig. 2.
The generated 2LQR code I 0 is shown in Fig. 14(a), in which 999 dark models of the standard QR code are replaced by S1 patterns. The generated 2SQR code I 0 is shown in Fig. 14(b), in which there are 666 S1 patterns of the 2LQR code left and 333 of the S1 patterns are replaced by S2 patterns. Comparing Fig. 14(a) and Fig. 14(b), it is hard to distinguish the slight difference between the S1 pattern and the S2 pattern by human eyes. Only when the details are enlarged can the human eyes distinguish the difference between the two patterns, as in Fig. 14(c).
From the description above we know that, in I 0 , the number of types of patterns is Q = 3, the numbers of every types of S1 and S2 patterns are When the 2SQR code I 0 goes through the physical clone process in Fig. 3, the variation of the pattern set in I 0 is shown in Fig. 15. The column (a) is the S1 pattern set P 0,S1 in I 0 . The column (b) is the S2 pattern set P 0,S2 in I 0 , which is also the S1 pattern set P 1,S1 in I 1 . The column (c) is the S2 pattern set P 1,S2 in I 1 , and the column (c) is also the S1 pattern set P 2,S1 in I 2 . The column (d) is the S2 pattern set P 2,S2 in I 2 .
2SQR code I 0 contains two set of patterns, namely the S1 pattern set P 0,S1 (Fig. 15(a)), and the S2 pattern set P 0,S2 (Fig. 15(b)). After a P&S process, I 0 turns into I 1 . The two pattern sets in I 0 turn into P 1,S1 (Fig. 15(b)) and P 1,S2 (Fig. 15(c)), respectively. Since both P 0,S2 in I 0 and P 1,S1 in I 1 are generated by P 0,S1 through a P&S process, they are approximately the same. Similarly, After the second P&S process, P 1,S1 and P 1,S2 in I 1 also turn into P 2,S1 (Fig. 15(c)) and P 2,S2 (Fig. 15(c)) in I 2 , respectively, where P 1,S2 in I 1 and P 2,S1 in I 2 are approximately the same, because they are both generated by P 0,S1 through two successive P&S processes. On the whole, the patterns gradually become blurred during the P&S processes.

The authentication thresholds
In this part, a small batch of 2SQR codes, for example, 20 2SQR codes, are generated to determine the authentication thresholds. Here they are called 20 samples of I 0 . The samples are then printed and scanned twice with HP Color LaserJet Pro MFP M180n Multifunction printer/scanner, to get their P&S version I 1 , which are also called the first P&S version of I 0 , and their double P&S version I 2 , which are also the second P&S version of I 0 .
For every sample of the first and second P&S versions of I 0 , i.e., I 1 and I 2 , the 2LQR indicators are calculated using (83). Then the mean value of indicator R c,2L (I k ) is called the indicator level of I k , i.e., the tree types of 2LQR indicator levels for I k are IL P,2L (I k ), IL S,2L (I k ), and IL K,2L (I k ), respectively, where the meaning of the subscripts P, S, K is the same as (82). The the threshold for 2LQR indicator is where the meaning of the subscripts c is the same as (82). Using 2SQR indicator instead of 2LQR indicator, the 2SQR indicator level for I k is IL c,2S (I k ) and the threshold for 2SQR indicator level is where the meaning of the subscript c is the same as (82).
To test the performance of these indicators, we calculate the difference of the indicator level between I 1 and I 2 , as in (88).
where the meaning of the subscripts c is the same as (82). The larger the value of ∆IL c,2L (or ∆IL c,2S ), the better this indicator can distinguish between I 1 and I 2 . The indicator level, threshold, difference of indicator level for 2LQR indicator, and 2SQR indicator are shown in Table. 2, and 3, respectively.
It can be seen from Table. 2 and 3 that both the 2LQR indicator and 2SQR indicator perfom best when Pearson correlation is used. However, the ∆IL of 2SQR is 42% larger than that of 2LQR. Therefore, the 2SQR indicator is more effective in detecting illegally cloning of printed documents.
For every sample of I k in the dataset, different indicators are calculated with (83), and (85) using different correlations defined in (1), (2), and (6). The 2LQR and 2SQR indicators of all the test samples using Pearson, Spearman, and Kendall correlation are shown in Fig.  16, 17, and 18, respectively. Obviously, the difference of indicates between positive and negative classes is greater when 2SQR indicator is used, which indicates that 2SQR indicator can better distinguish positive and negative classes.
Compare the indicators with the corresponding thresholds in Table 2 and 3 to determine to which class I k belongs, as described in Fig 7. The results are shown in Table 4 and 5, respectively. The precisions in the authentication are 100%, which means that no counterfeit code is judged as genuine one. However, using 2LQR indicator, the recall is 77%, meaning that there are 23% genuine codes are judged as counterfeit, whereas using 2SQR indicator, the recall can reach 96%.
The receiver operating characteristic (ROC) curve measures the overall cost and benefit of the printed document authentication in a straightforward way [27]. The horizontal axis of ROC curve is false positive rate (FPR), and the vertical axis is true positive rate (TPR). Ideally, the closer the ROC curve is to the left upper corner (0, 1), i.e., the steeper the ROC curve is, the better the performance is. ROC curves for the different indicates are shown in Fig. 19.
It is obvious that all the three ROC curves of the 2SQR indicator are steeper than that of the 2LQR indicator. Therefore, the performance of 2SQR indicator is the better than 2LQR indicator.
The area under the curve (AUC) is a quantitative description of ROC curve. A perfect classifier will have a ROC AUC equal to 1, whereas a purely random classifier will have a ROC AUC equal to 0.5. The ROC AUC of 2LQR and 2SQR indicators are shown in Table. 6. It is obvious that when Pearson correlation is used, both the 2LQR and 2SQR indicators perform best. However, the ROC AUC of 2SQR indicator is 8.53% greater than that of 2LQR indicator.
Since part of patterns in 2SQR are exactly the same as that in 2LQR code, the 2LQR indicator used for 2SQR codes authentication has similar performance as used for 2LQR codes authentication. Therefore, the experiments above show that the 2SQR codes with 2SQR indicator outperform 2LQR codes with 2LQR indicator in distinguishing illegal cloning.

Conclusion
In this paper, we propose a simplified P&S model, and give a strict mathematical description of the informa-tion loss principle based on this model. Then the 2SQR code are proposed to solve the problems of requiring the original image and tiny sensitivity in authentication. Then two experiments are performed. The first verified the correctness of the proposed model. The second verified the improvement in performance of 2SQR codes in distinguishing physical cloning compared with 2LQR.

Figure 15
The physical clone of pattern sets.