In this paper, the distributed field theory is applied to the study of heterogeneous source image matching problem, focusing on the matching difficulties caused by rotational transformation. First, the gradient direction DF is constructed and fuzzy filtered. Secondly, the feature space of the template image is reordered according to the location difference of the main direction DF of the template image and the real-time sub graph. Finally, the similarity is calculated and stored in the correlation matrix. The flow of the proposed algorithm is shown in Fig. 1.
2.1 Calculation of distribution field diagram
A distributed field is a combination of each pixel distributed in the corresponding "field", which is a division of pixel points into gray levels. This distribution defines the probability information of a pixel to appear on each feature map. Take a grayscale image as an example, the grayscale level is 0 ~ 256, and the 256 grayscale levels can be divided into N intervals, and the pixel points corresponding to each interval grayscale contains not only grayscale information but also location information.
The distribution field map of an image can be represented as an \(2+N\) dimensional matrix , with the 2 dimensions representing the length and width of the image and the other N dimensions representing the set number of feature space dimensions. In other words, if the size of an image is \(m \times n\), then its distribution field map is represented as a \(m \times n \times N\) 3-dimensional matrix.
The distribution field schematic can be represented graphically in Fig. 2.
Calculating the distribution field map of an image is equivalent to calculating the Kronecker delta pulse function at the geometric location of each pixel. It can be formulated by:
$$d\left( {i,j,k} \right)=\left\{ \begin{gathered} \begin{array}{*{20}{c}} 1&{I(i,j)=k} \end{array} \hfill \\ \begin{array}{*{20}{c}} 0&{{\text{otherwise}}} \end{array} \hfill \\ \end{gathered} \right.$$
1
where \(I(i,j)=k\) is the gray value of the pixel with coordinate \((i,j)\) in the image, and \(d(i,j,k)\) is the value of the pixel with coordinate \((i,j)\) in the image on the feature layer. It follows that \(d(i,j,k)\) takes the value of or and the values at each position \((i,j)\) in the layer sum to :
$$\sum\limits_{{k=1}}^{N} {d(i,j,k)=1}$$
2
The target shown in Fig. 3 is used as an example to analyze its distribution field. The target size in the figure is 14×14, because the distribution field needs to be fuzzy for each layer, and also for the convenience of calculation, the field distribution map of the target is calculated in the square area in this paper.
Figure 4 shows the individual feature layers of the target of Fig. 3. To understand the feature layers more intuitively, the 256 gray levels of the image are compressed to 8, so there are 8 feature layers, with layers 1 to 4 in the first row from left to right, and layers 5 to 8 in the second row from left to right.
As can be seen from Fig. 4, an image can be represented as a layer distribution field map, but most of the information of the image is not lost. This is the first step in constructing the distribution field map, which is equivalent to re-describing the original image. Next, in order to make the location information not lose its generality, the image needs to be blurred, i.e., Gaussian convolutional filtering is introduced for both horizontal and vertical Gaussian filtering of the distribution field map.
The first transverse filtering is performed, and \({d_s}\left( k \right)\) is obtained after convolution of the feature layer:
$${d_s}\left( k \right)=d\left( k \right) * {h_{{\sigma _s}}}$$
3
where \({d_s}\left( k \right)\) denotes the new feature layer after the feature layer is convolved with the Gaussian filter; \(d\left( k \right)\) is the feature layer before convolution; is a two-dimensional Gaussian filter with standard deviation \({\sigma _s}\); and \(*\) is the convolution symbol.
Figure 5 shows the effect of convolving each of the eight feature layers with a Gaussian filter with a standard deviation of 9 pixels.
Comparing with Fig. 4, it can be seen that before convolution, if the value of a position on the feature layer is 1, it indicates that the gray value at this position on the original image falls in the interval of N intervals; after convolution, if the value of a position on the feature layer is not 0, it indicates that the gray value at a position near this position on the original image falls in the interval of N intervals. This shows that Gaussian filtering of the feature layer is introducing uncertainty of position into the field distribution map. This method only loses the exact position information and does not lose the grayscale information in the original image, which will have some effect on the matching error during the matching process, but can enhance the robustness of the algorithm, making it possible to match successfully even in the presence of small rotational transformations.
In Eq. (3), if the Gaussian function \({h_{{\sigma _s}}}\) is considered as a probability distribution function, then after convolution, \({d_s}\left( k \right)\) satisfies the properties of \(\sum\limits_{{k=1}}^{N} {{d_s}\left( {i,j,k} \right)=1}\) and still satisfies the requirements of the distribution field.
The Gaussian filtering of the x and y coordinate directions of each distribution field feature layer increases the uncertainty of the position in the above discussion. Based on the same thinking consideration, Gaussian filtering of the distribution field feature space can be understood as Gaussian filtering of the z coordinate direction to increase the uncertainty of the features. In this way, theoretically blurring the distribution of grayscale information in a certain layer of the distribution field allows the description of the image to adapt to the motion of subpixels and partial brightness variations, which can enhance the robustness of the algorithm to some extent. Therefore, it is next necessary to filter the feature layer with a one-dimensional Gaussian filter:
$${d_{ss}}\left( {i,j} \right)={d_s}\left( {i,j} \right) * {h_{{\sigma _f}}}$$
4
in the above equation is a one-dimensional Gaussian filter with standard deviation \({\sigma _f}\). The final field distribution obtained from the example image shown in Fig. 3 is shown in Fig. 6.
At this point, the field distribution map of an image is calculated, and the process of calculation is shown in Fig. 7. From the calculation, it can be summarized that the process of calculating the distribution field map is the process of introducing uncertainty into the field distribution map: firstly, convolution in the direction of the two coordinate axes of the image introduces the uncertainty of position; secondly, convolution in the feature space introduces the uncertainty of grayscale information. In other words, the image represented by using the distribution field map is insensitive to smaller position changes and grayscale changes, and has good adaptability to position translations, rotations and occlusions within a certain range.
2.2 Construction of gradient direction DF
For any 2D image \({I_{x,y}}\), \(\nabla {I_x}=\partial I/\partial x\) and \(\nabla {I_y}=\partial I/\partial y\) are its corresponding horizontal and vertical direction gradients, which can be obtained by common first-order or second-order differential operators, such as Roberts operator, Sobel operator, Prewitt operator, etc. In this paper, the flat region with small gradient is regarded as a background susceptible to noise interference, and its gradient direction is defined as 0. The true 0-gradient direction is defined as \(\pi\), and then the gradient direction is quantized to \([0,180]\), which is expressed by the following equation:
$$\theta _{{x,y}}^{I}=\left\{ \begin{gathered} angle({V_{x,y}})\;\;\;\;\;if(\nabla {I_y} \ne 0 \cap {G_{x,y}}>\tau ) \hfill \\ \;\;\;\;\;\;\pi \;\;\;\;\;\;\;\;\;\;\;\;if(\nabla {I_y}=0 \cap {G_{x,y}}>\tau ) \hfill \\ \;\;\;\;\;\;0\;\;\;\;\;\;\;\;\;\;\;\;if\;\;\;{G_{x,y}}<\tau \hfill \\ \end{gathered} \right.$$
5
$${V_{x,y}}=sign(\nabla {I_y})\cdot (\nabla {I_x}+i\nabla {I_y}),\;\;\;\;\;{G_{x,y}}=\left| {{V_{x,y}}} \right|$$
6
Where, \(angle(x)\;\) is the phase angle finding function of vector ; \(sign(\nabla {I_y})\) is the gradient sign in vertical direction; is the complex unit; \({G_{x,y}}\) is the gradient amplitude; \(\tau\) is the gradient amplitude threshold, which is used to distinguish the low amplitude flat region from the effective gradient region, and this algorithm can take \(\tau \in [0.1\sim 0.4]\).
Next, we construct a DF for the gradient direction, because the image rotation will generate a part of 0 region at the edge, in order to prevent this region from affecting the matching accuracy and robustness, we only construct a DF for the points with gradient direction greater than zero. Take N = 18, divide [1,180] into 18 levels equally, and each level corresponds to one layer of DF, that is, any image \({I_{x,y}}\) will be represented as 18 layers of DF, and the value of each point in the first layer indicates the probability that the gradient direction of \({I_{x,y}}\) is included in the range of \([1,10]\). And so on, we can construct 18 layers of gradient direction DF, that is:
$$d(i,j,k)=\left\{ \begin{gathered} \;\;1\;\;\;\;\;\theta _{{i,j}}^{I} \in [10*(k - 1)+1,10*k] \hfill \\ \;\;0\;\;\;\;\;otherwise \hfill \\ \end{gathered} \right.$$
7
Finally, the DF feature space is filtered to introduce the ambiguity of position and the ambiguity of gray intensity into the distribution field map, which only loses the exact information and does not introduce the wrong position information into the DF, and still matches correctly in the case of smaller deformations, making the robustness enhanced.
For the traditional rectangular region, when it is rotated by a certain angle, the four vertex regions of the rectangular frame will lose more or less information in the original image according to the size of the rotation angle, and for a real-time subgraph, some irrelevant image information will be introduced while the original image information is lost. Therefore, the application of the traditional rectangular region to solve the rotation problem inevitably affects the accuracy and robustness of matching. In order to describe the image features more accurately, the rotation-invariant region needs to be selected as the effective region for computing the feature description.
Among all shapes, only a circle can satisfy the rotation invariant property, so the definition domain of feature description is set as a circle in this paper. As shown in Fig. 8, the center of the image is taken as the rotation axis, and the largest inwardly connected circle with the axis as the center of the circle is taken as the valid region.
2.3 Main Direction DF
The image principal direction characterizes the orientation of the image content and is a subjective concept in image processing. It can be defined as the texture direction of an image, the direction of a backbone, or the direction of a family of gradient vectors, and this artificially defined direction feature is sufficient as long as it has stable rotational invariance. The principal direction difference between two images characterizes the rotation angle between the images, according to which the images can be rotationally corrected and then search matched.
The classical gradient direction histogram-based principal direction estimation method is the most widely used. The method counts the gradient direction distribution (histogram) within a rectangular region and defines the most numerous class of directions (main peaks) as the principal direction of the region.
Similar to the histogram statistics, the main direction of the DF is defined in this paper as the DF feature layer with the largest sum of probabilities of occurrence in the gradient direction, denoted by . The calculation process is as follows:
$$dsu{m_k}=\sum\limits_{{i=1,j=1}}^{{i=m,j=n}} {{d_s}(i,j,k)} \;\;\;\;\;\;\;k=1,2, \cdot \cdot \cdot 18$$
8
$$[mlaysum,n]=\hbox{max} (dsum)$$
9
where \(dsu{m_k}\) is the probability statistics of the DF at the layer; \(dsum\) is the matrix storing the probability sum of each DF layer, \(mlaysum\) is the maximum value in \(dsum\), and is the value of corresponding to the maximum \(dsum\).
2.4 Similarity Metric
The previous section describes how to determine the principal direction of the template graph and the principal direction \(R'\) of the real-time subgraph, and the approximate rotation angle of the real-time subgraph with respect to the template graph can be obtained from the difference \(\nabla R=\left| {R - R'} \right|\) between the two. The template graph is rotated \(\nabla R\) and \(\nabla R+180\) respectively to construct the DF and described by a one-dimensional column vector, denoted as . The feature vector of the real-time subgraph is denoted by .
There are many methods used to measure the correlation of two feature vectors, such as Euclidean distance, Marxian distance, parametric and Eulerian distance, which have their own advantages and disadvantages and cannot be fully applied to the method in this paper. [35] introduces the Chi-square distance formula, which can achieve good results in measuring the similarity of two eigenvectors.
$${\chi ^2}(x,y)=\sum {\frac{{{{({x_i} - {y_i})}^2}}}{{({x_i}+{y_i})}}}$$
10
where \({\chi ^2}(x,y)\) denotes the Chi-square distance of two vectors \(x,y\), and \(x,y\) are the corresponding elements in the two vectors. From the equation, it can be seen that the Chi-square distance calculates the ratio of the variance of the corresponding elements to the sum of the elements, and the smaller the ratio indicates the closer the distance, the higher the similarity, while the Euclidean distance only considers the difference of the corresponding elements. During the experiment, it was found that the robustness of using Euclidean distance as the similarity discriminator could not meet the matching requirements and mis-matching occurred, while Chi-square distance could better meet the requirements of the method in this paper.
The Chi-square distance is further processed so that it is inverted, i.e., its minimum value is expected to become a maximum value, allowing a better visualization of the best match when drawing the relevant surface plot.
$$dist=\exp (k{\chi ^2})\mathop {}\nolimits_{{}} \theta \in (0,360)$$
11
In order to improve the operation speed and enhance the practicality of the matching algorithm in this paper, the hill climbing method is used for fast search. The algorithm in this paper adopts Chi-square distance as the similarity measure, and its larger value indicates the lower similarity of two images, so this paper uses Eq. (11) to invert the correlation surface map, which can more intuitively see the best matching point obtained by hill climbing search. Figure 9 gives a schematic diagram of the correlation surface of the hill-climbing method.