An infrared dim target detection algorithm based on density peak search and region consistency

To suppress background clutter and improve detection accuracy, we propose a dim target detection algorithm based on density peak search and region consistency. A density peak search algorithm is first applied to extract candidate targets, and these are then classified and marked according to the local mosaic probability factor, which is important in order to suppress the backgroundsssss clutter and accurately strip the candidate target region from the background. Based on the regional stability of the dim targets, local mosaic gradient factors are used to screen real targets from candidates, and a facet kernel filter is used to extract the irregular contours of dim targets with the aim of enhancing them. Our experimental results show that compared with existing algorithms, the proposed method has better detection accuracy and robustness in various complex scenarios.


Introduction
An infrared search and tracking system (IRST) has the advantages of good concealment and robust anti-interference ability, and this approach is widely used in many applications such as military early warning systems, precision guidance and remote sensing (Zeng et al. 2006;Wang et al. 2019). The method of infrared target detection is a key aspect of this technology (Chen et al. 2014). However, there are many challenges associated with dim target detection; for example, the target may only contain dozens of pixels in each frame, or may lack texture and shape information, and the signal-to-clutter ratio in each frame may be very low (Ye et al. 2017;Huang et al. 2019b, a). Dim targets are often submerged in complex backgrounds (such as cloud edges, ocean waves, and high-brightness noise).
Infrared dim target detection methods can be divided into two types: single-frame (Dai and Wu 2017) and multi-frame methods (Lv et al. 2018;Wan et al. 2016;Zhang et al. 2019). To detect such targets, Lv used a multi-frame iteration process to estimate the background. However, optimisation of the coefficients of the filter required a high number of iterations, so the efficiency of this algorithm was poor (Lv et al. 2018). Wan proposed a detection method based on the inter-frame difference, in which the differences between adjacent frames were used to locate the target, but the contours extracted for the target were incomplete, and the interval between the frames severely affected the detection accuracy (Wan et al. 2016). Zhang used a quaternion discrete cosine transform to extract the features of dim targets from multi-frame images, and this approach made full use of the spatiotemporal features of four channels. However, target misalignment between multiple frames reduced the detection accuracy, and the algorithm had a high level of complexity and poor real-time performance (Zhang et al. 2019). Single-frame methods have many advantages, such as low complexity, high execution efficiency and strong real-time performance, and have therefore attracted more attention.
Most traditional single-frame methods, such as those involving morphological filtering (Zeng et al. 2006) or probability statistical filtering (Deshpande et al. 1999;Soni et al. 1993;Tomasi and Manduchi 1998;Barnett, 1989), detect targets by suppressing the background. Although these methods are easy to implement in real time, their performance is greatly reduced by clutter and interference when a complex background is present. By exploiting the characteristics of the human visual system, Chen proposed a local contrast measure (LCM) that detected a target based on the difference in brightness between the target and the adjacent areas (Chen et al. 2014). This method was not suitable for scenes with a high level of background brightness, and a large number of improved methods were subsequently developed, such as those based on multiscale patch-based contrast measures (Wei et al. 2016;Nie et al. 2018). These methods were unable to extract the features of the target, and yielded high false positive rates caused by high-intensity clutter and highbrightness noise.
Huang proposed a dim target detection method based on maximum grey region growing, which segmented the target and background by selecting seed growing points (Huang et al. 2019b, a). However, this method was only suitable for scenarios in which the number of targets was much smaller than the number of candidates. Xia combined a random walk algorithm with the local contrast features to achieve efficient target detection, but the selected seed points in the algorithm needed to be labelled and classified, which was inefficient (Xia et al. 2018). Qin introduced the concept of facet kernel filtering to suppress clutter and enhance the targets (Qin et al. 2019), the selection of candidate points in this method was simple, but due to its dependence on a single characteristic of the target, the false alarm rate was high for complex backgrounds such as those containing strong clutter and bright edges.
In order to achieve robust detection of dim targets in images with complex backgrounds, we propose an infrared dim target detection algorithm based on a density peak search and region consistency. The main contributions of our algorithm are as follows.
We approach the problem of dim target detection as an abnormal point detection problem, and candidate target points are initially screened out using an unsupervised method based on an improved random walker (RW) algorithm. Local mosaic template regions are then constructed with the candidate target point as the centre pixel, and candidate targets are classified and labelled locally to suppress clutter. Based on the gradient difference and the directional consistency of dim infrared targets, a local mosaic probability factor (LMPF) is used to separate the target from noise and clutter, and targets are enhanced using a local mosaic gradient factor (LMGF). Our experimental results show that compared with existing algorithms, the proposed detection algorithm yields better performance for images containing complex scenes.

Proposed method
The details of the proposed method are as follows. Candidate targets are first identified based on their abnormal attributes, using a density peak search algorithm. They are then selected as the central point, and an improved RW algorithm is then applied to construct the local mosaic template region, thus reducing the feature search range. To suppress clutter, the LMPF is used to classify and mark candidate targets, which is a critical step in separating targets from clutter. Based on the direction consistency of the dim targets, the LMGF is then used to enhance the edges of the targets, which allows them to be extracted from the background. Figure 1 shows a flow chart for the proposed method.

Density peak search algorithm
The density peak search algorithm (DPeak) is a granular computing model for which theoretical basis rests on two assumptions: (1) the central point is surrounded by neighbouring data points with a lower local density; and (2) two central points are not adjacent to each other (Wang et al. 2017).
For any pixel i, the DPeak model needs to calculate the local density i and the distance to the nearest correlation distance i where f i represents the gray value of the pixel, the highest density point where d ij is the Manhattan distance, and then calculate the joint feature factor Υ pixel by pixel, and put them in the queue Q in descending order, and extract the first m pixels as candidate target points.
The grey value of the target is normally higher than that of the local neighbouring pixels, meaning that the two assumptions of the DPeak model are applicable to the characteristics of dim targets. The DPeak model can therefore reduce the search range of the target and reduce the calculation time.

Local mosaic probability factor (LMPF) and local mosaic gradient factor (LMGF)
The textural structure of the dim target is essentially different from the clutter. Figure 2 shows a comparison of the gradient distributions: Fig. 2a shows an infrared image, and Fig. 2b shows an enlarged view of the target area in Fig. 2a. The upper image shows the gradient distribution of the background clutter, and the lower image shows the gradient distribution of the target and its neighbourhood. It can be seen that the gradient direction of the target neighbourhood is consistent and the amplitude is uniform; in comparison, the gradient direction for the background clutter is messy, and the amplitude difference is large. Figures2c and d show three-dimensional gradient distribution maps for the background clutter and the target, respectively. It can be seen that the dim target has different characteristics, such as higher grey values, a higher gradient and a higher density than the adjacent background area. The aim of candidate point screening is to judge whether each point represents a target, based on these local attributes.
The roaming window of the RW algorithm is a rectangular frame, and the outermost pixels are usually classified as background. In a multi-target scene, if the targets are adjacent or touching, these neighbouring targets will be classified as background (Qin et al. 2019), meaning that parts of them will be missed. In order to accurately extract the target from the background, we improve the RW algorithm by introducing a more efficient marking model. Unlike the global traversal method, our improved model traverses only the candidate target area. The roaming window is also changed to an adaptive geometric window, which more closely reflects the contours of the target. On this basis, the outer area of the geometric window (background) is marked as 0-type, and the central area (target) is marked as 1-type, which is useful in terms of accurately segmenting the target. In Fig. 3a, the red frame indicates the central region, and the area between the yellow and red frames represents the transition region. The outer area lies between the blue and yellow frames. The candidate target is used as the central pixel of the local mosaic model, and is marked as a 1-class pixel, while pixels in the outer area of the mosaic model are marked as 0-class. Since most dim targets have dimensions of less than 9 × 9 pixels, the size of the template is set to 11 × 11 pixels. The shape of the marked pixels in the outer region is close to the target contour, and can accurately approximate the true distribution of the target. The transition region and the outer region are defined as follows: CR, MR and OR refer to the central region, the transition region and the outer region respectively. ⊕ refers to the expansion operation, and shape refers to the diamond-shaped structural element.
In the local mosaic model, only the central pixel is marked as 1-class, while pixels close to the outer region are marked as 0-class. The pixels in the transition region are classified based on their similarity to the grey level of the central pixel.
To create a new local mosaic region, the transition region and the outer region are updated using Eqs. (5) and (6) when the candidate target points have been obtained. In our approach, the outer area is selected as the background, and the targets are screened using the LMPF, an indicator that compares the probability value for the candidate target region with the probability value for the background.
In Eq. (7), MVP (mean value of probability) is the average of pixel probabilities, and pixels are pixels expanded by candidate target pixels in the central area. LMPF is the gray value of the central area.
It is difficult to use the LMPF to achieve robust detection of dim targets in highlight scenes, especially in low-contrast images. We therefore introduce the LMGF to expand the difference between the target and the background and improve the accuracy of our method. As shown in Fig. 3b, the template is divided into eight parts based on a three-level nested model: these are labelled as S1, S2, S3, S4 (where S represents the central area) and T1, T2, T3, T4 (where T represents the outer area). Here, the LMGF is defined as follows: where, grad ti is the gradient element in the area of T i (i = 1, 2, 3, 4), N ti is the number of gradients in T i , g ti is the average magnitude of the gradient in T i . G t max and G t min are the maximum and minimum values of the average gradient of T i (i = 1, 2, 3, 4).
where, the LMGF is the gray value of the central area. The direction of the gradient in the neighbourhood of the dim target is consistent and the amplitudes are similar, meaning that the gradient direction and amplitude of the clutter are significantly different. The target can therefore be identified based on this information.

Target extraction and enhancement
The input image is filtered using 5 × 5 kernel matrix K (Qi et al. 2016) to get the enhanced image,P i is original image, as shown in Eq. (15): where, the target map P m is obtained according to Eq. (16), P m and P e are calculated by the Hadamard product to get a weighted image P w ,P w is called "Weighted", in the enhanced image P e , both the targets and the clutter are enhanced, while P m only maps the target area in the proposed method. Therefore, the weighted image P w is more approximate the target contour, indicating that the proposed algorithm can effectively suppress the clutter and enhance the targets. T h is a simple threshold the value range of is 0.6 ~ 0.9.

Proposed detection method
As shown in Algorithm 1, the proposed method selects candidate targets by constructing a feature space for the local density-closest correlation distance, and then constructs the local mosaic template area with the candidate target point as the central pixel. In order to suppress clutter, the candidate target regions are classified and labelled based on the LMPF, and the real targets are finally selected from the candidate targets based on the LMGF.

Experiments and analysis
In this section, the experiments are performed to verify the effectiveness of the proposed method. The experimental computing environment in this paper is 3.40 GHz Intel i7-3770 CPU processor, 8 GB memory, and the test software is MATLAB 2018b.

Experimental setup
(1) Datasets: To verify the robustness and accuracy of the proposed method, six sets of infrared image sequences with different complex backgrounds were selected. Table 1 gives detailed information on these six sequences. Each sequence has a different background: Frame 1 contains the edge of the sky and irregular clouds; Frame 2 has a glare background (Huang et al. 2019b, a); Frame 3 contains a large amount of ring-shaped point clutter; in Frame 4, the target is surrounded by clouds, which form a great deal of clutter (Qin et al. 2019); Frame 5 shows a dual-target image sequence with high-intensity sky and sea wave clutter as the background; and Frame 6 is a four-target image sequence with low intensity (Xia et al. 2018) (Table 2).
(2) Baseline Methods: This paper selects the existing 9 methods as comparison methods, including Top-Hat (Zeng et al. 2006 (3) Evaluation metrics: in and out represent the intensity standard deviation of the source image and the processed image respectively; C in and C out are the contrast between the source image and the processed image. The contrast is defined as Eq. (21): where, f t is the maximum intensity value of the target area and f b is the average value of the background area. In this paper, we use BSF and CG as evaluation indicators. BSF can be used to evaluate the ability of the algorithm to suppress clutter globally, and its value is proportional to the suppression effect, while CG can be used to evaluate the effect of target enhancement, and its value is proportional to this effect.
TPR and FPR represent the true positive and false positive rates, respectively, and are used to draw the ROC curve and evaluate the algorithm: the closer the position of this curve to the upper left corner of the graph, the better the ability of the algorithm. Figure 4 shows a comparison of the detection results from the source images based on the LMPF, the LMGF and the weighted method, the weighted method represents P w . It can be seen that for Frames 1, 2 and 4, the detection results based on the LMPF still contain point shadow clutter. This indicates that the clutter suppression ability of the LMGF is better than that of the LMPF. However, in terms of target edge retention and target enhancement, the LMPF yields better performance than the LMGF, except for Frame 5; due to the influence of the strong light and sea clutter, the effectiveness of the LMPF is slightly lower than that of the LMGF in this case.

Qualitative comparisons
In order to verify the performance of this algorithm, we used six datasets in our detection experiments, including both single-target and multi-target image sequences. To qualitatively analyse the effect of each step of the proposed method, each source image was processed using LMPF, LMGF and the weighted model, respectively, and the BSF and CG were calculated for the outputs. Table 3 shows the results. It can be seen that the results of the weighted model were better than those of the LMPF or the LMGF models, meaning that the weighted model can enhance the target and suppress background clutter more effectively.
In order to evaluate the performance of our algorithm, six datasets were used as experimental data, including both single-target and multi-target image sequences. Nine alternative methods were selected for comparison: the top-hat, max-median, fast-saliency, LCM, MPCM, VARD, AAGD, NRAM and LIG methods. Figures 5 and 6 show the detection results for these six datasets. The four datasets in Fig. 5 all contained single-target sequences, and their backgrounds included high-intensity light, cloud clutter and complex irregular clutter. The two datasets shown in Fig. 6 were two-target and four-target image sequences, respectively, and were used to verify the multi-target detection ability of our algorithm.
The source images in Fig. 5 are four sets of image sequences showing areas of sky. In Frame 1, the background is the sky and irregular cloud edges, forming a lot of clutter; Frame 2 has a light background with strong intensity, and the background brightness is similar to that of the target; the background of Frame 3 contains a great deal of ringshaped point clutter; and the target in Frame 4 is surrounded by clouds, which create clutter interference. The detection results from the top-hat and the max-median methods contain a great deal of clutter, indicating that they have poor effectiveness in terms of background suppression, and the top-hat method loses the target in Frame 2. The detection results from the LCM and the MPCM methods not only contain background clutter, but also have Fig. 4 The detection results of each stage of the algorithm (each image uses a red rectangular frame to display the real target area and an enlarged view of the target area in the lower left corner of the image) a high false alarm rate. In contrast, there are few background clusters in the results of the fast-saliency method. The other comparison methods also show some problems: the AAGD method gives a small amount of linear clutter in Frames 1 and 4; the NRAM method loses the target in Frame 1; and the LIG method gives a small number of background clusters in Frame 4. In general, our proposed method effectively suppresses the background clutter and preserves the contours of the target, and gives the best detection performance.
The experimental dataset shown in Fig. 6 contains two-target and four-target image sequences with the sea and sky as background, and these are used to verify the multi-target detection performance of our approach. Frame 5 is a dual-target image sequence with a strongly lit sky and clutter from sea waves. The background suppression effects from the top-hat, fast-saliency, VARD, AAGD, NRAM and LIG methods are good. Since there is a great deal of clutter in the detection results from the LCM, MPCM and max-median methods, this means that the background suppression effects are poor. Frame 6 is a fourtarget image sequence with low brightness. The fast-saliency and VARD methods lose the target, and the LCM, max-median and MPCM methods have poor background suppression effects. The targets detected by the NRAM and the LIG methods are blurry, while the contours extracted with the top-hat and AAGD methods are clear. Compared with the alternative methods, our approach has obvious advantages in terms of background clutter suppression and target enhancement, and yields better performance for all of the datasets.

Quantitative comparisons
The BSF, CG, and ROC curves and the average running time of the algorithm were used to further evaluate the detection performance of our method. The BSF was used to evaluate the background suppression capability, while the CG was applied to evaluate the target enhancement capability of the algorithm. Table 4 shows the results for the BSF, CG and average running time for our algorithm.
In Table 4, the BSF of the proposed method is best for Frames 1, 2, 4, and 5, while the NRAM method performs best for Frames 3 and 4. This shows that the background suppression ability of our method is stronger than that of the alternative methods. CG is the target local contrast gain. The larger the value of this parameter, the better the background suppression and the target enhancement performance. The CG values from the proposed method were the highest for all image sequences.
In terms of the running time, the fast-saliency method required the shortest time, and AAGD took the shortest time in the remaining five image sequences. The NRAM method required the longest time for all image sequences. The running time for our method was between these two extremes, taking into account the detection accuracy and computing efficiency. Figure 7 shows a comparison of the ROC curves for these algorithms. It can be clearly seen that the ROC curve for our method is the highest.

Parameter Sensitivity Analysis
In our method, the target detection problem is treated as a clustering problem with unsupervised learning, meaning that the selection of the number of candidate points is the main parameter that determines the performance. The weights between adjacent pixels strongly affect the location of the target contour, and we use the parameter to determine these weights. In this section, we report the results of experiments performed to analyse the influence of this parameter on the test results.
Since dim targets were often submerged in clutter, we assumed that most of the seed points were false targets and a few were target points, meaning that the target points formed a set of abnormal points. Figure 8 shows TPR diagrams for varying numbers of candidate points m for six groups of image sequences. According to the experimental results, this proposed method sets m to 12. Figure 9 shows the distribution of the ROC curve for different values of the parameter for the proposed method. It can be seen that if this parameter is set to 100, the results for the ROC curve are best for all the image sequences. If it is set to 200, the curve is slightly lower, and it is set to zero or 300, the target may be missed or false targets may be counted, causing poor detection results. We therefore set this parameter to 100.

Conclusion
In this paper, a density peak algorithm is used to traverse each pixel in an image to establish a density-distance spatial decision map. Candidate points are then selected based on a proportional relationship between noise points and dim targets. A local mosaic model is built with the candidate pixels as the central points, and the targets can be screened from the candidate targets based on the mapping relationship between the LMPF and the LMGF, which narrows the target search range and reduces the amount  of calculation required. The proposed method can also accurately extract targets from multi-target image sequences. Experimental results show that our method performs better than alternative approaches, as it not only suppresses clutter and enhances the target, but also has a wide range of applications. In future work, we intend to develop a new positioning method for the target region, reduce the running time further, and build a robust feature space for screening real targets.