The Constant False Alarm Rate (CFAR) is a fundamental detection technique applied to the received signal. The CFAR detector block implements two-dimensional image data. When the value of an image cell exceeds a threshold, the target is declared present. On the other hand, in real-world contexts, the power of noise and clutter is a non-stationary random process that changes with time. A fixed threshold is applied to actual data will result in many false alarms. Furthermore, the desired probability of a false alert will not be achieved. CFAR is meant to keep the chance of a false warning from background noise or clutter at a constant level. To define a detection threshold that adapts to the power level of noise or clutter, the CFAR estimates the background power level from the surrounding samples.
The picture of the breast tissue has three colors (Red, Green and Blue). It can be assumed that the color is a Vector of three elements. It has a joint probability density function under \({H}_{0}\) and \({H}_{1}\) hypotheses. Therefore, the likelihood function can be generated using the ratio of the PDF function under two hypotheses.
$${L}\left({Y}\right)=\frac{{f}\left({Y}|{{H}}_{1}\right)}{{f}\left({Y}|{{H}}_{0}\right)}$$
3
where \(Y ={\left[{y}_{R} {y}_{G} {y}_{B}\right]}^{T}\) ranges from 0 to 250. All conventional detectors compare this likelihood function with a fixed threshold.
$${L}\left({Y}\right)\begin{array}{c}>\\ <\end{array}{\eta }$$
4
However, we should consider two additional topics.
First, it is well known that the PDF of the normal tissue is not similar everywhere. It is wise to consider different PDFs for the tissue near the chest, the one near the nipple, and the parts in between. In this case, a two-dimensional CFAR can be used. In two-dimensional CFAR, the normal tissue's PDF is estimated near the suspicious region. So, the model is generally more accurate so that the actual detection probability would increase. The concept of two-dimensional CFAR is shown in Fig. 1.
Here we want to work on the cell under test (CUT). To do so, we first disregard the cell near the CUT (named the guard cells) because they may be contaminated by the cancerous tissue like the CUT. Then the training samples next to the guard cells are considered to estimate the \(f\left(Y\right|{H}_{0})\). It is recognized that the conventional CFAR methods are modified versions of the Neyman-Pearson test. In such a test, the property of \(f\left(Y\right|{H}_{0})\) is considered only. Hence, the mainstream of the CFAR detector is, to some extent, different from what is needed in our problem. Here the abnormality means the different PDF for the CUT and the Training cells. However, we know that without a predefined model, PDF cannot be estimated of the CUT under a single observation. The PDF estimation using one sample is inaccurate even using a predefined model. Alternatively, know that if there is a cancerous tissue, it is usually not limited to one sample. However, it is a region. Therefore, we should develop a different version of 2D-CFAR, which is narrated as:
There are some samples in the region under test (RUT), and there are other samples in the training cells; and we want to decide whether the PDF of the RUT is the same as that of the training cells. Again, we encounter two other problems, and the first one is that the test should be non-parametric. The reason is that it is neither straightforward nor acceptable to suppose a predefined model for PDF of all tissue regions. In this case, only the non-parametric tests can be applied to such a problem. Here arises the second problem, wherefore the majority of the non-parametric test (e.g. Kolmogorov-Smirnov, Kendall \(\tau\), Spearman \(\rho\) tests) are constructed assuming one-dimension data. However, in our problem, the data (the colourful figure) has three dimensions. In this case, we need a multi-variate non-parametric test. Baringhaus and Franz [12] discussed a good choice for such a situation.
Assume that we have two sample sets \(X\) and \(Y\). All samples of these sets are n-dimensional. It is known that all samples in \(X\) have the same distribution \(\left(F\right)\) and all samples in \(Y\) also have the same distribution \(\left(G\right)\). Both \(F\) and \(G\) are unknown, and we want to decide whether \(F\) equals \(G\) or not. Defining \(\left|\left|1-{Z}_{2}\right|\right|\) as the Euclidean distance between points \({Z}_{1}\) and \({Z}_{2}\) in the n-dimensional space, then Baringhaus and Franz in [12] have proven that:
$${E}\left|\left|{{X}}_{{i}}-{{Y}}_{{j}}\right|\right|-\frac{1}{2}{E}\left|\left|{{X}}_{{i}}-{{X}}_{{j}}\right|\right|-\frac{1}{2}\left|\left|{{Y}}_{{i}}-{{Y}}_{{j}}\right|\right|\ge 0$$
5
Here \({X}_{i}\) and \({X}_{j}\) are two randomly selected samples from \(X\) set, and \({Y}_{i}\) and \({Y}_{j}\) are two randomly selected samples from \(Y\) set. The equality holds if and only if \(F = G\).
If there were infinite samples in the ordinary and suspicious region, it was possible to calculate the expectations in the above equation. However, as the samples are limited, we should calculate the statistical average and hold it as an approximation to the expectation.
Assume that there are \(N\) samples (\(X\) samples) in the training region and \(M\) samples (\(Y\) samples) in the RUT region. Then the above expectation is approximated as:
$${z}=\frac{1}{{M}{N}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{M}}\left|\left|{{X}}_{{i}}-{{Y}}_{{j}}\right|\right|-\frac{1}{2{{N}}^{2}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{N}}\left|\left|{{X}}_{{i}}-{{X}}_{{j}}\right|\right|-\frac{1}{2{{M}}^{2}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{N}}\left|\left|{{Y}}_{{i}}-{{Y}}_{{j}}\right|\right|$$
6
As the number of samples is finite, \(z\) is a random variable. The expectation of \(z\) under \({H}_{0}\) (same distribution) is zero, while it is a positive value under \({H}_{1}\).
Generally, it is not so simple to calculate the PDF of \(z\). However, regarding the central limit theory, it is deduced that \(z\) has approximately Gaussian distribution. Therefore, under this approximation, we have:
\(\left\{\begin{array}{cc}{z}\tilde{N}(0.{{\sigma }}_{0}^{2})& {{H}}_{0}\\ {z}\tilde{N}({\mu }.{{\sigma }}_{1}^{2})& {{H}}_{1}\end{array}\right.\)
|
(7)
|
It is known that \(\mu >0\). It is well known the Ne for this problem has the following form:
$${z}\begin{array}{c}>\\ <\end{array}{\eta }$$
8
Again if the \({\sigma }_{0}\) is known, the threshold value can be set based on the acceptable \(Pfa\) as:
$${\eta }={{\sigma }}_{0}{{G}}^{-1}\left(1-{P}{f}{a}\right)$$
9
Here \({G}^{-1}\) is the inverse CDF of the Gaussian distribution. However, the actual value of \({\sigma }_{0}\) is not known and the parameter can be estimated as:
\(\widehat{{{\sigma }}_{0}^{2}}=\frac{1}{{{N}}^{2}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{N}}\left|\left|{{X}}_{{i}}-{{X}}_{{j}}\right|\right|\)
|
(10)
|
Therefore it seems that the following statistic is independent of the \({\sigma }_{0}\)value:
$${w}=\frac{\frac{1}{{M}{N}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{M}}\left|\left|{{X}}_{{i}}-{{Y}}_{{j}}\right|\right|-\frac{1}{2{{N}}^{2}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{N}}\left|\left|{{X}}_{{i}}-{{X}}_{{j}}\right|\right|-\frac{1}{2{{M}}^{2}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{N}}\left|\left|{{Y}}_{{i}}-{{Y}}_{{j}}\right|\right|}{\frac{1}{{{N}}^{2}}\sum _{{i}=1}^{{N}}\sum _{{j}=1}^{{N}}\left|\left|{{X}}_{{i}}-{{X}}_{{j}}\right|\right|}$$
11
Therefore, our proposed detector has the following form:
$${w}\begin{array}{c}>\\ <\end{array}{{\eta }}_{1}$$
12
It is straightforward to show that any change in the mean or standard deviation of the \({X}_{iz}\) cannot change the statistics of the w; consequently, the abovementioned detector has a constant false alarm rate.
Simulation Results
The \({P}_{D}\) is the probability that the target will be observed, given that the target is genuinely present. The \({P}_{fa}\) is the probability that the target will be detected by measurement when the target is not present.
The detector must determine whether a tumor is present at each pixel in a picture. Hypothesis \({H}_{0}\) describes the situation where the measurement y corresponds to a section of the image with no tumor. \({H}_{1}\) indicates the presence of a tumor. Estimates of the probability density functions can be used to calculate detection and false alarm probability as a function of the decision threshold. When a noise peak exceeds the threshold, it appears as if there is a signal when there is only noise.
The proposed algorithm is as follows:
The region under test (RUT) is defined as a rectangular region \({m}_{0}\times {n}_{0}\). \({X}_{i}{\prime }s\) is the samples’ name in this region.
The guard area is defined as an \({m}_{1}\times {n}_{1}\) rectangle surrounding the RUT.
Training samples are selected as a \({m}_{2}\times {n}_{2}\) rectangle surrounding the guard area. Samples are named as \({Y}_{i}{\prime }s\).
This simulation scenario is as follows: Begin with extracting images from the data set (UM-BMID), isolating tumor-containing and tumor-free images, and estimating the PDF for normal and cancer tissue images, where 104 images have been extracted and simulated, 52 of which were diagnosed with a tumor and 52 were normal tissue, where \({P}_{D}\) and \({P}_{fa}\) determined using the same number of samples.
Preliminary results indicate good performance; it is observed that for the detection threshold \(T\), the lower the \({P}_{fa}\), the higher the detection threshold will be obtained, and vice versa. If we take the example of a \({P}_{fa}\) is 10%, we will get a value of 2 from the threshold \(T\), and note that when the probability of a false alarm is reduced, a higher threshold \(T\) is obtained, as shown in Fig. 2.
Calculating the \({P}_{D}\) is frequently the most important factor in determining the performance of our work. For example, the probability of a false alarm is 10%. Four values of 𝑅𝑈𝑇 were chosen to monitor the performance of produced probability. During all four values, the training region is set to a constant value which equals to TR = 10*10. The number of pixels of the region under test set to RUT = 1*1, 2*2, 3*3 and 4*4, that end up with the result of the CFAR algorithm produced a probability of detection of 97.4%, 97.2%, 96.5% and 95.9% respectively. To sum up, it is concluded that when the array of damaged pixels decreases, a higher detection probability will be achieved, because the accuracy of the pixels is greater, as shown in Fig. 3.
In the following case of our results, the probability of a false alarm is 10%. Four values of training region TR were chosen to monitor the performance of produced probability. During all four values, the RUT is set to a constant value which equals to 1*1. the number of pixels of the training region set to TR = 10*10, 11*11, 12*12 and 13*13, that end up with the result of the CFAR algorithm produced a probability of detection of 97.9%, 96.9%, 96.4% and 95.9% respectively. To sum up, it is concluded that when the array of healthy pixels increases, this will decrease the probability of detection due to the large array of healthy pixels, as shown in Fig. 4.
In the following case, we set the region under test \(RUT=2\text{*}2\), and the training region \(TR=10\text{*}10\), and we mix the Pixel of damage (PoD) and the Pixel of health (PoH) in the region under test, we conclude that when the greater the number of damaged pixels in the region under test gives better performance and achieves the highest probability of detection (97.2%) for the probability of false alarm (10%), as shown in Fig. 5.
Table 2 compares the results of the proposed algorithm (last row) and the various methods previously described in terms of detection probability. In [13], an artificial neural network (ANN) is constructed to determine whether a patient has breast cancer. Saritas performs well in terms of sensitivity or probability of detection but poorly in terms of specificity or probability of false alarm. In [14], a hybrid model of computational intelligence that is built on unsupervised learning methodologies, such as value complex neural networks (CVNN) and self-organizing maps (SOM), has been developed for accurate breast cancer detection. Shirazi produces relatively high levels of sensitivity and specificity, i.e. a high chance of detection probability even when the probability of a false alarm is low. Finally, in [15], which focuses primarily on the transformational learning process for breast cancer detection and where modified VGG (MVGG) has been proposed, Khamparia achieves high sensitivity and, therefore, a good detection probability. However, the sensitivity is somewhat diminished in the case of a low false-alarm probability. Through Table 2, it can be seen that our research methodology has achieved a completely successful performance compared to other proposed diagnostic methods by obtaining a very high detection probability even when the probability of false alarm is at its lowest.
Table 2 Results of the suggested work compared to other's work on detection probability
|
\({P}_{fa}\)
|
0.1
|
0.3
|
0.5
|
0.7
|
0.9
|
Artificial Neural Networks (ANN)[13]
|
\({P}_{D}\)
|
75%
|
92%
|
96%
|
98%
|
100%
|
(SOM-CVNN)[14]
|
\({P}_{D}\)
|
90%
|
92%
|
95%
|
97%
|
99%
|
Hybrid Transfer Learning (Modified VGG 16)[15]
|
\({P}_{D}\)
|
82%
|
93%
|
96%
|
98.5%
|
99.5%
|
Proposed method (MVMD-CFAR)
|
\({P}_{D}\)
|
97.4%
|
98.3%
|
98.8%
|
99%
|
99.4%
|