3.1 Basic principle
Expansion and erosion algorithm: The principle of expansion and erosion is to define some images or a part of the target image area, which is A, and expand the convolution operation between A and the kernel (predefined as B). The convolution operation core B can be of any size or shape and has specially defined anchors. In most cases, the B core is a square or small solid disk.
Expansion A and B are displayed as the total set of all pixels, with a value of 1. We can regard B ^ as a mapping to B, which is described as:
$$\widehat{\text{B}}=\{\text{x}\mid \text{x}=-\text{b},\text{b}\in \text{B}\}$$
1
Among them (B ˆ) X represents x bit shift of B mapping, described as:
$$(\text{M}{)}_{\text{x}}=\{\text{y}\mid \text{y}=\text{a}+\text{x},\text{a}\in \text{M}\}$$
2
Corrosion operation of A and B is A Θ B. This operation is described as:
$$\text{A}{\Theta }\text{B}=\left\{\text{x}\mid (\text{B}{)}_{\text{x}}\subseteq \text{A}\right\}$$
3
The above formula shows that using set B will be a new solution domain x after moving set A with erosion operation, where the solution of solution domain B will remain in domain A after shifting x bits.
There are many methods to select the threshold. No matter which one is used, the original target image f (x, y) is segmented by the threshold T, and the result of this operation is equivalent to another expression of the original image f (x, y), which can be realized by the occupancy matrix method.
Geodesic active contour model:
$$\text{E}\left(\text{C}\right(\text{p}\left)\right)={\alpha }{\int }_{0}^{1} \left|{\text{C}}_{\text{p}}\left(\text{p}\right)\right|\text{d}\text{p}+{\beta }{\int }_{0}^{1} {\left|{\text{C}}_{\text{p}\text{p}}\left(\text{p}\right)\right|}^{2}\text{d}\text{p}-{\lambda }{\int }_{0}^{1} |\nabla \text{I}[\text{C}\left(\text{p}\right)\left]\right|\text{d}\text{p}$$
4
The first and second terms in Eq. (6) are the integral of arc length and curve curvature respectively, and are internal energy terms. The third term is the external energy term, which can make the curve stop at the boundary. Finally, it can be rewritten as follows:
$$\text{E}\left(\text{C}\right(\text{p}\left)\right)={\alpha }{\int }_{0}^{1} \left|{\text{C}}_{\text{p}}\left(\text{p}\right)\right|\text{d}\text{p}+{\lambda }{\int }_{0}^{1} \text{g}(\nabla \text{I}[\text{C}\left(\text{p}\right)\left]\right)\text{d}\text{p}$$
5
The function g in the formula is usually simply defined as:
$$\text{g}\left(\text{r}\right)=\frac{1}{1+{\text{r}}^{2}}$$
6
Geodesic lines in Riemannian space:
$${\text{L}}_{\text{R}}\left(\text{C}\right)={\int }_{0}^{1} \text{g}\left(\right|\nabla \text{I}\left[\text{C}\right(\text{p}\left)\right]\left|\right)\left|{\text{C}}_{\text{p}}\left(\text{p}\right)\right|\text{d}\text{p}$$
7
3.2 Segmentation algorithm
The image pixels are regarded as vertices, and the weight is allocated between the vertices to construct the objective function, and the objective function is divided by using the segmentation criteria based on graph theory to achieve image segmentation. If the image is divided into two disjoint sets G1 and G2; And it must meet the following requirements: G1∪G2 = V, G1∩G2 = φ. Then the set of edges connecting G1 and G2 is called the cut set of the graph, and the sum of its weights is called the cut. In graph theory, graph G (V, E) is divided into two parts. The objective functions of G1 and G2 are given by the following formula.
$$\text{C}\text{u}\text{t}\left({\text{G}}_{1},{\text{G}}_{2}\right)=\sum _{\text{u}\in {\text{G}}_{1},\text{v}\in {\text{G}}_{2}} \text{w}(\text{u},\text{v})$$
8
The optimal partition of a graph is to create an objective function to achieve the minimum partition.
However, the optimization of the objective function only considers the relationship between the subsets of G1 and G2, and does not consider the relationship between the vertices of each subgraph. As shown in Fig. 1:
It can be seen from Fig. 1 that the ideal division form is the dotted line in the middle of the figure. However, the minimum cut standard divides the minimum cut 1 and minimum cut 2 into one category. At this time, the segmentation is meaningless.
In order to overcome the defect of minimum cut, the algorithm was improved and the normalized cut criterion was defined. As the formula:
$$\text{N}\text{c}\text{u}\text{t}\left({\text{G}}_{1},{\text{G}}_{2}\right)=\frac{\text{cut}\left({\text{G}}_{1},{\text{G}}_{2}\right)}{\text{assoc}\left({\text{G}}_{1},\text{V}\right)}+\frac{\text{cut}\left({\text{G}}_{1},{\text{G}}_{2}\right)}{\text{assoc}\left({\text{G}}_{2},\text{V}\right)}$$
9
The value of Ncut (G1, G2) represents the similarity between classes. The smaller the value, the better the segmentation result.
Combining the three equations, the equation between classes represented by the three equations can be written as the following equation.
$$\begin{array}{c}Ncut\left({\text{G}}_{1},{\text{G}}_{2}\right)=\frac{\text{cut}\left({\text{G}}_{1},{\text{G}}_{2}\right)}{\text{assoc}\left({\text{G}}_{1},\text{V}\right)}+\frac{\text{cut}\left({\text{G}}_{1},{\text{G}}_{2}\right)}{\text{assoc}\left({\text{G}}_{2},\text{V}\right)}\\ =\frac{\text{a}\text{s}\text{s}\text{o}\text{c}\left({\text{G}}_{1},\text{V}\right)-\text{a}\text{s}\text{s}\text{o}\text{c}\left({\text{G}}_{1},{\text{G}}_{1}\right)}{\text{assoc}\left({\text{G}}_{1},\text{V}\right)}+\frac{\text{a}\text{s}\text{s}\text{o}\text{c}\left({\text{G}}_{2},\text{V}\right)-\text{a}\text{s}\text{s}\text{o}\text{c}\left({\text{G}}_{2},{\text{G}}_{2}\right)}{\text{assoc}\left({\text{G}}_{2},\text{V}\right)}\\ =2-\left(\frac{\text{assoc}\left({\text{G}}_{1},{\text{G}}_{1}\right)}{\text{assoc}\left({\text{G}}_{1},\text{V}\right)}+\frac{\text{asoc}\left({\text{G}}_{2},{\text{G}}_{2}\right)}{\text{assoc}\left({\text{G}}_{2},\text{V}\right)}\right)\\ =2-Nassoc\left({\text{G}}_{1},{\text{G}}_{2}\right)\end{array}$$
10
It can be seen that the normalized cutting criterion satisfies the minimum inter class similarity and the maximum intra class similarity. The algorithm is a criterion of the global optimization algorithm. The segmentation result does not tilt to the far point or the minimum area, and the shape is more compact. However, the edge compatibility is not good, and with the increase of the number of vertices in the graph, the solution time will become longer.
The graph cutting theory is shown in Fig. 2.
The problem of image segmentation can be seen as a problem of binary image pixel labeling. The pixel belonging to the foreground is marked as 1, and the pixel belonging to the background is marked as 0.
3.3 Image segmentation evaluation results
In semantic image segmentation, classification accuracy is a measure of the classification effect of each category. The calculation process is as follows:
$$\text{C}{\text{A}}_{\text{i}}=\frac{\left|{\text{P}}_{\text{i}}\cap \text{G}{\text{T}}_{\text{i}}\right|}{\left|\text{G}{\text{T}}_{\text{i}}\right|}$$
11
Pi is the pixel set divided into the ith category in the test set, and GTi is the pixel set marked by the ith category in the test set.
The average classification accuracy is the average classification accuracy of all categories in the dataset, which is calculated as follows:
$$\text{C}\text{A}\text{A}=\frac{1}{\text{n}}\sum _{\text{i}=1}^{\text{n}} \text{C}{\text{A}}_{\text{i}}$$
12
Where, n is the number of categories divided into data sets, i represents the ith category, and CAI represents the classification accuracy of the ith category.
The global accuracy is the standard for evaluating pixel level image segmentation accuracy. It is defined as the ratio between the sum of the intersection of the pixel point set Pi of the segmentation results of various test sets and the number of pixels in the test set, M, as follows:
$$\text{G}\text{A}=\frac{\sum _{\text{i}-1}^{\text{n}} \left|{\text{P}}_{\text{i}}\cap \text{G}{\text{T}}_{\text{i}}\right|}{\sum _{\text{i}=1}^{\text{n}} \left|\text{G}{\text{T}}_{\text{i}}\right|}$$
13
Figure 3 Changes in the accuracy and loss of the training model (left: model accuracy; right: model loss)