Internal Defects Detection Method of the Railway Track Based on Generalization Features Cluster

Many internal defects maybe arise in railway track working, which usually have different shapes and distribution rules. To solve the problem, an intelligent detection method is proposed for internal defects of railway track based on generalization features cluster in this paper. Firstly, defects are classified and counted according to their shapes and locations features. Then, generalized features of defects are extracted and formulated based on the maximum difference between different types of defects and the maximum tolerance among same types of defects. Finally, extracted generalized features are expressed by function constraints, and formulated as generalization feature clusters to classify and identify internal defects of the railway track. Furthermore, a reduced dimension method of the generalization features clusters is presented too in this paper. Based on the reduced dimension feature and strong constrained generalized features, the K-means clustering algorithm is developed for defects clustering, and good clustering results are achieved. To defects in the rail head region, its clustering accuracy is over 95%, and the Davies-Bouldin Index (DBI) index is small, which indicates the validation of the proposed generalization features with strong constraints. Experimental results show that accuracy of the proposed method based on generalization features clusters is up to 97.55%, and the average detection time is 0.12s/frame, which indicates it has good performance in adaptability, high accuracy and detection speed under the complex working environments.


Introduction
Railway rail transit system is an essential infrastructure for cargo and passenger [1]. The track quality will directly affect the operation safety of railway traffic [2]. As the growing demand for railway traffic density and running speed, the rail track wear is also increasing, which causes a higher probability of internal rail defect, and the complexity of defect is also various [3][4]. Therefore, its internal defects detection speed and accuracy are also should be improved.
Presently, most of the rail defect detection focuses on surface defect detection. For example, Yu et al. proposed a coarse-to-fine model (CTFM) to identify defects at different scales: sub-image level, region level, and pixel level [5]. Dubey [6] used a maximally stable extremal region technique to identify and visualize the geometrical features of the defect regions on the rail head surface in railway track images. Li presented an intelligent vision detection system (VDS) for discrete surface defects, which focuses on two critical issues of VDS: image enhancement and automatic thresholding [7]. Furthermore, Zhang et al. proposed an automatic railway visual detection system (RVDS) for surface defects and presented an algorithm for detecting and extracting region-of-interest which enables identification and segmentation of the defects from rail surface [8]. Compared with the internal defects of rail, the rail surface defect image is more effective to acquire by high-speed camera, so the imaging standard is consistent. Internal rail defects cannot be visualized, while it can be detected using ultrasonic testing, eddy current testing and magnetic flux leakage testing [9]. Ultrasonic testing is widely used in rail internal defect detection because of its high penetration, excellent directivity and high sensitivity [10]. However, because of different ultrasonic acquisition equipments, it is difficult to have a unified imaging standard for rail ultrasonic B-scan image, but the feature expression of the defect is universal, which can be described by the generalized feature. B-scan image is imaged by echo of the ultrasonic probe, which can display a cross-section of rail. Few previous works can be found on internal defect detection of railway track based on ultrasonic image. Cygan [11] analyzed the advantages of B-scan image processing compared with A-scan signal analysis in rail internal defect detection. A-scan signal is a flaw detection method to evaluate the size and position of defects based on the amplitude and position of defect waves, but it is unable to determine the defect geometry directly. Huang [18] and Sun [29] used neural network model and deep learning to analyze rail B-scan images respectively and to achieve internal defects detection. Liang [12] put forward an improved imaging algorithm for rail defect identification, which is beneficial to the acquisition of a high-quality B-scan image and the design of high accuracy detection algorithm for rail internal defect image. However, there is still a challenge work to achieve high precision detection for different types of defects. Therefore, it is essential to develop an accuracy and highspeed detection algorithm for rail internal defects expressed by ultrasonic B-scan image.
The main types of rail internal defects include: (1) fatigue crack of the screw hole, (2) rail head defect, (3) crushing flaw of rail bottom, (4) transverse crack of rail bottom, and (5) material degradation of special parts [13]. Fatigue crack of screw hole is the crack located at different positions of screw hole. The internal defect of rail head mainly is the flaw, including black flaw and white flaw from the color showing in real environment. The crushing flaw of rail bottom includes transverse crack and longitudinal crack of rail bottom. To prevent such internal defects which affect the traffic working safety, nondestructive testing is usually carried out regularly to monitor track health.
Presently, X-ray can also be used to detect internal defects. For example, Cai et al. [14] proposed an X-ray image defect detection algorithm for casting based on Mask R-CNN, which provides a solution method for intelligent industrial defect detection. Guo et al. [15] proposed a welding defect detection method based on Fast R-CNN model with X-ray image and it achieved the expected experimental results. However, the mainstream method for detecting internal defects in railway track is still to analyze the ultrasonic images [16]. To detect such defects, the ultrasonic rail detection system developed by SPERRY company and TOKIMEC company can recognize and classify defects in real time. RTI company has developed an automatic defect identification system based on a neural network model, which has a learning function [17]. The Chinese Academy of Railway Sciences has developed a B-scan image rail defect classification system based on pattern recognition, whose recognition rate is about 95% [18]. However, the accuracy of the existing ultrasonic detection system is still not enough to meet the actual detection requirements.
To solve the problem of low detection accuracy, an internal defect detection method of railway track is proposed based on the generalization features cluster in this paper. Firstly, defects are classified and counted by analyzing their location and geometric features. Then, according to the maximum difference between different types of defects and the maximum tolerance of the same type of defects, the generalization features of defects are extracted. Finally, the generalization features clusters for various types of defects are established to classify the internal defects of railway track. On this basic, strong constrained generalization features clusters are formulated after dimension reduction, and K-means clustering algorithm is developed to cluster defects. Experimental results show that the proposed method can be used to detect internal defects with high accuracy and detection speed.
The rest of this paper is organized as follows. Firstly, the principle of ultrasonic image detection and defects' classification is analyzed in Section 2. Secondly, Section 3 presents generalization features clusters of different types of defects and K-means clustering algorithm based on strong constrained generalization features. Thirdly, the experimental results and analysis of the proposed method are shown in Section 4. Finally, some conclusions are given in Section 5.

The Principle of Internal Ultrasound Image Detection of Railway Track
Ultrasound has the advantages of fast spread speed, and broad applicability [19] in nondestructive testing. According to the propagation features of the ultrasonic wave in different medium, the probe emits a certain frequency of sound wave into the rail. If a rail defect arises, a defect wave will appear in front of the bottom wave, and the peak value of the bottom wave decreases or disappears. Furthermore, the size and position of the defect can be evaluated by the reflected signal. The ultrasonic echo signal is amplified, filtered and level converted by the ultrasonic receiver to draw and display the electrical signal in the form of a digital image, which is named as B-scan image. Therefore, the acquisition process of rail B-scan image is the collection process of rail defects data.
The acquisition mechanism of B-scan image in rail track is shown in Figure 1. The coupling liquid is sprayed on the contact surface between the probe and the rail to prevent the attenuation of ultrasonic signal energy, and water is usually selected as the coupling agent. The shape, position and depth of defects are detected by features of ultrasonic propagation, reflection and refraction in railway track [20]. Usually, ultrasonic pulse propagates inside the track to detect internal defects. If it encounters cracks and flaws with different acoustic impedance, it will produce a primary reflection and secondary reflection [21]. The defect shape in the railway track can be imaged and displayed [22] by analyzing the magnitude, quantity and waveform of the reflected wave. The ultrasonic probes with different angles can detect defects in different locations of the railway track and distinguish them with different colors. As shown in Figure  2, the 0° probe generates ultrasonic longitudinal wave beams, which are used to detect horizontal cracks of screw holes shown as red in the B-scan image. The 70° probe generates ultrasonic shear wave beams and detects rail head flaws by primary or secondary waves. In addition, rail head flaws are shown as red, green and blue in B-scan image. The 37° probe generates ultrasonic shear wave beams and detects other types of defects. Rail B-scan image is imaged by a reflection echo of 0 °, 37 °, 70 ° probe [23]. As shown in Figure 3, the B-scan image of color ultrasound for normal railway track and the actual B-scan image containing defects are displayed, respectively.

Classification of Railway Track Internal Defects
Color B-scan image can provide the projected-sectional features of defects with respect to the normal incident wave.
In addition, it also shows the horizontal location and depth information of the internal defects in railway track. As shown in Table 1, 12 types of railway track internal defects can be detected by ultrasound. For example, the defect of type 1 is the inside flaw in rail head. Then, it can be detected by 70° probe and shown with red in ultrasonic Bscan image. To make all these types of defects clear, some defects are taken as an example in Figure 4. In Figure 4, the inside of rail head and the inside of screw hole are both near the center of railway. Therefore, defects of type 1, 2, 3 and 4 can be shown clearly in Figure 4, other types of defects also can be shown as similar way. However, it is noted that inverted cracks can be only found in one side of two screw holes which near the rail end face. Furthermore, defects are classified according to the ultrasonic imaging mode, defect location, defect cause, defect features, and traditional railway track internal defect classification method [24].
The acquired ultrasonic image may have some differences with Figure 3(b). The acquisition process may cause random clutter, image coverage, image fracture, etc. As shown in Figure 5, all types of defects are labeled, which are shown in Table 1. There are various types of clutter in the black circle region, and the joint is in the yellow triangle region. The length of the rail head flaw is different, and it appears in pairs or separately. The lower crack of the screw hole is connected to its screw hole. There are differences in the thickness and length of rail bottom defect images contour. Furthermore, the defects are paired images contour or single images contour, which varies in shape and make it difficult to accurately describe defect features with accuracy models. Therefore, this paper proposes a detection method based on the generalization feature cluster to solve such a problem.

Defects Feature Analysis and Detection
There are many types of defects and different patterns arising in railway track working. Building a precise defect feature model by using existing samples to detect such defects images often results in problems of over-constraint or under-constraint [25]. This paper solves such problem as following. Firstly, the existing samples are analyzed to extract features for each type of defect. Secondly, extracted features are generalized to increase the fault tolerance and generalization of the features, which means that the generalized features not only show the general characteristics of the same type of defects, but also eliminate the differences as much as possible, such as their positions, shapes and so on. Then, according to the correlation and non-correlation among defects, generalization features clusters are built. Lastly, the generalization features clusters are used to detect the Bscan image of the railway track. In general defect feature model, even in the same type of defects, the specific characteristics of one defect probably cannot detect another defect properly due to the differences in their locations and shapes. Specifically, there may be few feature constraints to identify defects, or overfull feature constraints that result in misjudgment for defects. However, in the proposed method, generalization feature clusters show the common features of the same type of defects and the non-correlation characteristics between different types of defects, which can effectively avoid the problem mentioned above, speed up the detection time and improve the detection accuracy. Practically, because of the influence of different acquisition equipment, working environment, and other factors, the B-scan image usually has different features in shape, contour, position, and so on. It is difficult to describe defects accurately with traditional features or single feature due to its poor generalization and application ability. Therefore, multiple generalization features are combined to build a generalization feature cluster in this paper, which is used to achieve better defect detection and expand applicable range of the defect detection model.

Ultrasonic Image Preprocessing
As we know, image noise affects the detection accuracy and stability. It is necessary to eliminate the noise before defect features extracting. The image preprocessing flow chart is shown as Figure 6. Firstly, according to the color difference of defects, the image is separated into different color channels. Secondly, the morphological method is used to remove image noise that emerged from image acquisition processes due to artificial or environmental factors. Specifically, binary B-scan images are dilated before being corroded, and the morphological operator is set according to the contour direction. As shown in Figure  7, They are two morphological operators for the processing of lower crack of the screw hole. Then, the standardization of the ultrasonic image is processed by the skeleton extraction algorithm [26] to eliminate the thickness problem of different image contour caused by the sensitivity of the ultrasonic probe. As shown in Figure 8(a), the image acquired by the train ultrasonic detector is shown in red, green and blue. Each color represents the defect detected by probes with different angles. Firstly, the image is separated into red, green and blue image by channel separation processing. Secondly, image denoising for each single channel image can eliminate the influence of small clutter on subsequent defect evaluation. Then, to improve the detection accuracy, the single channel image is refined to obtain the standardized monochrome ultrasonic image. Finally, the defect is located and detected by the proposed algorithm.

Model analysis of generalization features cluster
Generalization features cluster is defined as the overall evaluation of feature comparison, color recognition and Euclidean distance measurement. Its model can be expressed as follows.
In which, is the feature of type defect, such as its position region, area, length and slope. ( ) means the feature function corresponding to . and are the lower and upper thresholds of ( ), respectively. In addition, their values are different for different defects.
is the color composition of type defect. , and represent the red, green and bule color components of type defect, respectively. It should be noted that not all kinds of defects have these three channel colors. ( , ) and ( , ) are the centroid coordinates of type defect and its corresponding reference, respectively. and are lower and upper limits of distance threshold, respectively. Furthermore, the value of distance threshold is different to different defects.
Therefore, a defect can be classified as type defect if it satisfies Eq. (1). For this way, every defect has the same evaluation method.

Generalization Features Clusters Analysis of Defects
Defects shown in Table 1 and Figure 5 are considered. Firstly, auxiliary lines in the image are analyzed and positioned to divide the image into three parts, including the rail head, rail waist and rail bottom. On this basic, different position region is used as original feature for rough classification. Secondly, to identify the defect more accurately, features of the defect, such as color, area, aspect ratio and location relationship, are analyzed and extracted. Then, to further increase the tolerance of extracted features to the same kind of defect, extracted features need to be transformed into generalization features. Finally, a suitable generalization features cluster composed of different generalization features is used to detect the defect. The process of building generalization features clusters for the rail head, rail waist and rail bottom are analyzed and explained as follows.

Rail head
As shown in Figure 9, defects in rail head region include joints, inner flaw, outer flaw, middle flaw and clutter.

Figure 9
Ultrasound image in rail head Following, five generalization features, including position region, defect color, defect area, defect height ratio, and the distance between defect and joint, are selected and formulated to build a generalization features cluster. The gap between two rail joints is much larger than rail head flaw. The joints are detected by probes of 0°, 37° and 70°, and the image contour colors are red, green and blue. The ratio of their lengths to the depth of rail head is large. Rail head flaws can be divided into inner, middle, and outer flaws according to their different positions, which are distinguished by red, green and blue, respectively. In addition, flaw area in rail head is smaller than the joint area, and the ratio of its length to rail head height is small. After image preprocessing, a small number of larger clutters are also recognized as defects by generalization feature clusters. Usually, the larger clutter is caused by the separation of the ultrasonic probe from the track surface, which leads to a longer oblique line in the rail head region. Such generalization features can be expressed as follows. Joint, Inner flaw of rail head, Middle flaw of rail head, Outer flaw of rail head, In Eq. (2)  is an area constant related to image resolution, is the height of contour circumscribed rectangle for joint, is the height of rail head region.
is the ratio of the height of contour circumscribed rectangle for joint to the height of the rail head region. ℎ is the depth of the rail head.
is the constant of length proportional coefficient, which is set according to samples statistics and calculation.
is the width of detected image.
is the constant of horizontal distance between the defect and the joint. Generalization features clusters expressed by Eq. (2) -(5) can be used to detect joints and three kinds of rail head flaws.
Generally, generalized feature constraints of the rail head flaws include the longitudinal position of the flaw in rail head, the color composition of the rail head flaw, the area of the rail head flaw, the ratio of the length of the rail head flaw to the height of the rail head region, and the horizontal distance between the rail head flaw and the joint. The longitudinal position of the rail head flaw limits its longitudinal distribution in the rail head region. As shown in Figure 10, the position of the rail head flaw fluctuates up and down, but it is always in the rail head region. Therefore, the longitudinal position features have poor constraint but good generalization ability. The color composition of rail head flaw is an essential condition to distinguish its type and it has clear constraint directivity. The area of flaw reflects the size of the defect. While the area is not the same for each defect, it is generally distributed in a specific range. After data statistics, its upper limit distribution can be defined by the area constant whose value setting is flexible. The ratio between the length of the rail head flaw and the height of the rail head region is an important criterion to distinguish the joint and the flaw, and it has a strong constraint and good features generalization ability. However, its threshold setting is critical. The horizontal distance constraint between the rail head flaw and the joint is to prevent the joint from being evaluated as the rail head flaw of the same color channel after channel separation. Generalized features constraints of joints are similar to those of rail head flaws.
(a) Rail head flaws at approximate height (b) Rail head flaws at different height Figure 10 Longitudinal distribution of rail head flaws

Rail waist
As shown in Figure 11, the ultrasound imaging of rail waist includes the screw hole, inner upper cracks of the screw hole, inner lower cracks of the screw hole, outer upper cracks of the screw hole, outer lower cracks of the screw hole, horizontal cracks of the screw hole, inner inverted cracks of the screw hole, outer inverted cracks of the screw hole and clutter.

Figure 11
Ultrasound image of the railway track waist Following, seven generalization features, including position region, defect color, centroid coordinate, defect slope, defect length, ratio of defect area to defect length, and defect distance, are selected and formulated to build a generalization features cluster. Normal screw hole image contours are defined by semi-circular images composed of red, green and blue line segments, and one of them is marked with an ellipse in Figure 11. The image contour of upper crack in screw hole is located on the left or right side of the normal screw hole image contour, which is related to the defect position. The image contours of the lower crack and horizontal crack in screw hole both are located below the normal image contour of the screw hole. The image contour of inverted crack is located below the normal image contour of screw hole, and their directions are opposite. Usually, image contour of cluster is not a line segment, while it can be filtered by setting ratio constraint which is the ratio of image contour area to image contour length. For example, image contours on the inside of screw hole include four defects, including inner upper crack, inner lower crack and inner inverted crack of the screw hole, and the clutter. Generalization features clusters of image contours on the inside of screw hole can be expressed as follows. The inner upper crack of the screw hole, The inner lower crack of the screw hole, The inner inverted crack of the screw hole, In Eq. (6) -(8), , is the centroid coordinates of screw hole.
, , , and , are the centroid coordinates of inner upper crack, inner lower crack and inner inverted crack for screw hole, respectively. , and denote as the color composition of inner upper crack, inner lower crack and inner inverted crack for screw hole, respectively.
is the contour slope of screw hole.
, and are the contour slope of inner upper crack, inner lower crack and inner inverted crack for screw hole, respectively.
, and are the area of inner upper crack, inner lower crack and inner inverted crack for screw hole, respectively.
is the length of screw hole.
, and are the length of inner upper crack, inner lower crack and inner inverted crack for screw hole, respectively. In addition, and are constants related to the ratios of defects length to the normal screw hole length.
is a constant related to the ratio of defect area to its length. ℎ is the height of rail waist region, which is expressed by the number of pixels. and are constants related to the distance between image contour centroids, which are the distance between the screw holes.
The combination of generalization features clusters expressed by Eq. (6) -(8) can detect three kinds of screw hole cracks, i.e. the inner upper crack, the inner lower crack and the inner inverted crack of screw hole. The outer upper crack, the outer lower crack, the outer inverted crack and the horizontal crack of the screw hole can be detected by the same way.
The generalized features constraints of screw hole cracks can be summarized as following, the centroid height, the color composition, the position relationship between screw hole crack and its corresponding screw hole, the inclination relationship between screw hole crack and its corresponding screw hole, the length ratio between screw hole crack and its corresponding screw hole, the ratio of screw hole crack area and its length, the distance relationship between screw hole crack and its corresponding screw hole. Specifically, the centroid height of screw hole crack reflects its longitudinal position in screw hole. The color composition of screw hole crack reflects the angle of acquisition probe. The position relationship between the screw hole crack and its corresponding screw hole reflects that the screw hole crack appears above or below the screw hole. The inclination angle of the screw hole crack and its corresponding screw hole reflects whether they will be imaged by the same color channel. If the inclination direction is the same, they are imaged by the same color channel. Otherwise they are imaged by different color channels. Ratio of the screw hole crack length to the corresponding screw hole length reflects the size relationship of the screw hole crack. The ratio of crack area to its length approximately reflects the crack width, which is affected by the sensitivity of the probe. The distance between the screw hole crack and its corresponding screw hole is a significant constraint to evaluate the screw hole crack, and its threshold can be obtained by statistical methods.

Rail bottom
As shown in Figure 12, the defect of rail bottom contains longitudinal crack, transverse crack and clutter.

Figure 12 Ultrasound image of the rail bottom in railway track
Following, five generalization features of position region, defect color, defect slope, defect length and defect distance, all are selected and formulated to build a generalization features cluster. There are three types of image contours for transverse cracks, i.e. single blue oblique segment, single green oblique segment and bluegreen oblique segment appearing in pairs. The longitudinal crack is a long line segment with red and horizon. Transverse cracks and longitudinal cracks are represented by generalization features cluster as follows. Transverse crack of rail bottom, Longitudinal crack of rail bottom, In Eq. (9) and (10),( , ) and ( , ) are the centroid coordinates of transverse and longitudinal crack in rail bottom, respectively. and denote as the color composition of transverse crack and longitudinal crack for rail bottom, respectively. and are the length of transverse crack and longitudinal crack for rail bottom, respectively. ( , ) and ( , ) are the centroid coordinates of two adjacent rail bottom defects respectively. and are the slope of two adjacent rail bottom defects respectively. In addition, if the distance between these two points is less than the setting distance , the transverse crack of rail bottom is composed of blue and green defect contours in pairs, and the contours slope directions are opposite.
is the slope of longitudinal crack, and the image contour of longitudinal crack in rail bottom is approximately horizontal. is the rows of ultrasonic images, and is the length threshold. The combination of generalization features clusters expressed by Eq. (9) -(10) can be used to detect transverse and longitudinal cracks in rail bottom.

Feature Clustering and Dimension Reduction
In the generalization features cluster, constraint condition of each feature is different. Some features have good constraint effect while some features have poor constraint effect. In this paper, by analyzing the constraint effect of each feature, the feature with strong constraint effect is selected. Then, on this basic, new constraint features are constructed to achieve dimension reduction and simplify defects identification. In the rail head region, is the area of contour circumscribed rectangle, and is the ratio of the height for contour circumscribed rectangle to the height for rail head region, which both are strong constraint generalization features for evaluating the rail head flaw and joint. Usually, there are various kinds of screw hole cracks in the rail waist, which are evaluated by the position relationship between screw hole and defect. Therefore, it is difficult to constrain defects uniformly with the reduction dimension's generalized features. However, for a specific type of defect, it can be identified by using the reduced dimension generalization features cluster. For example, the contour centroid height and defect area are strong constraint generalization features to evaluate the horizontal crack of the screw hole. The centroid height and the contour circumscribed rectangle length are strong constraint generalization features to evaluate the lower crack of the screw hole. The angle formed by green and blue contours and the horizontal distance between two centroids are also strong constraint generalization features to evaluate the rail bottom defects.
In this paper, the K-means clustering method is also developed to classify defects in different regions with following ways. ⑴ The number of defect types k is determined in different regions of the rail, and the strong constraint generalization features of various kinds of defects are extracted respectively. Then, the data set is built by the strong constraint generalization feature parameter data. Moreover, normalization and standardization processing are required for the data with following expression, and (11) In which, is the jth sample point of class i defect data, and are the mean and variance of class i defect data, respectively. and are the maximum and minimum of class i defect data, respectively. ⑵ k clustering centers are selected randomly, which can refer to the initial setting threshold of related constraint features. Furthermore, the Euclidean distance from the rest of the data points to the clustering center is calculated, and then the nearest clustering point is selected as their types. ⑶ All data points belonging to the same cluster are processed with centroid operation, then a new cluster center is calculated. ⑷ Process of (2) to (3) is repeated until the cluster center no longer changes.
To evaluate the clustering, the DBI is used [27,28]. DBI Index is the mean value of measuring the maximum similarity of each cluster, and it is an important index to measure the intra cluster distance and the cluster spacing. The smaller the value, the better the clustering effect. Assuming that there are m digital sequences which are clustered into n clusters, then m digital sequences are set as input matrix X, and n clusters are used as parameter to input algorithm. The DBI can be expressed as In which, is the average distance from the data of class i defect to the cluster center of its corresponding class. It can be expressed as In which, is the number of data in class i, is the cluster center of class i, p is used to calculate the Euclidean distance of two-dimensional data and p=2. The distance between cluster i defect and cluster j defect is defined as DBI index can be defined with the same way.
Following, the , of the maximum value for each cluster class i defect is calculated. Then, is the maximum similarity between cluster class i and other cluster classes. The DBI index is obtained by considering the mean of the maximum similarity of all classes, Therefore, the algorithm proposed in this paper can be expressed in Figure 13. Firstly, various defects in ultrasound images are classified according the classification information shown in Table I. Secondly, defect images are preprocessed to achieve image enhancement, including color channel separation, image denoising, image standardization and image contour location. On this basic, the generalization features of various types of defects are analyzed. For example, in rail head region, after analyzing the influence of various defects features on the detection effect, five generalization features with better detection effect are selected to detect defects. Then, they are formulated and expressed by theoretical analysis. Finally, generalization features clusters of all kinds of defects are constructed by the adopted generalization features for defects identification and judgement. In addition, based on the proposed generalization features, K-means clustering algorithm is used to verify the validity of strong constraint generalization features based on dimension reduction and achieve the identification and classification of defects.

Figure 13
The flow chart of the proposed algorithm

Proposed Method
To make the proposed algorithm clear, the detection method of rail head flaws is taken as an example to illustrate it again, and other types of defects detection is the same. Firstly, the image is preprocessed for image enhancement and removing noise. Detailed steps include color channel separation, image denoising, image standardization and image contour location. Secondly, generalized features of rail head flaws are formulated and expressed, including position region, defect color, defect area, defect height ratio, and the distance between defect and joint. Finally, these generalization features are used to construct an appropriate generalization features cluster for the identification and judgment of rail head flaws. On the other hand, after extracting all kinds of generalization features of flaws, the strong constraint generalization feature based on dimension reduction is analyzed and obtained, and K-means clustering algorithm is applied to identify and classify the rail head flaws.
Following, to evaluate the effectiveness of the proposed method, sample images and tested images are detected and analyzed to verify its detection accuracy and speed. The cooperator, Guangdong Goworld Co., Ltd, provides 43 sample images with a resolution of 2000  400. Defects of sample images are labelled by three experienced track inspection workers before testing. Then, the proposed method is used to extract the generalization feature clusters from 43 sample images and detect defects. In this experiment, the misjudgment rate and the missed detection rate are also analyzed.
Misjudgment rate: = Missed detection rate: = In Eq. (18)- (20), is the accuracy rate, is the misjudgment rate, and is the missed detection rate. is the number of defects counted by manual inspection, is the number of defects consistent with the results of the manual detection.
is the number of a type of defects detected artificially but detected to be other type of defects by the proposed method. is the number of defects detected artificially but detected to be normal by the proposed method. If there is no missed inspection and misjudgment, then = = 0 , = 1 . If there is no missed inspection and only false inspection, = 0, = 1 − . Then 43 sample images and 953 testing images are detected, and the experimental results are shown in Table 2, Table 3 and Table 4, respectively. As shown in Table 2, 43 sample images are tested. Its detection accuracy of the proposed method is 98.98%, and the average detection speed is 0.17s per sample image. While the misjudgment rate is 0.64%, the missed detection rate is 0.38%.
On the other hand, 953 railway track images are detected with the proposed method based on the generalization features cluster extracted from the sample image, and its results are shown in Table 3. The detection accuracy is 97.55%, while the misjudgment rate is 1.14%, and the missed detection rate is 1.31%. in addition, the average detection time is 0.12s. The detection speed is faster than before, which validates the effectiveness of the proposed method.
The type I and type II track ultrasonic image are shown in Figure 14. Three channels of the rail head region are displayed respectively in type II track ultrasonic image that includes 7 boundary lines. In addition, channels of the rail head region are displayed too in type I track ultrasonic image that includes 5 boundary lines. 300 type II track testing images are detected by adjusting constraint parameters related to their position and region. Testing results are shown in Table 4.  Through comparison among Table 2-Table 4, the detection accuracy rate is higher than 97% for different types of track defects image by using the generalization features cluster extracted from the same sample image. It means that the proposed method has good applicability and practicability for railway track defect detection. However, the rate of missing detection and misjudgment for testing image is slightly higher than that rate of the sample image. For example, the rate of misjudgment for the inner and outer inverted screw hole laceration was 15.38 %, 10.00% and 25.00%, respectively. The main reason is that the inverted crack of screw holes is located near the joints. The ultrasonic probe will be off the rail surface in a short time because of the gap at the joints. And the ultrasonic wave attenuates rapidly in the air, which results in the occurrence of a variety of clutter with different patterns and sizes in this region. Moreover, some clutter images contour near the joint are complex, and a few clutters are mistaken for inverted crack of screw hole because their contours conform to the features of generalization features clusters.
For these reasons, the rail waist region should be divided into the near joint region and far joint region, and the generalization features cluster should be extracted respectively, which will reduce the detection speed but improve the detection accuracy.

Comparison with Other Methods
To verify the detection performance of the proposed method, it is compared with two methods. One is the BP neural network method [18]. Another is the intelligent method of rail defect identification based on deep learning [29]. In Ref. [18], different neural network structures are used to identify different types of defects, and the initial weights are defined by empirical values. Then, momentum BP method and sigmoid function are used to train and output results, respectively. In Ref. [29], the method is proposed based on the framework of AlexNet convolutional neural network, and the softmax function is used to output results while the training is carried out in TensorFlow framework. The results are shown in Figure  15. The testing data and images of the proposed method all are from the EGT-60 double rail defect detector.
As shown in Figure 15, X-coordinates 1-12 are the number of the defect type corresponding to Table 1, and Xcoordinates 13 is the average accuracy of each corresponding method. Ref. [18] only detects screw holes. And its detection accuracy for defect type 4-10 is 95%, which means that its average detection accuracy is also 95%. In addition, it needs to adjust the correlation thresholds of the hidden layer and output layer of neural network. Ref. [29] also classifies the screw holes as screw hole defects, and the detection accuracy of defect type 4-10 is 96%. That is, the average detection rate is 94.2%. While, by comparison, the average defect detection rate of the proposed method is up to 97.60%. Therefore, considering the main generalization features of the restraint defect, the proposed method performs better than these two methods obviously. The experimental results further validate the effectiveness of the proposed method for railway track internal defect detection.

Generalization Features after Reducing Dimension
To validate the effectiveness of the proposed clustering method that uses reduced dimension strong constraint generalization features, K-means clustering method is used to classify the rail head flaw and joint. Moreover, the DBI index and correct and wrong classification index are also used to evaluate the clustering results. The results are shown in Figure 16. The number of samples are 120, 300 and 480 data points, respectively.

Figure 15
The comparison result of three methods As shown in Figure 16, the blue and red dots are the clustering results of rail head flaw and joint, respectively. Moreover, the blue and red squares are the clustering centers of rail head flaw and joint, respectively. In addition, dots marked with rectangle are the data of error clustering. A few of them are considered as joints by K-means algorithm due to their locations. Such way usually cause misclassification. One of the reasons may be that they are noise with large image area and not removed during preprocessing. Another possible reason is that the contour of some flaws is changed after the morphological processing, which makes the area and height change too and results in misclassification.
Furthermore, from Figure 16, the distribution of the rail head flaw is relatively centralized and stable, and the distribution of the joint is relatively scattered. The reason is that joint is imaged by 70° end face wave with large amplitude and long displacement, which leads to the scattered situation of joint imaging. Therefore, joints present the feature of a broad span of data distribution.
(a) Cluster results of 120 groups of data (b) Cluster results of 300 groups of data (c) Cluster results of 480 groups of data Figure 16 Clustering results of main defect in rail head region Table 5 shows detailed clustering results. From the cluster center, it shows that the cluster center of the rail head flaw is relatively stable, while the cluster center of joint fluctuates. With the increase of the number of samples, the accuracy of clustering is gradually increasing. However, the number of error clustering is also increasing. The reason is that the K-means clustering algorithm only measures the similarity among data points, and it does not consider the correlation of features contained in the same set of data, such as the weight proportion of area parameter and height parameter. It can be solved by keeping the stability of B-scan image acquisition to avoid the appearance of extreme data. In addition, it also can be inferred that the DBI index of several clustering results is smaller, which indicates that the clustering effect is improved, and the defect identification and classification can be achieved by strong constraint generalization features.

Conclusions
To achieve rail internal flaw detection, an internal defect detection method based on generalization features cluster is proposed in this paper. The main conclusions are as follows: (1) The proposed method can avoid the instability, overconstraint, or under-constraint of feature expression during detection processing. Experimental results show that the proposed method can also perform better than other methods with the accuracy of 97.55% and the average detection time of 0.12s/frame.
(2) K-means method is developed to cluster the strong constrained generalized features after dimension reduction. Besides, the DBI index and clustering accuracy are used to evaluate the results, and good clustering results are achieved with the accuracy of over 95%.
(3) Comprehensive experimental results show that the proposed method has higher detection accuracy and better application prospects than developed methods, which can be used to improve the operation safety of high-speed rail and rail transit systems.
The further work will be studying the internal defects detection method based on deep learning.