Fetal heart ultrasound image-oriented adaptive classification deep model based on differentiable architecture search


 Prenatal ultrasound examination is used for screening congenital heart defects and fetal genetic diseases. Unfavorable factors such as low signal-to-noise ratio, artifact and poor fetal posture in ultrasound images make it a very complicated task to identify and interpret the standard scan plane of the fetal heart in prenatal ultrasound examinations. Deep learning related methods are widely used to process and analyze medical images. However, designing an effective network structure for a specific task is a time-consuming and relies on expert knowledge. In order to obtain an effective fetal ultrasound image classification model in a short time, this paper collects and organizes the Fetal Heart Standard Plane(FHSP) level III screening dataset, and we use the Differentiable Architecture Search(DARTS) method for FHSP classification task to automatically obtain an efficient adaptive classification deep model called Ultrasound Image Adaptive Classification model(UIAC) for assisting the diagnosis of fetal congenital heart disease. This new model is a deep neural network consisting of two automatically searched optimal blocks. Our UIAC model has fewer parameters than the mainstream manned classification networks. Moreover, it has achieved the best recognition results on the FHSP classification task: top1-accuracy 89.84%, macro-f1 89.72%, kappa score 88.82%.


Introduction
Congenital Heart Disease (CHD) is one of the most common congenital malformations in newbornsError! Reference source not found.. There are many types of CHD, which can easily lead to miscarriage, stillbirth, and neonatal death. It is one of the important causes of infant deaths, which seriously affects the quality of newborn population 2 . Ultrasound is a routine screening tool offered to all pregnant women because of its safety, relatively low cost and real-time manner 3 .
Prenatal ultrasound examination is used for screening congenital heart defects and fetal genetic diseases. It can not only reduce the birth rate of defective newborns caused by congenital factors to a certain extent, but also has a guiding significance for the postpartum treatment of defective fetuses, which is conducive to improving the quality of the populationError! Reference source not found..
Prenatal ultrasound examination generally includes image scanning, standard plane search, structure observation, parameter measurement and diagnosis. Determining of the standard plane is the prerequisite for structural observation, parameter measurement and final diagnosis, and it is an important part of prenatal diagnosisError! Reference source not found.. Factors such as low signal-to-noise ratio of ultrasound images, image artifacts and poor fetal posture make it a very complicated task to identify and interpret the standard scanning plane of the fetus in prenatal ultrasound examinationsError! Reference source not found.. Ultrasound imaging is affected by noise and shadows, which will lead to poor imaging results and affect recognition accuracy. Recognition of related structures in ultrasound images is also difficult for inexperienced doctors and non-professionals. Therefore, constructing a FHSP classification model for auxiliary diagnosis can not only improve the efficiency of prenatal ultrasound examinations, but also reduce the burden on doctors. It is of great significance and a very challenging task 3 .
In recent years, deep learning research has made amazing achievements in many fields (such as image classification, image segmentation, image restoration, object detection, natural language processing, etc). Deep learning related methods are also widely used to process and analyze medical images for auxiliary diagnosis and other related tasks. It is well applied in the diagnosis of skin, fundus, lung, breast, thyroid, liver and other organ diseasesError! Reference source not found. - proposed a new multi-task convolution neural network that learns to use the gaze tracking data of the ultrasound machine on the input ultrasound video frames to generate clinically relevant visual attention maps to assist with standardized abdominal circumference (AC) Plane detection.
With the rapid development of deep learning technology, a large number of computer vision problems have been significantly improved in performance. Because deep learning methods can learn more abstract and complex representations directly from the original data. Therefore, deep learning methods have stronger applicability and better performance than traditional machine learning methods in complex image recognition tasksError! Reference source not found..
However, the existing image classification models are all artificially designed network structures.
The network architecture hyperparameters are complex, discrete, and disordered, which need to be adjusted from multiple dimensions such as depth, width, and jumper for specific tasks.
Furthermore, designing an effective network structure for a specific task requires a lot of experimentation, which is time-consuming and relies on expert knowledge 16 . Neural Network Structure Search (NAS) is a very competitive method to solve the above problems 16,17 . It is an automated method that acquires a neural network model for a specific dataset. The obtained model by this method has achieved good performance in image classification, object detection and video understanding. In order to obtain an effective fetal heart ultrasound image classification model in a short time, this paper collects and organizes the FHSP level III screening dataset, and we automatically obtain a fetal heart ultrasound image-oriented adaptive classification model called Ultrasound Image Adaptive Classification model. This model is used to aid the diagnosis of fetal congenital heart disease.
Our main contributions in this work are as follows: 1)According to the specification of fetal heart ultrasound classification, the FHSP level III screening dataset was constructed, with a total of eleven standard planes. 2)Aiming at the task of fetal heart ultrasound image classification , we use the DARTS method to automatically obtain fetal heart ultrasound image-oriented two optimal block. 3)We obtain a new classification network model (UIAC) consisting of two automatically searched optimal blocks to realize the standard plane classification of fetal heart ultrasound. 4)Experiments on the FHSP dataset show that the proposed UIAC model has the least amount of parameters and achieves the best recognition results than mainstream manned classification networks.

Method
The FHSP adaptive classification model framework is shown in Figure 1. It mainly includes three steps: the optimal block search driven by fetal heart data, adaptive classification model acquisition and model application. In this section, we introduce in detail the FHSP adaptive classification model framework.

The optimal block search
First, we use the DARTS method to automatically generate an adaptive FHSP optimal block. It can be seen from the part 1 of Figure 1 that the network is composed of two different blocks. The standard block keeps the size of feature map unchanged. The reduction block halves the size of feature map. There is no difference in the overall structural design of the two optimal blocks. These two blocks are directed acyclic graphs composed of N ordered nodes and edges between nodes. The node represent the feature mapT and each node with a lower number is connected to a node with a higher number. For example, 0, 1, 2 N N N will be connected to 3 N . The edge ij E between nodes represents the operation of the feature map i T to j T . In the part 1 of where   m i oTmeans that the m-th operation in the search space is applied to the feature map of node i . And the feature map of node can be expressed as: where m ij  represents the weight of the m-th operation between node i and node j .   m ij i oT means the m-th operation between node i and node j . In order to make the discrete search space continuous, all operation weights , mm     are relaxed using the softmax function. So that the network structure parameter  can be optimized. For the FHSP classification task, to obtain the adaptive classification model, we must obtain the optimal blocks corresponding to the optimal network structure parameters  . Therefore, we need to find the best network structure parameters  . It is can be expressed as a matrix: where M is the number of operations in the search space, N is the number of edges between nodes. We need to continuously optimize the network structure parameters  . The data in the training set is divided into training set train D and validation set val D . We define training loss train L and verification loss val L . First, the train D is used to calculate the training loss train L updating the network weights *  by gradient descent. Then, the val D is used to calculate the verification loss val L after the network weight is updated, and the network structure parameters  are optimized by gradient descent. The above two-step optimization can be expressed as: where   *  is the optimal network weight when the structure parameter is  . If the network weight *  is adjusted to the optimal for each network structure parameter  , it will consume a lot of computing resources and time. Therefore, the network structure parameters  are optimized by the approximate optimal network weights   *  optimized  one step . The derivation process can be seen in DARTS 18 . After the above two-step optimization, the optimal network architecture parameters *  are obtained. Using the maximum weight operation in *  instead of the mixing operation to obtain the optimal block B based on FHSP classification. The gray dotted box of the part 1 shows the search process of the block. The top is the initial block with mixed operation that the operations between each two nodes are all operations in the search space. The operation between nodes has two parameters: the network structure weight  updating with validation set and the network weight updating with training set. The middle are candidate blocks that obtained according to the optimized network structure parameters during the search process. In each candidate block, the operation with the maximum network structure weight is reserved among the nodes. Below is the best optimal block B for the FHSP classification task. Algortithm 1 gives the process of optimal block search by fetal heart data driven.
Algorithm 1 the process of optimal block search by fetal heart data driven. 13： end for 14：Obtain the optimal block architecture parameters matrix 15：Use the maximum weight operation in *  instead of the mixing operation to obtain the optimal block B . 16：Output the optimal block B .
The initial block is shown in Figure Figure 2 It includes two input     Figure 4 is processed in the same way as the standard block in Figure 3, except that the operation between each node is different.

Model application
Finally, the part 3 in Figure 1 shows were used to construct the fetal heart ultrasound level III screening dataset. Each standard plane selected 1,000 ultrasound images, which contained a total of 11,000 ultrasound images. The fetal heart ultrasound dataset sample of is shown in the Figure 6. During the experiment, we use a random division method, using 80% of the data as training data and 20% of the data as testing data. We evaluated the UIAC model and the mainstream manually designed classification models on the FHSP dataset. Model evaluation uses common indicators in multiple classifications: top1 accuracy, macro-f1 score, kappa score. In order to evaluate the performance of different classic classification models on FHSP image classification tasks. Training from scratch the Vgg series, Resnet series, Densenet series models, taking the highest top1 accuracy value and the macro-f1 score, the kappa score of the same epoch of the test set for evaluation. All experiments were completed on the Inspur server platform using a Tesla V100-SXM2-32GB graphics card.
In model training phase, the training set contains 8800 FHSP images, and the test set contains 2200 FHSP images. Image size 64 * 64, batch_size = 96. The auxiliary classifier weight is 0.25. The initial learning rate 0.02. As the number of training rounds increases, using the CosineAnnealingLR method to adjust the learning rate. Momentum 0.9. Weight decay The cutout is used to method to enhance the training set data, the cutout length is 6. The initial number of channels is 32 to ensure our model size (around 6.8M) is comparable with other mainstream manually designed classification models. The training early stop strategy is added in the training phase. After the network is trained for 500 epochs, the network training loss is judged in each subsequent round. If the network training loss does not decrease for 50 consecutive rounds, the model is considered to be converged and training is stopped. Therefore, each model training epochs will be different. Table 1 shows the comparison of the evaluation indicators of the FHSP image classification results of the Vgg 19 series model. As can be seen from the table, Vgg13-bn gets the highest score on accuracy, macro-f1 and kappa score. For the FHSP classification tasks, Vgg13-bn has achieved best classification results in the Vgg series model.

Model
Accuracy (    Compare the best model in Table 1Table 2Table 3 Table   4. We observed that the UIAC model achieves the best results on three evaluation indicators with the least amount of parameters. It is 0.82-10.75% higher than the classic classification network on accuracy, 0.81-10.73% higher than other models on macro-f1, and 0.9-11.82% higher than other methods on the kappa score.   the UIAC model has better overall recognition performance for these planes. In addition, the confusion matrix of UIAC model on test set is shown in Figure 8. From the confusion matrix, we can observe the misclassification is mainly concentrated in four planes of the RVOT, TV, TVT, ASA that lead to UIAC model poor classification effect. An important reason is that these planes are not very recognizable. The doctor judges that most of these planes are recognized by dynamic video. Another important reason is that some standard planes have a high degree of similarity.

Discussion
For example, the angle of the RVOT and the TVT is only changed a little when the ultrasound probe is scanned. This leads to similarities between different planes.

Conclusions
Prenatal ultrasound examination is of great significance to the screening of defective fetuses. The determination of the standard plane of fetal ultrasound is an important part of prenatal diagnosis.
In this paper, a new classification model UIAC is obtained for the fetal ultrasound standard plane classification using differentiable architecture search method. Experiments show that the new UIAC model can effectively identify each fetal heart ultrasound standard plane with fewest parameters, improving the classification accuracy. The UIAC model can help inexperienced doctors and non-professionals to judge the FHSP. Classifying the FHSP is the first step in the identification of fetal CHD. Due to the lack of abnormal fetal data, we cannot yet identify fetal CHD. Therefore, future work may also be carried out from the following aspects: 1)Collect and organize more abnormal fetal data to identify fetal CHD. 2)Optimize the search phase of the microstructured search method to reduce memory consumption.