Dataset and labeling
This was a single-center and retrospective study and all OCT pullbacks data were acquired with frequency-domain OCT system of Cornaris P80 (Vivolight Medical Device & Technology, Shenzhen, China). Exclusion criteria were pullbacks acquired after any stent implantation and uninterpretable OCT images due to residual blood, artifact or unclear lumen images. Because of the retrospective design, written informed consent for using anonymous patient data was obtained from the patients.
The region of interest (ROI) was defined as regions with calcified plaque or nodule screened by OCT. And the datasets of calcified plaque and nodule were labeled respectively. Two expert analysts from the 2nd Affiliated Hospital of Harbin Medical University manually labeled the calcified plaque and nodule based on the consensus document [7, 8]. If labeling results of the two experts are quite different at the same region of the same frame, consensus must be reached together with an additional experienced interventional cardiologist. The calcification appears in an OCT cross-section view as a region of attenuated signal or heterogeneous area with sharply delineated border. When a calcified lesion was extremely thick and its border was vague due to signal attention, the maximum visible outline will be delineated. Calcified nodule, a particular type of calcified plaque, was defined as a protruding calcium with rupture of fibrous cap and newly formed thrombosis.
The data of 100 clinical OCT pullbacks was collected, and then expert readers were invited to screen out 201 ROIs. Based on the selected ROIs, the ground-truth of calcified plaque in each image were labeled. Finally, 4,254 OCT images containing calcified plaque were labeled. 3900 frames and 354 frames were divided into the training and testing datasets. For calcified nodules, a total of 50 clinical OCT pullback data were collected. Analogous to the process of selecting and labeling calcified plaque image, 470 calcified nodule images were labeled.
Data preprocessing
For deep learning model, data is usually enriched to improve the performance and robustness of the network. Considering Cartesian images are cross-sectional images of blood vessels, which in Cartesian images always have an approximately circular structure. Therefore, a more suitable data expansion method is to rotate the image so as not to change the structural information of the blood vessel itself. In the end, we randomly selected 10 angles from 0 to 360 degrees for data rotation expansion. Specifically, the resolutions of polar image and cartesian image are 504x960 pixels and 704x704 pixels respectively.
Deep Learning Model
For the semantic segmentation tasks in the medical field, the u-net model has been widely used and achieved good accuracy. Based on the u-net model, this paper proposes a multi-scale, multi-task u-net network (MS-MT u-net), as shown in Figure 1. It is known that the calcified plaques are different in size & volume. In order to capture accurate information of calcium at various scales and improve the precision of calcified plaque measurement, this paper proposed the multi-scale model. Figure 2 shows the schematic diagram of the multi-scale model proposed in this article. For the input feature map, 4 different scale filter kernels forming the input feature map are used for features extraction under different receptive fields. The size of the filter kernels are 1x1, 3x3, 5x5 and 7x7. At last, the feature map obtained under different scale convolution kernels are directly added to the next layer of the network.
Furthermore, considering that the thickness and angle of calcification need to be calculated based on the identified calcified area, the accuracy of the edge recognition of the calcification contour is vital. First of all, based on the manually labeled calcification area, the calcification mask map will be generated (it is a 0-1 binary image, the pixel value in the calcification area is 1, and the pixel value of the remaining positions is 0). Then the canny operator will be used to extract the edge area of calcification. Finally, an expansion algorithm is applied to expand the edge, specifically using a disk filter of radius 12.
It can be seen that the acquisition of the edge mask will not increase the marking cost, and it can be obtained based on the gold standard extracted from marked calcification. In Figure 1, it can be seen that MS-MT u-net is a 5-layer encoding-decoding network. For each layer, a multi-scale module will firstly be used to extract features in different receptive fields, and then these features will be compared, added, and then connected to a nonlinear activation layer (ReLU). Down-sampling is 2x2 max pooling operation. In the up-sampling process of the decoding network, the convolution features are combined to perform up-sampling for the final segmentation result.
Loss Function
The loss functions in this paper consists of two parts - Negative Log Likelihood loss for calcification mask and edge mask. The formula of the total loss is given by
Training of deep learning model
The data preprocessing is implemented by matlab 2015b. The training & testing model is accomplished based on the pytorch framework with linux platform. And the GPU model of the computer is Titan Xp 12GB GPU.
Identification of Calcified plaque and nodule are trained separately, and the corresponding deep learning models for them are also carried out respectively. In the training process, the batch size is set to 2. Besides, we use the Adam algorithm with a starting learning rate of lr = 10-3. In the follow-up process, the dynamic learning rate is used for training. And when the training loss on the verification set is stable, we reduce the learning rate to 1/5 of the previous one and set the maximum epoch to 200.
Performance evaluation
Calcified plaque pixel-wise qualitative evaluation of the deep learning model used traditional metrics as below: (1) recall=, (2) precision=, (3) F1 score= (TP is the number of true positives, FP is number of false positives, TN is the number of true positives and FN is the number of false negatives). The quantity evaluation metrics were the agreement or correlation including angle, thickness and area of calcified plaque in each frame. Besides, calcium scoring by Fujino and calcium volume were computed in the calcified lesion segment. The calcium score was defined as 2 points for maximum angle >180°, 1 point for maximum thickness >0.5 mm, and 1 point for length >5 mm [5]. The calcium volume was defined as the sum of each calcium area multiplied by the cross-section interval (0.2mm). Calcified nodule was evaluated in frame level and the diagnostic accuracy was defined that the overlapping portion between the deep learning model and ground truth was >80%.
Statistical analysis
Statistical analysis was performed using SPSS 22.0 (IBM, Armonk, NY). Continuous data are expressed as mean± standard deviation, or median (interquartile range) when appropriate. Categorical variables are presented as counts and percentages. Agreement between groups for continuous variables was assessed by means of Bland- Altman analysis, while the kappa coefficient was used for categorical variables. Correlation between ground truth and deep learning model predictions was evaluated using Pearson or Spearman correlation tests. A Pearson’s correlation coefficient of 0.70 or higher indicates strong positive relationship; an r value of 0.50 to 0.69 indicates moderate positive relationship, and an r value of 0.30 to 0.49 indicates weak positive relationship. For all analyses, two-sided p < 0.05 was considered statistically significant.