2.1THEORETICAL BASIS
2.1.1 CYCLEGAN
CycleGAN [12] is basically two mirrored GANs that form a ring network. The goal of CycleGAN is to convert image A to another domain to generate image A1 and convert A1 back to A, where output image A1 is similar to the original input image A to form a meaningful mapping that does not exist in the unpaired data set. The advantage of CycleGAN is its ability to train two image sets without pairing.
2.1.2 DENSENET
DenseNet[13] is a convolutional neural network framework with dense connectivity proposed by Huang Gaoren in 2017. In its architecture, there is a direct connection between any two layers of the network. The input of each layer of the network is a combination of the output of all previous network layers, which enhances the propagation of features. It alleviates the problem of gradient disappearance, reduces network parameters and encourages feature reuse. It has been widely used in the medical image field.
2.1.3 RESNET
ResNet[14] is a convolutional neural network framework proposed by He et al in 2015. It adds a shortcut on top of the original architecture to enable direct connection between the mappings of layers, which solves the degradation problem. ResNet alleviates the gradient vanishing and gradient explosion problems caused by the increased depth of the network, and thus protects the entirety of the data. It has been widely used in medical image field.
2.1.4 CRNN
CRNN is a model proposed by Shi et al[15] to deal with sequence-like objects in images, which consists of DCNN and RNN. DCNN is used to extract sequence features from the input image. RNN has the advantage of processing sequence data, and can achieve better recognition accuracy from the extracted sequence features. The ability of CRNN to predict sequence data brings inspiration to the recognition of medical image data.
2.2 PITUITARY TUMOR SEQUENCE DATA AMPLIFICATION USING CYCLEGAN
A problem often encountered in MR images of pituitary tumors is under-sampling in a single domain (e.g., T1 or T2). This can be caused by various reasons, such as data missing or simply under sampling. To resolve this issue, our main idea is to use images from other domains (which may come from different image modalities) to generate a set of new images through domain conversion. The set of new and old images forms an augmented set of images which provides a better sample for the domain.
Particularly, we use CycleGAN for data augmentation. First, two domain converters are designed and trained based on the CycleGAN architecture to allow inter-domain conversion from T1 to T2 and from T2 to T1. Then, the generated MR images from domain conversion are added to the original sets of images to form augmented T1 and T2 sequences.
2.2.1 MULTIPLE SEQUENCE OF PITUITARY TUMOR MR IMAGES
As mentioned above, the MR images of one patient usually include spatial sequences from different modalities, such as T1WI, T2WI, T1C and T2FLAIR, etc. In this paper, we mainly use T1 and T2 spatial sequence images.
For each patient i,we denote its T1 spatial sequence is as , where represents the n-th slice/frame in the T1 spatial sequence, and its T2 spatial sequence as , where represents the n-th slice/frame in the T2 spatial sequence. The number of slices per sequence is N (12 in this paper). To classify the pituitary tumors, we combine the T1 and T2 spatial sequences of each patient i to obtain a spatial sequence of multiple sequences, which is denoted as: (see Equation 1 in the Supplementary Files)
The total number of slices in a multi-sequence spatial sequence is 2N (24 in this paper).
2.2.2 TRAINING DOMAIN CONVERTER BASED ON CYCLEGAN
In this paper, we use the CycleGAN framework to design and train the domain converter. CycleGAN is essentially a cyclic network consisting of two mutually symmetric GANs. On top of the original GAN, additional loop constraints are added to force the image to be converted into its original image format so as to reconstruct itself. This allows images to be converted from one domain to another domain without needing to pair them. The architecture of our domain converter is illustrated in Figure 1. In our design, we need to train the T1-to-T2 generator and the T2-to-T1 generator , as well as the T1 domain discriminator and T2 domain discriminator , where in , , and are the to-be-determined parameters in the deep neural network. During the training process, when the discriminator's loss reaches the minimum and tends to be stable, CycleGAN model training is completed.
2.3 SEMI-SUPERVISED CLASSIFICARION METHOD FOR THE IMAGE TEXTURE OF PITUITARY TUMORS BASED ON ADAPTIVELY OPTIMIZED FEATURE EXTRACTION
To improve the efficiency of feature extraction for determining the softness level of pituitary tumor, using DenseNet, ResNet we propose in this paper an Auto-Encoder-based deep neural network model for feature extraction. Since the weight of the features common to all input data could be reduced during the training process, our proposed model can enhance the weight of the features unique to each MRI spatial sequence (i.e., the features of pituitary tumor), and meanwhile reduce the dimensionality of the features of each slice. This can greatly accelerate the operational speed of the subsequent classifier. Therefore, it is essential for our classification method to use the proposed Auto-Encoder-based framework for feature extraction.
2.3.1 ENCODER AND DECODER BASED ON DENSE BLOCK AND RESIDUAL BLOCK
For encoder, we use Dense Block to enhance the feature propagation ability of MRI spatial sequences, rely on the convolutional layer and pooling layer to reduce the dimensionality, and combine them to form an encoder for extracting the common features of MRI spatial sequences. As shown in Figure 4, the encoder uses two dense blocks in the training process (only one is shown in the figure). Due to the fact that the feature maps are superimposed during the training process, it enhances the propagation ability of pituitary tumor features, which consequently improves the accuracy and reliability of feature extraction.
For decoder, we use Residual Block to compress the dimensionality of the feature map, and rely on the upsampling layer and the convolution layer to increase the dimensionality. These two components together form the decoder which can generate MRI spatial sequences with the same dimensionality as the original input data. The network architecture is shown in Figure 5. It also uses two residual blocks in the decoder (only one is shown in the figure). The decoder uses shortcut to lower the weight of some features during the training process. Also, the model drift increases due to the added network depth (after adding the decoder). These together improve the effectiveness of MRI spatial sequence reconstruction. It also means that it is quite meaningful to use Residual Block for image reconstruction in the whole model.
Firstly, the input image sequence is generated by dense block encoder. Secondly, the feature sequence is decoded by the residual block-based decoder to restore the image sequence. At last, the input image is compared with the corresponding pixels of the generated image. The lower the loss, the more similar the generated image is to the input image, and the more representative the extracted feature sequence is.
2.3.3 SEMI-SUPERVISED CLASSIFICATION OF SPATIAL SEQUENCE IMAGES BASED ON CRNN
The extracted feature map of the pituitary tumor MRI spatial sequence is a three-dimensional matrix using the format of CRNN. An image sequence represents a patient and only one sequence-level label is needed. We first use CNN to extract the spatial feature sequence of the feature map, and then use RNN to train the extracted feature sequence. When the loss in training process reaches the lowest and tends to be stable, it indicates that CRNN model has been trained. At this time, the model can be used as a standard to predict test accuracy. The neural network architecture is shown in Figure 7:
2.3.4 MULTI-SEQUENCE PITUITARY TUMOR CLASSIFICATION MODEL
Combining all the above neural network components, we obtain a model for classifying the multi-sequences of pituitary tumors. Its network architecture is shown in Figure 8. The model is capable of augmenting under-sampled T1 and T2 datasets, fusing sequences from multiple modalities, extracting features, and finally obtaining the accurate estimation of the softness level of pituitary tumor by using a CRNN-based classifier.
2.4 EXPERIMENT PLATFORM AND DATASET
Our experiments are conducted in the following settings. The operating system is Windows10, the processor is 2.10GHz Intel Xeon (dual core), the memory capacity is 64GB, the development environment is PyCharm, the deep learning framework is Keras, the programming language is Python, and the graphics card is GeForce RTX 2080Ti (three cores).
The dataset used in the experiment was pituitary tumors collected in a local affiliated hospital. Each patient had MRI data of OAX, OSAG and OCOR (In this paper, OCOR MRI data are used), and there were T1 and T2 two modes in OCOR MRI data. There are 374 patients in total, 152 of whom are labeled, with each associated with a grading label from the following two grades: soft texture and hard texture.