Brain tumor segmentation from multimodal magnetic resonance imaging data based on gray-level co-occurrence matrix (GLCM) and an ensemble Support Vector Machine (SVM) classifier

Background: Brain tumors, abnormal cells growing in the human brain ， are common neurological diseases that are extremely harmful to human health. Malignant brain tumors can lead to high mortality. Magnetic resonance imaging (MRI) ， a typical noninvasive imaging technology, can produce high-quality brain images without damage and skull artifacts, as well as provide comprehensive information to facilitate the diagnosis and treatment of brain tumors. Additionally ， the segmentation of MRI brain tumors utilizes computer technology to segment and label tumors and normal tissues automatically on multimodal brain images, which plays an important role in disease diagnosis, treatment planning, and surgical navigation. Methods: We propose a solution using gray-level co-occurrence matrix (GLCM) texture and an ensemble Support Vector Machine (SVM) structure. We focus on the effects of GLCM texture on brain tumor segmentation. First, 112 GLCM features for each voxel were extracted. Next, these features were ranked using the SVM-recursive feature elimination (SVM-RFE) method. Based on the sorting results, we found that when the number of features was 60, the value of the Dice similarity coefficient (DSC) tended to be flat. The GLCM texture features maximal correlation coefficient, information measure of correlation, Angular Second Moment, sum of squares, difference variance, contrast, and inverse difference moment were important for segmentation. Finally, we selected the top 60 grayscale features and constructed an ensemble SVM classifier to separate the abnormal mass of tissue from normal brain tissues. Results: The experimental material was a dataset called BraTs2015. The proposed model was verified with the Dice coefficient. For low-grade tumors, we obtained a 91.2% average Dice coefficient for segmenting the complete tumor region. For high-grade tumors, the average was slightly higher at 92.4%. Conclusion: Our results demonstrated that this method has a better capacity and higher segmentation accuracy with a low computation cost.


Methods
The workflow of our segmentation method includes the following steps: image preprocessing, feature calculation and ensemble SVM classification (Fig. 1).

Preprocessing
The data comes from different devices, and thus they have different gray levels. Therefore, it is necessary to correct grayscale inconsistencies and reduce image noise. To this end, the grayscale of the MRI image was normalized from 0 to 255, and a Gaussian filter was used to reduce Gaussian noise on the image. Both steps are required during the training and testing phases. A pre-segmentation process was added during the testing phase. Pre-segmentation reduces the amount of data and greatly improves the segmentation accuracy.
Generally, the left and right hemispheres of a normal human brain are approximately symmetrical [18]. Brain tumors destroys this symmetry, a phenomenon that is reflected in the image data. The left hemisphere of the tumor image is L f and the right hemisphere is 1 L RM f f f  (1) 2 M LM f f f  (2) 1 f is the difference of the left hemisphere image, and 2 f is the difference of the right hemisphere image. Fig. 2 presents an example of symmetry analysis of a brain image. Fig. 2a represents the original input image. It is the fluid-attenuated inversion recovery (FLAIR) modality data from MRI images. Fig.   2b is the left hemisphere and Fig. 2c is the right hemisphere. Fig. 2d presents the results calculated according to formula (1). Fig. 2e indicates the result calculated according to formula (2). The tumor area is in the right hemisphere, and thus the right hemisphere image minus the mirror image of the left  hemisphere image is shown in Fig. 2. Tumor regions are preserved by using symmetrical information.
In MRI data, the image of a brain tumor area has higher gray value in FLAIR modality. were added to obtain a new image. Then, some morphological processing methods were applied to the image to complete the pre-segmentation result (Fig. 2g). Through many experiments, we found that pre-segmentation processing reduced the amount of data. Consequently, the training time was two-thirds shorter than the original segmentation time. The segmentation accuracy also improved to varying degrees, especially for slices with less brain tumor tissue, and its segmentation accuracy was multiplied.

GLMC features extraction
GLCM is one of most commonly used methods for texture feature extraction. The GLCM determines the textural relationship between pixels by performing an operation according to second-order statistics in the images [10]. Specifically, the probability of the occurrence of two pixels with a specific distance in a certain direction is calculated. This value represents the frequency formation of the pixel pairs. Haralick et al. [10] suggested 14 measures that can be extracted from each of the gray-tone spatial dependence matrices. They are as follows: Angular Second Moment, contrast, correlation, sum of squares, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, difference entropy, information measure of correlation, and maximal correlation coefficient. For the selected distance d, there are four angular grayscale spatial dependent matrices. In our experiment, we set the value of d as 5. Thus, we obtained a set of four values for each of the preceding 14 measures. The mean and range of these 14 measures comprised the set of 28 features. There may be a strong correlation among these 28 features. Moreover, it should be noted that MR imaging of brain tumor patients is a three-dimensional, multi-band imaging technique that usually includes four modalities. Thus, there are a total of 112 features. Feature selection should be applied to select a subset of the 112 features (Table 1).

Ensemble SVM
The SVM method, based on the statistical learning theory, presents many advantages. SVM exhibits a good generalization ability and relatively high precision even when there are relatively few samples [19]. At the same time, SVM can effectively deal with nonlinear data by introducing a kernel function. Radial basis function (RBF) kernel may be well applied in some multimodal MRI images [20,21]. However, the optimal classifier trained with limited samples cannot meet the requirements of high precision, and so the whole SVM classifier can be constructed. By using ensemble learning theory, the generalization performance of the final classifier is improved by constructing multiple independent sub-classifiers.
The implementation of ensemble SVM depends on two factors: how to construct each member classifier and how to fuse the member classifier to form a strong classifier. In this study, we first selected 30 images as training data. For each image, we only extracted the features of the golden standard image region and its morphological extension region. Three-quarters of the data were used as training data; the rest were used as test data to evaluate the effect of classifiers. In this algorithm, the optimal number of members of the integrated classifier was not studied. There are four MRI data modes (Table 1), so we built the ensemble SVM classifier with eight members. For each classifier, we used a bagging-based random sampling method [22] to obtain random samples. To form an integrated SVM classifier, we employed the AdaBoost algorithm. The algorithm flow is as follows: Step 1. Initial sample weight ( ) Step 2. For t = 1-8, if: The classifier is defined as ( , and then update the weight: Step 3. The final weight value is the output.
The details of constructing the ensemble SVM classifier are summarized in Fig. 3. After the ensemble classifier is obtained, it can be used for classification tasks, as illustrated on the right side of Fig.   3.

Feature ranking and selection
An important part of our study was to evaluate the influence of GLMC texture on brain tumor image segmentation. To this end, the effect of each GLMC texture component on image segmentation should be observed. Thus, we sorted the GLMC texture that participated in the construction of the classifier.
Furthermore, the uncorrelated variables in the extracted features will slow down the calculation speed in the training and testing process. They may even cause some disturbing effects. We proposed an effective feature ranking and selection method to eliminate the irrelevant variables from the 112 extracted features presented in Table 1. Wang et al. [23] successfully applied SVM-RFE for screening medical image features. The main idea of the RFE method is to repeatedly establish an SVM model and then select the best features based on the coefficients. The specific process is as follows.
Step 1. Suppose there are two sets, one is FS, which contains all 112 feature sets, and the other is RS, which contains sorting features. At the beginning, RS is an empty set.
Step 2. One feature in RS is deleted, and the remaining 111 features are used to train the SVM classifier.
The classifier is initialized by empirical parameters to calculate the DS. If we repeat this procedure for all 112 features, we will get a set of DS data. The feature corresponding to the maximum DS value is the feature that contributes the least to the classifier. It will be moved from the FS to the RS set. After the first feature is selected, the second feature is chosen from the remaining 111 using the same method. The second feature is also placed in the RS set after the first feature.
Repeat the above process until FS is empty.
The sorting index of the features selected by each member classifier is shown in Table 2

Training and validation data
The online MR brain tumor data library Brain Tumor Image Segmentation Benchmark 2015 (BraTs2015) was used in experiments. In the database, T1, T2, T1ce, and FLAIR images for each patient are available. All images have been registered. Each modal image was linearly aligned according to the human body standard brain, and the pixel points correspond to each other. The three-dimensional size of each modal MRI image was 240 × 240 × 155, and the true value label is the result of manual calibration by multiple experts. In this paper, DSC scores were used to evaluate the segmentation results of brain tumors. The similarity coefficient indicates the degree of similarity between the experimental segmentation result and the label.

Segmentation result
MR images from 100 patients were randomly selected as training sets. We evaluated the final model on 30 patients. In the training phase, the image area around the gold standard was selected. In the test phase, we performed pre-segmentation. In image segmentation, we segmented five different labels: one normal and four tumor types, including normal brain, necrosis, edema, non-enhancing tumor, and enhancing tumor (Table 3). Overall, the results of the training data were better than the test data.  Fig. 5, whereas Fig. 6 presents an example of high-grade tumor segmentation.

Discussion
In this study, a GLCM texture-based brain tumor segmentation method was evaluated. The SVM-RFE was used to determine which components of GLCM texture were most useful for segmentation. One-hundred-twelve GLMC texture features were sorted using SVM-RFE. According to the sorting result, 60 important features were selected. Among these 60 features are maximal correlation coefficient, information measure of correlation, Angular Second Moment, sum of squares, difference variance, and inverse difference moment. In many applications of GLMC texture, entropy is often used, but it was not important in brain tumor segmentation. The same is true for contrast. In feature sorting, they seldom appear in the front position. In future research, we will focus on the above six components of GLCM texture. These components can be combined with other texture expressions (such as Tamura texture) to represent brain tumor image information. The method of fusing multiple textures will be studied in the future.
In this paper, we built an ensemble SVM classifier that comprised eight trained single classifiers.
Based on the DSC value of complete tumor segmentation, we set a weight value for each classifier. Our segmentation results were better than previous studies [23,24], which used a single classifier. This improvement was due to a pre-segmentation process and ensemble SVM classifier. However, there was still a gap between our method and the algorithm based on convolutional neural network (CNNs) [25,26]. When these methods are trained, many samples and extensive expertise are required to ensure proper convergence. A previously proposed method only used T1 MR images [27]. The average DSC value of complete segmentation was only 85.7% for gliomas. In the case of a small amount of data, the performance of the CNN-based method is also general. In the process of clinical diagnosis, we cannot get sufficient MR data. In our method, GLMC texture and gray features are extracted as classifier features, and the amount of data required is not particularly large. The whole training process is not very complicated. If more features are extracted, the segmentation accuracy will be improved. In our method, we set the value of d as 5. The spatial context of a voxel is 5  5. A larger d might generate better results.
However, this factor was not considered due to the increased computational complexity. Additionally, we only segmented one model, namely low-grade and high-grade tumors, whereas previous studies usually design two models. In clinical practice, it is not always known a priori which tumor type to analyze.

Conclusion
The precise segmentation of brain tumors is the most important and crucial step in their diagnosis and treatment. In future research, we will focus on the six components of GLCM texture, maximal correlation coefficient, information measure of correlation, Angular Second Moment, sum of squares, difference variance, and inverse difference moment. These components can be combined with other texture expressions (such as Tamura texture) to represent brain tumor image information. The method of fusing multiple textures will be studied in the future.
SVM: Support vector machine.

Ethics approval and consent to participate
This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication
I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part.