BCM-VEMT: classification of brain cancer from MRI images using deep learning and ensemble of machine learning techniques

Brain cancer is quite possibly the most common cause of death in recent years. Appropriate diagnosis of the cancer type empowers the specialists to make the right choice of treatment, decision, and to save the patient's life. It goes without saying the importance of a computer-aided diagnosis system with image processing that can classify the tumor types correctly. In this paper, an enhanced approach has been proposed that can classify brain tumor types from magnetic resonance images (MRI) using deep learning and an ensemble of machine learning (ML) algorithms. The system named BCM-VEMT can classify among four different classes that consist of three categories of brain cancers (Glioma, Meningioma, and Pituitary) and a non-cancerous class, which means normal type. A convolutional neural network was developed to extract deep features from the MRI images. These extracted deep features are fed into ML classifiers to classify among these cancer types. Finally, a weighted average ensemble of classifiers is used to achieve better performance by combining the results of each ML classifier. The dataset of the system has a total of 3787 MRI images of four classes. BCM-VEMT has achieved better performance with 97.90% accuracy for the Glioma class, 98.94% accuracy for Meningioma, 98.92% accuracy for Pituitary and 98.00% accuracy for the Normal class. BCM-VEMT can have great significance for medical sectors in classifying brain cancer types.


Introduction
"Tumor" means swelling of a part of the body, normally without inflammation, due to abnormal growth of tissue, either benign or malignant [22].A brain tumor implies the accumulation of unusual cells in certain tissues of the brain.The benign ones are not considered cancerous and cannot be spread to different parts of the brain or body.The malignant ones are considered cancerous, grow uncontrollably, and can spread to other parts of the body [19].Every year, all over the world, 12.7 million people are diagnosed with cancer, and 7.6 million people die of cancer [23].The National Brain Tumor Society says that in the United States, 700,000 individuals are living with a primary brain tumor, and in 2021, around 85,000 more will be analyzed [20].According to the National Brain Tumor Society, there are more than 120 types of brain tumors [5].Among these types, Glioma, Meningioma, and Pituitary tumors are the most common.These three types cover almost 75% of all brain tumor types, including 45% for the Glioma type, 15% for the Meningioma type, and 15% for the Pituitary type [30].
Magnetic Resonance Imaging (MRI) is the most refined and widely used method to obtain high-resolution images of everywhere in the body.It is broadly used and regarded as one of the most accurate techniques for cancer detection and classification because of its high-quality images of brain tissue [17].Tumors may have different shapes, and there may not be sufficient noticeable parameters in the picture.Moreover, misdiagnosis can lead a patient to death.That is why human diagnosis is ordinarily inconsistent.The improvement of new technologies, particularly artificial intelligence and machine learning, significantly affects the clinical field, offering an error-free diagnosis.For image segmentation and classification, different machinelearning methods are applied in MRI image processing.Deep learning is one kind of machine learning where layered algorithmic architecture is used to analyze data.In deep learning, model data is filtered through several layers.Each successive layer uses the result of the previous layer to make correlations and connections so that the accuracy of the model is increased.By introducing more advanced data analysis, deep learning will make medical research easier and more reliable.The reason why deep learning is getting so popular in the medical research industry is the huge amount of data that it can process, including research data, patient outcomes, and more, within a very short time with excellent accuracy.Proper treatment of brain cancer patients depends on the proper identification of brain tumors.Diagnosing the tumor by a human can be a time-consuming process, and sometimes an incorrect diagnosis leads a patient's life to a very critical condition.Using deep learning or machine learning, we can easily classify the tumor types, experts can make a rapid decision about treatment, and thus the best treatment can be delivered to patients.
In this paper, a new system, BCM-VEMT, has been proposed.BCM-VEMT has created an automated brain cancer classification system with a large amount of dataset to ensure good performance and accuracy.The majority of brain cancer detection systems have established a two-class (Cancer vs. Non-Cancer) or three-class classifier.However, when compared to developed brain cancer detection systems, the suggested system has built a four-class classifier with greater accuracy.A convolutional neural network (CNN) has been built in the system.The model has a straightforward structure.Unique and distinct features are extracted using CNN from MRI images.Then these extracted features are fed into the machine learning (ML) models to classify the brain cancer types.Finally, an ensemble of classifiers is applied that combines the ML classifier's result and, using an ensemble technique that uses the weighted average of the ML classifiers, produces a better result than any single ML classifier.
The main contributions to this paper are given below: i) A CNN model of the straightforward structure is trained to extract deep features from MRI images to classify between three brain cancer types and a normal type.ii) A weighted ensemble technique is introduced for classification using five different machine learning classifiers: Random Forest (RF), Support Vector Machine (SVM), AdaBoost (AB), K-Nearest Neighbors (KNN), and Logistic Regression (LR).iii) The system is learned with sufficient amount of dataset comparing to existing systems to ensure better performance and prevent overfitting.
The rest of the paper is summed up as: Section 2 illustrates recent research for the detection and classification of brain cancer types from MRI images by using deep neural networks and ML algorithms.Section 3 represents an all-over description of the dataset, data preprocessing, data augmentation, and the proposed models of BCM-VEMT.Section 4 shows the performance analysis of BCM-VEMT with evaluation metrics.Section 5 represents a comparative analysis of BCM-VEMT with the currently existing systems.Finally, Section 6 concludes the paper.

Related work
Over the last few years, there have been so many studies proposing an automated system to detect and classify brain cancer.This section describes the current research related to deep learning and the ensemble of ML classifiers to detect and classify brain cancers.Table 1 shows the comparative analysis of this recent research.
Ketan Machhale et al. [18] have suggested a system that recognizes abnormal and normal brain MRI images.In their research, a hybrid classifier (SVM-KNN) consisting of the Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) has been used for classifying 50 images.They have found an accuracy of 96% using SVM with a quadratic kernel and 98% using the hybrid classifier (SVM-KNN).Ali Pashaei et al. [25] proposed a neural network and Extreme Learning Machine (ELM) model for classifying brain MRI images.There are 3064 brain MRI images in the study, representing 233 patients with Meningiomas (708 images), Gliomas (1426 images), and Pituitary tumors (930 images).The dataset was split into 70% for training and 30% for testing.The ensemble model of CNN and kernel ELM (KE-CNN) achieved an accuracy of 93.68%.Jaeyong Kang et al. [15] used several pre-trained deep CNN models based on transfer learning for feature extraction and several ML models for brain tumor classification.Then the top three deep features were concatenated to form an ensemble of deep features.This ensemble of deep features is fed into several ML classifiers to achieve the final outcome.Using three different brain MRI datasets, the researchers performed a series of experiments.In the case of a small dataset and two classes (normal and tumor), DenseNet-169 was a good choice.For large dataset and four classes, the ensemble of DenseNet-169, Shufflenet, and MnasNet with SVM achieved an accuracy of 93.72%.Kimia et al. [27] proposed a scheme for the segmentation and classification of three types of brain tumors.Firstly, support vector machine (SVM) was used to extract significant features.Then the system classified brain tumors with k-nearest neighbor (KNN), weighted kernel width SVM (WSVM), and histogram Using an MRI image of the brain, Raheleh Hashemzehi et al. [11] proposed a hybrid approach with convolutional neural networks (CNNs) and neural autoregressive distribution estimation (NADE) to classify brain cancer.They have extracted the features and automatically estimated the distribution of the data.According to their study, their proposed system achieved a classification accuracy of 95%, sensitivity of 94.64%, specificity of 97.42%, and precision of 94.49%.A new method to categorize MRI scan images of the brain as cancerous or non-cancerous has been demonstrated by Hassan Ali Khan et al. [16].The approach is based on convolutional neural network and augmented data.In this research, the Kaggle dataset was used to test and train the model.Datasets consist of three types of classes, namely malignant, benign, and non-cancerous, with corresponding proportions of 73%, 19%, and 8% for training.Though the system got an accuracy (CNN) of 100%, the dataset consisted of only 253 images.Zar Nawab Khan et al. [29] classified brain tumors based on MRI images by using deep CNNs and fine-tuning blocks derived from transfer learning.The proposed model was tested and trained with the Figshare dataset, which consists of 3064 T1-weighted contrast-enhanced images of 233 patients, categorized into three types of tumors, i.e.Meningioma (708 images), Glioma (1426 images), and Pituitary tumor (930 images).Three pre-trained CNN models (AlexNet, VGG16, and VGG19) were compared in order to determine which performed better.The VGG-16 model gained an accuracy of 94.65%, a sensitivity of 93.51%, a specificity of 94.56%, and a precision of 89.17%.P.Varalakshm et al. [9] proposed AlexNet to classify different types of tumors.Moreover, to train their model, the Region Proposal Network (RPN) by the Faster R-CNN algorithm was also used in this system.This research dataset was obtained from www. scien cesou rce.com and www.radio logya ssist ant.nl.The dataset consists of 50 MRI images.They obtained an accuracy of above 99%.Yakub Bhanothu et al. [4] proposed the Faster R-CNN and Region Proposal Network (RPN) to recognize tumors and mark their location.The dataset contains 2406 MR images of three different types of tumors: Gliomas, Meningiomas, and Pituitary tumors.The dataset was divided into two parts: 80% for training and 20% for testing.VGG-16 was selected as the base layer for Faster R-CNN and RPN.The system achieved an overall average precision of 77.60%.Sarah Ali et al. [1] proposed residual networks to classify brain tumor images.They have evaluated their proposed model with a dataset that contains 3064 MRI images classified into three classes (Glioma, Meningioma, and Pituitary tumors).Their approach came up with an accuracy of 99%.Aaswad Sawant et al. [28] have used a Convolutional Neural Network (CNN) with 5 layers for detecting brain cancer from MRI images.They used a dataset containing 1800 MRI images, out of which 900 were non-cancerous and 900 were cancerous.Their system achieved 99% training accuracy and 98.6% validation accuracy.Swati et al. [30] proposed a novel feature extraction framework based on VGG-19.They have performed their experiment on a CE-MRI dataset that is publicly available and contains 3 types of brain tumors.The dataset consists of 3064 images of 233 patients across the coronal, axial, and sagittal views.This system obtained 96.13% accuracy.

Methodology
To develop the system BCM-VEMT, the dataset has been collected from various resources.As there are not enough sources of reliable and trustworthy patient-based datasets, limited patient-based datasets may not help to develop a system with higher performance and accuracy.So, the system has classified brain cancer using an imagebased dataset.The full dataset has been divided into the training set, validation set, and test set.Different preprocessing techniques have been applied to the dataset to make it suitable for use.The training set and validation set are used to train the proposed CNN model.The model ran up to 43 epochs, which had been controlled by the EarlyStopping function to avoid overfitting and underfitting problems.After training the CNN model, the first fully connected layer (FCL) of the CNN has been chosen to extract the feature vector for each of the training images.These feature vectors are used to train the ML models.Finally, these ML models are ensembled using an advanced ensemble technique that combines the predictions from each ML model as per the skill and capability of ML classifiers.The performance is analyzed by using the confusion matrix, accuracy, precision, recall, specificity, and F1-score.The entire architecture of BCM-VEMT is given in Fig. 1.

Description of dataset
The dataset of BCM-VEMT consists of four classes, which are Glioma, Meningioma, Normal, and Pituitary.MRI images have been collected from three publicly available brain tumor datasets [10,13,14].The reason for using MRI images is that magnetic resonance technology can capture a more detailed and clear visualization of brain tissues than other imaging technologies currently available.This allows neural networks to easily extract deep and accurate features from images, resulting in higher accuracy.2356 MRI images, including 1426 images of the Glioma class and 930 images of the Pituitary class, have been collected from General Hospital, Tianjin Medical University, China, and Nanfang Hospital, during the period from 2005 to 2010.This dataset contains MRI scan images collected from 233 cancer patients.Slices from the same patient are not only split into training sets but also split into test sets.Cheng et al. [6] processed the dataset for the first time to develop a classification system for brain cancer types.These MRI images of the Glioma and Pituitary classes are available at figshare [10].Then, 1431 MRI images have been acquired from a public dataset of Kaggle [14], including 396 images of the normal class and 937 images of the Meningioma class.Another 98 images of the normal class have been collected from a different public dataset of Kaggle [13].These 98 images of the normal class have been merged with the previously mentioned 396 images of the normal class.The samples of these 4 classes are shown in Fig. 2. The total dataset of BCM-VEMT consists of a total of 3787 MRI images, which includes 1426 images of the Glioma class, 937 images of the Meningioma class, 494 images of the normal class, and 930 images of the Pituitary class.

VGG-16
The VGG-16 is a convolutional neural network architecture trained on the ImageNet [8] dataset.The dataset consists of more than 14 million images belonging to 1000 classes.The uniqueness of the VGG-16 model architecture is that instead of having more parameters, it

Input Output
Fig. 3 Finding extreme points in contours and clipping them focuses on a convolution layer of 3 × 3 kernel size.This model's minimum input image size expectation is 224 × 224 pixels with 3 channels.The benefit of having small-sized kernels is that the overfitting problem can be avoided.Activation functions are used in the neural network to decide whether a neuron should be activated or not by calculating the weighted sum of input.The network contains neurons that work in coordination with weight, bias, and respective activation functions.The updating of weights and biases of the neurons is done depending on the error at the output.The activation function brings non-linearity to the input of neural nets, allowing them to learn and perform complex tasks.

Transfer learning
Transfer learning refers to the procedure of using a model that was previously trained for a different classification problem.Transfer learning is commonly used to overcome the problem of insufficient data and to reduce training time.In deep learning, there are many pretrained models that can be used to train another similar classification problem, with or without adding some extra layers to the pre-trained model to make it fit the new model better.
To train BCM-VEMT, VGG-16 has been used as a Convolutional Neural Network (CNN).A pre-trained VGG-16 model with some additional layers has been used to train the system.Firstly, the pre-trained VGG-16 model has been loaded as the base model.Then a sequential model is created and the base model is added to it.A flatten layer is used so that it can convert the pooled feature map into a single column which will be passed into the fully connected layer.Two dropout layers with a threshold of 0.5 each have been used to prevent the model from overfitting.A fully connected layer having 128 neurons is added to the model to extract the features from MRI images.Another fully connected layer having neuron size as per the number of classes with a "softmax" activation function is added as the output layer.The architecture of the proposed CNN model is shown in Fig. 5.
In BCM-VEMT minimum training loss is monitored with a patience value of 10.The model optimizer "RMSprop" has been used in BCM-VEMT with a learning rate of 0.0001.The summary of the proposed CNN model is shown in Table 3.

Feature extraction
In BCM-VENT, the feature vectors are extracted from the first Fully Connected Layer (FCL) of the CNN.From Table 3, it can be seen that the first FCL is "dense" which has 128 neurons.So, the feature vector will have a dimension of 1 × 128.BCM-VEMT has a training and validation set of size 2651 and 756; and deep features are extracted from each of the training images.That means each image in this training set will have 128 features individually.So, to train the ML classifiers, the input set dimension will be (3407 × 128).This input data is passed into five ML Classifiers.
Fig. 5 The Proposed CNN Architecture

Weighted average ensemble
Weighted Average Ensemble is an advanced ensemble technique that combines the predictions of multiple models as per the capability and skill of each model.While combining the models, it assigns a weight to each model that is proportional to its capability of correctly classifying the problem.A model that performs better is given a bigger weight and thus can play a vital role while making the decision.
To train these ML classifiers, different hyperparameters are tuned to fit models with the dataset and to achieve better performance with higher accuracy.The Grid Search Technique is applied to find out the best parameters for each ML classifier.Also, a cross-validation of size 10 is used while applying the Grid Search Technique.Each of these ML classifiers individually shows better performance than the CNN model performance.Then an ensemble classifier is developed, combining all five ML classifiers.All the developed ML classifiers are used as the 'estimators' for the ensemble technique.A corresponding 'weight' value for each of these classifiers is provided as the input to the ensemble model.In BCM-VEMT, the 'accuracy' value of every individual ML classifier has been chosen as the 'weight' value.Through the weight values, the ensemble technique can assign an effectiveness priority to each classifier rather than assuming each classifier is equally effective.BCM-VEMT assigns the 'accuracy' result to the weight parameter for each model, combines the prediction of ML models with weights, and takes a final decision.This ensemble technique performed better than any individual ML model's performance and achieved a higher accuracy.The process of training the ML classifiers and the ensemble technique is shown in Fig. 6.

Matrices for performance evaluation
To evaluate the performance of BCM-VEMT, five matrices, accuracy, precision, recall, specificity, and F1-score have been used.To know about these matrices, we need to know about some terms first.These four terms are used to compute the performance matrices of BCM-VEMT.The formulas of deriving these matrices from these four terms are shown in (1), ( 2), ( 3), (4), and (5).

Experimental analysis
In BCM-VEMT, the CNN model was trained with the training and validation set.The results are obtained using tenfold cross validation.The CNN model ran up to 43 epochs.EarlyStopping was used in the model to control the number of epochs.EarlyStopping is a regularization method that monitors a defined parameter with a pre-defined patience value to automatically halt the epoch running when the system stops improving.

Experimental setup
The experiment was run on Google Collaboratory, which is a cloud service of Google based on Jupyter Notebook.Python is used for its vast library facilities and simplicity.Google Drive was used to import the dataset.The Google Collaboratory provided a 68.35 GB DISK and 12.69 GB RAM with access to a free-of-charge stable GPU.

Result analysis
The training and validation accuracy of the CNN model is shown in Fig. 7. Training accuracy is 95.00% and loss is 1.61.Validation accuracy is 97.34% and loss is 0.79.To train the individual ML classifier, different hyperparameters were tuned using grid search technique.Table 4 shows the list of the best hyperparameters for each of the ML classifiers.
Ensemble approaches add to the computing cost and complexity of the problem.This increase is due to the skill and time necessary to train and maintain many models as opposed to just one.However, Ensemble has been recommended in the proposed system because an ensemble can minimize the variance component of the prediction error by introducing bias.It can generate better predictions and perform better than any single contributing model.Table 5 describes the computation time of each classifier and ensemble of classifiers.From Table 5, it is clear that among five ML classifiers, comparatively, the RF Classifier and AB Classifier have taken a long time to train.It's because these two classifiers use ensemble techniques.In case of the proposed ensemble of classifiers, no grid search technique was applied.The weighted ensemble technique just has been applied to the already trained five individual ML classifiers.To calculate the training time of ensemble model, the training time of earlier trained individual ML model has not been considered.That's why the training time of ensemble model was shorter than other ML models.In the case of testing time, the KNN Classifier takes much more time than the others as it calculates the distance from all training samples for each testing sample in run time.From these distance values, the model chooses the smallest K values and then takes a decision.Finally, the LR Classifier has taken the least amount of testing time.The evaluation matrices of BCM-VEMT are shown in Table 6.The accuracy value measures how accurately a classifier can classify a sample based on its training data.The CNN and KNN provided their best per-class accuracy for the Glioma Class.The SVM, AdaBoost, and ensemble of classifiers achieved the highest accuracy level for the Meningioma class.The random forest and logistic regression achieved their highest levels of accuracy for the normal class.Neither of the classifiers showed their individual best accuracy for the Pituitary class.All of the RF, SVM, AB, LR, and ensemble of classifiers have achieved a precision of 100% for normal class.This means these classifiers have not missclassified any outer class as the normal class.CNN and KNN achieved the best recall for the Glioma Class.The SVM, AdaBoost, and ensemble of classifiers achieved the highest recall for the Meningioma class.Random forest and logistic regression achieved their best recall for the normal class.Neither of the classifiers showed their individual best recall for the Pituitary class.Specificity is a measure of being able to distinguish between classes, not the target class.In this analysis, all the models performed best for the Normal Class.In terms of F1-score, CNN and AdaBoost showed higher values for the Glioma class, and KNN achieved its best performance for the Meningioma class.The rest of the classes have shown their highest performance for the normal class in respect of the F1-score analysis.Finally, it can be seen that the ensemble of classifiers has achieved excellent performance in each of the parameters for all classes.
The system's overall accuracy is shown in Table 7.Initially, CNN achieved 96.32% accuracy.Then each ML model gained individually higher accuracy than CNN, including 97.63% accuracy for RF, 98.15% for SVM, 98.15% for AB, 97.11% for KNN, and 97.11% for LR.Among five ML models, SVM and AB achieved the highest accuracy.This means that these two models were the most important in the ensemble.Finally, the ensemble classifier achieved an overall accuracy of 98.42%.The ROC curves of CNN, ML models, and ensemble of classifiers are shown in Fig. 9.In these curves, the False Positive Rate (FPR) and True Positive Rate (TPR) are plotted on the x-axis and y-axis, respectively.A higher value of area under the curve (AUC) of ROC reflects a better efficiency of the model in diagnosing the true classes of the classification problem. Figure 9 shows that each of these individual classifier curves shows a good value of AUC.The ensemble of classifiers shows a better performance with a very high value of AUC for all four classes.The value of AUC is 0.9867 for the Glioma class, 0.9877 for the Meningioma class, 0.9985 for Normal class, and 0.9876 for the Pituitary class.These values represent that BCM-VEMT has a higher efficiency in detecting each class at its true label.
Figure 10 shows the per class ROC curve for each classifier used in BCM-VEMT.For the Glioma class, the AdaBoost classifier obtained the highest AUC with 0.9888.That means for the Glioma class, the AdaBoost classifier has shown the best result.In the case of the Meningioma and the Normal classes, the highest value of AUC is 0.9929 and 1.0 respectively, which is obtained by the Random Forest classifier.This concludes that the RF classifier has better accuracy in Meningioma and Normal classification.The AdaBoost and the Proposed Ensemble classifier combinedly achieved the highest value of AUC with a value of 0.9876 for the pituitary class.But on average the proposed ensemble of classifiers has shown a consistently higher score of AUC for all the classes.For this reason, in the case of overall performance, it has outperformed the other classifiers.

Discussion
Various ML and DL techniques are combined together in the proposed system to create a relatively better classification model.According to Table 7, the overall accuracy of the proposed CNN model is 96.32%.The accuracy has been increased by transferring extracted features from the first Fully Connected Layer (FCL) to ML algorithms and training them.Among ML models, SVM and AdaBoost achieve the highest accuracy, and that is 98.15%.Finally, the weighted average ensemble of ML algorithms attained the maximum level of accuracy, which is 98.42%.

Model selection for CNN and ML
To develop BCM-VEMT, VGG-16 is chosen as the proposed CNN model because this pretrained model achieves 92.7% accuracy in the ImageNet dataset, which consists of more than 14 million images belonging to 1000 classes.It outperforms other deep neural models in ILSVRC-2014 [12].The system is also tested against different individual DL algorithms to classify brain cancer from MRI scans.But the accuracy of those models was lower than that of the proposed CNN.In Table 8, we have shown a performance comparison between the proposed CNN and one state-of-the-art deep learning model, Faster R-CNN, to classify brain tumors.Both models are trained with the same dataset used in this system.The  As the values of all evaluation matrices (accuracy, precision, recall, specificity, F1-score) depend on true positive and false positive value, that's why both the models show the same values for the Pituitary class.Though the accuracy is equal for the Pituitary class in the case of both models, the proposed CNN showed better performance than the Faster R-CNN for Glioma and Meningioma classes.Faster R-CNN has shown better performance for normal cases, but still, it does not affect so much.We don't want any patient who has cancer to be called "unaffected," because an extra medical checkup can show that someone doesn't have cancer, but the disease can't be found in a candidate who has already been turned down.
In ML and DL-based classification studies, it is very much visible that the higher the number of classes to classify, the lower the accuracy.Though Sarah Ali et al. [1] obtained 99% accuracy using Resnet50 for brain cancer classification, the system developed a threeclass classifier.But BCM-VEMT has proposed a classification system with four classes and achieved an accuracy of 98.42%.Considering all these cases, the VGG-16 model is picked first for feature extraction, and then the ML models are integrated.To control the number of epochs, EarlyStopping is used in the model where minimum loss is monitored with a patience value of 10.EarlyStopping helps to overcome overfitting and underfitting problems.With this method, the CNN model ran up to 43 epochs.The system is also tested up to 100, 150, and 200 epochs.A comparative performance analysis with 100, 150, and 200 epochs is shown in Table 9.The table shows that the accuracy of the proposed VGG-16 model is the highest with 43 and 100 epochs.Though the validation accuracy increased for 100 epochs, the testing accuracy remained the same.For 150 and 200 epochs, the validation accuracy increased, but the testing accuracy went down.Thus, the model becomes overfitted for extra epochs.This statistic reflects that the number of epochs monitored by Earlystopping was optimal.
After VGG-16, five different ML models have been selected for classification.RF has been chosen because it handles large sets of data with high dimensionality and prevents the system from being overfitting.AdaBoost has been selected because it performs better with multi-classification problems and is less prone to overfitting.Despite the fact that AdaBoost is itself an ensemble method, the goal of the proposed system is to create a model with superior performance and accuracy.The system's performance is developed after applying weighted average ensemble to AdaBoost and other ML techniques.The accuracy has increased from 98.15% to 98.42%, as indicated in Table 7.As a result, in the proposed system, the AdaBoost is combined as an ensemble method.LR has been chosen because it is a straightforward yet highly successful classification algorithm and has shown very impressive performance with an accuracy of 97.11% for the system.The term "regression" refers to the process of determining a relationship among variables.For example, linear regression seeks to establish a linear relationship (equation) between the input and output variables.Similarly, logistic regression seeks to determine the probability of class membership as a multilinear function of characteristics.The categorization is then done based on the probabilities, i.e., if it is greater than a specific threshold, it is assigned class 1, otherwise class 0. The words "regression" and "classification" seem to contradict each other.However, the emphasis of logistic regression is on the word "logistic," which refers to the logistic function that performs the classification work in the method.Finally, KNN and SVM have been suggested as both can handle high-dimensional data and are less prone to overfitting.

Ensemble technique selection
To choose the optimal ensemble method, majority voting and stacking ensemble techniques are also applied.The per class performance analysis of three ensemble techniques is shown in Table 10, and the overall accuracy comparison is shown in Table 11.From Tables 10  and 11, it can be seen that the proposed weighted average ensemble outperformed both the majority voting and stacking ensemble.That's why the weighted average ensemble has been chosen as the proposed ensemble technique of BCM-VEMT.Table 6 shows that the performances of different ML classifiers and the ensemble of classifiers are very close.Actually, the system has focused on its overall performance rather than the ability to perform better in any specific class.As the system is developed for multiclass classification purposes, any specific class performance can't reflect the efficiency of the system.It needs to ensure that the system can show a good performance for all the classes on average.From Table 7, we can see that the ensemble of classifiers has the best overall accuracy.That's why, it is considered a better method to classify these four classes.A statistical boxplot analysis of the average performances of the individual ML classifiers and ensemble of classifiers is shown in Fig. 10.The boxplot is drawn using the accuracy values of the ML classifiers and ensemble of classifiers, running them 30 times on the dataset.The red lines indicate the median values, and the green triangles indicate the mean of the performances.Both the mean and median of the ensemble are better than any individual ML classifier.That's why the system has chosen the ensemble as the final output model instead of any single individual ML classifier (Fig. 11).

Comparison with existing studies
Table 12 shows the comparative performance evaluation of BCM-VEMT with other recent studies.The binary classifier in [18] applied an ensemble technique on SVM and KNN, and achieved an accuracy of 98%.The system in [27] achieved an accuracy of 92.46% using the dataset [10] with three classes.The systems proposed in [24,25] gained an accuracy of 93.68% and 93.72%, respectively, with three classes.On the other hand, in spite of being a four-class model, BCM-VEMT outperforms all these systems with an accuracy of 98.42%.The system in [15] with four classes achieved an accuracy of 93.72% using the datasets from Kaggle [13,14].The proposed system outperformed it with the same four classes and using the datasets from [10,13,14].Compared with these systems, it can be said that BCM-VEMT can show better performance in classifying brain cancers, which will help the experts more reliably and more accurately.

Conclusion
The number of infected people with brain cancer is growing day by day all over the world.In this paper, an enhanced approach has been proposed that can classify brain cancer types using deep learning and an ensemble of machine learning techniques on MRI images.Various kinds of data preprocessing and data augmentation techniques have been performed to enhance the quality of the images and to increase the overall accuracy.The system was trained on a dataset of a total of 3787 images of 4 classes.The performance of BCM-VEMT was measured under several matrices such as Accuracy, Precision, Recall, Specificity, and F1-Score.The system has achieved 97.90% accuracy for the Glioma class, 98.94%

3. 2
Data preprocessing 3.3 Data augmentation Deep learning techniques always work better on bigger datasets.One of the ways of overcoming the lack of a sufficient amount of training data is data augmentation.The term data augmentation refers to the process of expanding the training set by applying different types of manipulation techniques to the existing training set, preserving the label.By applying these kinds of

Fig. 2
Fig. 2 Sample images of the brain cancer dataset with their class labels a Glioma b Meningioma c Normal d Pituitary

Fig. 4
Fig. 4 Visualization of Data Augmentation a Original Samples, b Rotated Samples, c Horizontally Shifted Samples, d Vertically Shifted Samples, e Horizontally Flipped Samples, f Vertically Flipped Samples, g Sheared Samples, h Brightness Manipulated Samples

Fig. 6
Fig.6 Training process of the ML classifiers and ensemble of classifiers F1 − score = 2 × Precision × Recall Precision + Recall(a) Training and validation accuracy (b) Training and validation loss

Fig. 7
Fig. 7 Performance analysis of the CNN model used in BCM-VEMT

Fig. 9
Fig. 9 ROC Curve representation of all classifiers used in BCM-VEMT

Fig. 10 ROC
Fig. 10 ROC Curve representation of Class Vs.Classifiers

Table 1
A comparative study of current research on brain cancer detection and classification Authors

Table 3
Model summary of the proposed CNN model of BCM-VEMT

Table 4
Tuned hyperparameters of ML classifiers used in BCM-VEMT Figure 8 shows the confusion matrix of the CNN model, ML classifiers and the ensemble of classifiers.The CNN model correctly classifies 140 Glioma cases, 91 Meningioma cases, 47 Normal cases, and 88 Pituitary cases, and misclassifies 3 cases of each Glioma, Meningioma, and Normal class, along with 5 Pituitary cases.The RF classifier truly classifies 139 Glioma cases, 93 Meningioma cases, all 50 Normal cases, and 89 Pituitary cases, and misjudges 4 Glioma cases, single Meningioma case, and 4 Pituitary cases.The SVM classifier perfectly classesifies 140 Glioma cases, 93 Meningioma cases, 49 Normal cases, and 91 Pituitary cases.The AB classifier succeeds in classifying 141 Glioma cases, 93 Meningioma cases, 47 Normal cases, and 92 Pituitary cases.The KNN classifier correctly classifies 141 Glioma cases, 92 Meningioma cases, 48 Normal cases, and 88 Pituitary cases.The LR classifier perfectly classifies 140 Glioma cases, 92 Meningioma cases, 49 Normal cases, and 88 Pituitary cases.Finally, the ensemble of classifiers achieved the best performance by correctly classifying 140 Glioma cases, 93 Meningioma cases, 49 Normal cases, and 92 Pituitary cases.

Table 5
Computation time of ML Classifiers used in BCM-VEMT Fig. 8 Confusion matrix of the classifiers used in BCM-VEMT

Table 6
Performance evaluation of the classifiers based on each class label

Table 7
Performance evaluation of the classifiers used in BCM-VEMT

Table 8
Performance comparison between the proposed VGG-16 and Faster R-CNN Model Vgg-16 is 96.32% and the overall accuracy of the Faster R-CNN model is 96.05%.The true positive, false positive, true negative and false negative values for the Pituitary class are the same for both the models which are 88, 3, 284, and 5 respectively.

Table 9
Performance comparison of the proposed CNN for different number of epochs

Table 12
Performance comparison of BMC-VEMT with recent studies for the Meningioma class, 98.00% accuracy for the normal class, and 98.92% accuracy for the Pituitary class, with an overall accuracy of 98.42%.It can be a very important tool for doctors to use to find and classify brain cancer more efficiently at an early stage. accuracy