Combining deep-wavelet neural networks and support-vector machines to classify breast lesions in thermography images

Breast cancer is among the leading causes of cancer death among women. The occurrence of breast cancer is similar both in developed countries and in underdeveloped and developing nations, although mortality is higher in underdeveloped countries due to late detection. Even though mammography is the most used technique for the differential diagnosis of breast cancer, breast thermography can be used as a complementary technique, more accurate than self-examination but accurate enough to guide the use of a mammogram. However, thermal imaging is still difficult for radiologists to understand. Machine learning can help improve this scenario. Deep Wavelet Neural Networks are convolutional neural networks that do not necessarily learn, as they may have predefined filter banks as their neurons. In this work, we propose a deep hybrid architecture to support breast thermography imaging diagnosis based on five-layer Deep-Wavelet Neural Networks, to extract attributes of regions of interest from mammograms, and linear kernel support vector machines for final classification. Classical classifiers such as Bayesian classifiers, single hidden layer multilayer perceptrons, decision trees, Random Forests, and support vector machines were tested. The results showed that it is possible to detect and classify injuries with an average accuracy of 99% and an average kappa of 0.99, employing a 5-layer deep-wavelet network and a linear kernel support vector machine as the final classifier. Using a deep neural network with prefixed weights from the Wavelets Transform filter bank, it was possible to extract attributes and thus take the problem to a universe where it can be solved with relatively simple decision boundaries like those composed by linear kernel support vector machines. This shows that these new deep networks can be important in building complete solutions to improve breast thermography imaging results to support clinical diagnosis.


Motivation and justification
Cancer, in all its forms, has become one of the greatest public health problems of the 20th century worldwide, regardless of the levels of social and economic development of different nations around the globe [1]. Of all the forms of cancer, breast cancer is the most dangerous cancer for older and middle-aged women [1]. Also the most common form of cancer among women [1]. Breast cancer is among the five most common types of cancer in the world [2]. In Brazil alone, it corresponds to about 28% of new cancer cases per year. Although, in general, there is a good prognosis, this disease is still responsible for the highest cancer mortality rate in the female population [3]. According to the Ministry of Health of Brazil (MS-BR), the early detection of tumors, which consists of identifying cancer at an early stage, is essential to reduce mortality from the disease [3]. Breast cancer has been proliferating both in so-called developed countries and in underdeveloped and developing countries. It follows the increase in the average life expectancy of the population, the swelling of cities, the gradual emptying of the countryside and the adoption of new, more aggressive forms of consumption [1]. Even though the risk of breast cancer can be reduced through preventive strategies, such as carrying out educational campaigns that encourage visual inspection and touch of the breasts, even a good and well-accepted prevention campaign cannot eliminate most of the types of breast cancer, as they end up being diagnosed too late [1]. Therefore, the existence and availability of technologies for early detection of breast cancer in public health systems can contribute to increase the chances of cure and treatment options [1].
Currently, the main method used to identify breast lesions is mammography, which consists of a breast scan using x-rays [4]. However, despite technological advances that have resulted in improved technique and image quality, there are still situations in which mammography is insufficient to identify lesions, especially in their early stages, whether due to limitations of the method itself or inconsistencies in the specialists' diagnosis due to the great variability of clinical cases [5]. For this reason, investigations using methods such as ultrasonography, magnetic resonance and clinical examinations in general have been associated with the results obtained through mammography, in order to make the diagnosis more robust [3]. Even with the combination of these techniques, the Ministry of Health of Brazil still states that the majority of correctly identified cases are currently of advanced stage injuries, which makes treatment difficult, when it is possible to do so, and increases the need to perform procedures invasive ones such as biopsies and mastectomies (total or partial breast removal) [3].
Although mammography is still the most reliable noninvasive method in use in clinical practice, breast thermography has emerged as an interesting complement to mammographic analysis. The acquisition of breast thermography images is a non-invasive, painless process, not subject to exposure to ionizing radiation or requiring compression of the patient's breast, as occurs in the acquisition of mammographic images. Alternative methods such as breast thermography have been explored as auxiliary tools for the early diagnosis of breast cancer.
Thermography is based on the acquisition of images, recorded through an infrared camera, which show the temperature distribution in the region. The camera's general operation consists of capturing the infrared radiation emitted by the surface of interest. The technique allows the investigation of physiological effects caused by diseases from the analysis of temperature variation in the region, in the case of the existence of cancer cells, metabolic growth interferes with blood flow, resulting in an increase in the surface temperature of the injured region. Studies claim that the use of thermography can anticipate the diagnosis of breast lesions by up to 10 (ten) years, as information related to physiological changes can be extracted from it, which tend to appear before the anatomical [6]. Furthermore, the use of breast thermography as a screening method can greatly reduce the unnecessary exposure of patients to ionizing radiation and other tests.
Overall, the accuracy of diagnosis using conventional techniques is around 70-90%, a percentage that decreases to less than 60% when dealing with women under 40 years old [37]. When it comes to breast thermography, although it is being used and studied in several countries around the world [6], this technique is still scarcely disseminated in Brazil and, for this reason, there are few experts trained to extract relevant information to from the analysis of thermographic images. These factors, associated with the vast variability of clinical cases, make the identification and differentiation of breast lesions based on images a difficult task for human eyes, especially when it comes to small lesions or lesions that are difficult to access. Faced with these challenges, several researches have been dedicated to the study and development of intelligent classification systems to be used as assistants to specialists, in order to optimize the accuracy of the diagnosis.
Fernández-Ovies et al. [38] presented the use of several deep convolutional artificial neural networks architectures to support the diagnosis of breast lesions by breast thermography. Although these classifier architectures are capable of modeling very complex decision boundaries, training deep networks is usually quite expensive from the point of view of computational complexity, demanding a lot of processing time and memory. Computing architectures with a high degree of parallelism are often required, such as servers equipped with graphics processing units (GPUs) [39,40]. This tends to greatly increase the costs of acquiring and maintaining solutions for training deep neural network architectures. One way to overcome this problem is to use hybrid deep architectures based on pre-trained deep networks and shallow machine learning models. This approach has been called deep transfer learning: a deep neural network trained for another problem is used to extract implicit features from images; the feature vector thus obtained is then presented to a shallow machine learning model [41,42].
However, the features extracted from images using models based on convolutional neural networks are not invariant to translation and rotation, which makes the shallow models of the output layer not very robust regarding data variability. Thus, to overcome this problem further, it is necessary to train a model with a very large database, commonly in the big data domain. This again raises the need for architectures with a high degree of parallelism to deal with the computational complexity involved. Furthermore, models based on CNNs require that the input images have a standardized resolution, which requires that images of higher resolution than expected for a given convolutional neural network have their dimensions reduced and aspect ratios changed. This can significantly affect the recognition of images where texture features are preponderant, as is the case of samples from regions of interest in mammary thermography.
In this work, we investigate the use of convolutional neural networks based on the Wavelet transform, the Deep-Wavelet Neural Networks, DWNN. These deep networks, in the form adopted in this work, were proposed by [43]. These networks do not learn: it is enough to set the number of layers and define the type of neighborhood of pixels considered that the bank of filters will be selected. The filter bank is based on the approximation of the Discrete Transform of Wavelets by the Mallat Algorithm: a set of high pass filters directed according to the pixel neighborhood and an approximation low pass filter. Features are extracted using statistics (synthesis blocks) obtained from the sub-images of the output layer. Thus, once the number of layers and the type of pixel neighborhood are fixed, the dimension of the feature vector is fixed, not being necessary to change the dimensions or the aspect ratio of the input images. Given the ability of the Wavelet Transform to be successfully used in texture recognition applications, we adopted the DWNN in the model proposed in this work hoping to obtain good results for the problem of supporting the diagnosis of breast lesions considering different textures of breast tissue.
Therefore, taking into account the relative success of approaches related to artificial intelligence and the need for solutions that enable the early diagnosis of breast cancer, this paper proposes a study of computational approaches for the automatic classification of lesions in thermography images of breast. The study also used a new computational tool for image attribute extraction, the Deep-Wavelet Neural Network (DWNN) [43], which consists of a deep and untrained architecture, inspired by Wavelet decomposition on multiple levels.
This article is organized as follows: initially, we present the most relevant state-of-the-art works in the field of breast cancer diagnosis using computational tools and breast thermography images. Next, we present the proposal of this study. After that, we present and discuss the results obtained, performing a quantitative and qualitative evaluation of the techniques explored. Finally, we conclude the article highlighting the main findings and limitations, in addition to proposing some possibilities for future work.

Related works
In the work of [44], the authors sought to train a backpropagation neural network to identify benign or malignant lesions from a database of 200 breast thermography images. Four different ways of representing the images were tried, through the following aspects: • Set 1: Mean, median, mode, standard deviation and skewness. • Set 2: Average, median and mode. • Set 3: Age, family history, hormone replacement therapy, age at menarche, presence of a palpable nodule, previous surgery or biopsy, presence of nipple discharge, breast pain, menopause aged over 50 years, had a first child with age over 30 years. • Set 4: Combination of attributes from sets 2 and 3.
The neural network used in this study was configured to have a learning rate of 0.5, with a momentum of 0.4 and a sigmoidal-type activation function. In this study, only one classifier configuration was used, since the aim of the authors was to analyze the different sets of attributes, comparing the effectiveness of each one of them in the representation of thermal images, according to the proposed method.
In this sense, it was observed that the accuracies of sets 1, 2 and 4 were similar to each other, with an approximate value of 61%, however, while the mean square error associated with classification using set 4 was 0.05, both the other sets (1 and 2) had an error equal to 0.12. Group 3, in turn, resulted in an intermediate error of 0.09, however, the accuracy obtained from this representation was around 53%, almost 10% lower than with the other sets.
As for system sensitivity, group 2 had the highest value, 70%, followed by groups 1 and 4, with just over 65%, and finally, group 3, with the lowest sensitivity, of almost 50%. In relation to specificity, set 3 had the best result, close to 80%, while the other sets had worse results and similar to each other, around 40%. Thus, even with less satisfactory results, group 3 was shown to result in lower rates of false positives, but this was not enough to improve the specificity of group 4, which also contained information from group 3. In general, low values of errors, but the results obtained, especially in terms of sensitivity and accuracy, were unsatisfactory, since it is an application in human beings.
Arora et al. [45] also sought to perform a binary classification of malignant and benign lesions from breast thermography images. In their work, the authors used 94 images (320×240), acquired by themselves and whose diagnoses were previously confirmed by biopsy, of which 60 had a malignant lesion and 34 had a benign lesion.
During the acquisition, the technique of cold stress was used, in which cold air is directed to the breasts during image acquisition. In this study, 3 different image analysis techniques were used: Blinded screening mode (SBS), clinical assessment and artificial neural network (ANN). The first technique results in a risk score, which ranges from 0 (zero), minimal risk, to 7, very high risk. The two other methods give a binary result, whether it is malignant or benign. From the experiments, the authors verified that the approach using ANN stood out positively, in relation to the others, obtaining an accuracy of 81.8%, while for the SBS it was 66.7% and 71.4% for clinical analysis.
A slightly different approach was proposed by [46], who chose to combine an unsupervised with a supervised learning method. The authors used a Self-Organizing Map (SOM) to perform both the grouping process, as the images did not have a pre-established output class, and the feature extraction process, related to the morphological texture. For the classification step, it was decided to use an MLP network and cross validation with 5 folds. Two different databases were used, the first with 50 images and whose acquisition process was better standardized, and the second with 200 images without much rigor in terms of standardization.
The databases were analyzed separately and, once again, a binary classification was performed, but now of the cancer versus non-cancer type. Using the first database, the proposed method reached up to 100%, both in sensitivity and specificity. These results declined somewhat for second base, which had a sensitivity of 88%, with 99% specificity in detecting breast cancer. Despite having used cross-validation during training, the results, especially for the first database, may indicate overspecialization of the system, given the low dimensionality of the database.
In the study by [47], 50 images were used (1280×1024), equally divided into two classes: healthy and with malignant lesion. Its main objective was to evaluate the performance of several intelligent classifiers in the task of grouping images into their respective classes. To extract attributes from the images, they used the Oriented Gradients Histogram (HOG) method; then, these attributes went through two processes, the first of dimensionality reduction, using the KLPP technique (Kernel Locality Preserving Projection, and the second of selection, using selection techniques based on Student's t test, since the set of attributes extracted with the HOG was large and could contain redundancy. Finally, the following classification methods were tested: decision tree, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), k-Nearest Neighbour (kNN), Fuzzy Sugeno, Naive Bayes, SVM, AdaBoost, Probabilistic Neural Network (PNN), and Breast Cancer Risk Index (BCRI). At this stage, the tests were performed using cross validation, with 10 folds. Among the evaluated methods, the decision tree achieved the best results, with 98% accuracy, 96% sensitivity and 100% specificity. Again, this system may have been overspecialized as there were only 25 instances per group and no expansion of this data was reported. The fact that the tree stood out in relation to other classifiers may also indicate a low generalizability of the results.
Fernández-Ovies et al. [38], in turn, also evaluated the detection of breast cancer using the binary problem with healthy and cancer classes. However, in this work, the classification was performed through six different configurations of convolutive neural networks (CNNs) and using the 5-fold cross validation method. The CNNs Resnet18, Resnet34, Resnet50, Resnet152, Vgg16 and Vgg19 were used for classification. As for the image base, the authors chose to use VisionLab, which, in total, contains 5604 images (480×640), some of which were inserted synthetically, to expand the base. 2411 of these images are of healthy breasts, while 534 have cancer, the other images were not used because they did not meet the inclusion criteria established by the team. Since CNN is an extremely sensitive classifier to unbalanced classes, 500 images from each group were randomly selected to participate in the actual experiments.
With the experiments, the authors verified that, in general, the Resnet-type CNN stands out in relation to Vgg. Resnet50 and Resnet34 had the best performances, with average accuracies of 98.75% and 98.13%, respectively. However, Resnet50 proved to be less stable, as it presented a standard deviation of 1.09%, while that of Resnet34 was 0.63%. Resnet18 also presented interesting results, but not enough to surpass the other two.

Materials and methods
In this study, we propose an approach for identifying and classifying breast lesions from breast thermography images. These images were obtained from extension actions carried out in partnership with the Hospital das Clínicas of the Federal University of Pernambuco, Brazil, with women from the Zona da Mata Norte and the Metropolitan Region of Recife, in Northeastern Brazil. The database contains images of both people without breast lesions and patients with cystic, benign and malignant lesions.
Initially, we convert the images from their RGB-JET pattern to grayscale, respecting the temperature information associated with the pseudocolors used in the formation of the original image. Then, we use the DWNN method to extract attributes from the images. Finally, we evaluated the performance of different algorithms in positioning images in their respective classes. This proposed approach is illustrated in Fig. 1.

Database
The thermographic breast image database used is from the Thermal Imaging Laboratory of the Mechanical Engineering Department at UFPE, coordinated by Profa. Rita de Cássia The base consists of 1052 thermographic breast images (480×640 pixels), acquired from a FLIR infrared camera, model ThermaCAMTM S45, with thermal sensitivity of 0.06°C. These images are grouped into 4 distinct classes, according to the diagnosis associated with them, which can be cyst, malignant lesion, benign lesion and no lesion, as illustrated in Fig. 2. All diagnoses were established after previous investigation using specific tests for each situation, in the case of cysts, the diagnosis was confirmed by fine needle aspiration puncture (FNAB) or ultrasonography, malignant and benign lesions were confirmed through biopsies and cases from the class without lesion were verified by mammography and ultrasonography, classified as BI-RADS 1, that is, without findings [48].
In the database there are also images acquired in 8 different positions, with two frontal images of both breasts (T1 and T2), a frontal image of each one of the isolated breasts (MD and ME) and four lateral images (LEMD, LEME, LIMD and LIME). The difference between the T1 and T2 images is in the position of the arms in each of them, in T1 the arms are at the waist, while in T2 they are upwards. As for the lateral images, LEMD and LEME represent records of the external sides of the right and left breasts, respectively; LIMD and LIME are images of the inner side of each breast. Examples of each of these positions are shown in Fig. 3.
Images were acquired following the protocol described by [49]. Before the acquisitions, it was necessary to carry out a series of preparations, from the patients and from the acquisition room, since the contact between parts of the body and the environment, or other parts of the body, changes the surface temperature through the process of heat conduction . Such procedures are described in the diagram in Fig. 4.
To standardize the acquisitions, in order to avoid changes in the positioning of patients during the process, a mechanical apparatus [49] was used. From this apparatus, illustrated in Fig. 5, the patient remains seated in the swivel chair (at 3) during the exam, which is moved to change the image  angle. The camera is placed on a tripod, which is placed on top of a support (2) that moves closer or farther away from the patient via the rails (at 1). The horizontal bars (at 4) are used for the correct positioning of the patients' arms. At least eight images were acquired per patient, one in each position; in some specific cases it was necessary to get some extra images.
In this work, we used only the T1 and T2 images, as this positioning favors the visualization of both breasts and the identification of possible anomalies and issues related to asymmetries. Thus, 336 images were used, distributed in 4 classes according to what is presented in Table 1. Since the amount of images per class is different, a class balancing step was added to avoid classification biases, since, with unbalanced bases, algorithms tend to favor classes with more representatives over less populated classes. In this sense, the balance is made from the insertion of synthetic instances, from the linear combination of real instances, as suggested   [33]. This process resulted in a total of 968 instances, divided evenly among the classes.
The number of benign lesion images is practically double the number of non-lesion images. This causes a predominance of the influence of the class of benign lesions on the training process of the learning machine, reducing the specificity, sensitivity and efficiency results while exhibiting a high but misleading accuracy. However, since each instance has 1024 attributes, the number of instances could not be much smaller than the dimension of the attribute vector, under the risk of the classification problem becoming sparse and, therefore, difficult to solve. This was the main reason for the database to have been balanced and expanded with synthetic instances, seeking to preserve the statistical characteristics of the original database.
Thermographic images use the pseudo-color technique, where different temperature values within a given range are mapped into RGB color ranges, so that it is possible to obtain the approximate temperature matrix of a thermographic image by performing the inverse transformation of pseudocolors in temperature. The most commonly used pseudocolor standard in thermographic images is RGB-JET [36,50]. For image processing and standardization of pseudocolor information, avoiding errors due to the different color palettes used in image acquisition, it was necessary to perform an RGB-JET conversion to gray levels, mapping the temperature matrix to 0 to 255 gray levels (8 bits) [36,50]. In this process, lighter shades of gray scale indicate higher temperatures and vice versa. In addition, the images were analyzed in their entirety, without any type of segmentation, therefore, using context-based analysis. Despite containing information such as bars, numbers and labels, it was considered that, as this information is always present and in the same position, the algorithms would interpret it as redundant or not relevant to differentiate the classes.

Feature extraction
For the extraction of attributes we use the method Deep-Wavelet Neural Network (DWNN) [43]. DWNN is a deep, untrained network for attribute extraction, inspired by Mallat's [51] algorithm for multilevel Wavelet decomposition. This algorithm, which emerged as a strategy for implementing the discrete wavelet transform, consists of obtaining approximations and details of an image. Approximations are the lowfrequency representation of the image, conserving its general trend while smoothing out the abrupt transitions present in it. The details show the image's high-frequency components, highlighting regions of discontinuity such as edges and blemishes.
In Wavelet decomposition, low-pass and high-pass filters are applied to an image to form a set of other images that are smaller in size than the original. From low-pass filtering, approximations are obtained and details are acquired through high-pass filters. This approach allows the analysis of images in the spatial and frequency domains and, therefore, it has been widely used in pattern recognition. DWNN uses a process similar to this one, in which a neuron is formed by combining a given filter with an image size reduction process called downsampling.
The "Deep" in the name arises from the possibility of using multiple layers, making the method even deeper as new layers are added. Furthermore, as with conventional deep learning methods, such as Convolutional Neural Networks (CNN), the process consists of two basic steps, in the first, the images are submitted to filters, and in the second, pooling is performed. However, while filters are not fixed in conventional methods, the filters used in Deep-Wavelet are fixed and refer to families of Haar wavelets.
Thus, considering a bank with n filters, an input image will be submitted to n neurons that form the first layer of the neural network. In the second layer, each of the images resulting from the first will be individually submitted to the same bank of n filters and to downsampling, in the same way as was done for the input image. The process is repeated for all network layers, according to the amount established by the user and which determines the depth of the network.  Finally, in the DWNN output layer, we have the synthesis blocks (SB), which are responsible for extracting information from the images resulting from the entire process. In these blocks, each of the reduced n m images will be submitted to a maximum, average, minimum, median or mode function. Thus, each image is replaced by a unique value.
In this work, we used a filter bank with 4 filters and 5 layers to extract attributes, resulting, therefore, in 1024 attributes for each input image. In the synthesis block we used the averaging function to calculate the output of the network, as this one presented better results.

Classification
After extracting attributes, the set was submitted to a classification step. We evaluate the classification performance of different methods, the Bayesian network (Bayes Net), the naive Bayes classifier (Naive Bayes), the Multilayer Perceptron (MLP), the Support Vector Machine (SVM), the Extreme Learning Machines (ELM), in addition to the search treebased classifiers: J48, Random Tree and Random Forest. The settings for each classifier are presented in Table 2.
All tests were performed using the 10-fold cross-validation method [52]. The use of this method reduces the variability of results, providing more robustness and reliability to training and reducing the chance of overfitting occurring. In addition, each classifier configuration was tested 30 times, in order to obtain statistical information for comparing the methods.
To assess the performance of the classifiers, accuracy metrics, Kappa index, and confusion matrix were used, and from the latter, sensitivity and specificity metrics were calculated. The accuracy (Ac) corresponds to the percentage of correctly classified instances and can range from 0 to 100% [53]. This metric is calculated from the Eq. 1, where n corretas is the number of correctly classified instances and n total is the total number of instances.
The index, or coefficient, Kappa ( ) is a statistical metric to assess the agreement between the obtained and expected results [54,55]. The index can vary in the range [-1.1], where values less than or equal to 0 (zero) indicate no agreement between the results, values above 0.8 demonstrate high agreement and intermediate values represent low or moderate agreement. Kappa provides information about the degree of reproducibility of the method and is calculated as indicated in Eq. 2, where P calculated represents the observed value and P expected is the expected value.
The confusion matrix shows the number of instances classified into each of the possible classes. The main diagonal of the matrix presents the instances correctly classified, they are the true positives (TP) and negatives (TN). The other values represent the instances whose classes were confused, that is, they were misclassified. From this matrix it is possible to observe which classes are more confused, in addition to being possible to calculate the sensitivity and specificity associated with that classification, which are essential metrics to assess the efficiency of diagnostic systems, as they are, respectively, related to rates of false negative (FN) and false positive (FP). Given the matrix presented in Table 3, the sensitivity is calculated from the Eq. 3 and the specificity through the Eq. 4.
Finally, the overall efficiency of the system can be calculated from the relationship between sensitivity and specificity indicated in Eq. 5.
(5) Ef f iciency = Sensitivity + Specif icity 2 .  Although the metrics described are defined for binary classification, they can also be extended to multiclass classification by constructing a binary confusion matrix for each one-versusall comparison for each class. This binary confusion matrix is obtained from the confusion matrix of the multiclass problem.

Results
In the thermography basis used in this work, the database is divided into four classes, defined based on the diagnosis given in each case, they are: Cyst, Benign Lesion, Malignant Lesion and No Lesion.
The Deep-Wavelet Neural Network (DWNN) was used to represent the images. The DWNN used has 5 levels and median as a function of the synthesis block. The results obtained in this situation are shown in Fig. 6, where graph (a) presents the accuracies for the different classification methods and in (b) the results of kappa indices are shown.
The results showed that the best performances with MLP were obtained with 100 neurons in the hidden layer and with 100 trees in Random Forest. For the SVM and ELM methods, however, the configurations that resulted in the best performances were, respectively, with linear kernel and with morphological erosion kernel.

Discussion
From the experiments, we could observe SVM and MLP classifiers stood out positively in relation to the others. In the case of the SVM, both the accuracy and the kappa index were close to the maximum values, whose mean values were 99.13% and 0.99, respectively. This method also presented the lowest standard deviation values, resulting in a low dispersion of the results. Random Forest was also able to obtain high classification performance, both in terms of accuracy and in relation to the kappa index. The ELM presented intermediate results, with a performance inferior to that presented by Random Forest. The Bayesian classifiers and the J48 and Random Tree trees presented the least satisfactory results, with accuracy values lower than 50% and kappa index of, at most, 0.33.
Overall, the best results for identifying and differentiating breast lesions on images were obtained with the SVM classifier. The least satisfactory results, in turn, were obtained with Random Tree. The confusion matrices for these situations (Table 4) reiterate the observations presented above, as there is much more confusion between the classes with the worst result than in the best situation. From the matrices, it is possible to observe that, in both cases, the benign lesion class is the one that is most confused with the others, mainly with the cyst and malignant lesion classes. But there is still confusion between the benign lesion class and the class where there is no lesion. However, as the degree of confusion is significantly reduced in the situation in which the best classification performance occurred, expressive results were obtained for the metrics of sensitivity, specificity and efficiency of the system, as shown in Table 6, from which the quality of diagnosis can be analyzed. Comparing with the same classifiers (best and worst configuration) applied to the original, unbalanced knowledge base, we arrive at the results expressed in the confusion matrices of Table 5 and  the quality metrics of Table 7. We can see that the results for the balanced and expanded base are much higher, reducing the influence of the majority class on the others and, consequently, improving the results of specificity and sensitivity. In green, correspond to the best results, while in pink, related to the worst result Table 5 Confusion matrices for best (linear kernel SVM) and worst (Random Tree) results considering the unbalanced database In green, correspond to the best results, while in pink, related to the worst result

Conclusion
The present study presented a deep network-based hybrid architecture and support vector machine for solving the challenges associated with interpreting breast thermography images. Wavelet-based image descriptors and different classification methods were explored.
From the results, it is observed that the deep network approach with DWNN was able to represent the images well. When associated with support vector machines (SVM), the identification of lesions in breast thermograms reached an accuracy around 99% and more than 0.95 kappa index. Good results were also obtained with MLP, Random Forest and ELM, indicating that the problem can be generalized, but often in a non-linear way. The good performance of these methods explains the low results obtained with decision trees (J48 and Random Tree), since these methods commonly achieve good results when the basis is very specific. The relatively low performances with Bayesian methods point to a considerable dependency between the attributes.
Regarding the metrics that assess the quality of diagnosis, excellent values were obtained for sensitivity, specificity and efficiency in the diagnosis of images with cystic and non-lesion lesions, reaching up to 100% sensitivity in both cases, and values above 99% for the other two metrics. Values above 99.5% were also observed for the three measures, in relation to the diagnosis of malignant lesions. For benign lesions, the approach provided a sensitivity of 97.11%, with a specificity of 74.92% and 86.02% accuracy.
There was even greater difficulty in differentiating benign lesions from other conditions. However, even for this class, high sensitivities were obtained, that is, it can be said that the system is associated with low false negative rates, which is an essential characteristic for a system with application in the diagnostic process.
The results found highlight the power of hybrid approaches based on deep learning for solving complex and non-linear problems, recurrent features in issues associated with biomedical applications. Deep learning techniques such as DWNN, explored here, have been shown to be effective for such applications, as they increase the complexity of the decision frontier for solving the classification problem.