A machine learning method based on the combination of nonlinear and texture features to diagnose malignant melanoma from dermoscopic images

— Skin cancer affects people of all skin tones, including those with darker complexions. Melanomas are known as the malignant tumors of skin cancer, resulting in an adverse prognosis, responsible for most deaths relating to skin cancer. Early diagnosis and treatment of skin cancer from dermoscopic images can significantly reduce mortality and save lives. Although several Computer-Aided Diagnosis (CAD) systems with satisfactory performance have been introduced in the literature for skin cancer detection, the high false detection rate has made it inevitable to have an expert physician for more examination. In this paper, a CAD system based on machine learning algorithms is provided to classify various skin cancer types. The proposed method uses the Online Region-based Active Contour Model (ORACM) to extract the Region Of Interest (ROI) of skin lesions. This model uses a new binary level set equation and regularization operation such as morphological opening and closing. Additionally, various combinations of different textures and nonlinear features are extracted for the ROI to show the multiple aspects of skin lesions. Several metaheuristic optimization algorithms are used to remove redundant or irrelevant features and reduce the feature space dimension. These are applied to the combination of the extracted features in which, Non-dominated Sorting Genetic Algorithm (NSGA II) as a multi-objective optimization algorithm has the best performance. Furthermore, various machine learning algorithms include K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Fitting neural network (Fit net), Feed-Forward neural network (FF net), and Pattern recognition network (Pat net) are employed for the classification. Accordingly, the best-obtained precision of 99.24% based on five-fold cross-validation is attained by the selected features of texture and nonlinear indices through NSGA II, applying the pattern net classifier. Also, the comparison between this paper's experimental results and other similar works with the same dataset demonstrates the proposed method's efficiency.


Background and motivations
Melanoma can be considered as the most dangerous form of skin cancer, composed the most significant portion of the death corresponds to skin cancer. If these malignant cancerous cells can be diagnosed in the primary stage, the chance of saving lives will be increased considerably [1,2]. Hence, the precise differentiating between the melanoma and the other pigmented skin lesions (non-melanoma) images has remained a serious challenge for dermatologists [3]. The most common diagnostic method that the physicians employ to detect the skin lesion types is the "ABCD" technique, in which, by measuring four morphological specifications, the belonging of skin lesions to the benign or malignant categories can be detected. Asymmetry, border irregularities, color distributions, and diameter are the mentioned features that composed this method [4,5].
However, due to the drawbacks of the "ABCD" technique in detecting the small or primary stage melanoma in which the irregularities in its boundary are not composed, this method's accuracy is not significant. It implies a challenging task even for dermatologists is diagnosing the type of skin cancer due to the different appearance of skin lesions by this noninvasive experimental technique.
To overcome this challenge, various noninvasive CAD systems have been developed by the researchers. The CAD system can be considered a 'second opinion' to help radiologists and dermatologists in deciding [6,7]. Decreasing the workload, reducing the false-negative diagnosis due to probabilistic physician mistakes, and avoiding the overloaded ignoring are the main advantages of CAD systems [8,9]. These methods usually involve three significant steps: i. Skin boundary detection ii. Feature extraction iii. Classification The boundary detection process segments the skin lesion images and extracts their ROI, that is critical for the precise classification of skin lesions. The feature extraction process uses visual properties such as color, lesion shape, and texture information. Finally, the classifiers are employed to determine that the new samples belong to which categories.

Literature review
In Table 1, the presented studies on the classification methods of skin cancer are surveyed.

Contributions
The significant contributions of this study proposed approach can be considered as follows:  Using ORACM as a new fast and accurate segmentation approach for skin lesion segmentation. As we know, the application of this method for dermoscopic images is not reported in the literature.  Texture features in the form of GLCM are utilized to attain the information inside the image and calculate the skin lesion characteristics correspond to color changes and diameter. Furthermore, a combination of nonlinear and texture features is introduced in this paper to exhibit the varied aspects of skin lesions.
 Applying a metaheuristic multi-objective optimization approach, NSGA II, for simultaneous reduction of the objective function and the selected feature's value.
 Utilizing k-fold cross-validation to reduce the sensitivity of classification accuracy to the training and testing datasets as well as employ diverse machine learning methods for dermoscopic image classification.

Paper organization
The rest of this study's organization is presented as follows: Section 2 is assigned to the input dataset of dermoscopic images. Afterward, Section 3 provides details regarding the proposed method's other steps, including extracting the ROI, feature extraction and selection, and 6 classification methods. The experimental results and classifying functions are represented in Section 4. While the comparison between the proposed method with similar works is illustrated in Section 5, the achieved results are discussed in Section 6. Lastly, the conclusion is presented in Section 7.

Material
This study's proposed approach is tested on PH 2 dermoscopy image database consisting of 200 dermoscopic images. 40 melanomas and 160 non-melanomas were achieved in the Pedro Hispano hospital composed this database [27]. For example, some of these dataset images are demonstrated in Figure 1. The images are RGB, saved in BMP and JPEG formats, and most of them have a resolution of 766× 576 pixels. For effective feature extraction and increased classification accuracy of skin lesions, image enhancement is conducted firstly. This task should be performed due to some hair, bubbles, and gel used on the skin surface in the image capturing process. Similar to [28], when artifacts are identified, an inpainting operation will be utilized. Fig. 2 demonstrates the process of converting a dermoscopic color image (a) into a grey-scale one (b) as well as removing its artifacts.

II. Extraction of the skin lesion ROI
Due to this paper's proposed method is based on irregularity detection of image boundaries, the dermoscopic images with a single lesion are considered here. In this stage, all images are rescaled to 766× 576 pixels size [5]. Afterward, the obtained images are changed into the binary, and their ROI is extracted through the ORACM.
ORACM can be considered an Active Contour Method (ACM) based on the region. Without needing any parameter tuning and shorter time than conventional ACMs, the segmentation problem can be solved by this approach. A sort of block thresholding process with inflexible boundaries and several components is produced without belonging to the object for any iteration. Applying upgrading level set equations, the structure of ORACM can be composed [29].

Feature extraction
Feature extracting of the ROI in skin lesions is the next phase of its processing, improving diagnostic efficiency and increasing the chance of successful treatment. As previously 8 mentioned, complexity measures or nonlinear indices and texture features are two types of extracted indices described in detail.

Nonlinear indices
Asymmetry related to the lesion's borderlines and border irregularity are two crucial elements for classifying the skin lesions based on their shapes as well as the critical warning signs for melanoma diagnosis. Therefore, the contour borderlines' asymmetry and border irregularity measurement lead to identifying the cancerous lesion [5].
On the other hand, cancerous tumors have a chaotic nature that contributes to growing asymmetric and irregular. Therefore, the source of irregularities in the tumor borderlines is the chaotic behaviors of its causation processes. The cancerous tissues can be identified through the nonlinear analysis and complexity measuring of the medical images [30,31].

a) Generating Time Series
The obtained chaotic TS could reconstruct an unknown complex system's states. Similar to [5,32], the image center can be achieved by calculating the means of pixel length and width.
Afterward, calculating the radial distance between every borderline point and image center leads to a sequence called time series.

b) Fractal dimension
Fractal can be assumed as a multidimensional geometric image in which, the original image's same structure can be founded on any scale. The fractal images' self-similarity property means any part is similar to the primary image [33]. Several methodologies have been employed for FD calculation in the literature in which, four types of most applicable methods are utilized here. 9

i. Box-Counting Method (BCM):
By dividing the image into several equal-size boxes, the number of squares that contain the image boundary is counted [33]. This process is repeated several times by reducing the box sizes. The image FD can be obtained through calculating the best fitting line slope for the plotted curve between calculated box numbers logarithm ( log( ) a ) versus box size logarithm ( 1 log( ) s ) [33,[34][35]. FD (D) can be estimated as (2).

ii. Higuchi Fractal Dimension (HFD)
Higuchi is one of the practical techniques employed to compute the time series FD. Assuming a TS as in which, the initial samples and sample frequency are denoted as m and k , and where Nm k  indicates the normalized factor. The TS length () Lk is achieved using Eq. 5.
Similar to BCM, the image FD can be calculated considering the best fitting line slope for the plotted curve between log( ( )) Lk versus log(1 ) k .

iii. Katz Fractal Dimension (KFD)
KFD is computed from the TS to describe the Euclidean distances of successive sample points based on Eq. 6.
where n or / La is TS length, L indicates the waveform length and d denotes the highest distance between the TS samples and its first point [37].

iv. Petrosian Fractal Dimension (PFD)
Converting the TS into a binary sequence is the basis of calculating the Petrosian FD. The required binary sequence can be composed as follows: if any distinction between the consecutive samples of a TS is observed, '1' and otherwise '0' is assigned. Afterward, the PFD is obtained as (7). 10 10 10 in which, n and N  are the sequence length and the number of binary sequence sign changes, respectively [38].

c) Lyapunov Exponent
Reconstructing the system phase space helps to TS analysis, especially when the system dynamics are not known appropriately. In this situation, a pseudo phase space named Reconstructed Phase Space (RPS) is formed and applied in the LE calculation. By the LE estimation, the rate of system chaotic behavior in the form of its sensitivity to the changes in initial conditions can be measured. Like [5,35], the RPS is composed via the time-delay embedding method for the TS of skin lesion extracted ROI, and LE is computed employing the Jacobian approach. 11 The entropy can determine the existence of irregularities in the spatial distribution and complexities in the system behavior [35,39]. In this study, three main approaches consist of approximate, permutation, and sample, are employed for the entropy calculation.

i) Approximate Entropy (ApEn)
When the TS sample number is low and the noise affects TS data, ApEn is a suitable tool for quantifying the irregularity and complexity rate [40]. ApEn can be defined as (8).
While the data length and its sample number are denoted by m and N , the filter level and the distance function are described by f r and d , respectively.

ii) Permutation Entropy (PermEn)
The PermEn is a robust statistical tool that determines the complexity of a TS or signal.
Employing Takens-Maine theorem, the RPS of a TS can be defined as follows: (13) in which the embedding dimension and time delay are presented as m and  [34,41].

Rearranging the ()
Xi leads to: If at least two same valued elements can be found in () Xi, their situation may be sorted in such a way that 12 ( (  1) ) Consequently, any vector () Xi mapped to a symbol group as: 12 ( ) ( , ,..., ) m S l j j j  (15) in which 1

iii) Sample Entropy (SampEn)
SampEn is a practical tool for measuring data complexity and can be obtained by modifying the ApEn equations as follows: where i A indicates the number of () Xiwith the maximum tolerance of r for the pattern vectors

Texture features
Statistical analysis or texture features can be applied to measure skin lesions' characteristics and produce relevant information from images that help solve other computational tasks related to specific applications. GLCM is a statistical approach employed to describe some of the textural features. The spatial relationship between pixels with several gray levels could be 13 defined by GLCM [43,30]. In this paper, the Discrete Wavelet Transform (DWT) is applied to extract the wavelet coefficients and GLCM. Furthermore, a set of six GLCM descriptors, namely, energy, correlation, homogeneity, contrast, entropy, and Inverse Difference Moment (IDM), is measured for the structural, statistical, and textural features extraction. The presented approach for extracting the texture features is illustrated in Fig. 3. The following GLCM features specifications are considered in this study.
 Energy: The angular second moment or energy is presented as (20).
After the mentioned textural features are extracted based on GLCM, to have a significant analysis of dermoscopy images, the bellow indices should be provided.

 Mean
The mean shows the central tendency of a pixel probability distribution in an image. The root square of variance is generally expressed as the SD. Assuming  as the SD of an image, the relative smoothness can be defined as:

 Skewness
The asymmetric distribution of pixels in a definite frame nearby its mean is referred to the skewness: The RMS provides the squares arithmetic mean of the average amounts and defines as (29).

 Diameters of lesion
The skin lesions' diameter can be defined as the largest distances between the borderline contour pixels and deputize the "D" in the clinicians ABCD method.

Feature selection with metaheuristic algorithms
Metaheuristic approaches can be considered as one of the more effective methods in solving optimization problems. The ability to find the optimal solutions for the enormously complicated situation in the fastest time makes these approaches very applicable. 16 Solving the constrained and unconstrained optimization challenges can be performed through the GA. Mutation, crossover, and selection are the three main operators of GA cause the optimal solutions are coming from the original population by performing iterative steps [36].

Particle Swarm Optimization (PSO)
The central concept of PSO is mimicking social animal behaviors. In this method, the particle's position and velocity are changed several times to reach the optimal solution by minimizing the optimization problem [44].

Water Waves Optimization (WWO)
The theory of shallow water waves inspires WWO. In this approach, any candidate is assumed to be a wave, and solution searching is supposed to be wave motions [45].

NSGA-II algorithm
Despite the single-objective optimization problem, multiple solutions named Pareto optimal set are achieved in the multi-objective one. These solutions can make a trade-off between even some inconsistent purposes [36]. The second version of the NSGA, named NSGA II, is one of the most puissant approaches applied to solve multi-objective optimization problems by resolving the weaknesses of the traditional NSGA.

K-FOLD STRATIFIED CROSS VALIDATION
Cross-validation is presented to solve the problem of dependency of classification accuracy to the training dataset. By this approach, the effect of over-fitting could be eliminated, and the classification is performed more reliable [8]. Partitioning all data into the K-fold, employing K-1 folds to train the classifier, and applying the rest to validate are the fundamental k-fold cross-validation steps. This procedure is repeated K times by changing the training and validating data. In this study, K is chosen 5.

17
The extracted features' capability in differentiating the mole or melanoma lesion can be assessed by employing the various machine learning algorithms, including KNN, SVM, Fit net, FF net, and Pat net.

SVM
SVM can be utilized for classifying the various sets of images. By separating the data into a hyperplane, the difference between the two categories can be increased [43]. Radial Basis Function (RBF) is one of the more applicable kernels used in SVM. Eq. 30 describes its decision function:

KNN
The KNN classifier is a non-parametric supervised technique that delivers an appropriate classification precision [30]. Learning and testing are two steps of the KNN approach. While assigning the labels based on the predefined determined class is the base of the learning phase, dedicating the k nearest data points to the undetermined data composed the testing stage.

Pat net
In accordance with target classes, data classification is performed in Pat net. In this structure, a deep ANN is utilized for training a map between design patterns into a compact Euclidean hyperspace. Pat net embedding's applied as an index vector with machine learning methods [36]. 18 The FF net is an effective ANN method that derives the data into the system. In this method, information streams from the input, pass the hidden, and go to the output nodes without any loop.

Fit net
Considering the inputs and related target sets, the training process of Fit net can be performed.
Fit net can be composed by training the input data and defines the relation between system input and output by selecting the desired hidden layers.

Proposed Method
This study's proposed method for classifying the skin lesions and diagnosis the melanoma using the dermoscopic images based on the stages mentioned above is described in detail in Fig. 4. 19 Fig. 4. The flowchart of the proposed tumor classification algorithm.

Results
The proposed approach for type detection of skin lesions is implemented on the dermatoscopic images. All methods are simulated on a 3.6GHz Core i7-4720 CPU system employing MATLAB R2018a (Math works Inc.). In the first phase of simulation, the ability of the ORACM in the segmenting and extracting the ROI of the skin lesion in the dermoscopic images is illustrated in Fig. 5.  As can be seen, this approach detects the ROI of lesions precisely. In the second phase, all of the introduced nonlinear indices are measured for the skin lesion images of the dataset, and their averages and std. are illustrated in Table 2.  In the next step, after calculating the texture features, their averages and std. for melanoma and non-melanoma cases are illustrated in Table 3. Furthermore, the diameters of the ROI of skin lesions are computed. For example, the diameters of four ROI of skin images are measured in pixels and millimeters and demonstrated in Fig. 8.  Table 3. The mean  std. of texture features for all melanoma and non-melanoma cases.

Fig. 8. Skin lesions' Feret's diameter (in pixels and millimeters).
After gathering the various features from the skin lesions, the next phase is implementing the introduced meta-heuristic algorithms to select the more correlated features. Feeding the feature selection approaches consists of GA, PSO, WWO, and NSGA II by the whole extracted complexity and texture indices and implementing the mentioned algorithms will lead to choosing more appropriate features. The amounts of designing parameters of feature selection methods are presented in Table 4.  Table 4. The amounts of designing parameters for feature selection methods. To have a fair comparison between the mentioned metaheuristic methods, the optimum objective function's amount, the selected feature quantity, and their implementation time are illustrated in Table 5. The NSGA II approach's multi-objective configuration leads to the best combination of the objective function's minimum value and selected feature quantity. Although this algorithm takes long more than others, the offline implementation of this stage caused this shortcoming can be ignored. Changing the amount of objective function versus increasing the iteration demonstrates the convergence diagram. When the decrease is stoped, the cost function reaches its minimum value. Due to the multi-objective configuration of the NSGA II, the convergence diagram is plotted in Fig. 9. As can be comprehend, the objective function's value does not differ significantly after choosing the eight features. Hence, eight indices are selected for classification based on the trade-off between reducing the cost function and increasing the computational complexity.  Table 6.   Table   7. Also, these results are demonstrated in Fig. 10 to have a proper understanding. The proposed method's training performance, presented in Figure 11, According to the results presented in Table 7, the ROC curve is obtained for the five classifiers and given in Fig. 12. Based on this figure, the ANN classifiers include Pat net, FF Net, and Fit net performed better than the others and have a more remarkable discriminate ability.  Table 8 and Table 9 for the nonlinear and texture features, respectively. Moreover, the accuracy of five designed classifiers using three feature sets consist of nonlinear, texture, and their combination are compared in Fig. 13. Comparing the obtained results of this study and other researches utilized the same dataset images can lead to a fair perception of the proposed method's ability. Table 10 provides a more detailed comparison based on sensitivity, specificity, and accuracy measures. Furthermore, the accuracy comparison of these methods is represented in Fig. 14.

Discussion
The first section of the experiments is dedicated to segmenting the skin lesion ROI. As can be observed, employing the ORACM method leads to an appropriate precision in image segmentation. Eliminating two significant shortcomings of ACM consists of the slow speed of algorithm convergency as well as the sensitivity of the results to the parameters tuning contributes to the ORACM being employed as a fast and low computational cost method for image segmentation.
Besides, Table 2 demonstrates that due to the chaotic nature of the cancerous cells and tumors, higher nonlinear indices are obtained for the malignant cases rather than benign ones.
Irregularities in skin lesion ROI of melanoma lesions lead to a higher FD in all types, including BFD, PFD, KFD, and HFD. Also, computing the other complexity measures, i.e., LEE and various types of entropies, confirmed the more complexity in malignant melanoma's extracted ROI. Therefore, chaotic indices have been introduced and applied as a discriminating tool for quantifying the asymmetry and border irregularity of dermoscopic images. Figures 6 and 7 demonstrate the nonlinear feature sets for the melanoma and non-melanoma lesions. The other clinician features consist of color changes, and the diameter of the skin lesion can be determined by calculating texture features. Table 3 illustrates while the entropy and IDM of the melanoma lesion are higher than the mole's, the inverse situations are established for the homogeneity, energy, smoothness, and correlation. The variety of nonlinear and texture measuring tools provides a comprehensive look at skin lesion's various aspects.
To select the more correlated features, boost the diagnosis sensitivity, remove the misleading data, and decrease the computational cost, the meta-heuristic feature selection algorithms are employed. Looking in detail at Table 5 shows the NSGA II approach displays the optimum combination of the lowest objective function and selected features value. As previously mentioned, the cost function is not reduced considerably by selecting more than eight features.
The obtained high accuracy of the classification stage shows this multi-objective optimization approach chose the features appropriately. It should be noted that the long time of feature selection stage by this approach has not decreased the performance of the online diagnosis system because the features selection phase is accomplished offline before the classification one.
Experimental results illustrate the proposed approach has the potency to be utilized as an automatic and operative differentiating tool for melanoma and normal lesion diagnosis. The results are given in Table 7 and Fig. 10, based on five-fold cross-validation, confirm the best classification is performed via Pat net (ACC =99.24%, SPE=100%, and SEN=100%).
Furthermore, Tables 8 and 9 and Fig. 13 present that although both nonlinear and texture indices can detect the skin types, their incorporation with the NSGA II approach leads to a more accurate diagnosis system. Asymmetries and border irregularities in skin lesion are originated from the chaotic nature of cancerous tumors. Applying the most effective complexity measures, including LLE, FD, and entropy, the various aspects of chaotic behavior that results in image irregularity is determined. Since the focus of the complexity measures is border irregularity, and they have no data about image content, texture features in GLCM form are combined with these indices to support all of the criteria needed in medicine for skin cancer diagnosis. The proposed method's performance can be confirmed by the achieved high precision compared to the other researches used the same dataset, presented in Table 10 and

32
In the present research, a combination of nonlinear indices and texture features are selected by a multi-objective optimization algorithm to distinguish the skin lesion types. Disorder growth of cancerous cells originated from the chaotic essence of its causation process, caused the complexity measures can reflect the asymmetry and border irregularities of skin lesions. On the other hand, the texture features attained from LH and HL sub-bands represent the image content's information. The selection of the appropriate indices utilizing a multi-objective optimization method as well as using various machine learning approaches for the type detection of dermoscopic skin lesion images can be considered as the main characteristics of this study. Furthermore, the performed experiments employing a dermoscopic image dataset, consisting of melanoma and non-melanoma cases, confirmed the proposed method's ability in skin lesion distinguishing. Hence, improving the dermoscopic images' classification precision through the presented CAD system can enhance the dermatologist's diagnostic ability during the medical inspection.