The most common form of the disease in humans is a skin disease. The causes of skin diseases include fungal infections, bacterial infections, viruses, etc. Dermatologists play a crucial role in the traditional method of skin disease identification. A dermatologist observes a patient in the first instance to gather the skin condition based on the knowledge and experience gained. This is followed by a skin imaging process known as dermoscopy for observing the skin structure.
The appearance of skin diseases in an image plays a crucial role in diagnosing the type of the disease. Considering the difficulties in diagnosing skin diseases with traditional methods, computer-aided methods are being used in medicine [1].
Traditional approaches to skin disease diagnosis have been replaced with machine learning methods to overcome their limitations. It requires the use of manual extractors for the extraction of features of skin diseases and then using machine learning algorithms for classification. Further, since this process is performed manually, it requires professional medical knowledge as well as the ability to conduct deep exploratory data analysis to reduce dimensions that limit its capability when it comes to recognizing skin disease images [2]. The deep learning methods outperform the machine learning techniques for image recognition and classification since the former automates the feature engineering process.
A lot of researchers have been interested in using image recognition-based deep learning techniques to diagnose skin diseases due to the advancement of deep learning technology. In [3] authors have used a hybrid approach that involves deep feature fusion, and several SVM classifiers. It combines the probabilities from multiple classifiers to obtain the final classification. In the researchers' study, three pre-trained deep learning models were used as deep feature generators, including VGG16, AlexNet, and ResNet-18. The last fully connected layers of the pre-trained AlexNet and pre-trained VGG16 were used for extracting the features. Since ResNet-18 has only one fully connected layer, the features were extracted from the last convolution layer of this model. ROC curve was used as an accuracy metric. The combination of all networks resulted in 97.55% and 83.83% for classifying melanoma and seborrheic keratosis diseases respectively.
Reference [4] focuses on computer vision algorithms and various image processing algorithms for feature extraction, and ANNs (Artificial Neural Networks) are used to train and test the algorithms. It first preprocesses the images of the skin to extract specific features, then determines the disease type. Eight image processing algorithms were used. They were median filter, binary mask, smooth filter, sobel operator, gray image, histogram, sharpening filter, and YCbCr. Ten different features were used for modeling the data. Using them a test accuracy of 90%, 85%, and 88% was achieved for supervised, unsupervised semi-supervised systems respectively.
For identifying histopathological characteristics of clinically evaluated samples, the authors of [5] used computer vision and machine-learning methods. They explored 6 diseases namely pityriasis rosea, chronic dermatitis, pityriasis rubra pilaris, seborrheic dermatitis, psoriasis, and lichen planus. They used decision trees, KNN for classification, and the ANN model which gave an accuracy of 95%.
Reference [6] makes use of AlexNet for feature extraction and SVM classifiers for classification. ANN and CNN (Convolutional Neural Networks) are both widely used for diagnosing skin diseases. Skin disease diagnosis using the CNN approach showed that the results are promising [7].
Reference [8] provides a comparative analysis of different CNN architectures to detect skin bacterial infections. ResNet50, Xception, InceptionV3, VGG16, and VGG19 architectures are studied. They also initially used the K-Fold cross-validation technique. Their network involved Adam optimizer and binary cross-entropy which is a loss function. The dropout rate was set to 25%. An accuracy of 91.303% and 84.545% was achieved using VGG16 and VGG19 architectures.
Reference [9] presents a web application system for classifying skin diseases using CNN architecture and the TensorFlow framework. Acne vulgaris and atopic dermatitis were some of the diseases considered in their study.
Reference [10] provides a workflow involving computer vision techniques that could be utilized for both mobile and web application systems.
Reference [11] proposed a method of clustering images that can be applied to the classification task. For detecting the key features in the image, SIFT method was used. Then SVM classifier and, segmentation technique was adopted resulting in a precision of 82% and an accuracy of 84%.
Reference [12] describes the formulas for image segmentation and feature extraction of the image. For Feature Extraction various parameters are calculated such as mean, Variance, Energy, and Entropy from the image.
Reference [13] combines InceptionResnetV2, MobileNet, and InceptionV3. The layers of these architectures are modified to suit their dataset for skin disease classification. The maximum voting rule was used for attaining the classification result. With an accuracy level of 88%, the authors were able to predict 20 diseases.
Reference [14] performs a multi-model approach for skin lesion classification. The authors have used the transfer learning technique to utilize both ResNet-101 and ResNet-50 for feature extraction. For reducing the dimensions of the feature an algorithm called KcPCA (Kurtosis controlled Principle Component Analysis) was developed. This process ensured the selection of optimal features. For classification, SVM with a radial basis function is utilized.
Reference [15] describes the Generative Adversarial Network (GAN) which can automatically discover and learn the patterns in input data. This way that the model can generate new samples from the original samples. GANs architecture consisting of CNNs can generate simulated data that is nearly a match for real data distribution. Features are extracted using MobileNet, GoogLeNet, ResNet, and DenseNet but DenseNet-121 and DenseNet-169 have produced the best accuracy of 94.25% and 93.67%.
Two stages were formed involving computer vision and machine learning techniques. This approach was performed on histopathological characteristics that were clinically validated. In the first step, the skin disease image is preprocessed and then the features are extracted. In the second step, the collected histopathological attributes are examined using machine learning algorithms. The experimental results for this proposed program are based on PyCharm-based Python scripts. The user interface web app was rendered in the project, where the image is uploaded and results are displayed [16].
Reference [17] utilizes Deep Convolutional Neural Networks (DCNN) for image classification and feature extraction processes. The layers consisted of three convolutional and pooling layers, followed by a fully connected layer for feature extraction. Their work can be viewed as two sub-tasks. The first task involves the extraction of features from DCNN, and the extracted information is fed into a neural network of two layers for classification. The second task involves classification by SVM. The results of their experiment show that the SVM classification on DCNN features is better than the performance of the neural network alone for the classification task. From their experimentation, it is understood that having several feature maps helps in discovering different patterns from the input image at several locations. The error on testing data starts to increase slowly once the number of feature maps and convolutional layers exceeds a certain limit. The training error increases after several iterations with a slightly high learning rate.
For image classification, [18] proposes a method for creating automatic CNN architectures using genetic algorithms (CNN-GA). Because of their algorithm, users with little to no experience in tuning CNN architectures can still find an appropriate CNN architecture from the provided images. For producing deeper CNNs, the authors have developed an encoding technique for the genetic algorithms that can encode CNNs having any depth. Two components for CNN-GA are designed to expedite analysis and save a lot of computing resources. Considering the computing resources consumed, the number of parameters required and the classification accuracy, they found that the proposed algorithm outperforms the existing automatic CNN architectures.
Reference [19] developed a new deep learning model that is named Convolutional eXtreme Gradient Boosting (ConvXGB) for classification problems. This work was carried out on several image datasets and general datasets available in the UCI repository. From an initial point of view, they considered that a single model didn't provide the amount of accuracy required for various models. As the name suggests, their model combines the Convolutional Neural Networks and the XGBoost classifier. ConvXGB also includes a module for preprocessing data. The ConvXGB model learns the input features using several convolutional layers, and XGBoost is used for class label prediction in the final layer. It lacks a pooling layer or a Fully Connected layer as opposed to the traditional CNN. This way the number of parameters is simplified in this model because the weight values do not have to be readjusted in the backpropagation cycle. As a result of the experiment, their ConvXGB model generally outperformed CNN and XGBoost alone for all data sets and, in some cases, by a significant margin.
Reference [20] utilizes the transfer learning technique by using the VGG-16 architecture. It is capable of extracting 1000 features from the input image. Apart from SVM, and decision tree algorithms for classification, the K-Nearest Neighbor and linear discriminate analysis were performed. ICIS public datasets were used for this study that consists of malignant and benign types of disease. The use of transfer learning techniques is found to be advantageous as it increases the accuracy level when compared with other pure deep learning models. Using the VGG-16 CNN model in conjunction with the K-Nearest Neighbor algorithm was found to produce the best results in the experiment. On the other hand, complex models like ensemble learning using boosted trees performed at or below 50%. By observing these results, the authors concluded that the dataset used is best suited only for linear binary classifiers.
Reference [21] uses a triplet loss function for skin disease classification using CNN. The authors fine-tuned the layers of InceptionResNet-V2 and ResNet152 to suit the classification needs. In the first step, 128-D embedded features from training samples are extracted into the Euclidean space. In the next step, the learned embeddings are used to compute the L-2 distances between the images. The L2 distances computed among the images are used for classifying skin diseases. This experiment analyzed four types of skin diseases: dark circles, acne, spots, and blackheads. The method was evaluated using 12000 input images for training and 2000 for testing. Also, 10% of images from the training set were used for validation.
From the literature survey, it is inferred that deep learning techniques have a greater impact on the diagnosis of skin diseases. In addition, CNN architectures are popular for the task of image processing. Also, the transfer learning technique allows using existing architectures of CNN for fine-tuning them or directly applying them using their previously learned weights for the classification task.