Generation and Classification of Land Use and Land Cover Datasets in the Indian States: A Comparative Study of Machine Learning and Deep Learning Models

doi:10.21203/rs.3.rs-3237461/v1

Download PDF

Research Article

Generation and Classification of Land Use and Land Cover Datasets in the Indian States: A Comparative Study of Machine Learning and Deep Learning Models

https://doi.org/10.21203/rs.3.rs-3237461/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Land use and land cover (LULC) analysis is highly significant for various environmental and social applications. As remote sensing (RS) data becomes more accessible, LULC benchmark datasets have emerged as powerful tools for complex image classification tasks. These datasets are used to test state-of-the-art artificial intelligence models, particularly convolutional neural networks (CNNs), which have demonstrated remarkable effectiveness in such tasks. Nonetheless, there are existing limitations, one of which is the scarcity of benchmark datasets from diverse settings, including those specifically pertaining to the Indian scenario. This study addresses these challenges by generating medium-sized benchmark LULC datasets from two Indian states and evaluating state-of-the-art CNN models alongside traditional ML models. The evaluation focuses on achieving high accuracy in LULC classification, specifically on the generated patches of LULC classes. The dataset comprises 4000 labelled images derived from Sentinel-2 satellite imagery, encompassing three visible spectral bands and four distinct LULC classes. Through quantitative experimental comparison, the study demonstrates that ML models outperform CNN models, exhibiting superior performance across various LULC classes with unique characteristics. Notably, using a traditional ML model, the proposed novel dataset achieves an impressive overall classification accuracy of 96.57%. This study contributes by introducing a standardized benchmark dataset and highlighting the comparative performance of deep CNNs and traditional ML models in the field of LULC classification.

Land use and land cover (LULC)

Image

Machine Learning

Deep Learning

Classification

Land is an invaluable natural resource that serves as the foundation for human civilization and sustains various ecological systems. The proper utilization of land resources is crucial to mitigating the irreversible loss caused by unplanned use. In the context of rapid urbanization and its adverse impacts on society and the environment, understanding the dynamics of land use and land cover (LULC) is crucial in various applications such as resource management and sustainable development (Ferreira et al., 2019; Rawat et al., 2020; Boulila et al., 2021; Chen et al., 2021; Zhang et al., 2021). To comprehend the complex relationships between land and human activities, researchers and decision-makers employ LULC classification systems. These systems provide a systematic framework for classifying and mapping various types of man-made and natural features on the Earth’s surface within a specific time frame using statistical and scientific analysis methods (Alshari et al., 2021). The classification of LULC provides valuable insights into the spatial distribution and temporal changes of land resources at various scales. It assists in identifying areas of intensive human intervention, consequently for land management strategies, which indirectly aids in assessing the effects of land use changes on ecosystems and habitat fragmentation. Therefore, LULC classification generally facilitates decision-making processes related to effective monitoring and analysis of land resources (Cheng et al., 2017; Ferreira et al., 2021; Tesfay et al., 2022).

Remote sensing (RS), which entails gathering data about objects or phenomena from a distance, has been instrumental in obtaining valuable information for LULC classification. Over time, RS technology has continuously evolved, resulting in enhanced data quality and detail, primarily driven by high-resolution imagery. The advancements in sensor capabilities and image processing techniques, coupled with the development of robust classification methods, have created opportunities to obtain highly accurate LULC information efficiently. However, classifying remotely sensed images remains a complex task (Broni-Bediako et al., 2022). One significant challenge that has hindered progress in RS classification is the limited availability of reliable labelled ground truth datasets (Jozdani et al., 2022). Although there has been a significant increase in the accessibility of freely available satellite and aerial imagery, fully harnessing the potential of this data requires processing and transforming satellite images into structured semantic information. Consequently, there is a pressing need to create readily usable datasets that facilitate the efficient utilization of available imagery. These datasets should be carefully labelled and validated to ensure their reliability and accuracy. By providing researchers with such datasets, it becomes possible to develop and evaluate classification algorithms more effectively, enabling the extraction of meaningful information from RS data and the application of LULC information with high precision.

The introduction of the pioneer and popular remotely sensed image classification dataset known as the UC Merced land use dataset (Yang & Newsam, 2010) has served as a basis for extensive research on classification models including machine learning (ML) and deep learning (DL), particularly convolutional neural network (CNN) models. As a result, numerous similar datasets have been generated, as summarized in Table 1. The benefits of generating such benchmark datasets are multifold, such as the evaluation of newly developed CNN structures and experimental studies on the fusion of multiple CNN models, specialized feature extraction techniques, and more. The computational methods heavily rely on experimental data for development and testing (Adegun et al., 2023). Only by comparing against existing knowledge can the performance of these methods be assessed. Therefore, benchmark datasets with known and verified outcomes are essential, as emphasized by Sarkar et al., in 2020. These benchmark datasets provide researchers with standardized and diverse collections of remotely sensed images, facilitating the development of accurate and robust classification algorithms. Furthermore, the availability of such datasets supports researchers to develop and train DL models on large-scale and diverse data, that can produce reliable results leading to improved classification performance. Moreover, these benchmark datasets promote reproducibility and comparability across different studies. By utilizing the same dataset, researchers can assess and compare the performances of their models consistently. This ensures that results are reliable and comparable, fostering a more cohesive research community in the field of remotely sensed image classification.

However, the existing benchmark datasets for LULC classification lack representation of the Indian landscape, with only a single study focusing on tiled sentinel image patches for LULC classes in Bangalore, India (Pallavi et al., 2022). Furthermore, other available datasets often include non-land cover classes or exhibit excessive complexity that hampers effective classification. Additionally, these datasets are predominantly trained on CNN models, which are DL neural networks requiring large datasets for training, disregarding the foundational principle of starting with less complex models in data science and gradually increasing complexity, leading to the underutilization of ML models. While some of the datasets are quite large posing significant challenges for training ML models. To overcome these limitations, this study aims to generate a tailored, medium-sized benchmark dataset specifically for the Indian context. Medium-sized datasets strike a balance, providing sufficient data for effective training of DL models without risking overfitting, while also allowing traditional ML models to leverage their efficiency and simplicity for competitive performance. In contrast, large datasets can overwhelm ML models and incur additional training time and resources, while very small datasets may not offer enough data for DL models to generalize effectively. As a result, the study presents an opportunity to evaluate both CNN and ML models, recognizing the significance of incorporating traditional ML models alongside advanced CNN models. The study offers an overview of relevant research in LULC classification, underscoring the importance of generating synthetic datasets tailored to specific contexts. The methodology to generate the Indian LULC patch dataset is described, employing state-of-the-art CNN models and traditional ML models for classification. The study presents experimental results and performance evaluation, followed by a comprehensive discussion of the findings and recommendations for future research.

1.2 Related work

In this section, we review earlier studies on the generation and classification of LULC scenes or datasets. In this context, we present datasets of remotely sensed aerial and satellite imagery of LULC and similar scenes. Additionally, we review the state-of-the-art CNN image classification models for LULC classification.

The generation and availability of satellite imagery datasets, such as UC Merced and EuroSAT, have significantly contributed to LULC classification and the improvement and development of DL models. EuroSAT, a dataset consisting of high-resolution satellite images covering ten different land cover classes in Europe, has provided researchers with a valuable resource for training and evaluating LULC classification algorithms. Petrovska et al., (2020) employed a two-stream concatenation method, CNNs to extract the feature and SVM to classify them for the classification of RS image datasets - UC Merced and WHU-RS. Rajagopal et al., (2020) proposed a model that uses residual network-based feature extraction, which extracts features from the diverse convolution layers of a deep residual network. And the model has been tested using the UC Merced land use and WHU-RS datasets. Studies conducted on the well-known benchmark datasets have demonstrated that the RS scene classification method based on heterogeneous feature extraction and fusion of CNN models is superior to many state-of-the-art scene classification algorithms (Chaib et al., 2017; Iftenea et al., 2017; Muhammad et al., 2018; Wang et al., 2020). Laban et al., (2018), on the other hand, used the WHU-RS, UC Merced, and Brazillian coffee Scenes (BCS) datasets for remotely sensed image scale selection methods to be used in feeding CNN architectures.

Table 1

List of widely used existing RS and aerial LULC benchmark datasets
Dataset	Descriptions	Total image	Class	Size	Resolution (m)	Reference
UC-Merced	The aerial ortho imageries were obtained from the United States Geological Survey National Map of specific regions within the U.S. The land-use images consist of red, green, and blue bands. However, classifying the dataset is challenging due to the presence of highly overlapped classes, e.g., dense residential, medium residential, and sparse residential classes, which primarily vary in the density of structures they contain.	2100	21	256 × 256	0.3	Yang and Newsam, 2010
WHU-RS	The aerial scenes are collected from Google Earth imagery. Later, Sheng et al., 2012 expanded the data set with 7 new classes. The datasets have a wide range of scale, orientation, illuminations, as well as spatial resolutions with a maximum of 0.5 m.	950	12	600 × 600	≥ 0.5	Xia et al., 2010
WHU-RS19		1005	19	600 × 600	≥ 0.5	Sheng et al., 2012
BCS	The BCS dataset contains only two scene classes (coffee and noncoffee) acquired from SPOT satellite imageries across four counties in the Brazilian state of Minas Gerais. The green, red, and near-infrared bands were used in this dataset because they are the most suitable and demonstrative bands for differentiating vegetation areas.	1,438 (coffee) & 36,577 (non-coffee)	2	64 × 64		Penatti et al., 2015
SAT-6	Images were extracted from the National Agriculture Imagery Program. The region and the uncompressed digital Ortho quarter quad tiles (DOQQs), which are GeoTIFF images, conform to the topographic quadrangles of the United States Geological Survey.	4,05,000	6	28 × 28		Basu et al., 2015
SAT-4		5,00,000	4	28 × 28		Basu et al., 2015
RSSCN7	The images in this dataset were gathered from Google Earth and were sampled at four different scales, with 100 images per scale. The primary difficulty of this dataset arises from the variations in scale among the images. Furthermore, the dataset poses a significant challenge due to the extensive diversity of images captured under various seasonal and weather conditions.	2800	7	400 × 400	-	Zou et al., 2015
RSC11	The dataset was obtained from Google Earth and consists of high-resolution RS images depicting several U.S. cities. Within this dataset, certain scene classes exhibit visual similarities, thereby amplifying the challenge of accurately distinguishing between the scene images.	1232	11	512 × 512	0.2	Zhao et al., 2016
NWPU-RESISC45	The dataset was developed by Northwestern Polytechnical University (NWPU) for Remote Sensing Image Scene Classification (RESISC). It exhibits a high degree of diversity within each class and similarity between different classes.	31500	45	256 × 256	0.2–30	Cheng et al., 2017
PatternNet	The image scene was collected from Google Earth imagery and, in some cases, through the Google Map API for selected cities in the United States. It was specifically gathered for RS image retrieval approaches.	30,400	38	256 × 256	0.062–4.693	Zhou et al., 2018
EuroSAT	A comprehensive dataset consisting of geo-referenced images captured by the Sentinel-2 satellite has been compiled, encompassing various European cities spread across more than 34 countries. This benchmark dataset comprises 13 spectral bands, providing a rich resource for analysis and research purposes.	27,000	10	64 × 64	10	Helber et al., 2019
AID	As Google Earth photos originate from various RS sensors, the images are multi-source. This poses greater difficulties than using photographs from a single source.	10000	30	600 × 600	0.5-8	Xia et al., 2017

The existing benchmark datasets have been extensively used for analyzing the performance of classification models with different feature extraction and classification methods. DL models such as GoogleNet, DenseNet, Visual Geometry Group 19 (VGG19), Residual Network 50 (ResNet50), and InceptionV3 on the EuroSAT dataset (Dewangkoro & Arymurthy, 2021; Helber et al., 2019). Carranza-García et al., (2019) used the CNN model for LULC classification over remotely sensed imagery and compared proposed DL architecture and other ML models such as SVM, RF, and KNN, and reported that DL is the fastest for both training and testing and concluded CNN as a very powerful technique for the problem of LULC classification. Basu et al., (2015) comparatively analyzed DL models including deep belief networks, CNN, and stacked denoising autoencoders using their own generated SAT-4 and SAT-6 datasets, and additionally developed a customized CNN architecture DeepSat, which was found to outperform the other models. Xia et al., (2017) classified their datasets-AID using GoogLeNet, VGG-VD-16, and CaffeNet and concluded that VGG-VD-16 performed the best with 89.64% accuracy.

In recent studies, Naushad et al., (2021) introduced a wide residual networks-based method that surpassed the performance of ResNet, achieving an accuracy of 99.17%. On the other hand, Temenos et al., (2023) evaluated various existing CNN models and their newly developed model. The existing models, such as shallow CNN, GoogleNet, DenseNet121, Inception V3, ResNet50, ResNet101, VGG16, and GeoSystemNet, outperformed the new model they proposed, Deep SHAP. Furthermore, the utilization of spectral indices in classification tasks was found to improve accuracy compared to using only RGB channels, as observed by Yaloveha et al., (2021), with the classification accuracy increasing from 64.72–84.19%. Broni-Bediako et al., (2022) also reported variations in their model's performance across different datasets, achieving accuracy rates of 96.56% and 96.10% on NWPU-RESISC45 single-label and AID single-label RGB aerial image datasets, respectively, as well as 99.76% and 93.89% on EuroSAT single-label and BigEarthNet multilabel multispectral satellite image datasets, respectively. Helber et al., (2019) applied ResNet-50 and GoogleNet on various datasets and found that GoogleNet performed best on UCM with an accuracy of 97.32%, while ResNet-50 excelled on SAT-6 with a 99.56% accuracy rate. Thiagarajan et al., (2021) achieved remarkable results using the HFEL–CCGSA method, reporting a classification accuracy of 99.99% for SAT-4 and SAT-6, surpassing AlexNet, LeNet-5, and ResNet. However, for the EuroSAT dataset, the accuracy was 99.49%, which was comparatively lower than the GeoSystemNet model. And Chen and Tsou (2021) proposed DRSNet, a novel deep CNN architecture specifically designed for small patch size Landsat 8 RS image recognition, and demonstrated impressive performance on EuroSAT, BCS, and UC-Merced datasets as well.

In this section, we first present the dataset utilized in this study, followed by a description of the methods performed on the satellite imageries to obtain the dataset that would be used to train the ML and DL models (an overview of the methodology is illustrated in Fig. 1). After that, we define the models used, explaining their parametrization in detail. We conclude by describing the accuracy metrics used in this study to assess the performance of the classification models.

2.1 Datasets

Sentinel-2 imagery of the study site of November month, 2022 meeting the required criteria of low cloud coverage, the flourishing season of vegetation including crops and forest, and retreating post-monsoon in an Indian scenario were selected for further pre-processing and analysis for the LULC classification. The red, green, and blue bands of Sentinel-2 imageries, covering the range of 400–700 nanometres which corresponds to the visible spectrum were utilized. Two states of India, Assam and Chandigarh, reflect the homogenous LULC classes where waterbodies and forests were extracted from the former state and agricultural land and built-up areas were extracted from the latter. These locations were selected keeping in mind that they contain a range of ecosystems as well as the homogenous presence of the LULC classes.

The satellite imageries from the selected locations were then processed to generate multiple imagery patches of 64*64 pixels, following the recent patch-based approach of LULC classification (Helber et al., 2019). This approach correctly predicts LULC classes as pixels within a short spatial neighbourhood are typical of the same class with similar material and so are their reflections. The process involves extracting small, fixed-size three-dimensional patches that are centred at each pixel. Each class has multiple subclasses, such as waterbody consisting of flowing and stagnant water bodies as well barren coastal areas, agricultural land consisting of seasonal and annual croplands, forests consisting of dense as well as sparse trees, and built-up consisting of human settlements and industries. The LULC imagery patches were then categorized into the four LULC classes and stored separately, ensuring that each patch contained a single LULC class (Table 2). To enhance the spatial features of the diverse LULC classes, contrast enhancement and augmentation based on rotation at 60 degrees were applied, which improves visual variability and reduces overfitting.

2.2 Implementation of the ML and DL models

In data science, it is a common practice, to begin with a simpler model and gradually build complexity over time. Therefore, we employ three ML models - k-nearest neighbours (kNN), random forest (RF), and support vector machine (SVM) - alongside two DL models, VGG16 and ResNet50. ML models focus on learning patterns and making predictions from data, while DL models use multi-layer neural networks to learn hierarchical representations and automatically extract complex features and recognize patterns (schematic diagrams in Figs. 2, 3, and 4). In the realm of LULC classification, AI models serve as a powerful tool, enabling the automatic categorization of satellite or aerial imagery into distinct LULC classes. This process relies on harnessing the spectral and spatial characteristics inherent in the images to achieve accurate and efficient classification.

kNN is a non-parametric algorithm that classifies new data points based on their distance or relation to the nearest neighbours in the training dataset. It calculates the distance between the new data point and all the points in the training dataset and then assigns the label of class to the new data point. It is a straightforward, simple-to-comprehend algorithm that may be applied for both classification and regression tasks. However, it can be sensitive to noise and require tuning of the hyperparameter k (Fix & Hodges, 1952). RF on the other hand is a type of ensemble learning algorithm that combines multiple decision trees to improve the accuracy and reduce the overfitting of the model. It randomly selects a subset of features and samples to build multiple decision trees and then combines them to make a final prediction (Brieman, 2001). While SVM is a supervised learning method that separates classes using a hyperplane in a high-dimensional space. It aims to maximize the margin between the classes while minimizing the classification error. SVM works well for both linearly separable and non-linearly separable datasets and can handle high-dimensional feature spaces (Vapnik 1999; Vapnik & Chervonenkis 1974). There are limited applications of ML models for LULC classification on the patch-based dataset (Mahamunkar et al., 2021; Das et al., 2022). The primary reasons behind this limitation are the challenges posed by the size and complexity of the available datasets.

An artificial neural network (ANN) comprises a collection of interconnected nodes that process and transmit information, and multiple layers allow for hierarchical feature learning. CNNs are a specific type of ANN that is designed for processing spatial data and images or videos (O'Shea & Nash, 2015). CNNs use convolutional layers to extract features from input data and pooling layers to reduce dimensionality. VGGNet and ResNet are two popular CNN architectures used in computer vision tasks such as image classification (Kattenborn et al., 2021). VGGNet is a deep CNN architecture that uses small 3x3 filters with a large number of convolutional layers, resulting in a very deep network with a high level of accuracy (Simonyan & Zisserman, 2014). ResNet, on the other hand, is a deeper network that uses skip connections to allow the network to learn residual functions, making it easier to train deeper models without the problem of vanishing gradients (Fig. 5). This allows ResNet to achieve higher accuracy with fewer layers than previous models (He et al., 2016). Both VGGNet and ResNet have been used in various image classification tasks, and their architectures have been adapted and modified for other tasks such as object detection and semantic segmentation. ResNet has many variants that run on the same concept with different numbers of layers. ResNet50 is a variant of the ResNet architecture with 50 neural network layers. Several studies have shown satisfactory LULC classification results using Resnet50 (Naushad et al., 2021; Dewangkoro et al., 2021). Whereas VGG16 is a variant of the VGG architecture with 16 neural network layers. VGG is particularly useful for solving image recognition problems.

Hyperparameters play a vital role in optimizing ML and DL models. Their values cannot be learned automatically, and finding the right configuration requires a combination of domain expertise, an understanding of the model architecture, and often iterative experimentation to enhance the model's performance. Setting the right values for hyperparameters can impact the model's ability to generalize, control overfitting, and convergence speed during training. An exhaustive grid search technique is performed on the significant hyperparameters for the optimization of the models, which tries every single possible combination of the hyperparameters. The generated image patches are trained with fine-tuned ML and DL models.

2.4 Evaluation Metrics

After the models are fine-tuned and trained, their performances are evaluated and compared using the testing dataset which was set aside so that they provide an unbiased sense of model effectiveness. Confusion matrices were extracted from these results to better understand the image classification results. In the context of multiclass classification of LULC classes, the terms true positive (TP), true negative (TN), false positive (FP), and false negative (FN) are considered to determine classification performance, which is briefed. TP is a case where the model correctly predicts a LULC class, i.e., the predicted class and the true class both belong to the vegetation LULC class. TN occurs the model correctly predicts that a particular image data does not belong to the vegetation class, and the ground truth label also indicates that this image is not vegetation class. While FP is a case where the model predicts a vegetation class, but the actual label indicates that it belongs to other LULC classes. Whereas FN is an instance when the model predicts that the image is not vegetation, but it actually is vegetation. It is important to note that in multiclass classification, these terms are used for each class. Therefore, we can have TP, TN, FP, and FN for each class separately. The parameters of the confusion matrices are then used to calculate different evaluation metrics, namely, accuracy, precision, recall, and F1-score (Equations 1–4).

$$Accuracy=\frac{{tp+tn}}{{tp+fp+fn+tn}}$$

$$\Pr ecision=\frac{{tp}}{{tp+fp}}$$

$$\operatorname{Re} call=\frac{{tp}}{{tp+fn}}$$

$$F1 - score=2 \times \frac{{\Pr ecision \times \operatorname{Re} call}}{{\Pr ecision+\operatorname{Re} call}}$$

Accuracy is a measure of how well a classification model is performing. Precision is a measure of how many of the positive predictions made by the model were correct. Whereas Recall refers to the proportion of actual positives which were detected correctly. While F-score is a measure that combines both precision and recall.

3.1. Hyperparameter tuning of the ML models

We performed a grid search to identify the optimal parameters for different ML models (refer to Table 3). For the RF model, the best parameter combinations were determined as follows: maximum depth = 7, n_estimators = 1001, min_samples_leaf = 1, and min_samples_split = 2. In the case of the kNN model, the ideal hyperparameters were found to be: n_neighbors = 7, weights = 'uniform', algorithm = 'ball_tree', leaf_size = 100, and metric = 'Manhattan'. For the SVM model, the radial basis function (RBF) kernel with gamma scaled was identified as the most suitable. Regarding the DL models, we employed the same set of hyperparameters for all models. This included the use of the Adamax optimizer, a batch size of 64, and an initial learning rate of 0.0001. In the case of multi-class classification, a categorical cross-entropy loss function was applied. This loss function encourages the model to optimize its predictions toward the correct class labels by measuring the difference between the predicted and true probability distributions. This approach leads to improved prediction accuracy. To fine-tune the DL models for our specific task, we individually adjusted the last layers of each pre-trained model over 50 epochs. Subsequently, the entire model was trained for an additional 100 epochs. The DL architecture employed a sequence of convolutional layers with rectified linear unit (ReLU) activation functions, and a softmax activation function was applied at the final layer for multi-class classification.

Table 3

Optimized parameters for various classifiers used for the LULC classification.
ML/DL model	Parameter	Selected value
RF	Maximum depth n-estimators Minimum samples split Minimum samples leaf	7 1001 2 1
SVM	Kernel function	Radial basis function
SVM	Gamma	Scale
KNN	n-neighbours weights Algorithm Leaf size Metric	3 Distance Ball tree 30 Minkowski
DL (ResNet50 and VGGNet16)	Activation function Batch size Epoch Learning rate Loss function Optimizer Pooling	ReLU, Softmax 64 100 0.0001 Categorical cross-entropy AdaMax Max pooling

3.2. Performance comparison of the fine-tuned ML and DL Models

The algorithms had to predict the various LULC classes, as described in Table 2. The results of the best-performing models for the LULC classification task are detailed with confusion matrices (Fig. 6). The comparison of the classification performances of different models on the proposed dataset is shown in Table 4. We utilized various evaluation matrices derived from the confusion matrices including accuracy, recall, precision, and F1 score to evaluate the performance of the ML and DL models on a multi-class classification problem with four classes (Agriculture, built-up, Forest, Waterbody). The kNN model performs reasonably well for the Forest class with a high F1-score, precision, and recall. However, it struggles with the built-up class, as indicated by the low F1-score, precision, and recall values. The overall accuracy is moderate. The RF model performs well overall with high F1 scores, precision, and recall values for all classes. It achieves a high accuracy (0.94), indicating good generalization capability. The SVM model performs consistently well across all classes, with high F1 scores, precision, and recall values. It achieves a high accuracy (0.96), indicating good classification performance. The ResNet50 model performs well for the Agriculture and Waterbody classes with high F1-scores, precision, and recall values. However, it struggles with the built-up class, as indicated by the relatively low F1 score and recall. The overall accuracy is decent (0.90). The VGG16 model's performance is similar to ResNet50, with high F1-scores, precision, and recall for Agriculture and Waterbody classes. It also struggles with the built-up class, resulting in a relatively low F1 score and recall. The overall accuracy is decent (0.92). In summary, the RF and SVM models consistently perform well across all classes, with high F1 scores, precision, recall, and accuracy. The kNN model shows mixed performance, excelling in the Forest class but struggling with the built-up class. The DL models (ResNet50 and VGG16) perform well for some classes but struggle with others, particularly the built-up class. Overall, the RF and SVM models seem to be the most reliable choices for this multi-class classification problem.

Table 4

Model performance assessment on the proposed dataset
Models	Classes	F1-score	Precision	Recall	Accuracy
kNN	Agriculture	0.55	0.60	0.51	0.65
	Builtup	0.11	1	0.06
	Forest	0.99	0.98	1
	Waterbody	0.65	0.48	0.99
RF	Agriculture	0.89	0.87	0.90	0.94
	Builtup	0.91	0.89	0.94
	Forest	1	1	1
	Waterbody	0.95	0.99	0.91
SVM	Agriculture	0.93	0.93	0.93	0.96
	Builtup	0.93	0.91	0.96
	Forest	1	1	1
	Waterbody	0.98	1	0.95
ResNet50	Agriculture	0.84	0.72	0.99	0.90
	Builtup	0.77	1	0.62
	Forest	1	1	0.99
	Waterbody	0.97	0.95	0.99
VGG16	Agriculture	0.84	0.72	0.99	0.92
	Builtup	0.77	1	0.62
	Forest	1	1	0.99
	Waterbody	0.97	0.95	0.99

The classification of LULC classes can be influenced by different classifiers, with varying performances observed. In this study, the difference between ML and DL models was less pronounced, likely due to the dataset size being optimal for both types of models. DL models have a reputation for performing exceptionally well with larger datasets, while AI models, in general, are significantly affected by the complexity of the dataset. When dealing with smaller datasets containing only two classes, the distinctions between traditional ML and DL become less pronounced (Alzubaidi et al., 2021; Günen et al., 2022). In our experiment, we evaluated the EuroSAT and UC Merced datasets using various ML models as shown in Table 5. We observed that the UC Merced dataset, despite having only 2100 scenes, presented higher complexity due to the presence of a large number of classes. Conversely, the EuroSAT dataset contains 27000 LULC class patches, making it challenging for ML models to handle effectively. However, when we reduced the EuroSAT dataset to only 4 classes, each consisting of 3000 image patches, the models achieved notably high accuracy. This finding indicates that the medium-sized dataset generated in our study is a simple and optimal size for both ML and DL models.

Table 5

Classification accuracy (%) of ML models on benchmark datasets.
Dataset	UC Merced	EuroSAT	EuroSAT- 4 classes
kNN	18.10	36.24	58.25
RF	44.76	68.48	92.00
SVM	42.62	65.65	88.88

The classification performance showed the highest variation for agricultural land and built-up areas, while forest and waterbody classes exhibited the least variation. This difference can be attributed to the presence of other land use within the patches of agricultural land and built-up areas. This heterogeneity in land use is often observed in regions like India, where human settlements are interspersed with agricultural lands across a small area. In contrast, forest cover predominantly includes forested regions, and the waterbody class encompasses coastal areas and certain vegetation types as well. The performance evaluation indicated that the SVM had the highest accuracy among all classifiers tested. While previous studies have reported higher accuracies for kNN, RF, ResNet50, and VGG16 than this study, it concludes that ML classifiers can work well with small and medium-sized datasets for LULC classification. The study found a small variation in the accuracy of DL classifiers but a greater variation in ML classifiers, consistent with other studies that reported minor to moderate fluctuations in LULC classification accuracy across different classifiers. These findings align with Das et al., (2022), who reported SVM as the best performer with 96.58% accuracy, followed by RF and kNN.

Rohith and Kumar (2020) found that Densenet-121 performed better than other CNN networks, achieving an accuracy of 99.67% for EuroSAT and 97.05% for UC Merced datasets, indicating that deeper CNN structures work better with larger and more complex datasets. VGG16 and ResNet50 achieved high accuracies on UC Merced and SIRI-WHU datasets, attributed to the higher number of classes and imagery resolution of the UC Merced dataset. However, a comparative study by AlAfandy et al., (2020) found that ResNet50 had a higher overall accuracy than DenseNets, suggesting that data and model complexity affect performance differently. KNN exhibited the lowest accuracy in this study, with a small difference compared to earlier studies (Das et al., 2022). Generally, ResNet50 is considered superior to VGG16 in terms of accuracy, but VGG16's simpler architecture and easier implementation make it a better choice with a smaller or medium-sized dataset (Pallavi et al., 2022; Yifter et al., 2022). While DL models require larger datasets to avoid overfitting, simpler ML models can perform well with smaller datasets due to their generalization ability with lesser parameters. However, the performances of ML and DL models are dependent on the task, dataset size, and complexity of the problem at hand.

Understanding spatial patterns and changes in the landscape depends heavily on the precise classification of LULC classes. Therefore, through this study, we generated a medium-sized LULC dataset from two Indian states and subsequently classified the patches of LULC classes using AI models. We employed three traditional ML models and two state-of-the-art DL models on the generated dataset from Sentinel-2 satellite imagery of Indian states. By comparing the performance of ML and DL models, we gained valuable insights into their strengths and weaknesses in LULC classification. This information can guide researchers and practitioners in selecting the most suitable model for their specific LULC classification tasks. The results of our study highlight the potential of ML and DL models in achieving accurate and efficient LULC classification, which can have significant implications for various real-world applications. It is worth noting that the accuracy of LULC mapping can vary not only with the classifier used but also with the spatial and temporal context. In addition to the practical Earth observation applications, the proposed medium-sized dataset could be utilized for several other tasks, such as model testing, model comparison, and model improvements. To further enhance our understanding, future research could focus on analyzing the accuracy of classifiers for different states in the country, considering the diverse conditions of LULC within the region.

Author contribution: Nyenshu Seb Rengma, and Manohar Yadav designed the conceptual framework of the study. Nyenshu Seb Rengma designed the methodology, conducted the experiment and wrote the initial manuscript with inputs from Manohar Yadav. Nyenshu Seb Rengma and Manohar Yadav reviewed and edited the final manuscript.

Competing Interests: The authors have no competing interests to declare that are relevant to the content of this article.

Funding: No funding was received for conducting this study.

Data Availability Statement: The datasets generated during and/or analysed during the current study are available from the corresponding author upon reasonable request.

Adegun, A. A., Viriri, S., & Tapamo, J. R. (2023). Review of deep learning methods for remote sensing satellite images classification: experimental survey and comparative analysis. Journal of Big Data, 10(1), 93. https://doi.org/10.1186/s40537-023-00772-x
AlAfandy, K. A., Omara, H., Lazaar, M., & Al Achhab, M. (2020). Using classic networks for classifying remote sensing images: Comparative study. Advances in Science, Technology and Engineering Systems Journal, 5(5), 770-780. https://doi.org/10.25046/aj050594
Alshari, E. A., & Gawali, B. W. (2021). Development of a classification system for LULC using remote sensing and GIS. Global Transitions Proceedings, 2(1), 8-17. https://doi.org/10.1016/j.gltp.2021.01.002
Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of big Data, 8, 1-74. https://doi.org/10.1186/s40537-021-00444-8
Basu, S., Ganguly, S., Mukhopadhyay, S., DiBiano, R., Karki, M., & Nemani, R. (2015, November). Deepsat: a learning framework for satellite imagery. In Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems (pp. 1-10). https://doi.org/10.1145/2820783.2820816
Boulila, W., Ghandorh, H., Khan, M. A., Ahmed, F., & Ahmad, J. (2021). A novel CNN-LSTM-based approach to predict urban expansion. Ecological Informatics, 64, 101325. https://doi.org/10.1016/j.ecoinf.2021.101325
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Broni-Bediako, C., Murata, Y., Mormille, L. H., & Atsumi, M. (2021). Searching for CNN architectures for remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-13. https://doi.org/10.1109/TGRS.2021.3097938
Carranza-García, M., García-Gutiérrez, J., & Riquelme, J. C. (2019). A framework for evaluating land use and land cover classification using convolutional neural networks. Remote Sensing, 11(3), 274. https://doi.org/10.3390/rs11030274
Chaib, S., Liu, H., Gu, Y., & Yao, H. (2017). Deep feature fusion for VHR remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(8), 4775-4784. https://doi.org/10.1109/TGRS.2017.2700322
Chen, F., & Tsou, J. Y. (2022). Assessing the effects of convolutional neural network architectural factors on model performance for remote sensing image classification: An in-depth investigation. International Journal of Applied Earth Observation and Geoinformation, 112, 102865. https://doi.org/10.1016/j.jag.2022.102865
Chen, W., Xu, Y., Zhang, Z., Yang, L., Pan, X., & Jia, Z. (2021). Mapping agricultural plastic greenhouses using Google Earth images and deep learning. Computers and Electronics in Agriculture, 191, 106552. https://doi.org/10.1016/j.compag.2021.106552
Cheng, G., Han, J., & Lu, X. (2017). Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10), 1865-1883. https://doi.org/10.1109/JPROC.2017.2675998
Das, T. K., Barik, D. K., & Kumar, K. R. (2022). Land-Use Land-Cover Prediction from Satellite Images using Machine Learning Techniques. In 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON) (pp. 338-343). IEEE. https://doi.org/10.1109/COM-IT-CON54601.2022.9850602
Dewangkoro, H. I., & Arymurthy, A. M. (2021). Land use and land cover classification using CNN, SVM, and channel squeeze & spatial excitation block. In IOP Conference Series: Earth and Environmental Science (Vol. 704, No. 1, p. 012048). IOP Publishing. https://doi.org/10.1088/1755-1315/704/1/012048
Ferreira, F. L. V., Rodrigues, L. N., & da Silva, D. D. (2021). Influence of changes in land use and land cover and rainfall on the streamflow regime of a watershed located in the transitioning region of the Brazilian Biomes Atlantic Forest and Cerrado. Environmental Monitoring and Assessment, 193, 1-17. https://doi.org/10.1007/s10661-020-08782-5
Ferreira, L. M. R., Esteves, L. S., de Souza, E. P., & dos Santos, C. A. C. (2019). Impact of the urbanisation process in the availability of ecosystem services in a tropical ecotone area. Ecosystems, 22(2), 266-282. https://doi.org/10.1007/s10021-018-0270-0
Fix, E., & Hodges, J. L. (1952). Discriminatory analysis: Nonparametric discrimination: Small sample performance.
Günen, M. A. (2022). Performance comparison of deep learning and machine learning methods in determining wetland water areas using EuroSAT dataset. Environmental Science and Pollution Research, 29(14), 21092-21106. https://doi.org/10.1007/s11356-021-17177-z
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/CVPR.2016.90
Helber, P., Bischke, B., Dengel, A., & Borth, D. (2019). Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7), 2217-2226. https://doi.org/10.1109/JSTARS.2019.2918242
Iftenea, M., Liub, Q., & Wangc, Y. (2017). Very high resolution images classification by fusing deep convolutional neural networks. In The 5th International Conference on Advanced Computer Science Applications and Technologies (ACSAT 2017) (pp. 172-176). https://doi.org/10.23977/acsat.2017.1022
Jozdani, S., Chen, D., Pouliot, D., & Johnson, B. A. (2022). A review and meta-analysis of generative adversarial networks and their applications in remote sensing. International Journal of Applied Earth Observation and Geoinformation, 108, 102734. https://doi.org/10.1016/j.jag.2022.102734
Kattenborn, T., Leitloff, J., Schiefer, F., & Hinz, S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS journal of photogrammetry and remote sensing, 173, 24-49. https://doi.org/10.1016/j.isprsjprs.2020.12.010
Laban, N., Abdellatif, B., Ebied, H. M., Shedeed, H. A., & Tolba, M. F. (2018). Performance enhancement of satellite image classification using a convolutional neural network. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017 (pp. 673-682). Springer International Publishing. https://doi.org/10.1007/978-3-319-64861-3_63
Mahamunkar, G. S., & Netak, L. D. (2021). Comparison of Various Deep CNN Models for Land Use and Land Cover Classification. In International Conference on Intelligent Human Computer Interaction (pp. 499-510). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-98404-5_46
Muhammad, U., Wang, W., Chattha, S. P., & Ali, S. (2018). Pre-trained VGGNet architecture for remote-sensing image scene classification. In 2018 24th International Conference on Pattern Recognition (ICPR) (pp. 1622-1627). IEEE. https://doi.org/10.1109/ICPR.2018.8545591
Naushad, R., Kaur, T., & Ghaderpour, E. (2021). Deep transfer learning for land use and land cover classification: A comparative study. Sensors, 21(23), 8083. https://doi.org/10.3390/s21238083
O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.
Pallavi, M., Thivakaran, T. K., & Ganapathi, C. (2022). A Tile-Based Approach for the LULC Classification of Sentinel Image Using Deep Learning Techniques. In 2022 International Conference for Advancement in Technology (ICONAT) (pp. 1-5). IEEE. https://doi.org/10.1109/ICONAT53423.2022.9726030
Petrovska, B., Zdravevski, E., Lameski, P., Corizzo, R., Štajduhar, I., & Lerga, J. (2020). Deep learning for feature extraction in remote sensing: A case-study of aerial scene classification. Sensors, 20(14), 3906. https://doi.org/10.3390/s20143906
Rajagopal, A., Ramachandran, A., Shankar, K., Khari, M., Jha, S., Lee, Y., & Joshi, G. P. (2020). Fine-tuned residual network-based features with latent variable support vector machine-based optimal scene classification model for unmanned aerial vehicles. IEEE Access, 8, 118396-118404. https://doi.org/10.1109/ACCESS.2020.3004233
Rawat, A. K., Banerjee, S., & Roy, A. K. (2020). Assessment of Land Use/Land Cover Changes of potential growing fringe areas of Lucknow Using Remote Sensing and GIS. In 2020 International Conference on Contemporary Computing and Applications (IC3A) (pp. 254-259). IEEE. https://doi.org/10.1109/IC3A48958.2020.233308
Rohith, G., & Kumar, L. S. (2020). Remote sensing signature classification of agriculture detection using deep convolution network models. In International Conference on Machine Learning, Image Processing, Network Security and Data Sciences (pp. 343-355). Singapore: Springer Singapore. https://doi.org/10.1007/978-981-15-6315-7_28
Sarkar, A., Yang, Y., & Vihinen, M. (2020). Variation benchmark datasets: update, criteria, quality and applications. Database, 2020, baz117. https://doi.org/10.1093/database/baz117
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Temenos, A., Temenos, N., Kaselimi, M., Doulamis, A., & Doulamis, N. (2023). Interpretable deep learning framework for land use and land cover classification in remote sensing using SHAP. IEEE Geoscience and Remote Sensing Letters, 20, 1-5. https://doi.org/10.1109/LGRS.2023.3251652
Tesfay, F., Kibret, K., Gebrekirstos, A., & Hadgu, K. M. (2022). Land use and land cover dynamics and ecosystem services values in Kewet district in the central dry lowlands of Ethiopia. Environmental Monitoring and Assessment, 194(11), 801. https://doi.org/10.1007/s10661-022-10486-x
Thiagarajan, K., Manapakkam Anandan, M., Stateczny, A., Bidare Divakarachari, P., & Kivudujogappa Lingappa, H. (2021). Satellite image classification using a hierarchical ensemble learning and correlation coefficient-based gravitational search algorithm. Remote Sensing, 13(21), 4351. https://doi.org/10.3390/rs13214351
Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media. https://doi.org/10.1007/978-1-4757-3264-1
Vapnik, V., & Chervonenkis, A. (1974). Theory of pattern recognition. Nauka, Moscow.
Wang, X., Xu, M., Xiong, X., & Ning, C. (2020). Remote sensing scene classification using heterogeneous feature extraction and multi-level fusion. IEEE Access, 8, 217628-217641. https://doi.org/10.1109/ACCESS.2020.3042501
Xia, G. S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., & Lu, X. (2017). AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7), 3965-3981. https://doi.org/10.1109/TGRS.2017.2685945
Xia, G. S., Yang, W., Delon, J., Gousseau, Y., Sun, H., & Maître, H. (2010). Structural high-resolution satellite image indexing.
Yaloveha, V., Hlavcheva, D., & Podorozhniak, A. (2021). Spectral Indexes Evaluation for Satellite Images Classification using CNN. Journal of Information and Organizational Sciences, 45(2), 435-449. https://doi.org/10.31341/jios.45.2.5
Yang, Y., & Newsam, S. (2010). Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems (pp. 270-279). https://doi.org/10.1145/1869790.1869829
Yifter, T., Razoumny, Y. N., & Lobanov, V. K. (2022). Deep transfer learning of satellite imagery for land use and land cover classification. Informatics and Automation, 21(5), 963-982. https://doi.org/10.15622/ia.21.5.5
Zhang, C., Li, Z., Jiang, H., Luo, Y., & Xu, S. (2021). Deep learning method for evaluating the photovoltaic potential of urban land-use: A case study of Wuhan, China. Applied Energy, 283, 116329. https://doi.org/10.1016/j.apenergy.2020.116329

Table 2 is available in Supplementary Files section.

No competing interests reported.

Table2.docx

Download PDF

Editorial decision: Major revision
24 Sep, 2023
Reviews received at journal
14 Aug, 2023
Reviewers agreed at journal
11 Aug, 2023
Reviewers invited by journal
10 Aug, 2023
Editor assigned by journal
08 Aug, 2023
Submission checks completed at journal
08 Aug, 2023
First submitted to journal
05 Aug, 2023

You are reading this latest preprint version

Generation and Classification of Land Use and Land Cover Datasets in the Indian States: A Comparative Study of Machine Learning and Deep Learning Models

Status:

Version 1

Abstract

Figures

1. Introduction

1.2 Related work

2. Materials and methods

2.1 Datasets

2.2 Implementation of the ML and DL models

2.4 Evaluation Metrics

3. Results

3.1. Hyperparameter tuning of the ML models

3.2. Performance comparison of the fine-tuned ML and DL Models

4. Discussion

5. Conclusions and future work

Declarations

References

Table 2

Additional Declarations

Supplementary Files

Status:

Version 1