Convolutional neural network based hurricane damage detection using satellite images

Hurricanes are tropical storms that cause immense damage to human life and property. Rapid assessment of damage caused by hurricanes is extremely important for the first responders. But this process is usually slow, expensive, labor intensive and prone to errors. The advancements in remote sensing and computer vision help in observing Earth at a different scale. In this paper, a new Convolutional Neural Network model has been designed with the help of satellite images captured from the areas affected by hurricanes. The model will be able to assess the damage by detecting damaged and undamaged buildings based upon which the relief aid can be provided to the affected people on an immediate basis. The model is composed of five convolutional layers, five pooling layers, one flattening layer, one dropout layer and two dense layers. Hurricane Harvey dataset consisting of 23,000 images of size 128 × 128 pixels has been used in this paper. The proposed model is simulated on 5750 test images at a learning rate of 0.00001 and 30 epochs with the Adam optimizer obtaining an accuracy of 0.95 and precision of 0.97. The proposed model will help the emergency responders to determine whether there has been damage or not due to the hurricane and also help those to provide relief aid to the affected people.


Introduction
An increase in the occurrence of natural disasters has been evident from the year 1980. People residing in the disasterprone areas are also increasing. This leads to a hike in losses and damage caused due to the natural disasters (Pi et al. 2020).
A low-pressure region that develops over the tropical or subtropical waters is known as a tropical cyclone. In the Atlantic basin, the tropical cyclones are known as hurricanes. The energy of the hurricanes is drawn through the warm surface waters. When the warm and moist air spirals in counter clockwise direction inwards towards the center, there is an increase in the wind speed which reaches its maximum in the area surrounding the calm center of the hurricane (Dawood and Asif 2019). The hurricanes are very fatal causing excessive damage to human life and property.
Hurricanes are storms that are cataclysmic in nature. Table 1 presents the storm's classification based on the wind speed. The storm is known as tropical storm when the wind speed is between 64 and 118 km/hour. When the wind speed is greater than 118 km/hour, it is known as a hurricane. A tropical depression occurs when the wind speed is below 64 km/hour. It becomes impossible to prevent an extreme weather event when the speed of the wind increases the above threshold.
Hurricane Harvey which is a Category 4 hurricane made landfall in the Houston region in the year 2017 with a speed Communicated by Irfan Uddin.
& Deepika Koundal koundal@gmail.com 1 of 210 km/hour killing more than 100 people. It caused a huge damage of $125 billion. In order to provide immediate help and aid to the people, assessing the damage is extremely important. In this paper, the author has determined whether there has been damage or not to the buildings in the Houston area due to hurricane. This would help to rescue people on time and also help to deliver food and other resources on time.
Satellite images are gaining popularity for monitoring of hurricanes (Dotel et al. 2020a). Satellite images help in assessing the situation by providing an aerial view. But this process is still dependent on inspection by humans and is thus slow and unreliable. Hence, computer vision comes into the picture.
Previously, several machine learning (ML) techniques have been used for hurricane damage detection, tracking and estimation. For example, different ML algorithms were used for detecting the damage due to the trees that had fallen on the roads due to the hurricanes. A good accuracy of 86% was achieved for the Hurricane Michael in the Florida region. But this method was not applicable to other hurricanes in the other regions (Gazzea et al. 2021). An effective method that is discrete skeleton evolution was utilized for determination of the shape of the eye of hurricanes which helped in finding out the occurrence of hurricanes. But the experimental data and the reference data agreed with each other by 4% only. Better results could have been achieved (Lee et al. 2016). Multimodal machine learning techniques were used to forecast hurricanes and also their track. But this method was restricted to particular hurricane and was not generalizable (Boussioux et al. 2020).
Recently, due to the revolutionizing of computing capacity, there has been tremendous development in deep learning (DL) (Kaur et al. 2021). DL with Convolutional Neural Network (CNN) performs extremely well in case of classification of images. The CNN model extracts feature from the images, therefore there is no requirement for any other feature extraction methods (Pritt and Chern 2017).
CNN is a kind of feedforward network in which the convolution operation is applied instead of the general matrix multiplication. It consists of mainly three layers: convolutional layer, pooling layer and fully connected layer (Cao and Choe 2020). The convolutional layer helps in extracting features from the image. Convolutional filters along with mathematical operations are applied to the input image which produces feature maps. ReLU (Rectified Linear Unit) is applied after the convolutional operation for introducing nonlinearities in the network. ReLU f(x) = max (0, x) helps in speeding up the training of the network without any effect on the network performance. Pooling or subsampling layers helps in reduction of the dimensions of the feature maps in order to decrease the processing time (Phung and Rhee 2019). After the extraction of features and the reduction of resolution by the convolutional and pooling layers, the network is flattened into a feature vector by passing through the fully connected layers. Output obtained from convolution and pooling layers exhibit high level features which are used for classification by the fully connected layers (Phung and Rhee 2019).
The major contribution of this research paper is as following: 1. A CNN based model has been proposed that comprises five convolutional layers, five pooling layers, one flattening layer, one dropout layer and two dense layers. 2. The impact of various hyperparameters such as optimizers and learning rate has been studied. The optimizers studied here are Adagrad, SGD, RMSProp and the Adam Optimizer. A comparison of these optimizers has been done at learning rates of 0.000001 and 0.00001. The best results were obtained for the Adam optimizer at a learning rate of 0.00001.
Finally, the best model is found that will be beneficial for finding out the damage caused due to hurricanes automatically.
The remaining portion of the paper is organized as related work in Sect. 2 which includes the contributions of other researchers, dataset preparation in Sect. 3 that consists of the dataset explanation, pre-processing techniques and the proposed model. The results and discussion have been discussed in Sect. 4. The comparison of the proposed model with state-of-the-art models have been discussed in Sect. 5 and conclusion and future work have been discussed in Sect. 6.

Related work
Recently, DL has proved to be beneficial for automatic detection of damages caused by hurricanes.
The author introduced and evaluated CNN models for the damage caused by natural disasters from the aerial imagery. The CNN models were trained on aerial videos named Volan 2018. Eight CNN models were trained by transfer learning that achieved 80.69% mAP and 74.48% for high altitude and low altitude respectively (Pi et al. 2020). A benchmark dataset was created from the data available publicly. The dataset was created for the Greater Houston area of Hurricane Harvey that occurred in the year 2017. The dataset could be utilized by the researchers for training and test of the object detection models for automatic detection of the damaged buildings (Chen et al. 2018). Satellite images along with deep learning are being used for various applications such as in disaster response and monitoring of the environment. CNNs with post processing neural networks combined CNN predictions along with satellite metadata. An accuracy of 0.83 and F1 score of 0.797 was obtained for the IARPA fMoW dataset (Pritt and Chern 2017). A model for change detection using CNN was designed for finding out the areas affected severely by Hurricane Harvey and an F1 score of 81.2% was obtained. The satellite images were thresholded and clustered to make grids from which the disaster impact was determined (Doshi et al. 2018 (Li et al. 2019). An accuracy of 88.3% was achieved when a semi-supervised classification method was applied to the Hurricane Sandy dataset. This method consisted of three steps: segmentation, convolutional auto-encoder and fine-tuning using CNN. These techniques are used when there are more of unlabelled samples and less of labelled samples (Li et al. 2018). Various combinations of three neural networks were taken into account and it was found that when different colour masks of the relevant objects were taken, they performed in a better way. The first neural network was used for pre-processing and the second and third neural networks were useful for extracting features. These networks were used for detecting damage caused to buildings (Nia and Mori 2017).
The studies conducted earlier mostly focused on determining the intensity of the hurricanes. The earlier research provided a benchmark dataset for hurricane damage detection, applied transfer learning techniques for determining the damage to roads caused by hurricanes. But very less researchers have conducted study on the damage caused to buildings due to hurricane using satellite images. In this paper, the proposed CNN model classifies the satellite images into damaged and undamaged classes with an improved accuracy than the previously used methods. This paper is based on the choice of an optimal hyperparameter to reduce losses and improves performance of the proposed model.

Dataset preparation
Hurricane Harvey dataset comprising 23,000 satellite images has been used for the automatic damage detection. The satellite images were captured by the optical sensors and sub-meter resolution and pre-processing was done. The pre-processing included atmospheric compensation and orthorectification. Pan-sharpening was also done. GeoEye -1 is the satellite through which the images have been captured. It has a panchromatic resolution of 46 cm. The data has been taken from Kaggle. The data originally has been obtained from IEEE Dataport. The images are further increased by the data augmentation techniques.

Dataset analysis
Dataset used in this paper comprises the satellite images obtained after Hurricane Harvey that occurred in the Greater Houston region. The satellite images have been labeled as ''damaged'' and ''undamaged'' for the assessment of damage caused to buildings due to the hurricane. The images labelled as ''damaged'' indicate buildings affected by the hurricane and ''undamaged'' label indicate the buildings which were unaffected by the hurricane disaster. The number of ''damaged ''class images are 15,000 and ''undamaged'' class images are 8000 in number. The dataset is divided into 15,525 training images, 1725 validation images and 5750 testing images. Figure 1 shows damaged and undamaged sample images.

Dataset Pre-processing
Pre-processing is the most important stage in image processing because it causes an improvement in the features of the satellite images and also helps in suppressing unnecessary data present in the image ( (Scannell et al. 2020;Zheng et al. 2018)). Normalization and data augmentation are the two steps under pre-processing.

Normalization
Data normalization is a very important step since it is used for maintaining numerical stability in the CNN models. Data normalization causes a CNN model to learn quickly and also makes its gradient descent stable. Hence, normalization of the pixel values of the images have been done in the range 0-1. This also makes the model unbiased in nature to the higher pixel values or feature values. The rescaling was done through multiplication of 1/255 along with the pixel values (Ba et al. 2016).

Data augmentation
Data augmentation is used for expanding the size of the dataset (Perez and Wang 2017). The augmentation not only helps in increasing the dataset size but also is used for incorporating diversity in the dataset. This allows in improving the generalizability of the model. The model also becomes more robust when trained on slightly different new images (Naqa et al. 2015;Shin et al. 2016).
Image Data Generator has been used for image augmentation. It augments the images in real time meaning that images are augmented on the fly during the training stage. This method saves a lot of memory. It returns only the images that have been transformed without adding to the original set of images.
In this paper, data augmentation techniques such as zooming, rotation, horizontal flip and height/width shifting have been performed. In rotation technique, the image is rotated by an angle. Rearrangement of pixels takes place due to flipping while feature protection is kept intact.
The specifications of these data augmentation transformations are given in Table 2. The image is rotated by 40 degrees in the clockwise direction. 0.2 Width shift range is the upper bound of fraction of the total width range by which image will be either shifted right or left. Similarly, 0.2 height shift range specifies the fraction by which image will be shifted along the y-axis. A value of 0.2 zoom means the image will be zoomed between the range [0.8 -1.2]. The image is flipped horizontally through the horizontal flip transformation. Figure 2 shows the results of data augmentation performed on the input image. Figure 2a represents the original image, 2 (b) is the rotated image, 2 (c) is the width shifted image, 2 (d) is the height shifted image, 2 (e) is the horizontally flipped image and 2 (f) is the zoomed image.

Design of the proposed CNN model
A CNN model has been proposed for hurricane damage detection consisting of 15 layers. In our CNN model, consisting of five convolutional layers, five pooling layers, one dropout layer and two dense layers, there comes out to be 1,061,826 trainable parameters. The model depth is dependent on how complex are the features that are to be extracted out from the images. As our dataset consists of only two classes, a shallow network works well in terms of both generalization and the training time.
In the proposed CNN model, an input image is of 128*128*3 size that is applied to a convolutional layer of 32 filters generating an image of size 126*126*32. After that, max pooling layer of 2*2 pool size is applied generating an image of size 63*63*32. Further, convolutional layers comprising 64,128,128 and 256 filters are applied and each convolutional layer is followed by a max pooling layer.
After the max pooling layer, a flattening layer is applied. The flattening layer is followed by a dropout layer of 0.5. Dropout layer is used for making model distribution of the weights regular. The value of 0.5 in the dropout layer means that half or 50% of the neurons will be dropped   . The design of the CNN model has been shown in Fig. 3 Table 3 shows the parameters in terms of filter size, image size and total number of parameters. An input image of 128*128*3 size is applied to a convolutional layer of size 128*128*3 of filter size 3*3 and comprising 32 filters. An output image of size 126*126*32 is obtained and 896 parameters are generated. This image is then applied to a max pooling layer of pool size 2*2 which returns an output of size 63*63*32. This output is then given as input to the second convolutional layer comprising 64 filters that returns 18,496 parameters. The output of size 61*61*64 is then given to the second max pooling layer that returns an output of size 30*30*64. The third convolutional layer returns an output of 28*28*128 size generating 73,856 parameters. The third and the fourth convolutional layer comprised 128 filters. Each convolutional layer is followed by a max pooling layer. The fifth convolutional layer comprises 256 filters of size 3*3 and returned 295,168 parameters. The fifth max pooling layer is followed by a flattening, dropout and two dense layers. The first dense layer returns 524,800 parameters and the second dense layer returns 1026 parameters. The activation function of all the convolutional layers and the first dense layer is the ReLU function whereas the activation function of the last dense layer is the sigmoid function. ReLU activation function is faster, simpler and works well whereas sigmoid function helps to predict the probability of the output since its values lie between 0 and 1.
The proposed model consists of five convolutional layers, five max pooling, one flattening layer, one dropout layer and two dense layers. The filter or kernel size for each convolutional layer is 3X3. The number of filters used for 5 convolutional layers are 32,64,128,128 and 256 reply. Hence the proposed model comprises of 1,061,826 trainable parameters. The model we have chosen is a shallow network since the images have to be classified into two categories that is damaged or undamaged. It is taking less time and also providing good results.
The machine learning methods used earlier involved manual feature extraction. A lot of work has already been done in the machine learning field. Presently, deep learning is being used as it extracts features from the data automatically. Currently, CNN based models are being used for image classification as they help in extracting features from the images with a very good accuracy. Hence, the proposed model chosen in this paper is a CNN based model designed from scratch which helps to achieve a very good accuracy of 95%.

Experimental setup
The training of the proposed CNN model has been done using the Python programming. TensorFlow and keras packages have been used and the model has been simulated on Kaggle and its GPU.

Results and discussion
This section includes the discussion of results that were obtained from the proposed CNN model after change of the various hyperparameters like optimizers and learning rates. The learning rate is that hyperparameter that helps to control how much the model needs to change in response to the estimated error every time the weights of the model are updated. Choosing an appropriate value of learning rate is very important as a very small value can cause a longer training process and a very large learning rate may result in an unstable training process. Optimizers are methods that help in minimizing the loss function or help in maximizing the efficiency. Optimizers are dependent on biases and weights. An optimizer helps in deciding how to change learning rates and weights so as to reduce the losses.

Performance metrics
The various performance metrics (Denil et al. 2013;Betz et al. 2011;Ng et al. 2020   Accuracy -It is obtained by dividing the correct predictions by the total predictions as shown in Eq. 1.
Precision-This metric gives the predicted positive labels out of all the positive labels as shown in Eq. 2.
Recall -Recall is obtained by dividing true predicted labels by the total estimated labels as shown in Eq. 3. It is also known as sensitivity.
F1 -score -It is the harmonic mean of precision and recall as shown in Eq. 4.
Specificity -It is found by division of true negative by actual negative as given in Eq. 5.

Specifications of different optimizers used for simulation
For classification of the hurricane images into damaged and undamaged classes, the proposed model is simulated using various deep learning optimizers. The optimizers used are Adagrad (Adaptive Gradient Algorithm), SGD (Stochastic Gradient Descent), RMSprop (Root Mean Squared Propagation) and Adam (Adaptive Moment Estimation). The learning specifications of the deep learning optimizers are given below in Table 4.

Results analysis for different optimizers at learning rates 0.000001
This section describes the results of the experiment performed on the Hurricane Harvey dataset using a proposed CNN model with four different optimizers. The proposed model was analysed at learning rate = 0.000001 and 30 epochs.

Analysis of accuracy and loss
The performance in terms of training accuracy, training loss, training recall, validation accuracy, validation loss and validation recall for the various optimizers at learning rate of 0.000001 and 30 epochs is shown in Table 5. From the Table 5, it can be analysed that RMSProp performed best for training accuracy, training loss, training recall, validation accuracy and validation loss whereas Adam optimizer performed best for validation recall parameter. Figure 4 shows the Model Accuracy for different Optimizers at each epoch with LR of 0.000001. Simulation is done for 30 epochs with four optimizers (Choi et al. 2019). Figure 4a shows the model accuracy for Adagrad (Lydia and Francis 2019) optimizer, Fig. 4b shows for SGD optimizer (Duda 2019), Fig. 4c shows model accuracy for RMSProp optimizer (Wichrowska et al. 2017) whereas Fig. 4d shows model accuracy for Adam optimizer (Kumar et al. 2020). From the Fig. 4, it can be analysed that RMSProp performed best whereas Adagrad performed the worst in terms of model accuracy. Figure 5 shows the Model Loss for different Optimizers at each epoch with LR of 0.000001. Simulation is done for 30 epochs with four optimizers. Figure 4a shows the model loss for Adagrad optimizer, Fig. 4b shows for SGD optimizer, Fig. 4c shows model loss for RMSProp optimizer whereas Fig. 4d shows model loss for Adam optimizer. From the Fig. 5, it can be analyzed that RMSProp performed best whereas Adagrad performed the worst in terms of model loss. Figure 6 shows the confusion matrix of the proposed model for different optimizers at a learning rate of 0.000001. Figure 6a, b, c, d shows the confusion matrices for the Adagrad, SGD, RMSProp and Adam optimizer respectively. In the confusion matrix, two classes are shown that are damaged and undamaged. Table 6 shows the different confusion matrix parameters of the optimizers at a learning rate of 0.000001. From Table 6, it can be analysed that RMSProp performed best in terms of accuracy and F1-score, Adam performed best in terms of precision and Adagrad performed best in terms of recall.

Analysis of different confusion matrix parameters
The comparison of four optimizers in terms of confusion matrix parameters is shown graphically also in Fig. 7. From Fig. 7, it can be concluded that RMSProp and Adam

Results analysis for different optimizers
at learning rates 0.00001 The performance in terms of training accuracy, training loss, training recall, validation accuracy, validation loss and validation recall for the various optimizers at learning rate of 0.00001 and 30 epochs is presented in this section.

Analysis of accuracy and loss
The analysis at a learning rate of 0.00001 and 30 epochs is shown in Table 7. From Table 7, it can be seen that Adam optimizer outperformed other optimizers as it achieved highest training and validation accuracy and recall and lowest loss. Adam optimizer performs the best as it is a combination of the best properties of both RMSProp and Adagrad optimizer. Figure 8 shows the Model Accuracy for different Optimizers at LR = 0.00001. The optimizers considered are Adagrad, SGD, RMSprop and Adam. Simulation is done for 30 epochs with four optimizers. Figure 8a shows the model accuracy for Adagrad optimizer, Fig. 8b shows for SGD optimizer, Fig. 8c shows model accuracy for  RMSProp optimizer whereas Fig. 8d shows model accuracy for Adam optimizer. From the Fig. 4, it can be analyzed that Adam optimizer performed best whereas Adagrad performed the worst in terms of model accuracy. Figure 9 shows the Model Loss for different Optimizers at each epoch with LR of 0.00001. Simulation is done for 30 epochs with four optimizers. Figure 9a shows the model loss for Adagrad optimizer, Fig. 9b shows for SGD optimizer, Fig. 9c shows model loss for RMSProp optimizer whereas Fig. 9d shows model loss for Adam optimizer. From the Fig. 9, it can be analyzed that Adam performed best whereas Adagrad and SGD performed the worst in terms of model loss. Figure 10 shows the confusion matrix of the proposed model for different optimizers at a learning rate of 0.000001. Figure 10a, b, c, d shows the confusion matrices for the Adagrad, SGD, RMSProp and Adam optimizer respectively.

Analysis of different confusion matrix parameters
In the confusion matrix, two classes are shown that are damaged and undamaged. Performance based on the different confusion matrix parameters such as accuracy, recall, precision and F1-score (Fürnkranz and Flach 2003;Zhou et al. 2021) has been discussed. Table 8 shows the classification performance of the four optimizers at learning rate = 0.00001 and 30 epochs. It was found that Adam outperformed other optimizers and achieved the best accuracy and precision of 0.95 and 0.97. An equal F1-score of 0.96 was obtained by both Adam and RMSProp optimizer. RMSProp optimizer achieved the best recall of 0.97.
The comparison of four optimizers in terms of confusion matrix parameters is shown graphically also in Fig. 11.-From Fig. 11, it can be concluded that RMSProp and Adam are good in terms of accuracy, precision, recall, specificity and F1-score. Overall Adam optimizer performed best and obtained an accuracy of 95%, precision of 97%, recall of 96% and F1-score of 96%. RMSProp achieved best specificity of 91% and recall of 97%.
4.5 Comparison of LR 0.000001 and LR 0.00001 Figure 12 shows the comparison of the confusion matrix parameters at learning rates 0.000001 and 0.00001. Figure 12a shows comparison of accuracy, 12(b) comparison of precision, 12(c) comparison of recall, 12(d) shows the comparison of F1-score and 12(e) shows the comparison of specificity. It can be analysed from Fig. 12a that Adam optimizer achieves the best accuracy of 0.95 at learning rate of 0.00001. From Fig. 12b, it can be inferred that Adam optimizer obtains the best precision of 0.97 at learning rate of 0.00001. From Fig. 12c, it can be seen that Adagrad obtained the best recall of 0.98 at learning rate of 0.000001, followed by recall of 0.97 obtained by RMSProp and 0.96 obtained by Adam optimizer at learning rate of 0.00001. An equal F1-score of 0.96 is obtained by both RMSProp and Adam optimizer at a learning rate of 0.00001 as can be seen from Fig. 12d. From Fig. 12e, it can be seen that RMSProp achieved highest specificity of 0.91 followed by specificity of 0.86 by Adam at learning rate of 0.00001.Thus, it can be inferred that the Adam optimizer is performing best at a learning rate of 0.00001.

Comparison of the proposed model with state-of-the-art models
The suggested model has been compared with the state-ofthe-art models as shown in Table 9. The suggested model consisted of 15 layers and gave the best results at learning rate of 0.00001 and 30 epochs with the Adam optimizer.23000 images of damaged and undamaged classes  In this paper, a convolutional neural network has been proposed for the automatic detection of damage caused to buildings using the satellite images due to the Hurricane Harvey that occurred in the Greater Houston region in the year 2017. Damage detection after the natural disasters is of prime importance for the first responders so that the people affected by the hurricane disaster can be provided aid at the earliest. The proposed model was made up of 15 layers consisting of five convolutional layers, five max pooling layers, one flattening layer, one dropout layer and two dense layers. Four optimizers that is Adagrad, SGD, RMSProp and Adam were compared at different learning rates and at 30 epochs. It was found that the proposed model worked best with the Adam optimizer at learning rate of 0.00001 and 30 epochs. An accuracy of 0.95, precision of 0.97, recall of 0.96, specificity of 0.86 and F1score of 0.96 were obtained. The proposed model also achieved best results in terms of training accuracy, recall and loss and validation accuracy, recall and loss. Training accuracy of 0.9599, training loss of 0.1076, training recall of 0.9579, validation accuracy and recall of 0.9519 and validation loss of 0.1153 were obtained. In this paper, an accuracy of 95% is obtained. The accuracy could be further improved in the future by using alternate CNN models. Also, various transfer learning models could be employed for an improvement in the damage assessment results.
The limitation of this study is that the image data used is specific to the buildings and geography of Greater Houston region during the Hurricane Harvey. An improvement in the model could be brought by making it more generalized by including samples from other hurricanes and regions.