Energy efficient IoT-based cloud framework for early flood prediction

Flood is a recurrent and crucial natural phenomenon affecting almost the entire planet. It is a critical problem that causes crop destruction, destruction to the population, loss of infrastructure, and demolition of several public utilities. An effective way to deal with this is to alert the community from incoming inundation and provide ample time to evacuate and protect property. In this article, we suggest an IoT-based energy-efficient flood prediction and forecasting system. IoT sensor nodes are constrained in battery and memory, so the fog layer uses an energy-saving approach based on data heterogeneity to preserve the system’s power consumption. Cloud storage is used for efficient storage. The environmental conditions such as temperature, humidity, rainfall, and water body parameters, i.e., water flow and water level, are being investigated for India’s Kerala region to calibrate the flood phases. PCA (Principal Component Analysis) approach is used at the fog layer for attribute dimensionality reduction. ANN (Artificial Neural Network) algorithm is used to predict the flood, and the simulation technique of Holt Winter is used to forecast the future flood. Data are obtained from the Indian government meteorological database, and experimental assessment is carried out. The findings showed the feasibility of the proposed architecture.


Introduction
Natural disasters are universal incidents and require significant assistance to tackle. Natural calamities such as floods, earthquakes, and hurricanes are events that intensely impact wide zone, distressing populations, affecting goods, and tremble the people on both economic as well as psychological viewpoints (Avvenuti et al. 2017). From these disasters, flood is one of the ordinary disastrous incidents in various countries consistently around the world. Over the past two decades, floods have accounted for 44% of all disaster incidents and have affected 1.6 billion people worldwide (UNDRR 2020) China is the nation most impacted by floods from 2000 to 2019, with an average of 20 floods every year and 900 million people affected (UNDRR 2020). In 2017, floods in the US caused 3019 fatalities and resulted in an overall loss of US$ 95,000 million (https:// natca tserv ice. munic hre. com).In India recently, in Gujrat, the floods devastated 6.44 lakh farmers. The cost of crop destruction is expected to be Rs. 867 crore. Assam deals with the displacement of 61,923 people and approximately 160 deaths. In West Bengal, 1.67 lakh citizens were forced to seek refuge in emergency shelters (www. newsc lick. in), resulting in part from the lack of development of warning systems and information at the community level of the imminent flood. There is needed to take special procedures to predict it and to manage the situation before it occurs. Flood prediction and detection are done using IoT sensors and cloud computing in a geographical area by providing proficient acquisition, processing, and efficient storage of flood-related information. IoT and cloud computing are in combination become a powerful platform for flood management by monitoring the water bodies from remote sites providing continuous information about flood to decision-making agencies and rescue units.
Flood forecasts and warnings are intended to decrease the risk of loss of life and economic influence. A framework for flood alert provides data accumulation, analysis of data, scrutinize and warning (Sakib et al. 2016). The presented model is based on forecasting and detecting the flood in advance based on environmental factors like temperature, humidity, rainfall, and hydrological parameters like water flow and water level. Nonlinear data are dimensionally reduced by using principal component analysis (PCA) dimension reduction method. In this study, principal component analysis is proposed as a novel pre-processing technique for the flood detection systems to reduce the dimensions of flood-related attributes, and the resulting input representation is trained with artificial neural networks (ANN) for classifying the data. Artificial neural network is a nonlinear computational structure comprising many interconnected processing units (Tur et al. 2010). ANN is used because of its characteristics like high parallelism, fault tolerance, learning, and generalization capabilities. In this study of flood prediction domain, IoT framework uses a comprehensive historical dataset of continuous observations over a span of time to predict the flood and related activities.

Contributions
The paper contributes to this aspect by • Using IoT sensors to monitor the study area for various flood-causing attributes.
• Efficient utilization of hardware by employing an energy-conserving mechanism at Fog Layer.
• Maximize network capacity consumption by applying a dimensionality reduction mechanism at Fog Layer. • Cloud computing assisted flood prediction and forecasting using precise approaches. • Alerts generation to the water resource and disaster response organizations if a region is particularly vulnerable to floods. • Sharing information and transmission of evaluated results to the concerned agencies (Water Resource Agencies and CDEOC).
In the proposed system, the general framework of IoT-based flood prediction and forecasting system is described. A survey on various flood monitoring and management systems focused on IoT with different data mining methodologies has been addressed in Sect. 2. Sect. 3 describes key terms relevant to our proposed flood data analysis model and computing framework with alert generation process. In Sect. 4, a complete experimental comparison of the proposed approach with different techniques is described. Comparative analysis and innovation of research are presented in Sect. 5. Section 6 discusses the proposed system's limitations. Finally, Sect. 7 brings the study to a conclusion with some important discussions.

Related work
This section analyzes various flood management systems and data mining techniques used for flood data. Firstly, different frameworks have been discussed related to flood management systems, followed by data mining techniques.

Flood management system
In Ray and Mukherjee (2017) presented the model based on IoT to experience the significant issues with disasters, for example, remote monitoring and real-time analysis, cautioning, data analysis, information accumulation. A complete dialog is presented on state-ofthe-art situations to deal with devastating incidents. Moreover, IoT-based guidelines and market-prepared deployable products were reviewed to tackle disaster problems. It is concluded that the Internet of Things-based technologies are suitable for disaster management and handling disastrous situations. This survey reveals key challenges and research patterns in IoT-enabled emergency response systems. In Afzaal and Zafar (2016) proposed a model in which sensors were installed to observe water levels in different water bodies. Gateways are used as an interface with the cloud and enable actors based on information handled by the cloud. The Vienna Development Method-Specification Language (VDM-SL) is used for the configuration and execution of the system. VDM-SL is utilized to create potential test scenarios to minimize device failures and omissions. The result shows that there is no error in the specification. In Hernández-Nolasco et al. (2016) offer a sensor for calculating water levels in rivers, reservoirs, lagoons, and streams. A prototype is designed using a micro-model installed on a basic open circuit based on a water level measurement sensor. Netduino Plus 2 is used to perform the micro-model. In Kumar et al. (2015) presented an innovative approach to India's recurrent issue of floods.
The system is an integrated flood prediction, and alert generation system developed using Internet of Things techniques. The system uses an innovative approach to calculate and monitor several flood-related parameters at different locations to forecast river flooding 1 3 in real-time reliably. In Yusoff et al. (2015) suggested that flood control and early warning monitoring can be tackled by cloud computing. The study confirms that GreenCloud supports crucial functions for the growth of smart cities. In Lo et al. (2015) presented an image processing techniques-based model to determine the flood conditions. The experimental results indicate the reliability of the visual sensing approach. In Hadid et al. (2020) presented a method for predicting river stream levels using the hybrid model and Dempster-Shafer algorithm for PWARX (Piecewise Auto-Regressive eXogenous) model.

Meteorological data analytics
Several techniques in hydrology and water resource studies had been described in recent decades. The techniques such as genetic algorithm, ANFIS(Adaptive Neural based Fuzzy Inference System) (Hong et al. 2018 proposed a principal component analysis-based system to monitor water quality's spatial and temporal variations. In Widmann and Schar (1997) presented a method for analyzing daily precipitation in Switzerland using principal component analysis technique. A mathematical model is developed to observing precipitation patterns due to shifts in the frequency or precipitation behavior of Alpine weather groups.
In Rehman et al. (2019) presented various flood vulnerability assessment methods, including ANN. In Aziz et al. (2013) described artificial neural network implementation for regional flood inundation mapping in the Australian case study. In Wu et al. (2010) applied modular ANN technique to forecast the rainfall time series. In Chu et al. (2018) proposed a modified principal component analysis (MPCA) method for assessing environmental variables to track ecological changes in coastal recovery areas. In Ghadim et al. (2018) discuss using the Holt-Winters time series model's additive and multiplicative types to forecast environmental variables one year in advance. Figure 1 explains the proposed model for flood prediction and forecasting. It consists of three layers: data acquisition layer, fog and cloud. The data acquisition layer gathers environmental data from several sensors such as rainfall sensors, temperature sensors, water flow sensors, humidity sensors, water level sensors at different locations and water bodies. The data collected are analyzed at the fog layer to examine the data variation to adapt the sampling frequency of the sensor nodes. The data dimensionality is decreased further by doing the principal component analysis (PCA) on the fog layer and forwarding it to the cloud layer. Data are maintained in a cloud-based repository from which valuable information is extracted for efficient processing and effective decision-making.

Data acquisition layer
The data acquisition layer gathers a large amount of data. IoT sensor nodes are responsible for collecting data on flood events and related parameters in the local area. Successful flood prediction and forecasting focus on information on the various meteorological and hydrological attributes that cause flooding. The overview of flood attributes and accompanying sensors is shown in Table 1. Meteorological attributes comprise information about temperature, humidity, precipitation, and monsoon season significantly escalate the occurrence of maximum rainfall that causes flood conditions. 2. Hydrological attributes: Water level in water bodies of the study area and water flow regarding time is considered an essential factor for flood conditions.

Fog layer
The fog layer is the transitional layer between cloud and the data acquisition layer. The raw data obtained from the data acquisition layer are preprocessed at this layer. The fog layer performs a two-step preprocessing of data such as ANOVA and Tukey post hoc tests to save energy and PCA-based data dimensional reduction. Table 2 describes various notations used in the fog layer.

ANOVA model and Tukey's post hoc test
Fog layer receives datasets information from the data acquisition layer. Batteries operate the sensors nodes, and power restricted sensors require an energy-efficient data collection strategy. A method for measuring the active and sleep duration based on sensor data has been presented in the proposed model. To avoid redundant information, sensor active and sleep durations of sensors are analyzed using the ANOVA (Analysis of Variance) method with Tukey Test. The one-way ANOVA model is used to measure the total heterogeneity (ℍ t ) of data generated by sensor nodes in N time slots. Total heterogeneity (ℍ t ) is calculated as the measure of the heterogeneity within duration (ℍ within ) and heterogeneity between duration (ℍ btw ) , illustrated as: Each sensor node captures a new data value for every flood attribute: (1) Here, f ac is ath observation taken by sensor node in cth duration; n c indicates the number of observations in cth duration; N indicates a total number of durations; Mean c denotes mean of data values captured in cth duration; Mean denotes mean of data values captured across all N time durations. The one-way ANOVA findings aid to assess whether or not the means of separate datasets obtained over subsequent time durations differ significantly. Further, the Tukey post hoc test is implemented to assess whether the variance between data values from distinct durations surpasses a particular threshold th . An update accompanies this on the active and sleep duration of the sensor node to make more efficient use of their energy.
(2) where bps denotes the sum of squares between durations; wps denotes the sum of squares within durations; ℑ bps denotes the mean square between durations; ℑ wps denotes the mean square within durations. The threshold value is calculated using the value of two most subsequent durations. The future active and passive time durations of nodes are modified based on the calculated value. If exceeds th , the time periods for are set to constant values T a and T p .
If the value of th is greater than , passive time is raised and active time decreased.
where T Active c is cth active time duration of node; T Passive c is cth passive time duration of node. Algorithm 1 depicts the operation of the proposed platform's energy saving mechanism. (4)

Dimension reduction
Dimension reduction module is used to acquire extracted features of the flood-related geographical attributes and sensor dataset, which is substantially smaller in size, thus far intimately preserving the accuracy of the original data. Specifically, mining on smaller dataset will be more effective and deliver the same (or nearly identical) outcomes. In this proposed system, principal component analysis (PCA) method is applied to sensed data and geographical attributes for dimension reduction according to Algorithm 2.
c Eigenvectors generate the dataset in new representation with reduced dimensions. The PCA restricts all new values to lower dimensionality and updates the database. Now that data are gathered and pre-processed, it must be evaluated for determining the flood level severity on the basis of the data received.

Cloud layer
The flood-related environmental sensory IoT data from different locations is stored in Cloud. The data are pervasively sensed and periodically collected at different time intervals. Therefore, for further analysis, data are stored in cloud servers. The flood-related activities are adequately categorized based on reduced attributes by using artificial neural networks. Table 3 describes the notations used in the cloud layer.

Flood prediction sub-layer and alert generation
The artificial neural network (ANN) method is adopted to classify the dimensionally reduced data by the principal component analysis algorithm. ANN model is composed of 3 layers of node, i.e., the input layer, the hidden layer, and the output layer. Every node has several inputs (from dimensionally reduced flood-related attributes) and several outputs (according to flood sensitivity factor). The structure of the multilayered feed-forward neural network is shown in Fig. 2. Circles represent the nodes, and lines represent the relations. Each input ( c 1 , c 2 , c 3 ,...,c n ) is multiply by a relation parameter known as weight (W i ) and combined to generate a single value. A transfer function then regulates this value. The cumulative output value of a node can be expressed as follows: Alert generation components are reasonable for transmitting alert notifications to decision-making agencies and citizens of a particular flood detected area (Algorithm 3).

Flood forecasting sub-layer
This sub-layer anticipates the potential occurrences of flood events by evaluating the current and historical values generated by the flood prediction layer. For this task, the Holt-Winters forecasting approach is used, which is the most frequently used exponential smoothing approaches. Holt Winter's method considers three components to determine the future flood. The three components are given as follows: where L t , T t and Q t are level, trend and seasonality components at time t. , and are model parameters. S t is flood stage at time t and x is seasons' length. Flood stage for

Performance evaluation
This portion of the paper discusses the implementation findings and addresses the reliability assessment of the proposed approach. The phases are addressed ahead:

Data accumulation and integration
Data cannot be derived directly from the environment for applying the proposed model but can be gathered from multiple official sources of data. We have generated flood attributes systematically by collecting data from India's different government sites (http:// www. imd. gov. in, http:// www. india water portal. org/ artic les/ distr ict-wise-month ly-rainf all-datal ist-rain-stati ons-india-metro logic aldep artme nt) and dataset repository (https:// www. kaggle. com/ biphi li/ india-rainf all-kerala-flood) for the Kerala region. The environmental dataset is created in such a way that all possible flood-related attributes are considered. The dataset contains 180 cases (about 14 districts of Kerala in 2018) with information on five attributes, i.e., season, temperature, relative humidity, rainfall, water level. The relationships between the flood attributes are presented by using the correlation matrix. Figure 3 depicts the correlation matrix of 5 independent variables. The correlation coefficient of variables is shown in Table 4.

Energy conservation using ANOVA and Tukey's post hoc test
Water level sensor data and temperature data are retrieved from the sensor dataset (Lyons et al. 2011) for several lakes in Alaska to determine the performance of the proposed energy-conserving mechanism. The dataset contains hourly data for the attributes of water level and temperature. 24 h water level sensor data are considered for implementation of ANOVA and Tukey's test. Considered data are divided into 6 intervals with 4 h in each interval. The result of ANOVA and Tukey post hoc test is depicted in Fig. 4. The result shows the maximum overlap of mean intervals.  1 3

Dimension reduction
The dimensionality of the final dataset is minimized by the principal component analysis (PCA) approach. All modern statistical analysis packages use programs to measure the vector pair (eigenvalue-eigenvector) of the sample correlation matrix. The PCA algorithm is available in Minitab and is applied directly to on generated dataset. Figure 5 shows the scree plot of flood attributes corresponding to eigenvalues and principal components. The first two principal components are selected out of 5 principal components since their corresponding eigenvalues are greater than unity. Table 5 specifies the loading values of two chosen PCs. These loadings illustrate how each variable contributes to a principal component. Almost all variables are significantly contributed to PC1, while WL is major contributed to PC2. Selected principal components clarified 78.4% of the overall variable heterogeneity in PCA (Table 6). The plot of first principal component against second principal component (Fig. 6) shows that samples were divided.  These two principal components are directed to the cloud layer for forecasting and prediction of floods.

Flood prediction analysis
Two PCs, which explain 78.4% of the total variance, were extracted to utilize the ANN technique for flood prediction. The generated dataset is split into two subgroups. The   Table 7 displays the predicted and observed accuracy of analyzed data. The ROC curve of the predictive model (ANN) is shown in Fig. 7. The area under curve is 0.9807. The tenfold cross-validation of ANN result in mean absolute error (MAE) 0.21321, root-mean-square-error (RMSE) 0.427, relative absolute error (RAE) 18.75%, root relative squared error (RRSE)28.75%, and Durbin-Watson statistic 1.55234. Efficiency in data classification concerns the classification of data instances into various groups using the artificial neural network technique. Different statistical methods are used to test and evaluate the classification efficiency of the proposed model. These include accuracy, sensitivity, specificity, and F-measure. Various baseline classifiers are used for comparative study. We have used three separate classifier models as a baseline classifier model for comparison, namely KNN, decision tree, BBN. Results have been obtained for various classifier models and are shown in Fig. 8.    Fig. 9 Flood forecasting for one month 1 3

Flood forecasting analysis
Holt-Winters forecasting model is used to estimate the future trends in flood stages. Minitab software is used for implementing Holt-Winter's model using. The results of ANN are used as feedback for the model. The result of flood forecasting for a time span of one month is shown in Fig. 9, and seasonal forecasting is depicted in Fig. 10. The results show variation in observed and forecasted values. The accuracy assessment parameters, i.e., mean square deviation, mean absolute deviation and mean absolute percentage error, are shown in Table 8.

Comparative analysis and innovation of research
Based on six characteristics, i.e., (i) contribution (ii) application domain (iii) innovative technology used (iv) energy saving (v) real-time aspect (vi) decision-making model, the conceptual model is compared to several relevant research in the flood forecasting domain, as shown in Table 9. It is noted that the proposed system comprises various aspects of flood management, including detection and alert generation, prediction, and forecasting. The studies under consideration are applicable for only one aspect of flood monitoring with limited smart technologies. The proposed framework comprises various smart technologies and covers more flood monitoring aspects. In addition, the study provides a real-time aspect of the conceptual framework with a novel mechanism of energy saving within sensor nodes. Results of proposed prediction and forecasting models compared to similar studies are shown in Tables 10 and 11.

Limitations
The proposed approach considers only specific attributes for which IoT sensors may collect data for flood prediction and forecasting. Several additional flood-related parameters such as river attributes (river channel, riverbed's gradient, and channel shape), soil, vegetation,

Conclusion
This paper proposes an energy-efficient cloud system for flood prediction and forecasting based on IoT. The proposed model effectively determines the prediction and forecasting measures for the particular study area (Kerala, India). This framework is contributed for optimistically data generation with efficiently enhance the sensor lifetime. Dimension reduction algorithm is applied at the fog layer to maintain the optimization of network bandwidth. Moreover, ANN predictive algorithm is produced efficient results with 97.3% sensitivity, 92.4% specificity, and future flood stages are forecasted using Holt Winter's model at the cloud layer. Experimentation results are stored at cloud storage for water management agencies and disaster management groups so that effective measures can be taken on time and reduce the post and during disaster destruction.

Conflict of interest
The authors declare that they have no conflict of interest.