Geostatistical strategy to build spatial coastal-flooding models

ABSTRACT The intensification of flooding events and the management of ecosystems became global issues. The combination of statistical modelling techniques and geospatial analysis represents a promising strategy to provide a holistic representation of the systems. This paper aimed to develop a strategy to build Spatial Coastal-Flooding Models based on evidence of flooding points and environmental and artificial characteristics of the area. The procedure combines statistical techniques, such as PCA, Cluster analysis, ANOVA, OLS regression, and geospatial data obtained from open databases. The geostatistical strategy was applied in Florianópolis city – Brazil. 108 photographic records of flooding were inventoried. The OLS regression analysis constructed three Spatial Coastal-Flooding Models considering eight factors. The analysis specifies the ability of models to explain the flooding in C1, C2, and C3 sub-regions through calculus (76, 68, and 40%). The results find that relationships between environmental and artificial variables and flooding events are not homogeneous over space.


Introduction
Floods are the most significant and most expensive natural disaster to which modern society is exposed. Between 1980 and 2018, the global economic losses exceeded $1 trillion (2018 values), and hundreds of thousands of lives were lost due to floods (Rosenzweig et al. 2021). The combined action between high population density, artificial changes in the environment and climate change further aggravate flooding worldwide (Wahab and Tiong 2016). Recent climate change-related events, such as higher intensity precipitation events, extreme temperatures, and rising sea levels, have been exposing the vulnerability of the urban environment, including the increased flooding risks on a regional scale, especially in coastal areas (Akbarpour and Niksokhan 2018;Durand et al. 2018;Ulysse et al. 2018).
The impact of natural hazards on the coastal environment goes far beyond the political-administrative territorial units. Coastal areas are densely populated and frequently have urban centres located at low terrain elevation, being susceptible to different disasters, such as coastal, river or urban flooding, rising tides, and saltwater intrusion into groundwater (Azevedo de Almeida and Mostafavi 2016;Durand et al. 2018;Malik and Abdalla 2016;Ulysse et al. 2018). Local governments are responsible for managing these aspects within their jurisdiction. However, the task of analysing and proposing prevention and control measures of flooding in coastal areas is complex. Thus, the development of coastal areas management policies requires an understanding of the involved multiple factors and their interactions in the socioeconomic-environmental context (Perrone et al. 2020).
In coastal cities in developing countries these problems become even more critical. The vulnerability to extreme climate events is high and the capacity to manage flooding is low (Ogie, Adam, and Perez 2019). The scarcity of fiscal and technological resources and inadequate urban planning are prevalent in these countries. This reverberate on management and control of disaster risks, which are essentially reactive (Fadel et al. 2018;Kovacs, Doussin, and Gaussens 2017;Wang et al. 2016). With financial restrictions, funds for public projects are generally used for basic needs, affecting flooding management and leading to significant economic and social consequences. In addition, there is mismanagement of public funds, which negatively affects spending on measures to mitigate flooding and minimize the scarcity of monitored data to support decision-making (Ogie, Adam, and Perez 2019). The scenario is no different in Brazil, the fifth largest country in the world in terms of territory and population, which is considered an emerging power with the sixteenth biggest coastal extension.
With about 9000 km of coastline and 17 states bordering the Atlantic Ocean, Brazil has diverse climatic and environmental conditions attributing great potential for coastal studies (Gomes Da Silva et al. 2016). The country constantly suffers from extreme events of precipitation, storms and rising sea levels, and as a result, flooding is a recurring annual problem that affects millions of people (Pezzoli and Cartacho 2013). The lack of historical records and monitored data make it difficult to verify flooding levels in most Brazilian coastal cities. Therefore, flooding management suffers from uncertain planning and inadequate preparation (Gomes Da Silva et al. 2016;Ogie, Adam, and Perez 2019). According to Myronidis, Stathis, and Sapountzis (2016), it is crucial to determine and quantify how flooding hazards increase over time due to coastal zone urbanization, to avoid possible flaws in future urban design. In light of the constraints in allocating government resources, restricted availability of monitoring data, and the need for low complexity computational methodologies in most districts, new strategies that simplify the process and provide a reliable and holistic representation of the system are required.
The development of geospatial technology like Remote Sensing (RS), Geographic Information System (GIS), and Global Positioning System (GPS) open up new vistas to explore both physical and social dynamic geographic phenomena by allowing modelling and examining their relationship, e.g. cause and effect relationship, in a place-based context (Roy 2014). Knowing where people and things are, their location, and their relationship to each other is essential to support strategic priorities and decision-making. However, the integration of spatial technologies with analytical approaches is often desirable to produce improved information. As jointly play a significant role in mapping, monitoring, and managing dynamic geographic processes, such as emergencies and natural disasters (Lü et al. 2018).
Geospatial models are critical tools to statistically investigate the geographic relationship between several explanatory variables and a phenomenon. Modelling involves a mathematical and statistical procedure that types through complex data, and the relationships between data, patterns, and its trends establishing an integration to the existing info into a logical context of equations, functions, and relations to reflect the system actions (Jumaah et al. 2019;Mollalo, Vahedi, and Rivera 2020). The use of spatial modelling provides a promising approach to estimating flooding occurrence parameters for large-scale regions. The models can offer detailed info and accurate maps of susceptibility to flooding, only overlaying the spatial information of the parameters describing the model (Nandi et al. 2016). Besides, spatial models can predict future flooding areas and provide information on the suitability of the site for resilient cities development.
There is a vast library of the combination of statistical modelling techniques and geospatial analysis in providing an effective tool for reducing the complexity of large-scale data set and enabling the identification of relationships between their components (Zhu, Wang, and Rioual 2017). Due to their relative simplicity in data requirements, low cost of operation, and quick execution, these methods have worldwide applicability in a wide range of subject areas (Nandi et al. 2016;Ogie, Adam, and Perez 2019). Examples go from recharge flows or aquifer vulnerability (e.g. Arora and Reddy 2013;Muhammad et al. 2016;Zhu, Wang, and Rioual 2017)  have provided the application of these techniques in pluvial and river flooding management themes (e.g. Nandi et al. 2016;Wang et al. 2016), and none provided the application in coastal flooding planning and control.
Physical processes in coastal environments present many interactions that are not essentially simple to quantify and model (Perrone et al. 2020). Coastal projects should consider the lessons learned from past events (Pezzoli and Cartacho 2013). The relationship between possible environmental and artificial factors with historical events can simplify understanding of flooding occurrences and provide relevant information to urban planning (Wang et al. 2016). The spatial distribution of flooding events depends upon local geological, geographical, topographical, climatological, hydrological, and artificial factors (Nandi et al. 2016). Hence, variables selection that best expresses the behaviour of these environments is a key factor for coastal flooding management.
Based on the vast number of variables involved in flooding studies and the ease of addressing them by geospatial analysis and statistical techniques, a new strategy denominated MEIC (Modelo Espacial de Inundação Costeira -Spatial Coastal-Flooding Model) was developed and is in this paper. MEIC is based on the association between the spatial distribution of flooding points and their relationship with the environmental and artificial characteristics of the area. The innovation is that instead of using rainfall projection by statistical analysis, the strategy combines statistical analysis from documented occurrences of flooding events from alternative databases (e.g. electronic newspaper) transformed into geospatial information. This paper can provide useful knowledge as a primary source. The benefit is seeing how people perceived an event, enabling multiple perspectives about an issue, and permitting researchers to trace the historical development of subjects (Rosenzweig et al. 2021). Levy (2014) discusses data from social science to address the emerging of an amount of information coming from media, social media, and the internet that starts to study as a scientific phenomenon. The knowledge from the flooding consequences from this alternative source assembles with the robustness of the geospatial analysis. In addition, the models seek to fill the information gap on the hydrological, topographical, and geological interactions that occur in the area. Thus, it relates the impact of artificial activities on flooding occurrence. The strategy was applied in a coastal city located in the southern region of Brazil, where there are high environmental vulnerabilities and constant urban and coastal flooding, representing the reality of most coastal cities. The results provide information that can support decision-making by government agencies, seeking socially and economically viable alternatives, considering limited resources, concerning the choice of protection measures and adaptation of existing infrastructure.
MEIC establishes the following basic principles and requirements: • Robustness, ensuring statistically significant relationships based on past events. • Suitable for urbanized coastal areas, requiring flood influencing factors such as sea-level data, micro-drainage network, hydrogeology and groundwater level maps.
• Basic knowledge of Geographic Information System (GIS) and statistical methods requirements. • Data from free and easily accessible online databases. • Low-cost and simplified application.

Materials and methods
The bases for application of the MEIC strategy consist of five steps: (1) data collection of the influencing factors, to create the database for statistical analysis; (2) Principal Component Analysis (PCA), aiming to reduce the number of influencing factors; (3) Cluster Analysis (CA), to group the flooding sites in sub-regions with similar characteristics; (4) Spatial Coastal-Flooding Model, to model, examine, and explore spatial relationships between dependent variables and possible explanatory variables; and (5) Model performance evaluation, aiming to evaluate the statistical quality of the model accuracy and complexity. The conceptual framework of the MEIC strategy is shown in Figure 1.

Data collection of the influencing factors
The focus of the strategy is to explain the flooding occurrence associated mainly with coastal areas. In this way, information regarding flooding events in a previously established area was gathered through documentary analysis in electronic newspapers, television news, and news websites. Social media were not a source of information collection.
The collected information is related to the event date and geographical location. The origin of the flooding events was not distinguished, being considered pluvial, rivers and coastal flooding.
The unconventional information collection method adopted, aims to overcome the lack of monitored data taking advantage of the information mediated by the social system, commonly overlooked in this type of study. In the conventional strategies defined by hydrology (Figure 2a), we start from the monitored environmental and hydrological database, going through multi-criteria analysis and modelling, reaching possible flooding areas. In the MEIC strategy, we perform inverse modelling ( Figure 2b) starting from the real problem, with information filtered by the social system through the publication of catastrophic events, athwart statistical and geospatial modelling techniques, and arriving at the actual source of the problem.
Thus, the data collected is regarding the influencing factors of the environmental and artificial variables that operate in each flooding location inventoried. For this purpose, we adopted 10 influencing factors: rainfall, hydrography, elevation, slope, soil type, land use, and stormwater drainage, revised from literature and systematized in an urban flooding study by , aggregated by sea level, groundwater level, hydrogeology maps added to the present study. A total of 60 distinct flooding points were inventoried, distributed among the principal urban occupation regions of the municipality. All the information was compiled in a spreadsheet, as presented in supplementary materials (Table S1). The influencing factors described by site characteristics (e.g. land use, soil type, and hydrogeology) are converted into numerical values to spatialize and to convert in common scale. The values ranged from 0 to 3 according to the degree of influence on the flooding occurrence and the contribution for the runoff, with 0 representing minimum or no degree and 3 representing a maximum degree or areas naturally flooded (please refer to an example of converting local characteristics to numerical values in supplementary materials - Table S2). The hydrography and stormwater drainage factors, represented by a binary system with 0 attributed to the absence and 1 the presence of the factor at the event local. The values for sea level and rainfall correspond to the maximum level and accumulated precipitation on the day when flooding occurred, respectively. The values of elevation, slope, and aquifer level come from local spatial data.
Land use -The converted numerical values of this factor respect the condition that the lower human interference, the less the chance of flooding. So, areas covered by forest or even undergrowth may considerably have great infiltration capacity, reducing the chances of flooding. The density of vegetation cover is also a differentiating factor between the values. Runoff is more likely in bare fields than in those with good vegetation cover. Urbanized areas, differently, received high value, thence the enormous possibility of flooding befalls due to the high rates of impervious surface. If sustainable urbanization accomplished of sustainable urban drainage systems is present, the values should be different, not the case in our application. The maximum value goes to areas covered by water bodies since they are naturally flooding.
Soil type -Soil texture has an impact on flooding occurrence. Sandy soils tend to have greater infiltration rates and better internal drainage, opposite characteristics from clay soils. The classification of the soil-type factor contemplates four categories based on the water absorption conditions, depth, and texture. The Acrisols class was considered more susceptible to flooding than the Arenosols, since its composition has a more considerable amount of clay, making it difficult to infiltrate. The Cambisols class received the lowest value due to its sandy character and location mainly associated with massif areas. It is important to note that mangrove discriminated soil received the maximum value since they are also naturally flooded.
Hydrogeology -The converted numerical values respect the condition of hydraulics and soil permeability. So, areas with unconsolidated aquifers of sedimentary bedrock were considered less susceptible to flooding than consolidated aquifers of crystalline bedrock, receiving less value, given its coarse texture and high permeability that favours the aquifer recharge. Areas classified as Aquiclude received the lowest value due to being directly associated with the highest elevation areas of the land (massif).

Principal components analysis
The PCA technique was applied to quantify the main relationships between the environmental and artificial variables and obtain principal components (PCs) that are the most representative of the events. PCA analysis aimed to reduce the number of possible correlated factors into a smaller number of vectors (Luo et al. 2019). The use of PCA as a precursor to cluster analysis, as it separates the factors that were likely to display collinearity and lead to a more stable clustering result (Nandi et al. 2016) Initially, z-score normalization standardizes the numerical values of each factor to reduce the impacts of magnitude and variability. According to Myronidis and Ivanova (2020), the standardization of the independent variables ensures that the latter variables had equal weights during the PCA analysis. Subsequently, the PCs were derived using Eq. (1). where E is the matrix of Eigenvectors [e 1 ; e 2 ; e 3 ; . . . e m � and X = [X 1 ; X 2 ; X 3 ; . . . X m � is a variable vector. PC are selected successively to account for the maximum variability in the data (Wahab and Tiong 2016). The Kaiser criterion is the base for the number of components to keep, for which only the components with eigenvalues greater than 1 are retained (Zhu, Wang, and Rioual 2017).

Cluster analysis
Several types of hydrological, topographical, geological, and artificial variables can have different influences on coastal environments and local urban infrastructures, and thus the frequency and severity of flooding can vary significantly. The interactions among multiple categories of variables should not be neglected or underestimated (Wang et al. 2015). Intending to determine the grouping variable, Pearson (r) parametric correlation method and Kendall's tau (τ) and Spearman's Rho (ρ) rank non-parametric correlations were applied. The application of statistical methods is commonly used in hydrology to measure the dependence between variables (Tosunoğlu and Onof 2017). The Pearson correlation method always requires normal data distribution, but there is no restriction for Kendall's tau use (more accurate with smaller sample sizes) and Spearman's Rho (sensitive to error and discrepancies in data) non-parametric correlations. Thus, the three statistical correlation methods were used together in order to ensure that if the assumptions of the parametric test are violated, we still can choose at least one non-parametric alternative as a backup analysis. The Spatially Constrained Multivariate Clustering tool was used to sequentially cluster flooding sites. By adopting multiobjective optimization, this tool finds spatially contiguous clusters of features based on attributes similarity and location similarity. Spatial clustering considering geographic coordinate information is an analytical method that divides the whole dataset into groups looking for a solution maximizing clusters (Kim and Cho 2019). The different sub-regions were clustered based on the similarity of the grouping variable and the number of flooding occurrences in each location. The flooding sites were partitioned by the clustering algorithms based on the K-means method widely used in geoscience studies, for example, grouping the level of damage from natural disasters and risk management (Xie et al. 2018). These algorithms, whose advantage is the simple implementation in search of clusters, minimize the sum of squared errors within each group and therefore is implicitly based on pairwise Euclidean distance (Haaf and Barthel 2018). Convergence of algorithm indicates the solution is locally ideal when partitioning reduce to the maximum extent or when the relocation of the cluster occurs (Sehgal et al. 2018). Cluster centroids are calculated based on the average of all observations, where a factor inclusion happens only if at least one other cluster member is a nearestneighbour (Xie et al. 2018).
Analysis of Variance (ANOVA) technique and Tukey's test was applied, with a 0.05 significance level, in order to check if there is a significant difference between the clusters. The null hypothesis assumes that the mean (average) value of those variables for each cluster is equal, while the alternative hypothesis states that, at least, one is different. All statistics analysis were performed using the STATISTIC® 8.0 software, and the spatial constraints and models were developed in the ArcGIS® 10.1 software. More detailed information about correlation methods, cluster analysis (CA) and ANOVA can be obtained from (Haaf and Barthel 2018;Sehgal et al. 2018;Tosunoğlu and Onof 2017;Wang et al. 2015;Xie et al. 2018).

Spatial coastal-flooding model
Ordinary Least Squares (OLS) is the best known of all regression techniques, and it has been the starting point for all spatial regression analysis (Jumaah et al. 2019). This analysis allows modelling, examining, and exploring spatial relationships between dependent variables and possible explanatory variables.
The OLS regression model assumes that the spatial relationships between dependent and independent variables are static, i.e. it does not vary over space (Wang et al. 2016). Thus, the method provides a global model of the factors that need comprehension by creating the Spatial Coastal-Flooding Model (MEIC), which represents each cluster by a single regression equation. OLS regression for k independent factors is specified as where Y is the dependent variable that explains spatial variables X 1 ; X 2 ; X 3 ; . . . X k , that is, the input factors causing the flooding; β 0 is the regression intercept that represents the expected value for the dependent variable if all the independent variables are zero; and β k is the respective regression coefficient for an explanatory variable (X 1 ; X 2 ; X 3 ; . . . X k ).

Model performances evaluation
One of the most crucial steps of the MEIC strategy is the performance evaluation of the obtained Spatial Coastal-Flooding Model. The goal of this step was to evaluate the statistical quality of the model accuracy and complexity, in addition to its ability to deal with spatial autocorrelations. The values of R 2 (Multiple R-Squared) and AICc (Akaike's Information Criterion) evaluates the MEIC performance. R 2 indicates the ability of the model to explain the variance in the dependent variable, so a higher R 2 implies a better model performance. The AICc indicates the model accuracy and complexity, so a lower AICc value indicates a closer approximation to reality (Mollalo, Vahedi, and Rivera 2020). The variance inflation factor (VIF) and Probability or Robust Probability calculus determine any multi-collinearity among the explanatory factors of the model. For instance, probability values for the used factors in the equation must be less than 0.05, indicating that none of the explanatory variables affects the dependent variable (Mollalo, Vahedi, and Rivera 2020). In addition, a T-test assesses whether the explanatory variable is statistically significant.
Koenker (BP) statistic (Koenker's studentized Bruesch-Pagan statistic) evaluated the stationarity of the model and the Jarque-Bera statistic determined the normality of the computed residues at a 95% confidence level. Finally, Spatial Autocorrelation (Moran's Index) checked the spatially random regression residual. The value of Moran's Index range from −1 to 1, where a value of 0 indicates perfect spatial randomness. If there is significant spatial autocorrelation in a regression model, it violates the assumption of randomly distributed and independent residues (Mollalo, Vahedi, and Rivera 2020). Thus, the model efficiency would be considered suspect, and the results would not be reliable.

A case study demonstrating the MEIC strategy application and performance evaluation
The study area is Florianópolis city, capital of Santa Catarina State, in the southern region of Brazil. Florianópolis is one of the most developed cities in the country, with the third-highest human development index score (0.847) among all municipalities (Guerra et al. 2017;Yigitcanlar et al. 2018), ranks as 2nd most populous municipally in the State (approximately 500,000 inhabitants). The city extends over an area of 432 km 2 surrounded by the Atlantic Ocean, is composed of the main Island (97% of the territory), a small continental part (approximately 3%), and uninhabited Islands surrounding it. Florianópolis has a diversity of ecosystems, rich in beaches, lagoons, lakes, dunes, sandbanks, marshes, and mangroves, with 42% of the territory established as a permanent preservation area. In this way, almost the entire urbanized portion, that is, without physical limitations for its implementation, has already been occupied (IDOM-COBRAPE 2015b).
The municipally benefits from precipitation driven by orographic factors, where the indexes are high variables (from 1100 to 2700 mm/year) and more intense during the summer. The density of water resources associated with the vast sedimentary plains and favourable rainfall conditions make the city prone to constant flooding, causing social and economic disruption . The authors did not find any scientific publications or technical reports that could provide valuable information on the spatial dimension of past flooding in the region. On the other hand, several field photographs that were captured during the flooding events provide a spatial notion of flooded areas. Figure 3 shows some flooded areas in events of past flooding in Florianópolis city.
The municipality has three well-defined hydrogeological systems based on bedrock lithology and its potential to store and transport water. These are (1) consolidated aquifer of crystalline basement made by intensely fractured granulitegneiss (water capture flow ranging between 2.0 and 9.0 m 3 / h); (2) unconsolidated sedimentary aquifer of marine and coastal deposits (water capture flow ranging between 20.0 and 90.0 m 3 /h); and (3) granulite-gneiss Aquiclude (unfavourable zone for wells) (CPRM 2013; IDOM-COBRAPE 2015a; Rama and Miotliński 2020). Thus, the region stands out due to its excellent underground water potential, contributing part of the municipality's public water supply. However, the shallow-free aquifers close to mangrove swamps and a complex estuarine system subject to tides give high environmental vulnerability to the coastal system. In addition to its high environmental vulnerability, the municipally has poor coverage of the stormwater drainage system (characterized as absolute separator) that regularly exceeds the volumetric capacity of drainage networks, overflowing to backyards, streets, sidewalks or other urban infrastructure .
For the MEIC strategy application, we inventoried the allocations of flooding events in Florianópolis from 2013 to 2019, using documentary records published in local media. The raw data consisted of 108 pieces of photographic evidence, where each evidence described an inundated site during a specific flooding event, totalling 60 distinct points distributed among the central regions of urban occupation (Figure 4). Data of duration and depth of flooding for each event were not available in most records. Therefore, only the inundation frequency described the flooding occurrence in Florianópolis City.
A spatial database for the Florianópolis City contemplates information about 6 of the 10 flooding influencing factors for all points inventoried. In addition, missing spatial data such as maximum daily sea level, daily accumulated rainfall, land use, and the presence of a stormwater system were investigated in loco or on a nearby gauge station. The information that composes the database referring to the ten flooding influencing factors was collected from online and free databases, as shown in Table 1. Figure 5 shows the spatial data used in 6 of the 10 flooding influencing factors considered in the strategy.

Factors affecting the flooding occurrence
From the standardized environmental and artificial variables in this study, the principal components came from the correlation matrix, which computed the 10 flooding influencing factors. As a result, only the first four components extracted conformed with the Kaiser criterion, presenting eigenvalues greater than 1. The components explain 70% of the total variance in the spatial and temporal distribution of the flooding. The Varimax normalized rotation maximized these four principal axes variance (per Zhu, Wang, and Rioual 2017). Table 2 presents the interactions between factors for these components and their respective variance.  Each variable loaded strongly onto only one of four principal components, showing a clear identifiable relationship with flooding occurrence.
The first component (PC1) explains the amount of the variance (27.5%), characterized by the highest loading to hydrography, soil type, and land use (loading >0.6). They also have the highest negative correlation with the elevation factor. So, sites with hydrography, clay soils (Acrisols), densely urbanized, and low terrain elevation tend to have a higher occurrence of flooding events. Nandi et al. (2016) and Wang et al. (2016) supported these findings and identified land elevation, slope angle, and distance from the hydrography as determining variables for flooding occurrence. According to Nandi et al. (2016), 83% of flooding events occur within 500 m from the hydrography in the floodplains, being lowlands with an elevation between 100 and 200 m above the sea level were found to be the most susceptible areas to flooding. PC1 reflects all types of environmental and artificial variables established in the study. Because of the association between the land use factors, elevation and hydrography, component 1 was defined as the component 'urbanization' referring to the process of development and urban occupation.
PC2 contains only hydrological variables, explaining 17.1% of the variance. This component was defined as the component 'storm surge' because of its highly positive loading with the sea-level factor. This positive loading is possibly consistent with the increase in storm surge events, rising tide, and sea-level rise reported for the entire south coast of Brazil (Gomes Da Silva et al. 2016). However, this component still presented a highly negative loading with the rainfall factor. The negative loading for the rainfall, opposed to the positive of the sea level, may come from low levels of accumulated daily rainfall (less than 2 mm), which characterize approximately 53% of the inventoried events. These low rainfall levels are directly associated with  information on the rise in the maximum daily sea level, which in these events range between 50 and 100 cm above the reference. Component PC3 was defined as the 'underground' component because of its positive loadings with the hydrogeology factor and groundwater level, the only two underground components adopted in the study. This component contains 15.5% of the variance, suggesting that the studied flooding events have the presence of fractured crystalline aquifers and the superficiality of groundwater level in the study area influence. It also indicates that all of the studied flooding sites are to some extent reliant or related to the underground water flow of the coastal area. These findings are supported by Nandi et al. (2016), who also identified the occurrence of flooding more frequently over aquifer rocks such as basal, coastal, and alluvial aquifers, where excess runoff could not drain simply into the subsurface due to impermeable bedrock. The porosity and permeability (hydraulic properties) of bedrock may influence flooding potential. Less porous, compact, and impermeable aquifer rocks make an infiltration of rainwater difficult, thus increasing flooding potential.
Finally, component PC4 contains 10.5% of the variance, and its definition as the 'stormwater' component comes from its positive loading in stormwater drainage factor and slope, the two factors often related to the urban stormwater drainage system. Therefore, sites with a predominance of large areas of coastal plains (slope of 0-3%) and that do not have a stormwater drainage system tend to have more cases of Figure 6. Flooding sites locations and classification according to the spatial constraint multivariate clustering (K-means).
flooding. These findings are supported by Wang et al. (2016) and Gaitan, Ten Veldhuis, and van de Giesen (2015), who found that local blockage of stormwater inlets, or the nonexistence of these, is the major reported cause of urban flooding incidents, occurring even during small rainfall events. Wang et al. (2016) add that improper urban planning leads to the development of flood-prone areas and poor drainage system design in which insufficient drainage capacity occurs in flat areas.

Spatial clusters of flooding sites
After applying PCA and identifying the minimum number of factors for constructing the spatial coastal-flooding model, we determine the grouping variable, for example, the factor that has the highest significant statistical correlation. This grouping variable will be used together with the number of flooding occurrences, thus defining the conditions of spatial constriction and avoiding multicollinearity between the influencing factors. Parametric and non-parametric correlation coefficients observed in the study area for all factors influencing the flooding occurrence are in the supplementary materials (Table S3). The influencing factors had both positive and negative correlations. The highest significant correlation of 0.760 was obtained between hydrography and soil type, while the lowest significant correlation of −0.168 was between hydrography and groundwater level. Trends of the influencing factors were considered statistically significant at the 5% significance level using the Pearson test, Kendall's tau test, and Spearman's Rho test. The results of parametric and non-parametric tests confirm the strong correlation between the hydrography and soiltype factors. Since both factors do not vary in time, only in space, the hydrography factor was the grouping variable. The general results of the PCA substantiate the choice of this factor between two correlates, where hydrography presented the highest loading among all the factors analysed.
The frequency of flooding at different spatial scales changes depending on geographical locations, environmental characteristics, artificial alterations, and local hydrological factors. Three clusters came from the K-means, based on the number of events and the grouping variable. Figure 6 shows the geographical allocation of the three clusters (C1, C2, and C3). The cluster C1 corresponds to the area between the Massifs under the influence of the largest mangrove in the region (Rio Tavares). The cluster C2 corresponds to the North and West of Florianópolis under the influence of the Ratones, Saco Grande and Itacorubi mangroves. The cluster C3 corresponds to the East and South areas of Florianópolis, influenced by lakes, lagoons, and outcrops of the aquifer.
The results of ANOVA and the Tukey test (please refer to supplementary materials Table S4 and Table S5, respectively) show that these three clusters are possibly heterogeneous regions (F 0 > F 0.05,2,105 = 3.09), being the clusters statistically different from each other at the 0.05 significance level.

Spatial coastal-flooding model and performance evaluation
Ordinary Least Squares regression explores all possible combinations of the 10 influencing factors to obtain an observed flooding occurrence for each sub-region explanation. Summary of the best regression results in C1, C2, and C3 subregions is given in Table 3. We develop three analytical models of the spatial distributions of coastal flooding after the regression analysis. The developed modelling equations for the C1, C2, and C3 subregions are in Equations (3) where MEIC represents the Spatial Coastal-Flooding Model, S (cm) represents the sea level, R (mm) represents the rainfall, H represents the hydrography, G represents Hydrogeology, L represents the land use, T represents the soil type, E (m) represents the elevation, and D represents the stormwater drainage. These equations do not produce the absolute values of the occurrence of coastal flooding but denote the closed or near approximated to the real value. The significance of coefficients represents the best optimization of value that can be estimated (Jumaah et al. 2019). Among the 10 factors considered, 8 were able to explain the flooding that occurred in the sub-regions. Being excluded by the regression models are the Groundwater level and Slope factors. The factors maintained by the regression models were provided by different combinations, with only one constant of the hydrological type variables in all sub-regions.
The MEIC C1 performed well in the spatial modelling of coastal flooding events for the sub-region C1, explaining 76% of the occurrences (R 2 = 0.763) with the best approximation to reality (AICc = 41.858) among all models built. The VIF of these factors was relatively low, ranging from 3.967 to 6.119, which indicates no significant multi-collinearity. Additionally, the explanatory variable were all statistically significant (p < 0.05). The model also ensured stationarity (Koenker = 5.806), normality (Jarque-Bera = 4.460), and spatial randomness of residues (Moran's I = 0.086), demonstrating that it is adequate to explain the relationships between the number of floods and the influencing factors.
The results suggest that the sub-region C1 is mainly affected by variables of the hydrological and geological type. The flooding events come from sea-level rise, high rainfall levels, local hydrography, and porous bedrock (unconsolidated sedimentary aquifer). Despite the high permeability of the rocky basement, it has low soil depth (0-4 m) and is considered an aquifer recharge area with several outcrop points. It should be cognizant that this sub-region localizes between the massifs under the influence of the largest mangrove in the region (Rio Tavares) and faces increasing pressure from urbanization, with disorderly expansion and irregular occupation of environmental preservation areas. It is noteworthy that this area is home to a lower-income population, being inland, with no scenic attractions, close to the Florianópolis International Airport, and access highway to the southern sector of the Island (IDOM-COBRAPE 2015b).
A coastal flooding study conducted in the municipality indicates that between 1.4 and 1.6 km 2 of urbanized coastal areas lie within the flooding level of 1 m by the end of the century, most of which by residential development occupation (IDOM-COBRAPE 2015a). The study also estimates that, among the flooded zones in this scenario, are some areas recognized for their high ecological and scenic value, such as mangroves, lagoons, and sandbanks. A similar coastal flooding study conducted in the Santos Bay -SP, showed a loss of 26.3% of the riparian zone of existing mangroves was identified until 2100, corresponding to 27.4 km 2 (Alfredini and Arasaki 2018). According to the authors, without this retention area, a larger amount of sediment will be carried to the inner nautical areas of Santos Port, silting and increasing the dredging volumes of maintenance. Thus, the relevance of mangroves is highlighted as an ecosystem-based solution for climate resilience of coastal cities, constituting the first barrier to rising sea levels. They help to ameliorate the coastal erosion and the impact of extreme events.
The model constructed for the sub-region C1 still indicates the prioritization of investments for retention and temporary storage of runoff. Implementation of detention basins, linear parks and multifunctional civic spaces is indicated, providing amenity benefits while delivering runoff flow attenuation and biodiversity benefits. Coastal engineering constructions that protect the region from advancing sea levels, such as storm surge barriers or tide gates, are also indicated. Those coastal defence measures are often chosen as an alternative to close off estuaries, and it establishes at various locations around the world, for example, The Netherlands, New Orleans, Singapore, St. Petersburg, Venice (Jonkman et al. 2013;Mooyaart and Jonkman 2017).
The MEIC C2 performed well in the spatial modelling of coastal flooding events for the sub-region C2, explaining 68% of the occurrences (R 2 = 0.680) with good approximation to reality (AICc = 98.105). The VIF of these factors was low, ranging from 1.277 to 2.805, which indicates no significant multicollinearity. Additionally, the explanatory variables were all statistically significant (p < 0.05). The model also ensured stationarity (Koenker = 4.242), normality (Jarque-Bera = 4.970), and spatial randomness of residues (Moran's I = −0.041), demonstrating that it is adequate to explain the relationships between the number of floods and the influencing factors.
Hydrological, geological, and artificial variables affect the subregion C2. The absence of local hydrography impacts the flooding in this sub-region. The geological constitution of the soil presents a predominance of sandy texture (Arenosol). Flooding in this region relates to some areas with a predominance of fractured crystalline bedrock and increasing impervious surfaces by urbanization. The porosity and permeability of bedrock may influence potential flooding. Highly porous rocks aid the rain infiltration, reducing flooding potential, whereas the opposite is with lesser porosity, compact, and impermeable bedrock (Nandi et al. 2016). A highlight is that this sub-region presents lowlands with an elevation between 0 and 4 m above sea level, flat slope (0-3%), and predominantly urbanized with some builds on mangrove areas. Approximately 76% of the municipal population estimated for the year 2020 resides in sub-region C2. In the central sector, 275,288 inhabitants are concentrated on a densely urbanized agglomeration, while in the northern sector, 102,721 inhabitants are distributed in small agglomerates (IDOM-COBRAPE 2015b). Although most of the population resides in this sub-region, the flooding severity is still associated with the occupation of risk areas close to mangroves. According to IDOM-COBRAPE (2015a), the main highways in the region are susceptible to flooding resulting from return periods of 10 to 25 years due to its proximity to water resources and mangroves. Thus, the model constructed for the sub-region C2 indicates the need to prioritize investments aimed at the storage and infiltration of runoff (e.g. compensatory techniques) and control of local increase of impervious rates. Infiltration swale and wells, rain gardens, green roofs, and permeable paving implementations are indicated to overflow management. However, urban compensatory techniques always take a form that responds to the location, character, drivers, and opportunities associated with the site.
Finally, MEIC C3 showed poor performance in the spatial modelling of coastal flooding events for the sub-region C3, explaining only 40% of the occurrences (R 2 = 0.402) with good approximation to reality (AICc = 68.637). The VIF of these factors was deep low, ranging from 1.025 to 1.044, which indicates no significant multi-collinearity. Additionally, the explanatory variable were all statistically significant (p < 0.05). The model ensured normality (Jarque-Bera = 4.970) and spatial randomness of residues (Moran's I = −0.041); however, it failed in stationarity (Koenker = 28.269), not ensure the consistency of relationships in space.
In sub-region C3, topographical variables replace geological variables. Flooding events in this sub-region are positively impacted by low local precipitation and negatively by the low elevation of the terrain and poor coverage of the stormwater drainage system. Approximately 16% of the municipal population (77,969 inhabitants) estimated for the year 2020 resides in sub-region C3. Despite the low population density and lower level of consolidation, the region has a permanent resident population with predominance of single-family houses, consuming large urban areas (IDOM-COBRAPE 2015b). According to , the sub-region C3 faces increasing pressure from urbanization, with disorderly expansion and low coverage (only 13%) of an undersized drainage network system characterized as an absolute separator. The model of this region indicates the need to prioritize investment for a stormwater drainage system implementation. It is noteworthy that due to the low terrain elevation and the superficiality of the groundwater level, this region has many areas of recharge and the outcrop of the aquifer. Storm surge and erosion are constantly observed in the region and require careful coastal planning.
The low performance of the MEIC C3 model suggests the need for more environmental and artificial variables considered detailed analysis. One of the hypotheses is a subsidence process in this sub-region, considering the constant groundwater extraction by the local water supply company and the local community itself.

Conclusions
We highlight in this paper a strategy based on sociohydrological principles, which considers the population as a decisive part of the system, contributing to the observation, understanding, and dissemination of phenomena in the places where the people live. We emphasize that hydrology can be improved by reconstructing and studying the past, complementing temporal and spatial analyses through human databases (social sphere), bringing essential contributions where the conventional system has failed.
The results highlight the potential of the proposed strategy to spatially modelling flooding in coastal areas using information available in free and easily accessible online databases. The first results are promising and show that the MEIC strategy can be a statistically robust and effective tool for city planning. In addition, the combination of statistical techniques and geospatial analysis has demonstrated the good capacity to explain the spatial distribution of the flooding points and their relationship with the characteristics of each location.
With 10 available explanatory factors, the best MEIC model can explain 76% of the distribution of flooding events, with an AICc of 41.858. The worst performance of the MEIC model can still explain 40% of the flooding events, showing significant spatial autocorrelation in the residues, which reveals both the complexity of coastal flooding and the limitations of the strategy. The novelty and discovery of our study are that relationships between environmental and artificial variables and flooding are not homogeneous.
In addition to the primary function of the MEIC strategy, analyses of the distribution of classes for each influencing factor within their sub-regions provide valuable information. However, it is important to stress that the spatial data sets may be subject to an error that may affect the final quality of the results. In the case of higher quality input data unavailability (for example, increased accuracy and spatial resolution), influencing factors data improvement is possible when including validation in loco of the information.
For future studies, research efforts could focus on including other influencing factors (e.g. the process of local subsidence and coastal erosion), taking further advantage of the versatility of strategy. In addition, other values of absolute loadings can restrict the flooding influence factors, for example, the consideration of only factors with correlation loadings above 0.75 in each PC. Another characteristic to be considered for a better performance of the strategy would be to include the accumulated precipitation of two to five days in advance, thus integrating the soil moisture conditions.