The recent report on human losses from disasters caused by natural hazards, based on the EM-DAT data from 2000 to 2019 (CRED & UNDRR, 2020) includes flooding within the hydrological hazards-type, along with landslides and wave action. Floods alone were responsible for 44% of all occurrences, 41% of affected people, 9% of casualties (representing 104,614 persons) and 22% of the economic losses (US$ 651 billion) recorded in the EM-DAT (CRED & UNDRR, 2020).
Natural and human environments, as in many other risks, intertwin significantly to generate flood risk (Zischg et al., 2018). The spatial relation of these environments is usually expressed by hazard, exposure and vulnerability with emphasis on social vulnerability (Koks et al., 2015; Santos et al., 2020). Several approaches are based on the traditional and current understanding of the risk concept (UNDRR, 2019) and define risk as a product of hazard, exposure and vulnerability. Other approaches perform a statistical analysis of the patterns of flood damages along with the characteristics of the hazard and exposure (Mazzoleni et al., 2020). Independently of the approach, databases of past floods and respective damages are essential in the calibration and validation process of flood hazard and risk models (e.g. Khosravi et al., 2019; Li et al., 2012; Termeh et al., 2018; Santos et al., 2019).
An increasingly number of global flood risk indexes has been produced by academic institutions and the business sector. The calculation of such indexes is taking advantage of the high availability of Earth Observation products and cloud computing capacity, producing static risk assessments (Assteerawatt et al., 2016; Phongsapan et al., 2019; Ward et al., 2013, 2015; Wing et al., 2018) or near-real time updated risk assessments (Dottori et al., 2017; Todini, 1999). Even so, resolution and diversity of input data representing exposure and vulnerability varies significantly among the available risk models according to the scale of the analysis.
Flood risk indexes have been designed worldwide to support risk characterization, analysis and management. In the Portuguese context, a static flood risk index was recently proposed at the municipality level, in support of strategic levels of decision in flood risk management (Santos et al., 2020). Subsequently, the research team involved in that work identified the usefulness of deepening the understanding of the risk drivers – susceptibility, exposure and vulnerability – in comparison with the historical records at a larger scale (parish). Such approach would be implemented using Geographic Information Systems (GIS) and machine learning.
The mix of GIS and other techniques like multivariate statistics, multicriteria analysis, physically-based and machine learning models is recognised to be appropriate to flood analysis and modelling (Arabameri et al., 2020; Bui et al., 2019). In this context, several methods have been used.
Regarding statistically-based methods, one can highlight the use of bivariate methods as the weights of evidence (Tehrany et al., 2014) and the frequency ratio (Samanta et al., 2018; Siahkamari et al., 2018). Other data driven methods include the entropy index (Hong et al., 2018), the statistical index (Khosravi et al., 2016), the evidential belief function (Bui et al., 2019), logistic regression (Ali et al., 2020) and the k-nearest neighbour (Costache et al., 2020; Costache et al., 2020; Costache et al., 2020). However, these procedures greatly rely on the relation between dependent and independent variables and are heavily influenced by the datasets size (McLay et al., 2001).
As for the multicriteria analysis to evaluate floods, several methods have also been used (Souissi et al., 2020), like the analytical network process (Cao et al., 2016), the analytical hierarchy process (Ghosh & Kar, 2018; Luu et al., 2018; Tang et al., 2018), the simple additive weighting (Khosravi et al., 2019) and the technique for order preference by similarity to ideal solution (Khosravi et al., 2019). These approaches function upon expert knowledge that can be twisted by confuse rulings and ambiguity (Miles & Snow, 1984).
More recent algorithms include machine learning techniques (Termeh et al., 2018; Zhao et al., 2018), Naive Bayes (Chen et al., 2020), decision tree models like random forest (Lee et al., 2017), artificial neural networks (Chapi et al., 2017), support vector machines (Choubin et al., 2019), support vector machine neuro-fuzzy inference system (Wang et al., 2019) and deep learning neural networks (Bui et al., 2020).
As far as we know, within the diverse decision tree models, like Random Forest (RF), Quick Unbiased and Efficient Statistic Tree, Classification and Regression Trees (CART) and Chi-squared Automatic Interaction Detection (CHAID), only RF (Lee et al., 2017) and CHAID (Tehrany et al., 2013) have been applied until now in flood analysis and modelling.
In this work, the CART algorithm is used because this method has proved to work well on procedures with nonlinear behaviour and substantial inner heterogeneity (Ji et al., 2013). The CART has a number of proficiencies, like the insensitivity to outliers and data spatial distribution, the capability of integrating both categorical and continuous variables in the model and the ability of using several trees to characterize the modelling processes (Choubin et al., 2018).
The main objective of this paper is to understand the drivers of flood disaster risk at the parish level in the Northern region of Portugal. This study has the following specific objectives:
a) To identify the role of susceptibility, exposure and vulnerability, as the main drivers of flood risk, in justifying human losses and damages caused by floods in the XXI century (2000–2015), suggesting a classification of parishes based on that role;
b) To discuss the selection of the flood risk areas under the framework of the Floods Directive, according to the specific dominant disaster flood driving forces in each parish.