Risk assessment of dammed lakes in China based on Bayesian network

Scientific risk assessment of dammed lakes is vitally important for emergency response planning. In this study, based on the evolution process of the disaster chain, the logic topology structure of dammed lake risk was developed. Then, a quantitative risk assessment model of dammed lake using Bayesian network is developed, which includes three modules of dammed lake hazard evaluation, outburst flood routing simulation, and loss assessment. In the model, the network nodes of each module were quantified using statistical data, empirical model, logical inference, and Monte Carlo method. The failure probability of a dammed lake, and the losses of life and property were calculated. This can be multiplied to assess the risk a dammed lake imposes after the uniformization of each loss type. Based on the socio-economic development and longevity statistics of dammed lakes, a risk-level classification method for dammed lakes is proposed. The Baige dammed lake, which emerged in China in 2018, was chosen as a case study and a risk assessment was conducted. The obtained results showed that the comprehensive risk index of Baige dammed lake is 0.7339 under the condition without manual intervention, identifying it as the extra-high level according to the classification. These results are in accordance with the actual condition, which corroborates the reasonability of the proposed model. The model can quickly and quantitatively evaluate the overall risk of a dammed lake and provide a reference for decision-making in a rapid emergency response scenario.


Introduction
Triggered by precipitation events or earthquakes, dammed lakes often form in mountainous areas where water is stored after landslides have blocked valleys and rivers (Costa and Schuster 1988;Zhong et al. 2018). Based on the investigation of the longevity of 352 dammed lake cases around the world, Shen et al. (2020) reported that 84.4% of landslide dams lasted less 1 year, 80.4% lasted less than 6 months, 68.2% lasted less than 1 month, and 48.3% lasted less than 1 week. If a dammed lake breaks, it often causes serious floods that pose a huge threat to the safety of people's lives and properties in downstream areas (Korup 2002;Ermini and Casagli 2003;Strom 2010). A scientific risk assessment of these dammed lakes is conducive to an appropriate emergency response, which can help reduce loss of life and property.
There are many definitions of natural disaster risk, which leads to different expressions of the risk degree. UNDRO (1991) defined risk as the product of hazard and vulnerability, which has been recognized by many scholars. ISO (2018) has a similar definition, which describes risk as the combination of the likelihood of a risk event and its consequences. According to the definitions, the risk assessment of a dammed lake can be quantified as the expected value of the consequences, which is the product of the failure probability of the dammed lake and the expected losses.
Since the 1980s, considerable research has focused on risk assessment of dammed lakes. After decades of development, great progress has been achieved in theoretical and practical aspects. Most studies have focused on the hazard evaluation of dammed lakes, outburst flood of dammed lakes, and loss assessment due to dammed lake breaching (Costa and Schuster 1988;Korup 2002;Liu et al. 2019;Fan et al. 2019Fan et al. , 2020Zhong et al. 2021). Geomorphological index-based methods for landslide dam stability are the most popular algorithms for hazard evaluation of dammed lakes; further, the particle composition of landslide dam material has recently been taken into account. Utilizing datasets for formedstable and formed-unstable landslide dams, researchers have developed a series of formulas and judgement criteria based on mathematical statistics methods to rapidly evaluate landslide dam stability. Methods based on mathematical statistics can be divided into two categories, namely multiple regression methods (i.e., Casagli and Ermini 1999;Ermini and Casagli 2003;Korup 2004;Stefanelli et al. 2016) and logistic regression methods (i.e., Dong et al. 2011;Shan et al. 2020). To predict the outburst flood of a dammed lake, the numerical simulation process includes dammed lake breaching and the outburst flood routing. For dammed lake breaching, numerical models are generally divided into three categories (ASCE/EWRI Task Committee on Dam/Levee Breach 2011), namely empirical models (i.e., Walder and O'Connor 1997;Peng and Zhang 2012a;Shan et al. 2022), simplified physically based models (Chang and Zhang 2010;Chen et al. 2015;Zhong et al. 2020a), and detailed physically based models (Wu et al. 2012;Cristo et al. 2016;Cantero-Chinchilla et al., 2016;Takayama et al. 2021). For outburst flood routing, there are common software programs that can be used, such as FLDWAV (Fread and Lewis 1988), DHI MIKE FLOOD (Danish Hydraulic Institute 2021), and HEC-RAS (US Army Corps of Engineers 2021). In addition, some detailed physically based models can be used (i.e., Li et al. 2021). Loss assessment can be divided into two aspects: loss of life (Graham 1999;Jonkman and Penning-Rowsell 2008;Peng and Zhang 2012b;Mahmoud et al. 2020) and loss of property (Das and Lee 1988;Li et al. 2012).
The above approaches for hazard evaluation, outburst flood, and loss assessment of dammed lakes are independent of each other and not highly quantified. In recent years, integrated outburst flood and loss assessment and deterministic analysis methods have been used to assess dammed lake risk (Shi et al. 2016;Yang et al. 2022). Furthermore, uncertainty analysis methods, such as the fuzzy comprehensive method (Liao et al. 2018) and the fuzzy analytic hierarchy process (Xue et al. 2021), can also be used in risk assessment. However, the risk assessment of the dammed lake has not yet been developed into a complete system and rigorous theoretical framework. A thorough study of risk assessment is still needed to be conducted, particularly one that focuses on advancing the disaster chain evolution because of the damming of rivers by landslides.
In this study, a quantitatively risk assessment of dammed lakes was conducted based on a Bayesian network, which mainly includes four steps: (1) Hazard assessment of dammed lakes to quantify their failure probability; (2) Flood routing simulation to obtain the inundation situation of the outburst flood; (3) Quantitative assessment and uniformization of losses in the inundated area; and (4) Utilizing the product of failure probability and losses to assess the risk of dammed lakes based on the risk-grading standard.

Bayesian network
The Bayesian network is a directed acyclic graph that represents the probability relationship between variables. It can be considered as a combination of probability and graph theories (Pearl 2003;Aguilera et al. 2011;Cai et al. 2021a). The nodes in the network represent the random variables, and the directed connections between the nodes represent the conditional causal relationships between the random variables. Each node corresponds to a conditional probability that represents the strength of the relationship between that variable and its parent node. Nodes without a parent node are expressed informally with prior probabilities (Cai et al. 2020a). According to the theorem of the Bayesian network (Jensen 2001), the joint probability can be calculated as follows: where Pa(X i ) is the parent set of X i . If X i has no parents, the function is reduced to the unconditional probability of P(x i ).
In recent years, Bayesian networks have been widely applied to engineering risk analysis (Khakzad et al. 2013;Zhang et al. 2021;Cai et al. 2020b, c). Applications in dam engineering are also reported (Smith 2006;Zhu et al. 2021;Peng and Zhang 2012a, c). To apply the Bayesian network to the risk assessment of dammed lakes, the causal relationship among the influencing factors should first be determined. In this study, based on the Bayesian software Netica Application 6.08 (2020), risk assessment of dammed lakes was conducted within the three modules of hazard evaluation of dammed lake, outburst flood routing simulation, and loss assessment. The Bayesian network model contains two parts, a structure model and a parameter model (Cai et al. 2021b). The structure model is usually modeled on the basis of a causal relationship. Figure 1 shows the network structure and its nodes. The parameter model is usually modeled based on expert experience and machine learning. In this study, the Bayesian network model was established by combining the two methods, and hazard evaluation was mainly based on machine learning. Flood simulation and loss assessment were determined using empirical formulas.

Hazard evaluation of dammed lakes
As most dammed lakes fail within a short period of time after formation (Shen et al. 2020), a reasonable hazard evaluation is a prerequisite and foundation for scientific risk assessment. Previous studies have shown that the hazard of a dammed lake is related to its morphological characteristics, material composition, and hydrodynamic conditions (Ermini and Casagli 2003;Korup 2004;Dong et al. 2011;Stefanelli et al. 2016;Shan et al. 2020).

Morphological characteristics of landslide dams
The dam height (i.e., the vertical altitude difference from the valley floor to the lowest point on the landslide dam) determines the potential energy of the water body and the volume of the upstream dammed lake. The dam height is an important factor affecting the stability of the landslide dam. The dam width (i.e., the base width of the landslide dam measured parallel to the main valley axis) affects the maximum hydraulic gradient. The maximum hydraulic gradient refers to the ratio of the maximum water head (dam height) to the dam width, which affects the permeability of the dam body itself. When the upstream and downstream water levels are constant, the shorter the dam length (i.e., the crest length of the landslide dam measured perpendicular to the major valley axis), the greater its constraint by the mountains on both sides and the better its stability. The volume of the dam, which is determined by its own length, width, and height, affects its weight and global stability.

Material composition of landslide dams
The material composition of the landslide dam determines its anti-erosion ability. In general, the larger the size of particles that form the dam, the stronger its ability to resist erosion. In this study, the proportion of soil and stone was utilized to quantitatively represent the material composition. A landslide dam with soil (particle size < 60 mm) share exceeding 70% of the total weight is categorized as a soil dam, a dam with stone (particle size > 200 mm) share exceeding 70% of the total weight is categorized as a stone dam, and the other landslide dams are categorized as mixed dams (Shan et al. 2022

Hydrodynamic conditions of dammed lakes
The volume of the dammed lake determines its water storage capacity and plays a decisive role in the landslide dam stability, which is generally limited by the dam height and catchment area. Here, the catchment area determines the upstream inflow and directly affects the impounding speed. In general, the larger the catchment area, the greater the risk.

Determination of network structure and node states
The determination of the network structure is generally based on three approaches: expert experience, machine learning, or a combination of both. However, based on the premise that there are many nodes and no suitable algorithm, both the efficiency of machine learning and the accuracy of the established network structure are low. In this section, the expert experience method is employed to develop the network model. Dam height, length, width, volume, and maximum hydraulic gradient were selected to characterize the morphological characteristics of landslide dams. The lake volume and the catchment area were selected to represent the hydrodynamic conditions of dammed lakes, and the material composition of the landslide dam was also considered. Nodes in the model are either discrete or continuous nodes. Most nodes are continuous and include initial and intermediate nodes (i.e., dam height and lake volume). Their discretization methods are mostly based on expert experience or logical inference. Discrete nodes are mostly target nodes, commonly existing in the form of output results (i.e., failure probability). According to the relationship between each node, the hazard topology network of dammed lakes is shown in Fig. 1, and the states and value ranges of each node are presented in Table 1 (Liao et al. 2018). Figure 1 shows that the network of hazard evaluation of dammed lake contains 9 nodes and 11 links.

Quantifying the network
Quantifying the Bayesian network means finding the conditional probability relationship between nodes. In general, there are four sources of data for quantifying a Bayesian network: statistical data, experience-based judgment, existing physical or empirical models, and logical inference (Peng and Zhang 2012a). Quantification of dammed lake hazard network nodes is mainly based on historical statistical data. To quantify network nodes, a database containing 313 documented dammed lake cases across the world was compiled (Schuster 1985;Costa and Schuster 1991;Chai et al. 1998;Tahata et al. 2002;Ermini and Casagli 2003;O'Connor and Beebee 2009;Nie et al. 2004;Yan 2006;Tong 2008;Cui et al. 2011;Dong et al. 2011;Peng and Zhang 2012b;Shi et al. 2014;Stefanelli et al. 2015;Zhang et al. 2016Zhang et al. , 2019Jia et al. 2019) (see Supplementary Table 1). This includes 186 formed unstable cases and 127 formed stable cases. Of these documented cases, 151 occurred in Italy, 48 in Japan, 40 in China, 31 in the USA, and 43 in other countries (see Fig. 2). However, for various objective reasons, information for certain cases is incomplete. Table 2 presents the statistics of all documented dammed lake cases in the database.
In this study, 248 cases were selected from the database as training sets for data learning, where the missing rates of V d , V l , and A b were 17.34%, 41.94%, and 38.71%, respectively. As the learning cases contain incomplete data cases, the expectation maximum algorithm is adopted to deal with the probability calculation in the case of missing data (Dempster et al. 1977). Here, the failure probability of the dammed lake (Y) is  Table 2 Information statistics of all documented dammed lake cases in the database , which include the input parameters, node status, and value range (Table 1). Missing node data Z are an implicit variable representing the unobserved outcome of the node, assuming the joint probability of Y and Z is P(Y, Z|θ), and the conditional probability distribution is P(Z|Y, θ). First, the counting probability in the case is taken as the initial value of the parameter to start the iteration. θ (i) is denoted as the estimation of the parameter θ of the ith iteration, and the Q function can be calculated as follows: where P (Z|Y, θ (i) ) is the conditional probability distribution of the implicit variable data Z given the observation data Y and the current parameter estimation θ (i) . To find θ which can maximize Q (θ, θ (i) ), the estimated parameter θ (i+1) of the i + 1 iteration can be expressed as follows: The iteration is repeated until its convergence, and the convergence condition is set as ||θ (i+1) − θ (i) ||< 0.01%. Once the maximum likelihood Bayesian network is found, the final prior (conditional) probability distribution is obtained, and the Bayesian network model is obtained (see Fig. 3). The model can obtain the real-time failure probability of the dammed lake according to the input parameters.
To verify the accuracy of the model, the 65 remaining cases in the database were used as a test set. Since the Bayesian model is a probabilistic result and the target node (the failure probability of the dammed lake) has only two states, the state with the higher probability is taken as the resulting output (i.e., when the probability of instability is greater than 50%, instability is taken as the resulting output). Qualitative analysis is only used for model verification of a test set, while the output of comprehensive risk analysis is still in the form of probability. Testing the model with the test set cases showed that the output results were identical to the real results in 53 cases, with an absolute accuracy of 81.54%. Nine cases had unstable output results, while the real results were all stable, so the conservative accuracy rate was 95.38%.
Although the node state of the failure probability of the dammed lake has only two states that is formed stable and formed unstable, the failure probability of the dammed lake is also a quantitative result. The risk value is the product of the failure probability of the dammed lake and the loss due to a breach of the dam. Once the failure probability is determined, the risk value of the dammed lake can be obtained by combining that probability with the loss due to a breach of the dam.

Outburst flood routing simulation of dammed lakes
The outburst flood routing simulation is an important module in the risk assessment of a dammed lake, and it is the pivot connecting hazard evaluation with loss assessment. This module can be divided into two parts: flood severity and duration. The node states and value ranges of the outburst flood routing simulation of dammed lakes are shown in Table 3 (Peng and Zhang 2012a; Clausen and Clark 1990).

Flood severity
Flood severity is the product of flow velocity and water depth at a certain cross section, which is an important parameter for measuring damage to the inundation area. Clausen and Clark (1990) partitioned flood severity, where the product of water depth and flow velocity less than or equal to 3 m 2 /s is defined as the inundation zone, an area with a product greater than 3 m 2 /s and less than or equal to 7 m 2 /s is defined as the partial damage zone, and an area with a product greater than 7 m 2 /s is defined as the complete damage zone. Accordingly, in this study, flood severity is classified into three levels: high, medium, and low (see Table 3).
Nodes involved in flood severity mainly include water depth, flow velocity, distance to dam site, dam height, and lake volume. In Table 4, the dam breach cases with flood characteristics are summarized and used to quantify each node (Zhou 2007;Cui et al. 2009;Peng and Zhang 2012c;Wang 2018). Among these cases, the priori probabilities of the distance to the dam site and the flow velocity can be obtained directly from the statistical probability, while the conditional probability of the water depth can be obtained from the case data.  Flood severity is an important index for describing flood characteristics, and it is mainly affected by water depth and flow velocity (see Fig. 1). The conditional probability distribution of flood severity is calculated using the zoning standard proposed by Clausen and Clark (1990) and the Monte Carlo simulation method (see Table 5).

Flood duration
Flood duration mainly uses flood arrival time and flood rise time. Flood arrival time is the time from dam breaching to the time when the flood reaches the affected area, which is also one of the key influencing parameters for the evacuation of the downstream risk population and movable property. Affected by dam height and lake volume, the flood arrival time can be calculated using the empirical formula (Yellow River Water Conservancy Commission 1977): where T 1 is the flood arrival time; K 1 is the time coefficient, K 1 = 0.7 × 10 -3 ; H 0 is the average base flow depth, which can be assumed as 2 m; x is the distance to the dam site; V l is the lake volume; and H d is the dam height.
The Monte Carlo simulation method was used to calculate the conditional probability of the flood arrival time. Taking the inundation area far from the dam site (12-36 km) as an example, the probability distribution of flood arrival time is shown in Table 6.
Flood rise time refers to the time interval between the time when the flood reaches the affected areas and the flood rising to the dangerous water depth, which is mainly affected by the water depth, the dam breach duration, and the distance to the dam site. It can be calculated by the formula (Peng and Zhang 2012a): (4)    where T 2 is the flood rise time; B t is the dam breach duration; α is the ratio of the time to peak and the dam breach duration, and a mean value of 0.2225 is suggested; β = 1 when the peak water depth h p < 1.5 m, and β = 1.5/h p when h p > 1.5 m; γ is the coefficient of energy consumption; the values of γ are assumed as 1, 1.05, 1.1, 1.2 when the distances to the dam site are 0-4.8 km, 4.8-12 km, 12-36 km, and > 36 km, respectively. Taking the inundation area very far from the dam site (> 36 km) as an example, the probability distribution of flood rise time is shown in Table 7. The probability distribution of each node is the input, and the Bayesian network of the evolution of the dammed lake outburst flood is obtained (see Fig. 4).

Loss assessment due to dammed lake breaching
The outburst of a dammed lake causes irreversible damage to the downstream area. The loss caused by dammed lake breaching mainly includes loss of life and property.

Loss of life
When the dammed lake breach occurs, the downstream population at risk should be evacuated to safe zones. Evacuation behavior includes a series of processes, such as decisionmaking, alarm, reaction, and evacuation. Whether the evacuation is successful depends on the time required and the time available for evacuation, which can be expressed as: where W t is the warning time; T 2 is the flood rise time; and T n is the time required for the evacuation. Loss of life is generally defined as the exposed population lost in the flood and is primarily determined by the level of achieved evacuation and the flood characteristics. Loss of life Bayesian network is mainly composed of six nodes, such as dam breach time, warning time, evacuation distance, time required for evacuation, evacuation situation, and loss of life. The states of nodes and the value ranges of the loss of life because of the dammed lake breaching are shown in Table 8.
A dam breach is generally an equal probability event, and it is assumed that the dam breach time is evenly distributed throughout the day. That is, the probability of the dammed lake breach events occurring in the daytime (8:00-17:00), evening (17:00-22:00), and night (22:00-8:00) is 0.375, 0.208, and 0.417, respectively. In addition, due to limited information on the location of evacuation points, it is assumed that these events follow a uniform distribution.
The warning time refers to the time from when the alarm is issued until the flood reaches the affected area, which is one of the key factors affecting the successful evacuation of the population at risk. The warning time consists of the warning initiation time and the flood arrival time. The warning initiation time is generally affected by the dam breach time. In general, it is easier to find a dam breach event in the daytime. The conditional probability of the warning time is calculated by the Monte Carlo method, and the probabilities of 0-0.25 h, 0.25-1 h, 1-3 h, and > 3 h are 0.218, 0.252, 0.259, and 0.271, respectively (see Table 9).
The time required for evacuation is the time for the downstream population at risk to be successfully evacuated to safe zones, which mainly consists of warning initiation time, warning propagation time, response time, and evacuation time. The warning propagation time is the time elapsed between issuing and receipt of warnings by the population at risk, which is related to factors such as weather, dam breach time, and flood characteristics. Lindell (2008)  suggested using the Weibull distribution (see Eq. (7)) to depict the warning propagation time (see Table 9). Response time refers to the time required for the population at risk to receive the warning and initiate an emergency response, which is also shown by the Weibull distribution (see Table 9). The evacuation time is the time required for the evacuation process, which mainly refers to the dam breach time, the evacuation distance, and the evacuation speed of the population at risk (see Table 9). The conditional probability of the time required for evacuation is calculated in combination with the Monte Carlo method, and the probabilities of 0-0.25 h, 0.25-1 h, 1-3 h, and > 3 h are 0.124, 0.523, 0.297, and 0.056, respectively.
where P t is the proportion of the population corresponding to a certain state and a and b are the two coefficients in W (a, b). The evacuation situation is mainly affected by the flood rise time, warning time, and evacuation time. Herein, the Monte Carlo simulation method is used to calculate the conditional probability distribution of the evacuation situation.
The population at risk in the affected areas can be in four states: successful evacuation, low, medium, and high flood severity. The loss rate of successfully evacuated population is assumed to be 0 (Jonkman 2007), while the population in the high severity flood area has a high loss rate, with a hypothetic value of 0.9078 (Peng and Zhang 2012a). Compared to the two extremes, low and medium severity floods are more uncertain, which is why they are quantified based on historical data. Jonkman (2007) proposed a lognormal distribution function with water depth as the independent variable to simulate the rate of loss of life: Fig. 5 Fitting curve of the rate of loss of life for low severity floods where F D (h) is the loss rate for a certain flood severity when the water depth of the downstream affected area is h; Φ is the standard normal distribution function; and μ D and σ D are the mean and standard deviation, respectively. Areas with low flood severity generally have slow flow velocity and shallow water depth, which poses little threat to the safety of the affected population. Low severity flood cases were collected from the datasets from Jonkman (2007) and Zhou (2007), and a lognormal distribution curve was used for data fitting (see Fig. 5). Based on the fitting curve, μ D , σ D , and R 2 (coefficient of determination) are 3.71, 1.382, and 0.497, respectively.
Areas with medium flood severity generally have greater water depths and faster flow rates than areas with low flood severity, with a correspondingly greater potential for   (2007), Zhou (2007), and McClelland and Bowles (2002), and a lognormal distribution curve was also used to fit the data (see Fig. 6). Based on the fitting curve, μ D , σ D , and R 2 are 1.624, 0.472, and 0.658, respectively.
Based on the Monte Carlo simulation method, the conditional probability distribution of the loss of life in the Bayesian network can be obtained (see Fig. 7).

Loss of property
The loss of property due to the dammed lake breaching can be divided into direct and indirect economic losses. Of these, direct economic loss is the main component of loss, which is generally represented by the direct economic loss rate. The economic loss rate refers to the ratio of the post-disaster value of disaster-bearing bodies of various economic types after experiencing floods to their economic value before or in normal years (Das and Lee 1988). This can be expressed as: where R Loe (i) is the direct economic loss rate of economy type I; F i is the pre-disaster value of economy type i; and F i ʹ is the post-disaster value of economy type i.
The economic loss rate is usually determined by examining flood losses in previous years to establish the relationship function between the loss rate, the water depth, and other factors; it can also be determined in relation to the loss rates of other catchments. In this study, as an example, actual loss rates of 14 different types of economic losses in China were collected as data sources (Kang et al. 2006). According to the trends of water depth and loss rates, loss of property can be divided into the four categories: loss of agriculture and fishery, loss of forestry and animal husbandry, loss of industry, commerce, and residential property, and loss of tertiary industry and infrastructure. In addition, different types of economic property values are taken as weights; however, there is a lack of available data. Therefore, the proportions of Gross Domestic Product (GDP) for each economic type are taken as a weight to calculate the loss rate of each loss classification. This can be expressed as follows: where R Loe (i, h) is the direct economic loss rate corresponding to the economy type i at water depth h; Gr ij is the share of GDP of category j in the economy type i, where i is the economic type classification, and j is the economic subcategory within category i; R Loe (i, j, h) is the direct economic loss rate corresponding to the economy type j at water depth h.
Based on the economic statistics for China (Kang et al. 2006), the relationship between water depth and the loss rate of each economic type can be calculated using Eq. (11), and the curve is obtained by fitting a power function to these data (see Fig. 8). The R 2 values of the four curves were 0.919, 0.959, 0.994, and 0.995, indicating a good fitting effect. It is worth mentioning that in this study, the basic information was collected from China; the same fitting method can be adopted for other countries. In addition, the economic loss rate changes little when the water depth exceeds 3.5 m. Therefore, the loss rate at a water depth of 3.5 m is taken as the upper limit of the loss rate.
where R(h) is the direct economic loss rate corresponding to the water depth and x and y are the fitting coefficients. Today, most studies on the economic loss rate merely consider water depth. However, the economic loss rate is also related to warning time, flow velocity, and flood duration. In this study, based on the function of water depth and loss rate, the disaster-causing environment and other influencing factors are considered to revise the expression of the economic loss rate.
Agriculture and fishery are economic activities that people have relied on for survival since ancient times. Clearly, the deeper the water depth, the greater the loss, and most crops die when the water depth exceeds 0.5 m. Furthermore, crops are sensitive to the duration of flood and have poor inundation tolerance. According to previous studies on the relationship between the duration of inundation and the loss rate in China (Wang 2009;Yang et al. 2010), the suggested loss rate values for agriculture and fishery related to flood duration are shown in Table 10.  Flood duration (h) 0-2 2-4 4-6 > 6 0-2 2-4 4-6 > 6 0-2 2-4 4-6 > 6 Loss rate (%)  65  80  95  100 85  95  100 100 95  100 100 100  1 3 Although forestry, animal husbandry, tertiary industry, and infrastructure have better water tolerance, they still have high loss rates due to the influence of flow velocity. The correction coefficient for flow velocity based on statistics for China can be expressed as (Liu et al. 2016): where f v is the correction coefficient for the flow velocity; m is the reciprocal of economic loss rate R Loe (i, h) corresponding to the water depth h; v is the flow velocity; v 2 is assumed to be 3 m/s based on the loss statistical information for China.
In addition to water depth and flow velocity, warning time is also a major factor for the loss assessment of industry, commercial, and residential property due to their shared characteristic of partial portability. In general, the longer the alarm time, the lower the economic loss rate (up to a limiting value). The correction coefficient for the warning time based on statistics in China can be expressed as (Liu et al. 2016): where f T is the correction coefficient for the warning time; n is the property transfer coefficient, which is assumed to be 0.65 when the rescue level is uncertain; T 1 is the time required for residents to move portable properties to the safe zone, which is assumed to be 1.5 h based on a questionnaire survey in China (Shi et al. 2009).
According to the function of water depth and loss rate, considering the correction coefficients of each influencing factor, the direct economic loss rate of the dammed lake breaching can be expressed as: where R Loe (i) is the direct economic loss rate of the economy type i and f d is the correction coefficient of the flood duration (see Table 10).
In Eq. (14), only the correction coefficient corresponding to the node needs to be calculated, and all other correction coefficients are assumed to be 1. The Bayesian network for the loss of economy is established in Fig. 9.
Based on the direct economic loss rate in each disaster-causing environment, the regional loss of the economy can be expressed as: Fig. 9 Bayesian network for loss of property where L E is the regional loss of the economy; F i is the pre-disaster value of the economy type i; p i is the indirect loss coefficient, which can be 15%, 15%, 35%, and 20% for loss of agriculture and fishery, loss of forestry and animal husbandry, loss of industry, commerce, and residential property, and loss of tertiary industry and infrastructure, respectively (Wahlstrom et al. 1999); r is the comprehensive annual economic growth rate, which can be determined by local economic development; and t is the year between the forecast year and the base year.

Comprehensive loss
The losses caused by the dammed lake breaching are multifaceted and multi-dimensional, and the impact degree of each loss is different. Consequently, how the loss caused by the outburst flood can be comprehensively evaluated is a very important issue. Losses of life and economy are normalized according to the influence degree of the two types of losses, in order to conduct a quantitative analysis under a uniform standard for different losses. The normalized functions based on statistics for China can be expressed as (Li et al. 2006): where F l and F e are indices of losses of life and economy, respectively, and Lol and Loe are the values of losses of life and economy, respectively.
A comprehensive loss assessment can be conducted by linear weighting of losses of life and economy: where F is the comprehensive loss; S l and S e are the weighting coefficients of losses of life and economy, which are assumed to be 0.875 and 0.125, respectively, according to statistics for China (Li et al. 2006).

Risk level of a dammed lake
The risk level of a dammed lake is mainly determined by the failure probability of the dammed lake and the losses caused by the outburst flood. The Bayesian network for dammed lake risk assessment by integrating each module is shown in Fig. 10. The model includes 21 nodes, 41 links, and provides 2122 possible scenarios.
Classifying the risk level of a dammed lake requires quantitative definitions of the failure probability and loss. There is no unified quantitative definition for the dammed lake failure. Certain documented disposal cases of dammed lakes in the past 20 years are presented in Table 11 (Cai et al. 2021c), in which the average, median, and maximum disposal times were 13.37 days, 12 days, and 31 days, respectively. Furthermore, based on 352 dammed lake cases around the world, Shen et al. (2020) reported that 60% of cases failed within two weeks, 68.2% within a month, and 84.4% within a year; thus, the longevity of dammed lakes often determines the danger posed by the dammed lakes themselves. In this study, the longevities of dammed lakes of two weeks, one month, and one year are taken as boundaries around which extra-high, high, medium, and low risks of the hazard assessment of dammed lakes are defined. In general, the shorter the longevity of a dammed lake, the greater the hazard. Based on the dataset, one month is assumed to be a manageable time for disposal of dammed lakes, so the failure probability of dammed lake within a month is used to determine the inverse proportional function: where P is the failure probability of the dammed lake; P t is the failure probability of the dammed lake within the time period t. The boundaries of extra-high, high, medium, and low risks are calculated as 0.775, 0.682, and 0.551. That is, the failure probability of a dammed lake is extra-high if it exceeds 77.5%, high if it exceeds 68.2%, medium if it exceeds 55.1%, and low if it is less than or equal to 55.1%.
The risk index is the product of the probability of dammed lake failure and the loss due to dammed lake breaching. Based on the results of hazard evaluation and outburst flood routing simulation of the dammed lake, while simultaneously considering the socio-economic development in China, the risk level can be divided into four levels: risk level IV (0 ≤ I r ≤ 0.439), risk level III (0.439 < I r ≤ 0.583), risk level II (0.583 < I r ≤ 0.687), and risk level I (0.687 < I r ≤ 1) (see Table 12). A classification of the risk level of a dammed lake can provide a reference for emergency response.

Case study
To test the rationality and applicability of the risk assessment method, the case of Baige dammed lake in China is chosen, for which detailed measured information on the dam breaching process and flood routing, as well as loss statistics, is available. On November 3, 2018, a massive landslide occurred on the right bank of the Jinsha River near Baige Village, Tibet Autonomous Region, China, which blocked the Jinsha River and formed a dammed lake (Zhong et al. 2020b) (see Fig. 11).

Failure probability of the Baige dammed lake
The "11.03" Baige landslide deposits were tongue-shaped in-plane, and the landslide dam was composed of a long-range debris flow, which had a height, length, width, and volume of 86 m, 600 m, 1200 m, and 2.83 × 10 6 m 3 , respectively (Cai et al. 2020c). The dam was mainly composed of soil gravel and block stone, with a ratio of soil to rock of about 7:3. The dammed lake had a maximum volume of 7.9 × 10 8 m 3 , and the upstream catchment area is about 1.7 × 10 5 km 2 (Cai et al. 2020c;Zhong et al. 2020b;Shan et al. 2022). After inputting the relevant parameters into the Bayesian network of hazard assessment, the failure probability of Baige dammed lake was calculated to be 81.3% (see Fig. 12).

Loss assessment due to breaching of Baige dammed lake
In this section, the evacuation situation and flood characteristics of each region are considered comprehensively, and the loss of life is quantitatively evaluated based on possible fatality in each region. The inundation area due to the Baige dammed lake outburst flood mainly covers both sides of the Jinsha River from the dam site to the Liyuan Hydropower station. Therefore, a loss assessment is conducted for this area, which includes the cities of Diqing and Lijiang in Yunnan Province. The statistical information is based on the postdisaster investigations reported by the Yunan Province Flood Control and Drought Relief Headquarters (2018). After hours of erosion by overtopping flow, Baige dammed lake breached, and the time to peak occurred at about 18:00 h on November 13, 2018 (Zhong et al. 2020b). Based on the measured data, the duration of the dam breach lasted about 14 h. To facilitate the analysis, the water depth and flow velocity in the inundation area were calculated using the empirical formulas shown in Eqs. (19) and (20), respectively (Xie 1993): where h x is the water depth at the cross section x km away from the dam site; h 0 is the water depth at the dam site; m is the index of the cross-sectional shape; i 0 is the longitudinal slope at the bottom of the river; and λ is the coefficient of the cross section, which is approximately the width of the cross section.
where v x is the flow velocity rate at the cross section x km away from the dam site; Q x is the peak flow at the cross section x km away from the dam site, which can be expressed as: where Q p is the peak breach flow at the dam site; vK is the empirical coefficient of the flow velocity, vK = 7.15 for the mountainous area, and vK = 3.13 for the flat area.
The parameter values of each node are added into the network to obtain the rate of loss of life (see Fig. 13). Due to timely manual intervention, the population at risk was Fig. 12 Failure probability analysis of the Baige dammed lake evacuated before Baige dammed lake breaching; hence, the calculated loss of life is 0 (see Table 13). Table 13 also presents the calculated loss of life without manual intervention. Through the emergency response, the population at risk was safely evacuated, but the flood has caused irreversible economic losses. Based on the statistics of the disaster situation of various industries in the inundation areas, the rates of loss of property for various industry types were calculated using the Bayesian network (see Fig. 13), and the total loss of property was calculated using Eq. (15). A comparison between calculated and actual loss of properties of the two affected cities in the inundation area is shown in Table 14. It is worth mentioning that only the actual direct economic loss was available from statistical data; therefore, the comparison was conducted on the direct economic loss. The comparison between the calculated and the actual direct economic losses verified the rationality of the proposed model. In addition, the calculated total loss of property was also provided, and the risk index was calculated based on that value.
In summary, the failure probability of Baige dammed lake was 81.3%, resulting in the loss of 35 lives and an economic loss of CNY 8939.2 million under the condition without manual intervention. Hence, the risk index is 0.7339, which belongs to extra-high level. We also conducted a risk assessment for Tangjiashan and Xiaogangjian landslide-dammed lakes in China. The results are listed in Table 15. The risk index for Tangjiashan lake is 0.762; thus, it belongs to the extra-high level. The risk index for Xiaogangjian lake is 0.683, which is high level risk. Overall, the three representative cases matched actual risk situations.

Conclusions
Based on the Bayesian network, a quantitative risk assessment model for dammed lakes was developed, which includes modules for dammed lake hazard assessment, outburst flood routing simulation, and loss assessment. The risk classification method of the dammed lake is also proposed and applied to the Baige dammed lake. The following conclusions can be drawn:  (1) According to the disaster chain formed by the dammed lake breaching, the risk topological structure is proposed, and the parameters are selected according to the logical causality of each module; then, a Bayesian network for dammed lake risk assessment with 27 nodes and 41 arcs was developed.
(2) A database containing 313 documented dammed lake cases around the world has been compiled. Considering the morphological characteristics and material compositions of these landslide dams, as well as the hydrodynamic conditions of the dammed lakes, the network nodes were quantified. Then, the failure probability of dammed lakes and loss rates of life and economy because of dammed lake breaching are obtained.
(3) Based on the socio-economic development of China, as well as the longevity statistics of dammed lakes around the world, a new quantitative risk-level classification method based on the losses of life and economy for dammed lakes is proposed. Taking 0.439, 0.583, and 0.687 as the boundary values, the risk level of the dammed lake is divided into low, medium, high, and extra-high, which is more scientific and feasible than the previously used qualitative classification method. (4) The results of the risk assessment show that the failure probability of the Baige dammed lake was 81.3%. The comparison between the calculated and actual losses of life under manual intervention, as well as the direct economic losses, verified the rationality of proposed model. Without manual intervention, the loss of life would be 35, and the loss of property would be 8939.2 million CHY; therefore, the calculated risk index is 0.7339, which belongs to the extra-high level.
Acknowledgements This work was financially supported by the National Natural Science Foundation of China (Grant No. U2040221).

Funding
The authors have not disclosed any funding.
Data availability Data used and obtained in the current work are available from the corresponding author upon request.

Conflict of interest
The authors declare that they have no conflict of interest.