Study On Multiscale-Multivariate Attribution Analysis and Prediction of Urban Rainstorm Flood

： In order to explore the impact of the changing environment on urban rainstorm flood, 14 and reveal the relationship between flood volume and its influencing factors at the micro level, 15 the rainfall and flood volume are decomposed by the wavelet analysis method to perform the 16 multiscale attribution analysis. Then the multiscale-multivariate prediction model of urban 17 rainstorm flood is constructed in the Jialu River Basin in Zhengzhou city of China. The results 18 show that the main influencing factors of flood volume are rainfall and underlying surface, 19 where the latter causes the mutation of flood volume in 1994 and 2005. At the micro level, there 20 is a constant linear relationship between rainfall and flood volume in d1, d2 and d3, while the 21 impact of underlying surface on flood volume is mainly reflected in a3. The multiscale- 22 multivariate prediction model has a good simulation effect on the flood volume of the first 45 23 rainstorm floods, NSE , R 2 and R e are 0.966, 0.964 and 10.80%, respectively. Moreover, the 24 model also has a good prediction effect, and the relative errors between the predicted and 25 observed flood volume of 46th~50th rainstorm floods are all less than 20%. 26


Introduction 29
For recent years, the urban rainstorm flood disaster occurs frequently causing a great threat where  ̅ ( − ) is the complex conjugate function of ( − ). In real hydrological problems, 116 the time series are usually in the discrete format rather than continuous format (Roushangar et 117 al. 2018), therefore, the discrete wavelet transform (DWT) in the following form is usually used: 118 (3) 119 where t is the sampling interval.

Linear Regression Model and Ridge Regression Model 121
The core of the linear regression model is to establish a linear relationship between the 122 dependent variable and one or more independent variables (Liu et al. 2016). When the number 123 of independent variables is p and the number of samples is n, the linear regression model is 124 formulated as follows: 125 where Y (n×1) is the vector of dependent variable, X (n×p) is the regression matrix, ε (n×1) 127 is the vector of random error terms after removing the influence of independent variable on 128 dependent variable, and β (p×1) is the parameter vector which can be estimated by the least 129 square method with the following equation (Zhao et al. 2020): 130 When there is a good correlation between independent variables, the least squares 132 estimator may lead to the ill-conditioned problem, that is, the calculated optimal parameter does 133 not match the actual situation. The ridge regression model can effectively handle this problem. 134 The ridge regression is a biased estimation regression method for multicollinearity data 135 analysis. The motivation for the ridge estimator is to add a constant matrix kI (k>0) to the matrix 136 X T X, which greatly reduces the probability of X T X+kI approaching singularity (Rabiei et al. 137 2019). Therefore, the parameters of the ridge regression model can be obtained with the 138 following equation: 139 where k≥0 is the ridge parameter and Ip is the p-dimensional identity matrix (Choi et al. 2019).  The Nash-Sutcliffe efficiency (NSE), the coefficient of determination (R 2 ) and the mean 155 relative error (Re) are used to evaluate the performance of the model. The closer the values of 156 NSE and R 2 to 1.0 and the closer the value of Re to zero, the better the performance of model 157 can be achieved. These performance indexes can be written as: 158 where a nd ′ are the i-th observed and simulated value； ̅ and ̅ ′ are the average 162 observed and simulated value respectively, n is the number of observations. 163  Assuming that these two rainstorm floods are mutation points of the rainfall-flood volume 181 relationship, the significance T-test is respectively performed on the rainfall and flood volume, 182 the significance level is 0.01 and the results are shown in Table 2. If the absolute value of T 183 exceeds the critical value, it means that the significance test is passed. 184 It can be seen from Table 2 that the flood volume decreases abruptly in the 12th rainstorm 186 flood and increases abruptly in the 24th rainstorm flood, the mutation of the latter is more 187 significant. Moreover, the rainfall does not change abruptly in these two rainstorm floods. 188 Therefore, the 12th and 24th rainstorm floods are mutation points of the relationship between 189 rainfall and flood volume, that is, the characteristics of runoff generation and concentration in 190 the study area change significantly in 1994 and 2005. 191

Determination of the Main Influencing Factors of Flood Volume 192
The land use/cover of study area is divided into 6 categories: cultivated land including dry 193 land and paddy field, forest land, grassland, waters, impermeable land and unutilized land 194 (shown in Figure 5), where the unutilized land mainly includes waste grassland, saline-alkali 195 land, swamp, sand, bare land, bare rock, etc. Furthermore, the area of different underlying 196 surface types is shown in Figure 6 (the unutilized land area is so small that it can be ignored).  Table 3. 209  Table 3 shows that both cultivated land area and impermeable land area change sharply in Theoretically, the flood volume is affected by rainfall elements such as rainfall, rainfall 217 duration and rainfall intensity. According to the selection principle of rainstorm floods and the 218 analysis of the relationship between rainfall and flood volume, the rainfall is selected as the 219 main influencing factor of flood volume among rainfall elements in this paper. 220 Furthermore, the increase of forest and grassland area or water area will lead to the 221 decrease of runoff coefficient, reducing the flood volume generated by rainstorm of the same 222 magnitude, while the increase of impermeable land area will lead to the opposite result. The above conclusion is consistent with the previous study on the annual runoff coefficient 229 in the study area , that is, the underlying surface is the main driving factor 230 for urban rainstorm flood evolution. Therefore, the rainfall and underlying surface are the main 231 influencing factors of flood volume in the Jialu River Basin of Zhengzhou central urban area. 232

Multiscale Attribution Analysis of Flood Volume 233
In this paper, db6 wavelet (Daubechies wavelet of order 6) is selected to decompose the 234 rainfall sequence and flood volume sequence of the first 45 rainstorm floods, then five detail 235 components and one trend component are obtained respectively, that is d1, d2, d3, d4, d5 and 236 a5 (shown in Figure 7). The original sequence consists of these six component sequences, and 237 a5 reflects the overall change trend of the original sequence. 238 Pb3-Rb3 and Pa3-Ra3 are respectively constructed (shown in Figure 8). volume is mainly reflected in a3. 268

Construction of Prediction Model for Each Flood Volume Component 269
According to the multiscale attribution analysis of flood volume, the univariate model sum of cultivated land area(S1), forest and grassland area(S2), water area(S3) and impermeable 296 land area(S4) is basically a constant value, thus, S1 is discarded to avoid strong collinearity 297 among the independent variables and up to other three variables are selected as independent 298

variables. 299
Taking the year after 2000 as an example, the correlation coefficients of S2, S3 and S4 are 300 0.53, -0.98 and -0.55 by combining Table 3  The simulation results of each model are shown in Figure 11 and the performance indexes 318 of each model are shown in Table 4. 319 320 Figure 11. The simulation results of each model 321 From Figure 11 and Table 4, it can be seen that the Model 1 has the best simulation effect respectively. The simulated flood volume component is shown in Figure 12.  Table 5. 345