Under the background of increasing extreme weather and severe urban flood, combining the advantages of data-driven model and process-driven model, a new hybrid model LSTM-SWMM model is constructed, which is not only efficient and convenient, but also considers the process elements of runoff yield and concentration. The LSTM-SWMM model supplements the output hydrological process element datas of the SWMM model as input data to the LSTM neural network.The framework of this study is presented in **Figure.1**.

## 2.1 Storm Water Management Model (SWMM)

SWMM is an integrated urban water system simulation model developed by the U.S. Environmental Protection Agency in 1971 (Ma et al.,2021), which is mainly used to simulate a single precipitation event or long-term water quantity and quality simulation in cities, and runoff yield and runoff concentration and pipe network hydrodynamics are its main functional modules. In this study, the controlling equation of runoff yield calculation module uses the Horton infiltration equation; Runoff concentration calculation is to generalize each sub-catchment area to a non-linear reservoir; The best applied dynamic wave method is chosen for the pipeline discharge calculation, based on the complete one-dimensional St. Vennat discharge equation for the solution.

$$\frac{\partial A}{\partial t}+\frac{\partial Q}{\partial L}=0$$

1

$$-\frac{\partial Z}{\partial L}={S}_{f}+\frac{1}{g}\frac{\partial v}{\partial t}+\frac{v}{g}\frac{\partial v}{\partial L}$$

2

Where: \(A\)is the water section area, ㎡; \(Q\) is the discharge through the cross-section, m³/s; \(L\) is the distance along the river course, m; \(Z\) is the water level, m; \(v\) is the average velocity of the cross-section, m/s; \(g\) is the acceleration of gravity, m/s²; \({S}_{f}\) is the friction ratio reduction calculated by Manning's formula, usually expressed as \(Q²/K²\); \(K\) is the modulus of discharge (Dong et al.,2002).

A schematic of the surface runoff concept used by SWMM is shown in **Figure.2** below. Each sub-catchment area is surface treated as a non-linear reservoir, with incoming discharges coming from precipitation and any designated upstream sub-catchments (Ma et al.,2022). Several output discharges include infiltration, evaporation, and surface runoff. The capacity of the "reservoir" is maximum depressional storage, which provides maximum surface storage through ponding, depression filling, and plant interception. The surface runoff volume per unit area, \(Q\), occurs only when the water depth in the "reservoir" exceeds the maximum depressional storage \({d}_{p}\), and the output discharge is calculated using Manning's formula. The water depth (\(d\) in feet) in the sub-catchment area is continuously updated over time, and the water balance equation over the sub-catchment area is solved numerically.

## 2.2 The long short-term memory (LSTM) neural network

The LSTM neural network was proposed by Hochreiter & Schmidhuber in 1997, and is a special recurrent neural network(RNN) (Xu et al.,2020). The LSTM model has a unique "gate" structure, which solves the problem of gradient explosion and gradient disappearance in long time sequence process training, and improves the simulation accuracy of long time sequence process. The memory cell unit is composed of a forgetting gate, an input gate and an output gate, which enables information to pass through selectively (Li et al.,2022). The structure of LSTM include Input layer, hidden layer and output layer, and memory cell unit is added to the hidden layer, so that the memory information of time series can be controllable, effectively solve the long-term dependence of information. The structure of memory cell unit in LSTM is shown in **Figure.2**.

Memory cell unit can selectively remember and forget the input data, which is composed of input gate, output gate and forgetting gate, \({Q}_{t-1}\) is the hidden layer state of the previous moment, \({C}_{t-1}\) and \({C}_{t}\) are the state variables of the memory unit at the previous moment and through the memory cell unit respectively, \({f}_{t}\), \({i}_{t}\), and \({o}_{t}\)are forgetting doors, input doors, and output doors, respectively. When the input \({I}_{t}\) passes through the memory cell unit, the state variable \({C}_{t-1}\) becomes \({C}_{t}\). The principle is that when the input \({I}_{t}\) passes through the memory cell unit, it passes through the forgetting gate, the input gate and the output gate in turn. Some information is selected to be forgotten by the memory unit, while others are selected to be added to the memory by the memory unit(Cui et al.,2021).

The first step for Input data to enter the LSTM is through the forget gate \({f}_{t}\), which determines what information the memory cell unit discard from the previous state. The \({f}_{t}\) is expressed as equation:

\({f}_{t}=\sigma ({W}_{f}{x}_{t}+{U}_{f}{h}_{t-1}+{b}_{f}\) ) (3)

where\({f}_{t}\) represents the forget gate and takes values from [0, 1], *σ* is the logistic sigmoid function, \({W}_{f}\), \({U}_{f}\), and \({b}_{f}\) are adjustable weight matrices and a bias vector, respectively.

The next step determines which information is added to the memory cell unit for the update. The sigmoid function in the input gate \({i}_{t}\) determines which values to update, and a \(tanh\) layer generates a potential update vector \({C}_{t}\). \({i}_{t}\) and \({C}_{t}\) are calculated as Eqs. (4) and (5):

$${i}_{t}=\sigma ({W}_{i}{x}_{t}+{U}_{i}{h}_{t-1}+{b}_{i})$$

4

$${C}_{t}=\text{t}\text{a}\text{n}\text{h}({W}_{c}{x}_{t}+{U}_{c}{h}_{t-1}+{b}_{c})$$

5

where \({i}_{t}\) is a vector takes values from (0, 1); \({W}_{i}\), \({U}_{i}\), and \({b}_{i}\) are a series of learnable parameters defined for the input gate. \({W}_{c}\), \({U}_{c}\), and\({b}_{c}\) are another series of learnable parameters.

After deciding the discarded and retained information, the cell state \({C}_{t}\) is updated and calculated as equation:

$${C}_{t}={f}_{t}\odot {C}_{t-1}+{i}_{t}\odot {C}_{t}$$

6

where ⊙ denotes element-wise multiplication, \({f}_{t}\odot {C}_{t-1}\) defines which information stored in \({C}_{t-1}\) will be forgotten, and \({i}_{t}\odot {C}_{t}\) defines which new information will be added to the cell state \({C}_{t}\).

The final step calculates the output gate \({o}_{t}\), which determines the hidden state \({h}_{t}\).The output \({o}_{t}\) is calculated by the sigmoid function; the output \({h}_{t}\) is obtained by multiplying \({o}_{t}\) and the tanh output \({C}_{t}\) as equation:

$${o}_{t}=\sigma ({W}_{0}{x}_{t}+{U}_{0}{h}_{t-1}+{b}_{0})$$

7

$${h}_{t}={o}_{t}\odot \text{t}\text{a}\text{n}\text{h}\left({C}_{t}\right)$$

8

where \({o}_{t}\) is a vector with values ranging from (0, 1), \({W}_{0}\), \({U}_{0}\), and\({b}_{0}\) are three learnable parameters defined for the input gate.

## 2.3 LSTM-SWMM hybrid model

LSTM neural network usually requires the previous precipitation and runoff data as input to ensure the accuracy of simulation prediction. LSTM-SWMM hybrid model takes into account the information of certain physical characteristics (evaporation, infiltration) output from SWMM model, rainfall and runoff information as input variables of LSTM model.

Based on the above models and algorithms, this study constructed a hybrid hydrological model of LSTM and SWMM. The process of LSTM-SWMM model mainly includes **(Figure.2)** :

1. Input designed rainfall data and observed evaporation data into SWMM model;

2. The data of discharge, infiltration, evaporation and designed rainfall output by SWMM with time series are taken as the input data of LSTM model.

3. Complete the training and verification of LSTM model.

## 2.4 Model indexes

The Nash efficiency coefficient (\(NSE\)) and the coefficient of determination (\({r}^{2}\)) are adopted to evaluate the model performance, as shown in Equations (9) -(10).

(1) Nash-Sutcliffe efficiency coefficient (\(NSE\))

The mathematical expressions of this metric can be described as follows [29]:

$$NSE=1-\frac{\sum _{t=1}^{T}({Q}_{o}^{t}-{Q}_{s}^{t})²}{\sum _{t=1}^{T}({Q}_{o}^{t}-\stackrel{-}{{Q}_{o}})²}$$

9

Where, \({Q}_{o}^{t}\) is the measured discharge at time \(t\); \({Q}_{s}^{t}\) is the forecasting discharge at time \(t\); \(\stackrel{-}{{Q}_{o}}\) is the average value of measured flood discharge at each situation.

\(NSE\) measures the ability of the model to predict variables different from the mean, gives the proportion of the initial variance accounted for by the model, and ranges from 1 (perfect fit) to -∞. Values closer to 1 provide more accurate predictions (Yin et al.,2020).

(2) The coefficient of determination (\({r}^{2}\))

\({r}^{2}\) is often used to describe the degree of fit between data. When \({r}^{2}\) is closer to 1, it means that the reference value of the related equation is higher; on the contrary, when it is closer to 0, it means that the reference value is lower. It is described as follows (Jackson et al., 2019):

$${R}^{2}=\frac{[\sum _{t=1}^{T}({Q}_{s}^{t}-\stackrel{-}{{Q}_{s}})({Q}_{o}^{t}-\stackrel{-}{{Q}_{o}})]}{\sum _{t=1}^{T}{\left({Q}_{s}^{t}-\stackrel{-}{{Q}_{s}}\right)}^{2}{\left({Q}_{o}^{t}-\stackrel{-}{{Q}_{o}}\right)}^{2}}$$

10

Where, \(\stackrel{-}{{Q}_{s}}\) is the average value of simulated flood discharge at each situation.