## 2.1 Study area and data

The YRB refers to a vast region through which the mainstream and tributaries of the Yangtze River flow through the hinterland of China, spanning the three major economic zones of Western, Central, and Eastern China. Although the YRB has abundant water resources, the spatial and temporal distribution of precipitation in the basin is very uneven. The average annual precipitation ranges from less than 300 mm in the west to over 2400 mm in the east, where > 60% of the rainfall is concentrated in summer.

This study established 1961–2019 as the historical period, 2020–2100 as the future period. 138 meteorological stations in the YRB were selected. Since the establishments of the meteorological points varied, the meteorological data for 59 years (January 1961 - December 2019) were uniformly selected, and stations with missing meteorological data that accounted for more than 5% of the time series were excluded. Missing values were supplemented by linear interpolation. The Global SPEI Monthly Dataset was at a spatial resolution of 0.5° × 0.5° (http://spei.csic.es/database.html).

The climate model data for this study were derived from CMIP6 (https://esgf-node.llnl.gov/search/cmip6/) (Table 1), which were combined with the MME model (Multi-Model Ensemble) that averaged 10 CMIP6 models. The simulations for SSP1-2.6 (SSP1 + RCP2.6, a green development pathway), SSP2-4.5 (SSP2 + RCP4.5, an intermediate development pathway), and SSP5-8.5 (SSP5 + RCP8.5, a high development pathway) 3 scenarios were selected. (Li et al. 2020).

Table 1

Basic data for the 10 models selected from CMIP6 used in this study.

Code | Model | Abbreviation | Spatial Resolution | Institution/Country |

1 | BCC-CSM2-MR | BCCC | 320×160 | BCC/China |

2 | CESM2-WACCM | CEWA | 288×192 | NCAR/America |

3 | CanESM5 | CANE | 128×64 | CCCma/Canada |

4 | FGOALS-g3 | FGOG | 180×80 | CAS/China |

5 | GFDL-ESM4 | GFES | 288×180 | NOAA-GFDL/America |

6 | IPSL-CM6A-LR | IPSL | 144×143 | IPSL/France |

7 | MIROC-ES2L | MIRE | 128×64 | JAMSTE、CAORI、NIES、RCCS/Japan |

8 | MIROC6 | MIRO | 256×128 | JAMSTEC、AORI、NIESR-CCS/Japan |

9 | MRI-ESM2-0 | MRIE | 320×160 | MRI/Japan |

10 | UKESM1 | UKES | 192×144 | NCAS/British |

## 2.2 Methodology

For this study, the SPEI was employed to classify drought grading standards. Grade classification standards refer to "GB/T 20481 − 2017 Meteorological Drought Grade”, as shown in Table 2:

Table 2

SPEI drought grade divisions.

Drought grade | SPEI | Category |

0 | -0.5 and above | No drought |

1 | -1.00 to -0.50 | mild drought |

2 | -1.50 to -1.00 | moderate drought |

3 | -2.00 to -1.50 | severe drought |

4 | Below and − 2.00 | extreme drought |

The SPEI index was calculated based on two different potential evapotranspiration formulas due to different parameter acquisition in the CMIP6 data (Vicente-Serrano et al. 2010). The PET based on the Penman formula (PM) was to calculate the SPEI for the historical period. The PET based on the Thornthwaite formula (Tho) was used to calculate the future SPEI index (Thornthwaite 1948):

Tho:

$$E{T}_{0}=k(10T/I{)}^{\alpha }\times uN/360$$

PM:

$$E{T}_{0}=\frac{0.408\varDelta \left({R}_{n}-G\right)+\gamma \frac{900}{T+273}{U}_{2}({e}_{s}-{e}_{\alpha })}{\varDelta +\gamma (1+0.34{U}_{2})}$$

Record the difference between monthly precipitation and potential evapotranspiration as \({D}_{i}\):

$${D}_{i}={P}_{i}-(E{T}_{0}{)}_{i}$$

Normalize the \({D}_{n}^{k}\) and get the SPEI value:

$${D}_{n}^{k}=\sum _{i=0}^{k-1}[{P}_{n-i}-\left(E{T}_{0}{)}_{n-i}\right](n\ge k)$$

In the above normalization process, Vicente-Serrao et al. compared the Log-logistic distribution, Pearson III distribution, Log normal distribution and Generalized extreme value distribution with the sequence \({D}_{n}^{k}\), and the results showed that the Log-logistic distribution was the best fit to the sequence \({D}_{n}^{k}\) (Thornthwaite 1948).

Drought frequency describes the frequency of drought occurrence at a station or grid point during the study period, and the larger the value, the higher the frequency of drought occurrence, calculated as follows:

$$P=\frac{n}{N}100\text{\%}$$

Where \(n\) is the total number of years, that the drought occurred at the site, and \(N\) is the total length of the data series of the site, i.e. the total number of time.

The drought station sub-ratio describes the extent of drought impact at a certain point in time when drought occurs in the study area, and the larger the value indicates the greater the extent of drought impact, calculated as follows:

$$K=\frac{m}{M}100\text{\%}$$

The Mann-Kendall (MK) test is often used to evaluate trends of long series of hydrological, meteorological, and other data (Mann 1945). Modified Mann-Kendall (MMK) was proposed by a modified MK test (Hamed et al. 1998) to test the significance of SPEI sequences for different time periods and scenarios. Below are the calculation steps:

for a time series \(X=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\}\),The period, tatistic \(S\) is:

$$S=\sum _{i<j} {a}_{ij}=\sum _{i<j} \text{sgn}\left({x}_{j}-{x}_{i}\right)$$

Where \(\text{s}\text{g}\text{n}\) is the sign function, the var (\(S\)) is calculated as the following formula:

$$\text{v}\text{a}\text{r}\left(S\right)=\eta \times \frac{n(n-1)(2n+5)}{18}$$

\(\eta =1+\frac{1}{n(n-1)(n-2)}\times \sum _{i=1}^{n-1} (n-i)(n-i-1)(n-i-2){r}_{i}\) , the statistic \(Z\) in the MMK trend test can be obtained, and the calculation formula is as follows:

$$Z=\left\{\begin{array}{cc}\frac{S-1}{\sqrt{\text{v}\text{a}\text{r}\left(S\right)}}& S>0\\ 0& S=0\\ \frac{S+1}{\sqrt{\text{v}\text{a}\text{r}\left(S\right)}}& S<0\end{array}\right.$$

Sen’s slope estimator is a simple nonparametric estimator (Sen 1968). Sen + Mann-Kendall combination can improve the accuracy of judging climate trend changes (Li et al. 2015). The slope between all points \({S}_{i}\) of a given time series is calculated, \(X=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\}\):

$${S}_{i}=\frac{{x}_{j}-{x}_{k}}{j-k}(i=\text{1,2},\dots ,N)$$

In this study, to increase the applicability of the simulation results in the research area, Bias Corrected Spatial Disaggregation (BCSD) (Li et al. 2010) was used. The data of each mode was resampled to a resolution of 0.5°×0.5°. The Bias Correction was as follows:

$${X}_{m-p,adj}={X}_{m-p}+{F}_{\text{o}-c}^{-1}\left[{F}_{m-p}\left({X}_{m-p}\right)\right]-{F}_{m-c}^{-1}\left[{F}_{m-p}\left({X}_{m-p}\right)\right]$$

The Taylor diagram was adopted to appraise the simulation of all the bias-corrected CMIP6 climate patterns, which provided a concise statistical summary of how well the patterns resembled each other in terms of their centered RMS (root-mean-square) difference, standard deviation, and correlation (Taylor 2001):

Measured data mean square error:

$$S{D}_{obs}={\left[\frac{1}{N}\sum _{n=1}^{N} {\left({O}_{n}-\stackrel{-}{O}\right)}^{2}\right]}^{1/2}$$

Mode data mean square error:

$${SD}_{X}={\left[\frac{1}{N}\sum _{n=1}^{N} {\left({X}_{n}-\stackrel{-}{X}\right)}^{2}\right]}^{1/2}$$

Correlation coefficient:

$$R=\frac{\frac{1}{N}\sum _{n=1}^{N} \left({X}_{n}-\stackrel{-}{X}\right)\left({O}_{n}-\stackrel{-}{O}\right)}{S{D}_{obs}{SD}_{X}}$$

Root mean square error:

$$RMS={\left\{\frac{1}{N}\sum _{n=1}^{N} {\left[\left({X}_{n}-\stackrel{-}{X}\right)\left({O}_{n}-\stackrel{-}{O}\right)\right]}^{2}\right\}}^{1/2}$$

The EOF method (Empirical orthogonal function decomposition), was initially proposed by statistician Pearson in 1902. Lorenz integrated it into atmospheric science research in the mid-1950’s. The basic principle was to decompose the field containing m spatial points over time, and subsequently to extract several major independent and orthogonal spatially distributed and time-varying characteristic patterns from the elemental variable fields.

$$X=\left({x}_{ij}\right)=\left({x}_{1},{x}_{2},\dots ,{x}_{j}\right)=\left[\begin{array}{ccc}{x}_{11}& \cdots & {x}_{1n}\\ ⋮& \ddots & ⋮\\ {x}_{m1}& \cdots & {x}_{mn}\end{array}\right]$$

The EOF decomposition gives:

$$V=\left({v}_{1},{v}_{2},\dots ,{v}_{m}\right)=\left[\begin{array}{ccc}{v}_{11}& \cdots & {v}_{m1}\\ ⋮& \ddots & ⋮\\ {v}_{1n}& \cdots & {v}_{mm}\end{array}\right]$$

$$T= {V}^{T}X=\left[\begin{array}{c}{t}_{1}\\ {t}_{2}\\ ⋮\\ {t}_{m}\end{array}\right]=\left[\begin{array}{ccc}{t}_{11}& \cdots & {t}_{1n}\\ ⋮& \ddots & ⋮\\ {t}_{m1}& \cdots & {t}_{mn}\end{array}\right]$$

V is a function of spatial coordinates only, and is the eigenvector of the covariance array quantity, which is the eigenvalue of the explained variance corresponding to the covariance array; T is the time function, which is uniquely determined by V and X.