Projected future changes of tropical cyclone genesis frequency in the Northern Hemisphere based on a multi-timescale regression model

Large uncertainties exist in the projected future TC genesis frequency (TCGF) due to the existence of various timescale internal climate variabilities and external forcing. Here, we introduce a statistical multi-timescale TCGF regression model, including contributions by three interannual modes, two interdecadal modes, and a global warming mode. The model is shown to be able to capture well the present-day multi-timescale changes in TCGF in the major TC basins in the Northern Hemisphere. The model results demonstrate that change in TCGF over the western North Pacic are predominantly modulated by internal climate variability while that over the eastern North Pacic is dominated by global warming and that over the North Atlantic is controlled about equally by the internal climate variability and global warming. Consistently, the model projects a signicant increase over the eastern North Pacic and North Atlantic with insignicant trend over the western North Pacic.


Background
Numerous efforts have been made to understand tropical cyclone (TC) variability and to project their future changes owing to the large societal impacts from TCs 1 . In particular, understanding TC genesis is critically important but extraordinarily di cult 2 . TC genesis involves interactions among various timescale motions from large-scale circulation to small-scale convection. The large-scale environmental controls are largely modulated by not only local sea surface temperature (SST) modes but also transbasin SST anomalies related to internal (natural) climate modes (interannual and interdecadal/multidecadal variabilities) and long-term climate trends associated with external forcing (such as anthropogenic greenhouse gas emissions). The numerical modeling or dynamics downscaling projected either an increasing or a decreasing trend of TC genesis frequency (TCGF), remaining large uncertainty for future projections [3][4][5][6] . Therefore, understanding and projecting future TCGF changes is a topic of profound societal concerns and scienti c interests Considerable progress has been made in identifying and understanding large-scale factors that affect the variability of TCGF in the past decades. On the interannual timescale, the El Niño-Southern Oscillation (ENSO) has a pantropic impact on TCGFs over the western North Paci c (WNP) 7 , the eastern North Paci c (ENP) 8,9 and the North Atlantic (NA) 10 . During the warm phase of ENSO, TCGF over the ENP increases mainly due to the favorable atmospheric conditions and large oceanic energy supply, while TCGF over the NA is signi cantly suppressed due to the increased vertical wind shear and negative lowlevel vorticity anomalies, indicating a signi cant interannual seesaw across basins. Over the WNP, the canonical ENSO tends to induce a dipole response of TCGF change between the southeastern and northwestern quadrants of the basin 7 . WNP TCGF is also linked to the SST gradient between the Southwest Paci c and the western Paci c warm pool 11 , and the SST anomalies (SSTAs) over the eastern Indian Ocean (EIO) 17 and the tropical NA (TNA) 18 . Moreover, the inter-basin SST gradient (also called "relative" SST) between the ENP and the NA affects TCGFs over the ENP and the NA [14][15][16] .
TCGF also exhibits strong decadal/interdecadal or multidecadal variabilities [17][18][19][20][21] . Many studies have shown that the Atlantic Multidecadal Oscillation (AMO) 24 affects TCGFs over the NA, ENP and WNP by modulating the local and remote atmospheric circulations. Besides the AMO, the Paci c Decadal Oscillation (PDO) or Interdecadal Paci c Oscillation (IPO) 22,26 plays an important role in modulating TCGFs over the North Paci c with a basin-wide uniform increase in its positive phase and a decrease in its negative phase 18,[21][22][23] . The TCGFs over the NA and North Paci c also exhibit a seesaw due to the trans-basin effects of SST on decadal timescale 22,[24][25][26] .
In addition to internal climate variability, anthropogenic green-house-gas-induced warming can also in uence TCGF 6 . However, how and to what extent the green-house-gas-induced GW affects TCGF remain uncertain. Most climate models projected a future decrease, while some other studies projected an increase in global TCGF 3,5,27 . Such opposing results call for caution in the interpretation of existing climate simulations and the con dence in the projected TCGF change by climate models 28 . Uncertaintied in the SST warming pattern and model biases are important factors that can lead to a large diversity in the projected future changes in TCGF [29][30][31] .
The TC genesis potential index (GPI) is commonly used to explore TCGF in the present-day and future climates. Gray 32 rst constructed a seasonal genesis parameter based on several environmental variables, which was further modi ed and reconstructed in many other studies [33][34][35][36][37] . Among them, the GPI developed by Emanuel and Nolan 33 has been widely used with skills in replicating the seasonal and interannual variabilities of the observed TCGF in various TC basins 38 . However, the GPI performance is not always good in either reproducing the long-term variability or projecting future changes in TCGF 35 .
It is well known that the SST not only affects the atmospheric and oceanic thermostat, but also modulates large-scale dynamical circulation both locally and remotely. Therefore, in this study, we introduce a multi-timescale regression model of TCGF based on ve key SST factors and a GW factor, all derived from SST data. We will show that the new model can reproduce the observed multi-timescale variability in TCGF in the Northern Hemisphere (NH) in the present-day climate and can well delineate the observed TCGF changes on interannual and interdecadal timescales as well as the long-term trend. The new model is also used to project future changes in TCGF in the three main TC basins, namely the WNP, ENP, and NA, in the NH with the future SST pattern and key SST factors constructed based partly on the historical data and partly on the state-of-the-art climate model projections, including the GW trend. The results show a steady future increase in TCGF over both the ENP and the NA with insigni cant trends over the NWP. Figure 1 shows the feedback coe cients of the responses of TCGF to the six individual key factors (Supplementary Figs. S1 and S2, See method) based on the generalized equilibrium feedback assessment (GEFA) method 39,40 (See method). A positive (negative) response means that the TCGF is above (below) normal when the factor has a positive (negative) anomaly. It is obvious that all six factors have signi cant in uence on TCGFs over the three TC basins in the NH, but exhibit different spatial patterns, which are consistent with previous studies as introduced above. These signi cant in uences can be explained by the responses of the large-scale environmental conditions to these factors (Supplementary Figs. S3 and S4). Namely, the large-scale environmental atmospheric anomalies induced by these factors provide either favorable or unfavorable conditions for TC genesis on multi-timescales over different basins in the NH.

Results
The response coe cients also re ect the relative contributions of individual factors to the TCGF. Over the NA, the TNA, AMO and GW contribute the most to TCGF, and the ENSO and EIO contribute the second most. Over the ENP, GW and AMO contributes the most, and ENSO contribute the second most, with signi cant negative (positive) contributions over the eastern (western) ENP. Over the WNP, ENSO contributes the most with opposite signs over the northwestern and southeastern quadrants, and AMO and GW contribute the second most. AMO seems to be the most important factor for TCGF over the South China Sea. These results consistently show that the internal climate variability is more important for TCGF over the WNP than the GW although the GW also exerts signi cant impact, while the GW dominates other factors over the ENP and the internal variability and the GW are about equally important over the NA.
Performance of the regression model in reproducing the multi-timescale variability of TCGF Here, the performance of the regression model in reproducing the multi-timescale variability of TCGF in the NH is evaluated and the estimated TCGF is compared with the commonly used GPI as well as the state-of-the-art climate models. Figure 2 compares the climatological means of the observed TCGF (Fig.  2a), the leave-one-out regressed TCGF (Fig. 2b), which is independent of the withholding samples, and the commonly used GPI (Fig. 2c), all averaged during JJASON for the period 1960-2019 in the NH. Overall, the spatial pattern correlation between the observed and regressed TCGFs is higher than that between the observed TCGF and the commonly used GPI. Speci cally, the GPI shows considerable discrepancies in both the amplitude and spatial distribution of the climatological TCGF (Fig. 2c). In sharp contrast, the regressed TCGF shows good consistency with the observed climatological TCGF, with the spatial correlation with the observed as high as 0.99 (Fig. 2b).
We further examined how skillful the new model is in reproducing the observed TCGFs in the three NH TC basins. Figures 2d-2f show the correlation coe cients of the observed TCGF with the regressed TCGF and the commonly used GPI in the 1960-2019 period. The red bars represent the correlation coe cients calculated for the raw time series while the blue bars for the components beyond the interannual time scale. In general, the GPI fails to reproduce the multi-timescale variations of the observed TCGF over the ENP and WNP (red bars), especially for the long-term variations (blue bars). The correlation between the GPI based on the NCEP reanalysis and the observed TCGF is weak over three basins, with the correlation coe cients being, respectively, 0.18 over the NA, 0.33 ENP, and -0.16 over the WNP. The correlations are 0.6, 0.37 and -0.16 based on the ERA5 reanalysis, respectively. In sharp contrast, the correlation coe cients between the regressed and the observed TCGFs reach 0.63, 0.48 and 0.34, respectively, over the NA, the ENP, and the WNP, all being statistically signi cant over 95% con dence level ( Fig. 2d-f). More importantly, after we ltered out the interannual variability, we found that the regressed TCGF well captured the long-term variability of the observed TCGF with the correlation coe cients reaching 0.87, 0.74 and 0.48 over the NA, ENP and WNP, respectively, all being statistically signi cant over 95% con dence level (blue bar in Fig. 2d-f). However, the correlations between the observed TCGF and the GPI are insigni cant except for over the NA for the ERA data and even for all of three basins for the NCEP data. Furthermore, for both reanalysis datasets, the long-term variations between the GPI and the observed TCGF are negatively correlated over the WNP (Fig. 2f). This strongly suggests that the regressed model developed in this study can better represent the multi-scale changes in TCGF in the three NH TC basins, especially for the long-term changes, compared to the commonly used GPI.
Currently, the state-of-the-art climate models are widely used in studying the variability and future change in TCGF 5,6 . We examined the performance of the European Center for Medium Weather Forecasting's To understand why our multi-timescale regression model is skillful in reproducing the observed TCGF, we analyzed the connection of the estimated TCGF and GPI with the six key factors used in the regression model. Figure 3 shows the composite differences in the observed TCGF, the regressed TCGF, and GPI between the positive and negative phases of the rst ve factors, and their trend increments as the GW impact. To con rm the data independency, the composite differences in GPI based on the ERA5 reanalysis data are shown in SI Appendix Fig. S6. As expected, the regressed TCGF well matches the responses of the observed TCGF anomalies to the six key factors over the three TC basins, with the spatial distribution consistent with that of the observed TCGF. Since these key factors largely control the TCGF change from interannual and interdecadal variations to long-term trend and also include both intrabasin and inter-basin in uences, the regression model can well reproduce the multi-timescale variability of the observed TCGF over the major TC basins in the NH. In contrast, the commonly used GPI based on the two reanalysis datasets shows large discrepancies from the observed TCGF. The largest discrepancy is found in the responses to both the interdecadal factors and the GW mode, giving rise to nearly opposite signs between the GPI and the observed TCGF in the three basins. Although the GPI reproduces the observed TCGF responses to the interannual factors reasonably well, discrepancies still exist in the extreme values and their spatial locations. These results are consistent with those found in previous studies based on observations and climate models 35,42 and are also con rmed by comparing the correlation coe cients between these key factors and, respectively, the observed TCGF, the regressed TCGF, and the commonly used GPI (SI Appendix Table S1).
Projected future change of TCGF in the NH using the multi-timescale regression model Since the regression model can well capture the long-term changes in the observed TCGF in the NH, if the future changes in the six key factors are known, it can be used to project future changes in TCGF in the NH. For example, the future changes in the six key factors can be obtained from global model ensemble simulations of the CMIP5 or CMIP6. However, as mentioned above, climate models from the CMIP6 HighResMIP have low skills in reproducing the responses of the TCGF to long-term climate changes. In   (Figs. 4b, 4e and 4h), the probability of signi cant increasing trends in TCGF under the slow warming condition is the highest among the three NH TC basins with the probability of about 70% (Fig. 4b), while the probability reaches 100% in both the medium and extreme warming conditions. Over the WNP (Figs.  4c, 4f and 4i), the probabilities of signi cant linear trend in TCGF from 2019-2069 are considerably small in all the three warming conditions, and the trend is generally not signi cant. This strongly suggests that there would be more TCs to form over the ENP and the NA under GW with higher con dence for the ENP, while the future change in TCGF over the WNP is strongly modulated by the internal climate variability.
The projected increase in TCGF in the ENP and NA is consistent with some previously reported downscaling results 3,27 , but is different from the projected decrease by the majority of climate models 5 . The climate model projected decrease is ranked as being low-to-medium con dence, even lower for individual TC basins. This is because the state-of-the-art climate models have very limited skill in representing the TCGF change beyond the interannual timescale 28,46 . In addition, our results also strongly suggest that the impact of GW on TCGF is predominant over the ENP and NA, but is insigni cant over the WNP, where the internal climate variability may play dominant roles in controlling the long-term changes in TCGF, as also indicated in some previous studies 23,30 .

Conclusions And Discussion
In this study, a multi-timescale TCGF regression model has been constructed with six key climate factors that signi cantly control TCGF at different timescales in the NH. The regression model is skillful in reproducing the multi-timescale variability of TCGF over the major TC basins in the NH. Using this model and the projected key factors under three assumed GW scenarios, we have projected a steady increase in TCGF over the ENP and NA in a warmer climate but insigni cant trend over the WNP, where the future long-term changes in TCGF are dominated by internal climate (mainly the interdecadal) variability.

In addition to the six well-documented key factors used in the regression model, previous studies have also revealed the signi cant impacts of other factors on TCGF in the NH, such as the Paci c Meridional
Mode and the inter-hemispheric SST gradient 11,47 . Nevertheless, their impacts could have been partially included in the six key factors. This may be the reason why the regression model with the six key factors is skillful in reproducing the multi-timescale variability of the observed TCGF. To project the future TCGF change, some assumptions have been made, including the interannual variations remained under the present-day climate and reasonable simulations of decadal/interdecadal SST modes under assumed GW scenarios by state-of-the-art climate models. In addition, the possible nonlinear interactions among motions on various timescales in our projections of the key factors are not considered and are assumed quite weak. Nevertheless, the projected future TCGF in this study supports the downscaling results in several previous studies 3,27 . Therefore, the newly developed multi-timescale regression model can provide an additional reference with an alternative independent approach.

Spatial distribution of TCGF
Inspired by the kernel density method of TCGF 48 , the NH domain is meshed into 1° 1° grid box and then the TCGF in each grid box is counted as the sum of the TCs formed in an area of 10 degrees in the zonal direction and 5 degrees in the meridional direction centred at this grid box. This method, as a smoother, can not only reduce the possible uncertainty due to the scattering nature of TC genesis or bias of genesis location in the TC data, but also establish a stable relationship between the observed TCGF and the related environmental factors. We chose a 20° 10° domain considering the fact that the synoptic waves, such as equatorial Rossby waves, mixed-Rossby-gravity waves, and easterly waves, or other types of synoptic disturbances, that trigger TC genesis have horizontal scale of about two thousands of kilometers in the zonal direction and e-folded in the meridional direction 49 . However, we also tested 30° 15° and 10° 5° domains. Basically, the result from 30° 15° domain shows a similar correlation with that using 20° 10° domain. However, the climatological TC genesis density over the WNP is higher than that over the ENP due to the large TC-counting domain, which is unrealistic. The result from 10° 5° domain is too chaotic owing to the small sample size. We focus on TCGF during June-November (JJASON) in the NH where about 85% of global TCs form each year 50 . Note that since TCs over the North Indian Ocean mainly form in the pre-monsoon (April and May) and the post-monsoon (October and November) seasons 51 , the TCGF over the North Indian Ocean is not included in this study. Therefore, this study has focused on TCGFs over the three main TC basins in the NH, including the WNP, the ENP, and the NA.

Six oceanic factors de ned by SST
The SST data are used to de ne the key SST indices controlling the interannual variability of TCGFs, including ENSO, EIO and TNA indices with the decadal signal (≥10 years) removed based on the fast Fourier transform. Here, we de ne the SST in JJASON averaged over the Nino3.4 (5°S-5°N, 120°W-170°W) as the ENSO index, that averaged over the EIO (10°S-22.5°N, 75°E-100°E) as the EIO index 12 , and that averaged over the tropical Atlantic (5°N-25°N; 30°W-70°W) as the TNA index 13 . All the ENSO, EIO and TNA warm and cold events are selected based on the criteria of above and below 0.7 standard deviations, respectively. Based on these criteria, 20 ENSO events, including 9 El Niño and 11 La Niña, 23 EIO events, including 10 warm and 13 cold episodes, and 20 TNA events, including 10 warm and 10 cold events are identi ed (Supplementary Figure S1). Since both the IPO and AMO indices mainly control the variabilities of TCGFs with timescales beyond the interannual timescale, the interannual variability (<10 years) is removed from both the IPO and AMO indices using the fast Fourier transform. The IPO experienced two negative phases (1960-1975 and 1998-2019) and one positive phase , while the AMO exhibited one negative phase  and one positive phase (1996-2019) in our study period. The global warming time series is de ned by the global mean SST over 45ºS-45ºN, 0º-360º.

Construction of a multi-timescale TCGF regression model in the NH
The new multi-timescale regression model developed in this study predicts the TCGF in the NH with the factors of both internal climate variability and external forcing and is expressed in the form. We consider the terms on the RHS of Eq. (1) as being linearly proportional to six key factors controlling the TCGF in the NH with ve being related to the internal climate variability and one being the external forcing. Among the internal climate factors, there are three interannual climate modes (ENSO, EIO SST, and the TNA SST) and two interdecadal climate modes (the IPO and the AMO). The external forcing (GWSST) is represented by the increasing global mean SST averaged over 45ºS-45ºN, 0º-360º. These factors have been recognized to greatly modulate the interannual and longer-timescale variabilities of TCGF. The regressed coe cients in the regression model Eq. (1) vary with basin as inferred in Table 1. The normalized time series of these factors are shown in Supplementary Fig. S1, and the composite difference in SST anomalies between the positive and negative phases of each factor is displayed in Supplementary Fig.  S2. The SST anomalies also show a signi cant GW pattern in response to the increasing CO 2 concentration ( Supplementary Fig. S2f). It is noted that the SST warming rate is different over different basins, suggesting that the GW impact on TCGF could differ across basins. To con rm the signi cance of these six factors, we examined the relationships between the observed TCGF and these factors and the possibly involved physical mechanisms.
The GPI and generalized equilibrium feedback assessment (GEFA) The performance of the newly constructed multi-timescale regression model is compared with the widely used GPI developed by Emanuel and Nolan 33 , which is a function of several environmental variables given below: where is the absolute vertical vorticity at 850 hPa, Rhum is the relative humidity at 600 hPa, is the TC maximum potential intensity 52 , and VWS is the vertical shear of horizontal winds between 200 and 850 hPa.
The newly constructed multi-timescale regression model describes the relationship between the TCGF averaged over a TC season and the key factors that are known to control different timescale variabilities of TCGF and thus can reproduce the TCGF in each grid box in the NH. To con rm the validity of the multitimescale regression model, the leave-one-out cross-validation method 53

Statistics
In this study, the statistical signi cances were checked based on three methods: the two-tailed Student's t test, the two-sided Mann-Kendall trend test and the Monte-Carlo test. The Student's t test is adopted for testing if the composites of TCTD and circulation elds are signi cant or not. The degree of freedom is 58 for the 60-year data. The Mann-Kendall trend test is used to verify if the linear trends of projecting and simulating TCGF with the degree of freedom 60 for historical data and 50 for future projection results, are signi cant or not. The Monte-Carlo test repeated the sampling randomly for 100 times, and tested whether the GEFA contribution is signi cant above 90% con dence level.

Data availability
The 6-hourly NH TC best-track dataset is obtained from the International Best Track Archive for Climate Stewardship version 4 (IBTrACS v4; https://www.ncei.noaa.gov/data/international-best-track-archive-forclimate-stewardship-ibtracs/v04r00/access/netcdf/), which includes the central position (longitude and latitude) and the intensity in terms of maximum sustained near-surface wind speed of each TC during the 1960-2019 period. The TC genesis location is de ned as the central position where a TC reaches 35 knots for the rst time.
The monthly Extended Reconstructed SST version 5 data 54 with a horizontal resolution of 2º 2º were obtained from the National Oceanic and Atmospheric Administration (NOAA). In the history data, the IPO index (https://www.esrl.noaa.gov/psd/data/timeseries/IPOTPI/) and AMO index (https://www.esrl.noaa.gov/psd/data/ timeseries/AMO/) were downloaded from the Earth System Research Laboratory (ESRL) of NOAA, both were constructed based on the same monthly SST dataset.
The atmospheric data, including winds and relative humidity, were obtained from the National Centres for Environmental Prediction/National Centers for Atmospheric Research (NCEP/NCAR) reanalysis I with a horizontal resolution of 2.5° ´ 2.5° ( https://psl.noaa.gov/data/gridded/data.ncep.reanalysis. html) 56     The dots indicate areas where the differences are statistically signi cant at the 95% con dence level by the Student's t test. The contours represent the observed TCGF and the shades stand for hindcast TCGFs by the multi-timescale regression model (left) and the GPI (right). The yellow lines represent the projections for 100 random cases from 100 members and the shades indicate the spread. Here, the climatological mean TCGFs (11.6, 16.11, 22.03 for respective NA, ENP and