Bayesian Poisson Autoregressive Modeling of the Philippine Storm Frequencies

The study extends the log-linear Poisson autoregressive model proposed by Fokianos and Tjøstheim (2011) into a Bayesian framework, and applies it to modeling the yearly tropical cyclone frequencies within the Philippine Area of Responsibility. The data, taken from the Japan Meteorological Agency’s Regional Specialized Meteorological Center, with time points running from 1950 to 2021, is modeled at a semesterlevel granularity to take advantage of the periodic patterns suitable for autoregressive modeling. The estimation is done using Markov Chain Monte Carlo algorithm via the No-U-Turn Sampler. Finally, the proposed model can infer the characteristics of the Philippine storm frequencies, both in terms of the point forecasts and the associated uncertainties.


Introduction
With a maximum sustained wind speed of 315 kilometers per hour (kph) or 170 knots, and gusts reaching 379 kph (205 knots) just before landfall (Lagmay et al, 2015), super typhoon Haiyan-known locally as Yolanda-became one of the most powerful hurricanes ever recorded in history. It left the Philippines with 6300 deaths, 1061 missings and 28689 injured (Lagmay et al, 2015). Such disaster forced coconut farmers to explore other sources of income, including raising livestock, growing other crops, etc. after Yolanda reduced their yearly income by more than 400 USD/hectare (Seriño et al, 2021). These statistics are the very reasons why it is important to study the characteristics of the tropical cyclones (TCs), especially for the Philippines.
Among the efforts to understanding these characteristics is the study of the annual frequencies of the TCs, where the aim is to come up with a model that can accurately forecast future storm activities, and be used as part of the early warning systems for policy makers. Early works for the Western North Pacific (WNP) region, where the Philippines is located, focused on finding for independent covariates that can be used as predictors for modeling the TC frequencies. These covariates are atmospheric/oceanic signals such as the El Niño/Southern Oscillation (ENSO) phenomenon, where indices known to affect this event like the sea-surface temperature (SST) were studied by Chan (1985), whose findings motivated Chan (1995) to also study indices for the quasi-Biennial Oscillation (QBO) as possible predictors for the TC activities in the WNP.
Using the findings mentioned above, Chan et al (1998) finally developed a statistical model for WNP TC activities using the following predictors: i. ENSO indices; ii. the environmental conditions over East Asia and the WNP from April of the previous year to March of the current year; and, iii. climatology and persistence (see Murphy (1992)) predictors. The model turn out to do quite well despite the findings of Lander (1994) that the observed annual TC totals in the WNP region are not significantly affected by any ENSO indices. The model used by Chan et al (1998) was a generalization of Projection Pursuit Regression (PPR) named Smooth Multiple Additive Regression Technique (Friedman, 1985), which was further improved in terms of predictors by Chan et al (2001). It should be noted that, Wang and Chan (2002) showed how strong the effect of ENSO on the TC activities in the WNP region, addressing earlier findings of Lander (1994). The quest to look for other indices, drived Fan (2007) to use a North Pacific Oscillation (NPO) index as part of the covariates using multiple linear regression model as the prediction equation. Another related work is the study by Choi et al (2010) on the use of three teleconnection patterns: the Siberian High Oscillation (SHO) index in the East Asian continent; the NPO; and, the Antarctic Oscillation (AAO) near the Australia during the boreal spring (April-May). The indices found for these three teleconnection patterns were used as predictors for their multiple linear regression model. Finally, Fan and Wang (2009) proposed a new approach to modeling TC activities, where instead of modeling the yearly totals, they proposed to model the year-over-year change of the frequency instead. The model used was also a multiple linear regression model.
Statistically, TC frequencies can be assumed to be Poisson distributed, so that later papers have incorporated this to their modeling. For example, Zhang et al (2018) developed a Poisson regression model to predict the typhoon genesis formation in the WNP using April SST anomalies in the North Pacific region and SST anomalies over the Coral Sea off east coast of Australia. McDonnell and Holbrook (2004a) also used Poisson regression with saturated equivalent potential temperature gradient as part of its predictors for the Australian region (see also McDonnell and Holbrook (2004b)). Moreover, recent papers such as Magee et al (2021), used multivariate Poisson regression to develop a location-specific TC frequency outlook using 16 covariates, among of which is the Trans NINO Index (TNI)-difference in normalized SST anomalies between the NINO1+2 (0°-10°S, 90°-80°W) and NINO4 (5°N-5°S, 160°E-150°W) regions. The locations targeted by the said paper are countries in the South East Asia region, including the Philippines.
Combinations of dynamical and statistical modeling have been explored as well, some of the works include Li et al (2013) who combined the National Centers for Environmental Prediction (NCEP) Climate Forecast System version 2 (CFSv2)-a fully coupled ocean-land-atmosphere dynamical model-with linear regression model. The idea of a hybrid model is that, the forecast of the dynamical model is used as predictor for the statistical model. As such, Zhang et al (2017) used dynamical model forecasts of the GFDL Forecast-Oriented Low Ocean Resolution version of CM2.5 with flux adjustments (FLOR-FA) as input for their Poisson regression. Finally, Tian and Fan (2019) uses the approach done by Fan and Wang (2009) but using a hybrid model.
While Poisson log-linear regression modeling does appropriately capture or at least approximate the true variations of the TC counts, the past values of it should also be taken into our advantage. Indeed, in time series modeling, the core motivation is that, variations of the response variable are expected to be time dependent. As for climatic events, the fact that dynamics are continuous over time, from gradual changes in temperature to movements in atmospheric circulations, simply suggest that, by nature, current weather patterns can be described by its preceding state. However, when these continuous changes are aggregated to discrete time points summarized across a wide temporal horizon like annual, a lot of informations are loss, making it difficult to model the variations, and hence most literatures resort to finding sets of independent covariates. This issue is of particular interest in this study. Among the literatures mentioned, Tian and Fan (2019) addressed the above limitation by incorporating a autoregressive term albeit not for a Poisson regression, since they modeled the yearly change of the frequency count.
Early works on the use of Autoregressive (AR) model (see Chapter 8 of Hyndman and Athanasopoulos (2018)) for count time series modeling was seen as problematic, mainly due to the "multiplication problem" (Weiss, 2018). To address this, McKenzie (1985) proposed some solutions which motivated the interger-valued AR(1) (INAR(1)) and Poisson INAR(1) models (see also Al-Osh and Alzaid (1987); and, Alzaid and Al-Osh (1988)). Moreover, another approach to addressing the "multiplication problem" is the Interger-Valued Generalized AR Conditional Heteroscedasticity (INGARCH) by Ferland et al (2006), which models the count time series as a linear regression of the conditional means. So that, the count at a given time is generated by using, say, a conditional Poisson distribution (Weiss, 2018). This model led Fokianos et al (2009) to develop the Poisson AR (PoisAR) model, where the log-linear version of it (Fokianos and Tjøstheim, 2011) is the one used in this paper. Other related works were studied by Fokianos (2012), and Fokianos and Fried (2012).
One thing to emphasize, however, is that this paper further extends the work of Fokianos and Tjøstheim (2011) by also proposing a Bayesian approach to it, which to the best knowledge of the author none has ever explored yet. The direct benefit of using Bayesian inference is the ability to study the behavior of the parameters through its marginal posterior distribution. This distribution allows researchers to study the uncertainty of the parameters, and give insights into the full characteristics of the estimate. Most importantly, it naturally provide prediction interval due to probabilistic nature of Bayesian inference, and is very useful for estimating the TC activities. Further, the work of Brandt and Sandler (2012) on Bayesian Poisson Vector Autoregressive (Bayesian Pois-VAR) model is based on a different specification, and is not the same model (at least the univariate counterpart of PoisVAR) with the one by Fokianos and Tjøstheim (2011), which is extended to Bayesian in this paper.
Moving on, in terms of the data, the interest of the paper is not on the broader region of WNP, but rather on a specific country, the Philippines, in particular, the Philippine Area of Responsibility (PAR). Hence, the frequency defined here is not based on the TC geneses-since these activities often developed outside PAR-but rather on the actual TCs entering the PAR. This study can therefore be compared to one of the locations studied by Magee et al (2021), the Philippines.
As for the covariates, Magee et al (2021) summarized the indices often used by their automated variable selection algorithm as predictors for the Philippine TC counts. This is the first study to have come up with indices for the Philippines, as far as we know. The indices identified are: NINO3.4 (average SSTs in the NINO3.4 region defined over 5°-5°S, 170°-120°W), IOD E (SST anomalies in the IOD E region defined over 0°-10°S, 90°-110°E), IOD W (SST anomalies in the IOD W region defined over western pole of the DMI at 10°N-10°S, 50°-70°E), DMI (diference in SST anomalies between IOD W and IOD E), PNA (Rotated Principal Component Analysis (RPCA) applied to monthly standardised 500mb height anomalies between 20°and 90°N), QBO (zonal average of the equatorial 30mb zonal wind) and PMM (Maximum Covariance Analysis (MCA) to SSTs at 21°S-32°N, 74°W-15°E). These covariates, however, are not explored in this paper. Instead, this work aims to provide a simpler model that is easy to operationalize but with competitive performance. By this, we mean, using historical data as the only predictors, autoregressed against its future values via a canonical link function, in this case the log function.
The absence of independent covariates makes the proposed model simpler if not the simplest compared to most models proposed in the literatures mentioned earlier. It comes with an obvious cost, the model error. This is especially a problem for annual resolution since most temporal patterns are loss due to aggregation. In order to address this, the study proposes modeling at a semester-level granularity instead. This follows from the study of Asaad (2021) for the case of the Philippines, where the first semester (January to June) is not a TC season, so zero to very few typhoons are recorded; whereas, the second semester is where most of the typhoons occur. Therefore, the first semester is expected to have lower TC totals, while the second semester is expected to have higher frequencies. This annually repeating pattern is suitable for autoregressive (AR) models since it uses past values as its indicator for forecasting future values. Further, most literatures mentioned, only model typhoon season months, which in the case of the PAR corresponds to the second semester. However, the benefit of including the first semester which are the non-TC seasons follows from the assumption that, the first semester's variations can affect the second semester's variations; and, that the second semester's variations can affect the succeeding first semester's variations. This assumption is based on the dynamics of the weather discussed earlier, where current state of the weather can be explained by its preceding state. Moreover, the simplicity of the model allows it to be easy to use in terms of operationalization. Lastly, Magee et al (2021) used June-November as TC season for the Philippines, which more or less the second semester considered here.
As for the Bayesian inference proposed here, Chand et al (2010) also used Bayesian approach for Poisson regression modeling of the TC activities in the Fiji region. The authors used Gibbs sampling as the choice for Markov Chain Monte Carlo (MCMC) (Hastings, 1970) algorithm for 10000 iterations, with 3000 burn-in. They also compared the result with the Maximum Likelihood Estimate (MLE) of the Frequentist approach. Further, Werner and Holbrook (2011) used Bayesian model for the TC formation in the Australian region using Gaussian distribution as a priori s for the parameters, and used Slice sampling (Neal, 2003) as the MCMC algorithm. We emphasize that, while the said papers mentioned used Bayesian inference for Poisson regression, they did not used Bayesian Poisson autoregression model proposed in this paper.
In terms of the MCMC algorithm, new innovations have been made in the past decade. Among these is the incorporation of physical dynamics to the sampler to help assists in generating proposal steps when exploring the posterior distribution, for example the Hamiltonian Monte Carlo (HMC) (Neal, 2011) algorithm uses Hamiltonian dynamics to properly explore or sample the target distribution. With this success, HMC have been extended to address other limitations as well, for example the Stochastic Gradient HMC by Chen et al (2014) aimed at applying HMC to large datasets, which Asaad and Magadia (2019) applied to Bayesian time series modeling. For this paper, the data is not that large enough to at least notice computationally. Hence, another recent extension to HMC is used instead, the No-U-Turn Sampler (NUTS) sampling algorithm proposed by Hoffman and Gelman (2014), which extends HMC to eliminate the need to set the hyperparameter L (number of Leapfrog steps).
In summary, this paper has three main contributions to the literature of TC activities: i. the use of PoisAR model based on the specification of Fokianos and Tjøstheim (2011); ii. the extension of PoisAR to Bayesian framework; and, iii. the first TC count model for the Philippines with prediction intervals. Lastly, since the study proposes a new methodology for forecasting TC frequencies, details are given for TC researchers and those dealing with count time series modeling in other domains.

Data and methods
The data is publicly available from the Japan Meteorological Agency's (JMA's) Regional Specialized Meteorological Center (RSMC). The available time points run from 1951 to all of 2021. In total, there are 1852 unique TC international IDs (IDs).
In order to count the frequency of the TCs for the PAR, the 1852 TC IDs are further filtered to only those that have entered the PAR. These include tracks that made a recursive path, meaning entering the PAR briefly and left for the north. In total, 1250 TC IDs tallied under the PAR. It should be noted that, the Philippines has its own local weather agency (PAGASA short for Philippine Atmospheric, Geophysical and Astronomical Services Administration) tracking the TCs, and there can be minor differences in terms of TC counts with that in JMA. Nonetheless, the difference should be marginal. Fig. 1a shows the 1852 TCs in the WNP, and Fig. 1b shows the 1250 TCs in PAR (tails and heads of the tracks beyond PAR were cut out for emphasis). These data points were then grouped per semester annually, so that Fig. 2 depicts the time series of the historical frequencies of the TCs in PAR. It should be noted that Asaad (2021) defines the Philippine storms as those that went straight to the Philippines, and classified those that made a recursive track as belonging to a different cluster, including those that briefly entered PAR. While it differ, the proposed definition in this paper is already a good proxy for the number of storms that passed straight through the country.
In this study, the data is partitioned into training and testing, with training accounting for 80% of the data, that is up until the first semester of 2007, and the rest accounts for the test data (up until the first semester of 2021). The model parameters are estimated using the training data, and then validated in the test data. The software used for this study is Julia (Bezanson et al, 2017) programming language, using Turing.jl (Ge et al, 2018) as the core library for Bayesian inference. MCMC diagnostics using Gelman-Rubin statistics is done using R (R Core Team, 2020) programming language but is executed within Julia using RCall.jl.
The remainder of the paper is organized as follows: Section 2.1 introduces the log-linear PoisAR model; Section 2.2 specifies the prior distribution to the parameters of the model; Section 2.3 discusses the model estimation using Bayesian inference; Section 2.4 tackles the necessary diagnostics for the Markov chains; Section 2.5 discusses the model inference; Section 3 presents the results; and, Section 4 is the conclusion and future works.

Log-linear PoisAR model
The log-linear PoisAR model proposed by Fokianos and Tjøstheim (2011) is defined in Definition 1.
Remark 1 The η t−q in Equation (1) is the feedback mechanism of the model.
The choice for the appropriate temporal lag of Y t can be determined using the sample Autocorrelation Function (ACF). The paper, however, settles on temporal lag 2, mainly because the current variations of the frequencies are assumed to be well explained by the preceding two semesters, which form a contrasting (first semester TC totals is always lower than the second semester) effect, annually. Further, the historical data at the semester-level is periodic as shown in Fig. 2, making the ACF decays slowly due to high correlations between temporal lags. As for the feedback mechanism, the paper aims for parsimony and sets it to lag 1. Hence, the model considered in this paper is given by η t = κ + αη t−1 + ϕ 1 log(Y t−1 + 1) + ϕ 2 log(Y t−2 + 1). (2) The addition of 1 inside the log(Y t−i + 1), i ∈ {1, 2}, is meant to handle 0 TC totals as Fokianos and Tjøstheim (2011) pointed out. This is particularly true for the Philippines, where the first semester (non-TC season) can have 0 TC totals, as seen in Fig. 2.

Prior distribution specification
The model in Equation (2) is estimated using Bayesian inference. This is done through the Bayes' Theorem, which requires prior distribution specification on the target parameters. For this paper, the following are the a priori s: σ 1 ∼ InverseGamma(3, 1) The specification above assumes that ϕ 1 and ϕ 2 have opposite effects since the preceding two frequencies as predictors are expected to have contrasting values, hence the difference on the mean of their a piori s. The κ, which serves as the constant of the model, is hoped to compensate the needed values to get close to the target frequencies. This is therefore assigned to a Uniform distribution over the domain 0 to 15. On the other hand, α is simply assumed to be Gaussian distributed with location parameter at 0. Finally, the hyperparameters σ 1 , σ 2 and σ 3 are simply initialized to Inverse Gamma with parameters 3 and 1. So that, Fig. 3 is the corresponding plate diagram of the proposed hierarchical Bayesian model. From the said figure, it should be understood that there are T −2 plates since the log-linear model proposed is of order 2 (see Equation (2)). Further, the square shapes in Fig. 3 are constants and circles are probability distributions. Shaded shapes implies that they are known, whereas unshaded ones are unknown. Those that fall within the big rectangular (the plate) shape implies that these variables vary across time.
Finally, the following are the corresponding prior distributions specified in Section 2.2: Pr(ϕ 1 | µ = 0, σ 2 ) = 1 The main problem with Equation (11) is the high-dimensional integration needed in the denominator, which often is intractable or simply difficult to simplify. Hence, it is often the practice to just approximate Pr(θ 1 , θ 2 | y t ) using MCMC (Hastings, 1970). There are several MCMC algorithms, but the one used here is the NUTS (Hoffman and Gelman, 2014) which is an extension of HMC (Neal, 2011).
The NUTS algorithm requires the gradient of the log of the numerator with respect to the parameters, that is, These differentiations are handled using the automatic differentiation (Griewank and Walther, 2008) built-in within Turing.jl (Ge et al, 2018), which fed the resulting derivative into its NUTS algorithm as defined in Algorithm 6 of Hoffman and Gelman (2014) at p.1611. In practice, the MCMC algorithms needs few iterations to "warm up," (also known as burn-in portion) and hence a portion of the Markov chain is discarded, leaving with what assumed to be the sample points from the marginal posterior distribution. Moreover, it is recommended to run several Markov chains in parallel to make sure that the algorithm converges to a stationary distribution. For this study, 4 Markov chains are sampled each having 20000 sample size. The burn-in is 10000, leaving 10000 samples for each Markov chains used as the data points for estimating the marginal posterior distributions.

Markov chain diagnostics
MCMC algorithms work in principle by sampling from the a posteriori through random walk. The realized independent Markov chains from parallel random walks (with mechanism for accepting/rejecting the samples) are assumed to have converged to some stationary distribution-with burn-in samples already removed. The popular statistical test for checking the convergence of multiple Markov chains is the Gelman-Rubin statistics proposed by Gelman and Rubin (1992) and further refined by Brooks and Gelman (1998). It should be noted that, Chand et al (2010) diagnosed the model convergence through repeated samplings with different values of initial conditions, and then checking if the KDEs are identical (meaning the chains converged). As for Werner and Holbrook (2011), they argued that the relatively short (500 iterations) iterative burn-in they used achieves a quick convergence. Both literatures, however, did not use Gelman-Rubin statistics.
Gelman-Rubin statistics works by comparing the estimated betweenvariance and within-variance of the Markov chains. In particular, the betweenvariance is the variance of the averages of the independent Markov chains, while the within-variance is the average of the variances of the independent Markov chains. Hence, if the Markov chains did converged, then the two statistics mentioned must approximately equate to each other. For this study, the parameters κ, α, ϕ 1 and ϕ 2 ; and, the hyperparameters σ 1 , σ 2 and σ 3 are each sampled from the posterior distribution using 4 parallel Markov chains. These independent chains are then tested for convergence using the Gelman-Rubin's statistics.
The final point forecast is therefore obtained by simply averaging the vector, that is,ŷ t := ∀i exp(η it ). The vectorŷ t describes the uncertainty of the mean forecastŷ t . This uncertainty is referred in here as the epistemic uncertainty (first coined by Kendall and Gal (2017) to the best knowledge of the author). This is becauseŷ t is a realization of our hypothesis or belief about the function that generated the data. The specified hypothesis assumes that the data was generated by a log-linear model (i.e., Equation (2)). Hence, the variance of the elements of the vectorŷ t describes the uncertainty of this belief. One thing to note, however, is that Equation (23) assumes that y t−1 and y t−2 as known values. That is, the inferredŷ t is not used as input to forecast y t+1 . For such cases, the type of forecast is referred in here as rolling forecast, which will be discussed later in this section. Moreover, η t−1 is a vector for t ≥ 2, since for t = 1, η t−1 = η 0 which is a scalar. In this case, the η t is used to infer η t+1 , making η t as the feedback mechanism for t + 1.
Moreover,ŷ t is the forecasts conditional means of the observation y t , and because y t is assumed to be Poisson distributed, it therefore meant thatŷ t | F Y,η t−1 ∼ Poisson(ŷ it ), ∀i, t. The variance of this Poisson distribution describes the uncertainty of the predicted observations, which Kendall and Gal (2017) named as aleatoric uncertainty, and is also adopted here.
Since for every element ofŷ t corresponds to a predicted mean, then there will be multiple quantiles/percentiles of the Poisson distribution at time t, each having mean corresponding to the elements ofŷ t . To properly display this, we first compute for the following order statistics: the maximumŷ (N )t and the minimumŷ (0)t for all t. That is, y (0)t := min{ŷ 1t , · · · ,ŷ N t }.
We then compute the prediction intervals for 90% (lower bound at 2.5th percentile and upper bound at 97.5th percentile), 75% (12.5th lower and 87.5th upper percentiles), and 50% (25th lower and 75th upper percentiles). The percentile/quantile q is computed as follows:ŷ (q) t := F −1 (q |ŷ (0)t ) for percentiles below the median (i.e., 50th percentile); and,ŷ (q) t := F −1 (q |ŷ (N )t ) for percentiles above the median. The Moving on, the main challenge of doing rolling forecast is due to the cost it entails on models with independent covariates, which in literatures reviewed here are often given as indices. It is costly in the sense that it requires the model to have the covariates be available as well on the future time points to be forecast, which means these predictors need to be modeled as well. This is not a problem for models with no independent covariates, however, since it only depends on its past values like the one used here. To do this, the following iterative forecasting is used: where z t−l := [log(ŷ i(t−l) + 1), · · · , log(ŷ N (t−l) + 1)] ⊤ , ∀l ∈ {1, 2}. The main difference between Equation (28) and Equation (23) is that the former uses the forecastŷ t as input to forecastŷ t+1 , which means y t is not available to forecast t + 1. Rolling forecasting should be taken with caution, however, since it can quickly accumulate the model residuals with respect to the length of the target horizon. Hence, the proposed model in this paper is only recommended for one-semester ahead forecasting using Equation (23). Finally, η 0 evolved from forecasting the training dataset, can also be used as the initial feedback mechanism for the test data. This is contrary to simply fixing it to 1, as was used in estimating the posterior. Nonetheless, both specification will give almost the same results.

Results and discussions
The results are presented into the following sections: Section 3.1 discusses the kernel density estimates (KDEs) of the sampling distributions of the marginal a posteriori of the parameters; Section 3.2 presents the statistical tests on the convergence of the Markov chains; Section 3.3 discusses the sensitivity of the model with respect to η 0 ; and, Section 3.4 discusses the corresponding forecast of the model. Fig. 4 Kernel density estimates of the a posteriori of the parameters: a κ; b α; c ϕ 1 ; d ϕ 2 ; e σ 1 ; f σ 2 ; and, g σ 3

Marginal posterior distributions
The statistics of the KDEs are given in Table 1 with corresponding plots provided in Fig. 4. More specifically, Fig. 4a-g corresponds to the KDEs of the parameters κ, α, ϕ 1 , ϕ 2 , σ 1 , σ 2 and σ 3 , respectively. Starting with the KDEs of the κ, the marginal distribution seem to follow a Gaussian density function centered around 1.57, with estimated standard deviation of 0.21 (refer to standard error (SE) in Table 1). Interestingly, the NUTS algorithm managed to explore the marginal of κ despite the initial prior distribution assumed was Uniform(0, 15), that is, the range should go all the way up to 15, but is now significantly shrank after observing all observations. Further, the parameter α which was initially set to a Gaussian a priori centered at 0, now slightly moves to the negative values with an estimated mean of -0.22 and a standard error of 0.06. As for the KDEs of ϕ 1 and ϕ 2 , both have contrasting supports, that is, ϕ 1 has negative domain centered at -0.19, while ϕ 2 is on the positive side with mean 0.59. One can also observed that, both ϕ 1 and ϕ 2 like κ and α have symmetric curve with skewness near 0. Finally, the three hyperparameters, σ 1 , σ 2 and σ 3 have marginal a posteriori closely resembling a Inverse Gamma distribution.

Markov chain diagnostics
The Gelman-Rubin statistics (Gelman and Rubin, 1992;Brooks and Gelman, 1998) returns a R-valued shrink factor. The shrink factor indicates that, if the test statistics of the Markov chains of a given parameter is 1, then the between-variance and within-variance (see discussions in Section 2.4) are equal, meaning the parallel Markov chains tested have all converged to a stationary distribution. For the proposed model, the Markov chains of the estimates of the parameters have Gelman-Rubin statistics equal to 1.
To see this visually, Fig. 5 shows the evolution of the shrink factor across iterations. Towards the end, both the median and even the 97.5% percentile of the shrink factor are already 1 or near 1. The convergence is usually determined if the shrink factor is less than 1.1 (the "stringent standard," see Brooks and Gelman (1998)). Fig. 5 helps users to specify the ideal burn-in. This study, however, settled on 10000 samples as part of the burn-in, leaving 10000 remaining samples as estimate for the marginal distribution. Note that, these remaining 10000 samples account for 1 Markov chain only. Therefore, for the 4 Markov chains ran for each parameter, the total samples used to estimate the marginal distribution is 10000 × 4 = 40000.

Model sensitivity on η 0
The proposed model requires initialization of its feedback mechanism, η t . It is therefore important to understand the sensitivity of the model with respect to η 0 . To analyze this sensitivity, Fig. 6 shows the variabilities of the model metrics with respect to the values of η 0 ∈ {1, 1.5, 2, 2.5, · · · , 15} on both training and testing data. Fig. 6a-d correspond to the MSE (Mean Squared Error), Fig. 5 Evolution of the Gelman-Rubin's shrink factor for test of stationarity for a κ; b α; c ϕ 1 ; d ϕ 2 ; e σ 1 ; f σ 2 ; and, g σ 3 RMSE (Root Mean Squared Error), MAE (Mean Absolute Error) and Correlation, respectively. One thing to emphasize, are the high errors/low correlations at η 0 = 2.5 and η 0 = 3.0. Upon inspection, the model does perform poor under these initializations. However, the corresponding marginal distributions estimated using MCMC are not stationary, that is, the NUTS algorithm cannot explore properly the marginal a posteriori s for the said η 0 s. This is due to the seed fixed for NUTS for all η 0 ∈ {1, 1.5, 2, 2.5, · · · , 15} for reproducibility purposes. So one can argue that, under a different seed, the Turing.jl's NUTS could have positioned the first step of the Markov chain properly within the target posterior. In spite of that, the rest of the values of the four metrics for other η 0 values are not that varied. In fact, excluding the values for η 0 = 2.5 and η 0 = 3.0, the coefficient of variations (CVs) of the sensitivity across metrics for the training dataset are as follows: 0.04 (MSE), 0.02 (RMSE), 0.03 Lastly, the sensitivity under the test data was computed using the post feedback mechanism η t , evolved from computing the training data, as the initial value of η 0 for forecasting the test data. Although not shown here, this is not significantly different from, say, fixing it to η 0 = 1 (initially used for estimating the posterior) for validation data. This therefore shows that, in terms of operationalization of the model, η 0 can be set to any value, at least within the range studied here.

Forecasts
The errors of the model are provided in Table 2. The corresponding forecasts are given in Fig. 7 for training data, and Fig. 8 for test data. From both figures, the model does capture well the characteristics of the time series, although it misses the significant spike on the second semester of 1964 in the training dataset. For the testing dataset, it also misses a few dips. Despite that, however, the misses are mostly within the prediction intervals as emphasized by the aleatoric uncertainty. Further, the first two points shows that the model is not confident with its forecast, this is emphasized by the epistemic uncertainty, which is higher compared to other points-with the exception of second semester 2016, where the model also returned a pronounced epistemic uncertainty despite capturing the actual frequency in its forecast mean. Moreover, it is a common practice in the literature (for example Zhang et al (2018), McDonnell and Holbrook (2004b), among others) of TC count forecasting to also check on the Pearson correlation between the forecast and the actual values. For the proposed model, the correlation is high since the model takes advantage of the semester-level data, where periodic patterns recur every year.
As for the accuracy, while there are differences in terms of the setup of modeling, validation method, to granularity of the data, we'll compare the proposed model to that of Magee et al (2021) for the Philippines. As for the MAE, their model performed well with error less than 2, which is expected as they used at least 5 independent covariates compared to the proposed model, which uses only the historical data. However, in terms of the correlation, the semester-level data used here benefited the model well, and is higher than that by Magee et al (2021), which is max at around 0.8. In addition to this, the added prediction intervals by the proposed model can provide more insights that can be useful for decision-making.
In terms of the forecast of first and second semesters of 2022, Table 3 presents the details. The two-ahead forecast horizon is possible by using the mean forecast of the first semester as input to forecast the second semester, that is, a roll forecast was done for the second semester. From the inference, about 4 TCs are expected in the first semester of 2022, although it can be as low as 2 according to the 50% prediction interval. For the second semester, on the other hand, about 8 TCs are expected, but it can be as high as 13 according to the 87.5th percentile-the upper limit of the 75% prediction interval.  Lastly, it can be observed that, for the second semester, the model has greater variability on its epistemic uncertainty, with range of 2.98 (9.5 − 6.52) compared to the first semester with 1.31 range. This is expected since the forecast of the second semester is based on a rolling forecast, which accumulates the error from the previous forecast. Nevertheless, the prediction intervals provided by the aleatoric percentiles can give more information about the expected TC counts.

Conclusion and future works
The aim of the study was to explore the use of PoisAR model, in the context of Bayesian inference, for modeling the Philippine storm frequency. The results of the inference suggest that the proposed model, while simple enough compared to those in other TC count literatures, can well describe the characteristics of the Philippine TC frequencies. In particular, the RMSE of the proposed model is 3.18 for training (80% of the data) and 3.12 for testing (20% of the data). Further, actual frequencies not captured by the forecast mean are mostly within the prediction intervals. In particular, the availability of both epistemic and aleatoric uncertainties can give insights into the confidence of the model, which are useful for policy-makers. Lastly, the absence of the covariates, meant that the model is easy to maintain and operationalize. We've shown this on forecasting the 2022, where roll forecasting was done to predict the TC count for the second semester. So that, the model inferred about 4 TCs on the first semester of 2022, and about 8 TCs for the second semester. There are still room for creativity in terms of the proposed model. The first obvious improvement is the use of independent covariates to capture other variations that the model can't explain. Second, the exploration of the non-linear version of PoisAR; and finally, the exploration of multiple horizon forecasting instead of one-point forecast at a time.