Combining and comparing regional epidemic dynamics in Italy: Bayesian meta-analysis of compartmental models and model assessment via Global Sensitivity Analysis

During autumn 2020, Italy faced a second important SARS-CoV-2 epidemic wave. We explored the time pattern of the instantaneous reproductive number, R 0 ( t ) , and estimated the prevalence of infections by region from August to December calibrating SIRD models on COVID19-related deaths, ﬁxing at values from literature Infection Fatality Rate (IFR) and infection duration. A Global Sensitivity Analysis (GSA) was performed on the regional SIRD models. Then, we used Bayesian meta-analysis and meta-regression to combine and compare the regional results and investigate their heterogeneity. The meta-analytic R 0 ( t ) curves were similar in the Northern and Central regions, while a less peaked curve was estimated for the South. The maximum R 0 ( t ) ranged from 2.61 (North) to 2.15 (South) with an increase following school reopening and a decline at the end of October. Average temperature, urbanization, characteristics of family medicine and health care system, economic dynamism, and use of public transport could partly explain the regional heterogeneity. The GSA indicated the robustness of the regional R 0 ( t ) curves to different assumptions on IFR. The infectious period turned out to have a key role in determining the model results, but without compromising between-region comparisons.


Introduction
After the first SARS-CoV-2 outbreak during spring 2020, Italy faced a second epidemic wave during the autumn. In order to reduce the rate of contagion and prevent the collapse of the health care system, the Italian government introduced regional-level measures of social distancing of different degrees, starting from November 6th, 2020. Among others, a national curfew from 10 pm to 5 am was implemented, and the regions were weekly classified as low risk, medium risk, and high risk zones (yellow, orange, and red zones, respectively), according to indicators centrally calculated by the Istituto Superiore di Sanità (ISS). The timing and the degree of the containment measures likely influenced the epidemic dynamics 1 , but also socio-economic, demographic characteristics of the population, and environmental factors may have had a role in determining and moderating the level of contagion and its pattern over time.
Several studies investigated different aspects related to the epidemic dynamics during the first and/or the second waves in Italy by using compartmental models [2][3][4][5] or different approaches [6][7][8] . Some studies performed descriptive comparisons among regions (mostly during the first wave) [9][10][11][12] or between the first and the second epidemic wave 13 . Others explored possible determinants of the heterogeneity in COVID19 incidence and mortality across the country, focusing on the beginning of the emergency in the spring 2020 [14][15][16] .
The aim of our study was to describe the epidemic dynamics in Italy from August 1st, 2020 to the end of the same year, in order to obtain an overall picture of the second wave in the country with a special focus on the contagion spread, highlighting and investigating the heterogeneity among regions. To this end, we adopted a two-step procedure.
First, we estimated a compartmental model of SIRD-type for each region in order to investigate the trend of the contagions over time. Compartmental models, which take their name from the fundamental assumption that at each time during the epidemic the population is divided into homogeneous groups or "compartments", are widely used in the literature for forecasting and inference purposes, with examples also in COVID19 research 17 . When the interest is to make inference, the model parameters are estimated minimizing the distance between observed data and model predictions (calibration). In our study, we calibrated the regional SIRD models on the daily number of notified COVID19-related deaths, made publicly available by the Protezione Civile 18 . Compared with the number of notified cases, mortality data -although possibly subject to some notification delays -can be considered as more reliable and likely less influenced by the capacity of the health system to detect infections. The calibration procedure allowed us to investigate the behavior of the contagion over time in terms of instantaneous reproductive number R 0 (t), which quantifies the average number of secondary infections caused by a single infected individual over time, as well as in terms of the number of infected individuals, which can exceed, even by far, the number of the notified ones.
At the second step, we combined and compared the regional results by using Bayesian multivariate and univariate metaanalytic techniques, and we investigated through meta-regressions the possible role of region-level variables in explaining between-region observed discrepancies in terms of contagion spread. In the regional SIRD models, we set the infection fatality rate (IFR) and the average time from infection onset to infection resolution to plausible values arising from the literature, in order to assure parameter identifiability. Treating part of the parameters in the compartmental models as fixed is a common practice that poses the problem, usually not addressed, of evaluating the robustness of the results when different values are specified for these unestimated parameters 19 . In this paper, we performed a sensitivity analysis on the estimated R 0 (t) curve and prevalence of infections, by changing one at a time the values of IFR and infection duration in the SIRD models. Additionally, we implemented a Global Sensitivity Analysis (GSA) procedure 20 to quantify and characterize the uncertainty around the calibration results that propagated from the uncertainty around the values of those parameters that were not the object of the inference. Despite GSA is not yet widely used in epidemiology, it appears as one of the recommendations in the manifesto by Saltelli and colleagues, which offers a critical view of modeling in time of pandemic 21 .
Finally, as an ancillary result, we obtained an evaluation of the submerged fraction of contagion and an indirect assessment of the admissible IFR values, by comparing the number of infections estimated by the regional SIRD models, which by definition includes both notified and not notified cases, with the observed number of notified positives reported by the Protezione Civile 18 .

Data
For our analyses, we used the national database on the evolution of the COVID19 pandemic, made available on a daily basis by the Protezione Civile 18 . This database collects the numbers of notified positive, hospitalized, recovered and deceased subjects by region. In the estimation phase, we used the daily number of COVID19-related deaths from August 1st, 2020 to January 14th, 2021 for all Italian regions, and the number of notified circulating infections on July 31st. The daily number of new infections and the daily number of circulating infections were used in a descriptive way to indirectly assess the submerged fraction of contagion. Although provided separately, we merged data of the two autonomous provinces of Trentino Alto Adige (Bolzano-Alto Adige and Trento), in order to obtain a more stable estimate for the entire region.We removed from the death counts of Emilia-Romagna 154 cases that, although happened during spring 2020, have been added to the data set on August 15th. In the map of Supplementary Figure S1, we represented, for each Italian region, the population size and the total number of COVID19-related deaths notified during the study period (see also Supplementary Table S1).
We collected socio-demographic and economic indicators measured at the regional level for 2020 or for the last available year from the website of the Italian National Institute of Statistics (ISTAT) 22 (see Section 2.3 for further details). We calculated the regional average temperatures during the study period from the daily temperature measurements reported on the website www.ilmeteo.it.

Regional SIRD models
For each region, we adopted a compartmental model of SIRD type, described by the following system of differential equations: where S(t), I(t), R(t) and D(t) are the sizes of the Susceptible, Infected, Recovered and Deceased compartments at time t 23 . For each region, we fixed I(0) to the number of notified circulating infections on July 31st (from now on denoted by i 0 ) as reported by the Protezione Civile 18 . We approximated the number of susceptible people at time 0 (August 1st) as S(0) = N − I(0), with N regional population size at January 1st 2020 (Permanent Census of Population 22 ). We set D(0) = 0 and R(0) = 0, thus starting to count deaths and recoveries from August 1st. The parameters α and δ are the transition rates from the compartment of the infected individuals to the compartments of the recovered and deceased ones, respectively. They depend on the IFR, p, on the average times from infection to death, T D , and from infection to recovery, T R . Having set T D = T R = T , the following relationships hold: α = 1−p T , δ = p T 24 . The infection rate β (t) is related to the instantaneous reproductive number R 0 (t), modelled as time-dependent, as follows: At the beginning of the epidemic, R 0 (t) corresponds to the basic reproductive number, defined as the number of secondary infections generated by the first infected individual in the population. R 0 (t) is also related to the effective reproductive number, R t = R 0 (t)S(t)/S(0) 25 , that measures the actual transmission at a specific time accounting for the natural depletion of susceptible individuals as the contagion spreads. R t departs from R 0 (t) only at an advanced stage of the epidemic . To get a flexible estimate of R 0 (t), we modelled it through a natural cubic regression spline 26 , with 4 internal equi-spaced knots (6 degrees of freedom): R 0 (t) = s(t; ϑ ϑ ϑ ), where ϑ ϑ ϑ is a vector of unknown coefficients, to be estimated.
We assured parameter identifiability by fixing in the model T = 14 days and p = 1.15%. The value p = 1.15% is the IFR estimate reported for the upper-income countries by the Imperial College COVID19 response team 27 . It is also consistent with the value of 1.14% estimated for Italy by the Italian Institute for International Political Studies 28 and used in a previous paper by the authors 29 . Regarding T , the value of 14 days is in line with both the median time from symptoms onset to death reported by ISS for Italy (12 days) 30 and the estimated average time from infection onset to recovery of 13.4 days arisen from a recent meta-analysis 31 . We explored these two choices explored via sensitivity analysis and GSA.
In the estimation phase, we discretized the differential equations in (1) and evaluated the size of the compartments by considering unit time intervals (details in Supplementary Section S1). This allowed us to estimate the model minimizing over ϑ ϑ ϑ the following sum of squares: where t = 1 corresponds to August 1st 2020, t = K to January 14th 2021, and D obs (t) denotes the cumulative number of deaths observed from August 1st. Notice that deaths observed during the first two weeks of January provide information about the contagion at the end of the year, given that deaths are assumed to happen on average T = 14 days after the disease onset. We performed the minimization of (3) through the auglag function of the nloptr package of R software (http: //ab-initio.mit.edu/nlopt), constraining the function R 0 (t) to positive values. We ran the estimation algorithm 100 times, using different initial values sampled from a multivariate grid defined on the values of ϑ ϑ ϑ . Among the 100 parameters estimates thus obtained, we selected the estimateθ ϑ ϑ associated to the lowest value of Q(·). We implemented a parametric bootstrap procedure, in order to quantify the sampling variability around the estimates. Following a consolidated procedure 32, 33 , we assumed a Negative Binomial distribution on the daily increments of the estimated time series D(t;θ ϑ ϑ ) and generated 500 bootstrap samples to be used as observed time series in as many calibrations. The 90% confidence intervals or bands for the quantities of interest have been calculated as the 5th and 95th percentiles of the bootstrap distributions (see Supplementary Section S2 for further details).

Sensitivity analyses
To investigate how changes in the values of p and T affected the estimates of R 0 (t) and the shapes of the epidemiological curves arising from the regional SIRD models, we repeated the analyses for p = 1.79%, 0.78% and T = 10, 18. The values p = 0.78% and p = 1.79% are the 95% confidence interval bounds of the IFR estimate in Brazeau et al. 27 . We performed an additional analysis fixing p = 0.5%, as an extreme lower bound for the IFR. The value T = 10 is consistent with the 95% lower bound of the estimated mean time of Byrne et al. 31 , while T = 18 is the estimated mean duration of the maximal infectious period from the same study.
Then, we went beyond the previous one-factor-at-a-time sensitivity analysis by performing a GSA, calculating the Sobol's variance indexes 20 . Given a function that relates inputs to outputs, the GSA explores how the outputs vary as the inputs change, to determine the inputs most contributing to the outputs' behavior (factor prioritization), finding non-influential inputs (model simplification), and investigating interaction effects between inputs. This can be done by propagating the uncertainty around the inputs to the outputs via MC simulations, then using Sobol's decomposition of the variance of each output thus obtained, and apportioning it among the different inputs 34 (see Supplementary Section S3). The contribution of each input to the output variance can be quantified by computing first order indexes (and superior order indexes) and total variance indexes. In particular, the first order index of a given input represents the proportion of the output variance which is due to the main effect of the input (i.e. the first-order effect), while the total effect index represents the proportion of the output variance which is due to the main effect of the input and all its interactions with the other inputs (the higher-order effects).
In our application, we considered as inputs the fixed parameters of the SIRD model (p, T R , T D , i 0 ) and as outputs the parameters estimated by calibration, as well as derived quantities, such as the maximum and minimum R 0 (t), the peak of infections, and the dates at which they occurred, together with the first date when R 0 (t) reached the threshold of 1 after the maximum infection peak. The model was the calibration algorithm, given the observed data.
We calculated the first order and total variance indexes of each input on each output, relying on the results of 5'040 MC simulations, that in our case corresponded to 5'040 calibrations of the SIRD model. Each calibration was performed under a different combination of inputs, obtained by sampling p from the continuous uniform distribution U [0.0078, 0.0179] 27 , and the transition times and I(0) from the following discrete uniform distributions: For each input, the aggregate total variance index on the vector ϑ ϑ ϑ was calculated as a weighted average of the total variance indexes of the single spline coefficients 35 .
Given the huge computational effort required by the GSA implementation, we performed it on a virtual machine with 16 vCPU only for one region (Tuscany). However, we expect that the results can be generalized to the other regions. The GSA was conducted by using the soboljansen and the sobolMultOut functions of the R package sensitivity (https: //cran.r-project.org/web/packages/sensitivity/sensitivity.pdf).

Bayesian meta-analyses
We used a multivariate Bayesian random effects meta-analysis model to combine the estimated region-specific R 0 (t) curves, accounting for the heterogeneity among regions, and to combine the regional estimates of the monthly average prevalence of infection from September to December.
Additionally, we conducted univariate Bayesian random effects meta-analyses on the following quantities obtained from the regional R 0 (t) curves: • average value of R 0 (t) from October 1st to December 31st; • maximum value of R 0 (t) during the study period; • variation of R 0 (t) during the 4 weeks following November 6th, date of the introduction of social distancing measures by the central government, with a classification of the regions according to three risk levels; • variation of R 0 (t) during the 4 weeks following the beginning of the school.
For the variation of R 0 (t) during the 4 weeks following November 6th, we performed also meta-analyses by intensity assigned in the first week of the region-specific measures. Univariate and multivariate meta-analyses were conducted on all regions and separately by geographical area: Southern regions (Basilicata, Calabria, Campania, Molise, Puglia, Sardegna, Sicilia), Central regions (Abruzzo, Lazio, Marche, Toscana, Umbria), Northern regions (Emilia Romagna, Friuli Venezia Giulia, Liguria, Lombardia, Piemonte, Trentino, Valle d'Aosta, Veneto). Finally, Bayesian meta-regression analyses were performed on the average R 0 (t) from October 1st to December 31st in order to investigate possible sources of between-region heterogeneity. Specifically, we evaluated as possible effect modifiers the following variables measured at the regional level (for 2020 or for the last available year): percentage of people with at least two chronic diseases in the population, number of general practitioners per 10'000 residents, number of pediatricians per 10'000 children, number of hospitals per 1'000 residents, and percentage of public hospitals over the total number of hospitals, aging index (number of over 65 per 100 individuals younger than 15), mean age of the population, average size of households, percentage of people with an academic degree per 10'000 residents, schooling rate (percentage of the individuals aged 20-24 who have at least a high school diploma), percentage of children attending kindergartens, percentage of workers using public transports to go to work, percentage of people (18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34) using public transports to go to school, poverty index (percentage of people living in households below the poverty threshold), employment rate (percentage of employed persons 15-64), tourism rate (days of presence of tourists during the year per inhabitant), total energy consumption of industries and manufactures, percentage of residents living in high-urbanization areas, average temperature from October to December 2020 (see Supplementary Table S2 for details on meta-regressors and related references). We specified separate meta-regression models each of which included only one meta-regressor.
In all analyses, non-informative priors were assumed on the model hyperparameters. We get a sample from the joint posterior distribution of the hyperparameters via MCMC algorithms. A detailed description of the models and the software used for the analysis is reported in Supplementary Section S4.

4/17
Code availability R code used for calibration of the regional SIRD models and for sensitivity analysis and GSA are available from the authors upon request.

Results of the main analysis
The fit of the SIRD model was good for all regions, with the expected cumulative deaths close to the observed ones (Supplementary Figure S2).

Sep
Nov Jan In Figure 1 we reported the estimated curves of R 0 (t) arising from the regional SIRD models with their 90% point-wise confidence bands. For the regions where the first COVID19-related death was observed after the second week of August, the curve was shown starting 14 days before the first death, because of the extremely poor information on the previous period. Moreover, we highlighted in grey the last 14 days of the study period, to stress that the estimates for that period were quite unreliable, having assumed an average time from infection onset to death of two weeks. This is also the reason why, in order to draw valid conclusions about R 0 (t) until the end of the year, we calibrated the model up to January 14th, 2021. The pattern of R 0 (t) was heterogeneous among regions, but, with few exceptions, a mid/late-October peak was visible as the culmination of growth started in early/mid September. In several regions, R 0 (t) declined and then increased again starting from mid December. Figure 2 summarizes these regional results showing the overall meta-analytic Italian R 0 (t) curve and the meta-analytic curves by geographical area, obtained from Bayesian multivariate meta-analyses (pointwise posterior mean and 90% credible intervals). It is quite evident that during the second epidemic wave the shape of R 0 (t) was similar in the Northern and Central regions, with an initial increase and a clear peak around middle October. On the contrary, the overall curve for Southern regions was flatter, without the initial decrease. The combination of all the regional curves leads to an overall R 0 (t) curve (left plot) similar to the one obtained for Northern and Central regions, even though with tighter credible bands.
In Figure 3 (A), we reported the regional curves describing the prevalence of infections per 1 ′ 000 inhabitants over time, arising from the SIRD models (see also Supplementary Figures S3 and S4 and Supplementary Table S4). These estimates are inclusive of detected and undetected cases, thus in principle they should be an upper bound for the number of notified cases provided by Protezione Civile (see Section 3.3 for a discussion about this point). Valle d'Aosta exhibited the largest peak of prevalence, reached in the first half of November, with more than 50 circulating infections every 1'000 inhabitants. It was followed -though with less than half the value of its prevalence -by Friuli Venezia Giulia, Veneto, Piemonte, Lombardia, Trentino Alto Adige and then by Liguria and Emilia Romagna. The lowest prevalence was estimated for Calabria. Liguria and Valle d'Aosta reached the peak of circulating infections before the other regions, while Veneto was the last one. The overall estimates of the average monthly prevalence arising from the Bayesian multivariate meta-analysis (Figure 3, B) highlight how the prevalence of infections in the Northern regions was larger than in the Central and Southern ones, exceeding on average 15 cases every 1'000 inhabitants during the month of November.  Table 1 Results of the Bayesian meta-analyses (posterior mean of the quantity of interest and posterior median of the I 2 index, with associated 90% credible intervals) conducted on: average value of R 0 (t) from October to December, maximum value of R 0 (t) over the study period, changes in R 0 (t) after school re-opening and after the introduction of national restrictions on November 6th. p = 1.15%, T = 14 days. Table 1 summarizes the results of the Bayesian meta-analyses on the quantities derived from the regional R 0 (t) curves: average value of R 0 (t) from October to December, maximum value of R 0 (t) over the study period, changes in R 0 (t) arising in four weeks after school re-opening and after the introduction of national restrictions on November 6th. For all quantities, the heterogeneity among regions was high, with the lowest I 2 estimated among the Southern regions. The average level of R 0 (t), as well as its maximum, was higher in the Central and Northern regions than in the Southern ones. An overall increase of R 0 (t) equal to 0.50 was estimated during the first 4 weeks after school re-opening in September (the dates of school re-opening in every region are reported in Supplementary Table S3). This increase was lower in the South than in the Central and Northern regions.

Results of univariate meta-analyses and meta-regressions
We found also evidence of an overall decline of R 0 (t) equal to 0.75 during the four weeks following the introduction of the restrictions on November 6th. This decline was similar across geographical areas (Table 1), but appeared to be associated with the strength of the measures adopted at a regional level, as shown by the posterior distributions reported in Figure 4 (A). In the four regions initially classified as at high risk (red regions: Calabria, Lombardia, Piemonte, Valle d'Aosta), where stronger restrictions were immediately adopted, the decline was steeper than in the two classified as at medium risk (orange regions: Puglia, Sicilia) and in the remaining 14 low risk regions (yellow), subject to lighter measures. Figure 4 (B) shows the results of the meta-regressions on the average value of R 0 (t) from October to December, in terms of change in the average value of R 0 (t) associated with an interquartile range variation in the meta-regressor. The change was multiplied by 100 so that the reported values indicated the changes in the number of secondary infections derived from 100 infected individuals at time t. We considered as relevant a meta-regressor if the posterior probability that the change was larger than 5 or lower than -5 exceeded 50%. According to this criterion, it was evident a marginal positive association of the average R 0 (t) with employment rate, use of kindergartens, tourism rate and, to a less extent, with population in high density areas, schooling index, and use of public transport to go to school. A negative association was found with number of practitioners and pediatricians, poverty index, temperature, prevalence of people with at least two chronic diseases, and, to a less extent, with the number of hospitals and the proportion of the public ones. The results of the meta-regressions are reported also in Supplementary Table S5. The shape of the regional R 0 (t) curves appeared quite similar under different IFR scenarios when T = 14, as shown by the comparison of the overall meta-analytic curves obtained by fixing the IFR to different values in the regional SIRD models ( Figure 5, A, left). Some discrepancy was observed only at the beginning of the study period, when larger values of R 0 (t) were obtained in correspondence of smaller values of p. The regional curves obtained changing p are shown in Supplementary  Figures S5-S7. Figure 5 (A, right) compares the overall meta-analytic curve obtained by assuming different values for T , having fixed p = 1.15%. A shorter infection duration corresponded to a less peaked curve, but the date when R 0 (t) was maximum and the date when it crossed the value of 1 after the introduction of containment measures were preserved.
In the lower panel of Figure 5, we reported the meta-analytic estimates of the monthly averaged prevalence of infections per 1'000 inhabitants from September to December, when changing p, fixed T (second row), and when changing T , fixed p (third row). The estimated prevalence decreased consistently with the value of p and increased with the values of T . Notice that the number of prevalent infections when T = 10 and p = 1.15% and when T = 14 and p = 1.79% resembled one another. The same happened with the pairs of parameters (T = 14, p = 0.78%) and (T = 18, p = 1.15%). This suggests that increasing (decreasing) p produces the same effect of decreasing (increasing) T , when the other parameter is fixed, and that a global evaluation of the impact of these quantities on the results is required. Table 2 shows the total variance indexes, including the aggregate index for the vector ϑ ϑ ϑ , and the first order indexes obtained from the GSA procedure. As suggested by the magnitude of the total variance indexes, changes of the fixed parameters p, T D , and i 0 had a very low impact on the coefficients vector ϑ ϑ ϑ . On the contrary, T R exhibited the greatest aggregated total 9/17  Table 2 Total variance indexes and first order indexes of each model input (by row) on the coefficients of the R 0 (t) regression spline, maximum and minimum values of R 0 (t) with the corresponding dates, date in which R 0 (t) crossed the value one, infection peak with the corresponding date (by column).
variance index (0.72), proving to be an input that contributes very much to the R 0 (t) curve as a whole. Regarding the impact on the derived quantities, T R was still the most relevant input (the only relevant on the maximum values of R 0 (t)), even if non-negligible indexes were found also for the other inputs, especially on the date of occurrence of the maximum R 0 (t), on the date when R 0 (t) crossed the threshold of 1 and on the date of the peak of infections. In order to correctly interpret these results, one should however consider that some outputs could vary within a small range of values as the inputs change. Hence, the corresponding total variance indexes, although high, could actually derive from the apportionment of a small total variance. This is in itself indicative of overall robustness of these outputs to inputs perturbations. For instance (Supplementary Figure S8 and Table S5), the date of the infection peak (coefficient of variation 0.002) was almost unaffected by variations of the model inputs as well as the date in which R 0 (t) crossed the value of 1 (coefficient of variation 0.01). Analogously ϑ 4 had the most dispersed distribution among the other coefficients, suggesting its greater sensitivity to changes of the inputs values. This larger dispersion was considered in the calculation of the aggregate total variance index, which averages the single total variance indexes with weights proportional to the ϑ i variances 35 . While in the regional SIRD models we constrained T D = T R = T , the GSA was conducted allowing T D to vary independently from T R . This choice emphasized that T D was less relevant than T R on output variability. Therefore, we can conclude that, when T R is properly set, misspecification of T D has a negligible effect. In such a case, forcing T D to be equal to T R , as in our regional analysis, produces a simplification of the SIRD model without inducing relevant variations in the outputs.
Finally, being the total variance indexes for ϑ ϑ ϑ only slightly higher than the corresponding first order indexes, it was evident that interactions among inputs were not relevant on the estimated R 0 (t) curve. On the contrary, interactions were a relevant source of variability in the derived quantities. The fact that the first order indexes on the derived quantities were sometimes negative, in most cases close to zero, was due to a poor MC approximation. Negative indexes are not unusual when the contribution of the inputs is negligible and can be avoided by increasing the number of MC simulations 20 .

Submerged infections and role of the IFR
The number of new notified infections estimated by the SIRD model should be interpreted as inclusive of the undetected cases, thus one would expect that it was an upper bound for the observed number of new notified cases. However, when fixing p = 1.15%, for some regions the observed number of new notified infections sometimes exceeded the number of new infections estimated by calibrating the SIRD model on the observed COVID19-related deaths (Supplementary Figure S3). This paradoxical result could be partly due to systematic errors (e.g. notifications concentrated on particular days of the week) and delays in the notification of cases, but also suggests that an IFR lower than 1.15% could be more consistent with the observed infection dynamics.
As an example, in Figure 6 we compared the predicted number of new infections for Campania and Liguria under different IFR scenarios. As the value assumed for the IFR increased, the number of new infections estimated by the model became smaller, to the extent of being, in the case of Campania, lower than the observed number of new notified infections when p = 1.15% and p = 1.79%. On the contrary, for Liguria the estimated curves were overall consistent with the observed number of new notified infections, regardless of the IFR used in the analysis. The only exception was for p = 1.79%, when the observed number of new cases notified during August and September slightly exceeded the model predictions.

Discussion
In this paper, we used data publicly available to study the SARS-CoV-2 epidemic dynamics in Italy and to investigate regional heterogeneity. We conducted our analyses on the time window corresponding to the second epidemic wave in Italy, which did not include the first months of 2021, when an extensive vaccination campaign began in the country. This permitted us to specify regional models that did not account for the immunity progressively acquired by part of the population. A special focus was on the instantaneous reproductive number, R 0 (t), that we modeled through a regression spline in order to capture how the strength of contagion varied over time. R 0 (t) describes the speed of the contagion and depends directly or indirectly on countless factors, including virus infectiousness, socio-demographic and economic characteristics of the population, the efficacy of contact-tracing procedures, restriction policies. Being directly related to the number of contacts across the population, it is very sensitive to the introduction of social distancing measures. Alternative methods exist to estimate R 0 (t) 25 ; the main advantage of estimating R 0 (t) through compartmental models is that they allow quantifying also the number of incident and prevalent infections over time, and, more in general, the size of all defined compartments. Additionally, they permit to estimate R 0 (t) relying on the observed time series of daily COVID19-related deaths, which, in our context, was the most reliable quantity among those reported in the Protezione Civile database. Even though we cannot exclude a certain amount of mortality under-reporting, this was probably negligible during the second epidemic wave.

Results of the main analysis
Our results indicated that in Italy, during the second SARS-CoV-2 epidemic wave, the instantaneous reproductive number changed over time heterogeneously across regions, but with some important common elements. Among them, the increase of R 0 (t) during September, stronger in the Northern and Central regions than in Southern ones, which led to a dangerously high level of contagion in mid-October, followed by a decline of the contagion spread. We cannot exclude that this pattern was due to a phenomenon of seasonal variation typical of respiratory infections 36 , but it was also suggestive of a possible role of school reopening in September in amplifying infection spreading, as argued also by other authors 37 . Although contagion within school has been sometimes considered as not higher than in other contexts 38,39 , this issue is still debated 40 . In particular, reopening schools may have facilitated contagions through intensifying the number and the duration of interpersonal contacts within and outside schools, especially with the use of the public transport 1 . This seems to be confirmed by the meta-regression results: the regions where the percentage of students using public transport was higher tended to be characterized by higher average levels of R 0 (t). Also meteorological conditions (temperature, humidity and UV radiations) may have had an impact on contagion spread through modulating SARS-CoV-2 infectiousness 36,[41][42][43] . Additionally, with the arrival of the autumn season, recreational and sport activities moved, as usual, to closed places, with a consequent increase of the contagion risk 44,45 . The possible role of ambient temperature as a moderator of the contagion was suggested also by the meta-regression: higher average regional temperatures were associated with lower values of the average R 0 (t) from October to December. A factor that could have contributed to reach the maximum of the R 0 (t) curve in October was also the full recovery of many work activities after the summer holidays. Finally, at the end of September (20th-21st) regional elections took place in seven regions (Toscana, Marche, Campania, Puglia, Veneto, Liguria, Valle d'Aosta) and administrative elections took place in 1'184 municipalities around the country. The fact that, as per tradition, voting stations have been set up in school buildings during the weekend and that people moved within and between regions to reach their places of residence in order to vote should not be a priori ruled out as a possible source of contagion amplification, as documented elsewhere 46,47 .
The decline of the R 0 (t) curves after the October peak was likely related to the restrictions progressively put in place by the central government. This decline tended to be steeper in the regions earlier classified as high-risk zones, where stronger restrictions were immediately introduced. However, it should be also noticed that, especially in the Northern and Central regions, the decline apparently started before the introduction of restrictions on November 6th. This could be indicative of the efficacy of local measures introduced in some regions before the national ones or reflect spontaneous changes in behavior of the population in the face of worrying levels of contagion.
At the end of November, the instantaneous reproductive number was below or very close to the threshold of 1 everywhere. Then, a new increase was observed during the second half of December. This was particularly evident in the Southern regions, probably due to travels within and between regions related to the Christmas holidays.
The "flat" R 0 (t) curve estimated for the Southern regions, with a maximum value slightly larger than 2, versus a maximum close to 3 in the rest of the country, was indicative of contagion dynamics in the South apparently different, possibly related to specific environmental, demographic and socio-economic features or a greater capacity of contagion control. Focusing on the average level of R 0 (t), we tried to explain through meta-regression part of the observed between-region heterogeneity. Our meta-regression results should not be used to draw conclusions about the existence of causal relationships linking the strength of contagion with the regional features, but suggest hypotheses that should be investigated through ad-hoc studies. The positive association of the average value of R 0 (t) with the employment rate and the schooling rate, and its negative association with the poverty index seem to indicate that population lifestyles typical of richer socio-economic contexts may have helped the virus spread. Also, population density, here measured in terms of percentage of people residing in high-density areas, was one of the possible predictors of a higher instantaneous reproductive number, as well as the tourism attractiveness of the region. However, in interpreting this latter result, one should consider that tourism flows have undergone considerable changes due to the COVID19 emergency and that in our meta-regression we included the tourism rate of the year 2018, the last available. As already discussed, our meta-regressions highlighted the potential role of school-related commuting in enhancing the contagion. We also found that the percentage of children attending kindergarten had a positive association with the R 0 (t) level. This result could simply reflect the fact that taking advantage of the kindergarten service is a proxy of the employment level in the region or could be indicative that kindergarten themselves had a role in spreading infections. Notice that kindergartens have been usually kept open during the second epidemic wave. Interestingly, we found that the average R 0 (t) was lower when the number of family physicians and pediatricians per inhabitants, as well as the proportion of public health care institutes, were higher. These results could indicate the important role of family medicine and public sanitary service in preventing contagion. In interpreting the meta-regression results, we should account that some of the meta-regressors followed a North-South gradient so that it was difficult to disentangle their role from that of possibly unobserved factors that varied with the latitude. Additionally, some observed associations could reflect a reverse causality phenomenon. Finally, we did not focus on environmental variables other than the regional average temperature in the study period. Other meteorological factors, e.g. solar radiation, humidity, and air pollution have been suggested as affecting the contagion rate as well as COVID19-related mortality or hospitalization, but exploring these complex relationships was outside the scope of our paper 48-50 .

Sensitivity analyses
Compartmental models can be very complex when many compartments are defined and many transitions between them are allowed. Complexity goes hand in hand with an increase in the number of unknown parameters that cannot be estimated due to structural and practical identifiability problems. Most of the complicated compartmental models proposed in the literature rely on fixing the values of a large number of parameters without even studying the impact on the results of those arbitrarily chosen values 21,51 .
In this paper, we performed both one-factor-at-time sensitivity analysis and a GSA. The two sensitivity analyses had a different interpretation and their results were not directly comparable. In fact, with the one-factor-at-time sensitivity analysis, we ignored how the inputs interacted and inspected the model outputs only for a few values of IFR and infection duration. On the contrary, with the GSA, we explored the whole space of the unestimated parameters of the SIRD modelThe GSA indicated that there was interaction among the inputs and that there were no negligible inputs. The initial number of infected i 0 resulted to be the less influential fixed parameter of the SIRD model. Notice that we were interested in evaluating the relevance of i 0 since in the main analysis we forced it to be equal to the notified number of circulating infections on July 31st, thus assuming that the regional screening and tracing systems were initially able to detect all new infections. A second important result was related to the model simplification that we adopted in the regional SIRD models, assuming T D = T R = T : this simplification was admissible if the infection duration in the SIRD was set to plausible values for T R . On the other hand, T R resulted to be the most influential parameter, suggesting that an accurate knowledge of T R would be needed for better predictions and understanding of the pandemic. Unfortunately, improving the empirical evidence on T R is not trivial, because of its complex nature (T R is both the average time from infection onset to recovery and the average time of infectiousness) and because of its dependence on undetected infections. Finally, the GSA confirmed the robustness of the estimated R 0 (t) curve to the value assigned to p, already documented elsewhere 29 . Under the reasonable assumption that the average infection time was homogeneous across the regions (the infection time is mainly related to virus and disease characteristics) and considering that changing the value of T the regional R 0 (t) curves inflated/deflated but preserved their overall shape, this robustness should assure that the conclusions drawn from the comparison of the regional R 0 (t) curves were not affected by specific choices of IFR and T .

Submerged infections and role of the IFR
Our model produced regional estimates of the number of circulating infections, inclusive of the submerged fraction of the contagion. This quantity, combined with R 0 (t), determines the number of new infections, and it is useful to assess the actual impact of the contagion on the health care system: if a large instantaneous reproductive number can be sustainable at the beginning of the epidemic when the number of cases is low, a low instantaneous reproductive number may not be sustainable when the number of circulating infected is large.
For identifiability reasons, we did not estimate the IFR, but our analysis provided indirectly indications about plausible values for this parameter through a comparison between the number of infections predicted by the SIRD model and the observed cases reported by Protezione Civile 18 . Under the assumption of an average waiting time from infection onset to infection resolution (death or recovery) equal to 14 days, our findings supported the hypothesis that during the study period the IFR varied over time and space. In particular, according to our results, the IFR was likely lower than 1.15% in the initial phase of the second epidemic wave, when the virus mainly circulated among younger people 52 , as well as in many Southern regions, including Campania, which is, not by chance, the region with the lowest aging index in Italy 22 . Conversely, an IFR equal to 1.79%, the upper bound considered in our analysis, was not consistent with the observed infection dynamics in most regions, because it led to a predicted number of infections smaller than the observed number of notified cases.
Fixing p = 0.5%, not far from the IFR estimate reported in the review by Ioannidis 53 , we obtained quite high estimates of infection prevalence which could be reasonably considered as an extreme -even if not impossible -lower bound.
Notice that, in principle, the observed inconsistency between predicted infections and notified infections in the Southern regions could partly be solved by assuming for them a longer average waiting time T (e.g. T = 18 days), with p = 1.15%. Indeed, as shown by the sensitivity analysis results, the parameters p and T jointly affect the epidemic curve generated by the SIRD. However, as already explained, this hypothesis seems less plausible than the hypothesis of a heterogeneous IFR.

Study limitations
Our study has some limitations. Regarding the regional analyses, the SIRD model relied on strong assumptions that could be partly unrealistic: the population was homogeneously mixed, with people making contact at random, and closed, with no contacts among individuals belonging to different regions or countries; transition parameters were constant across individuals who were present at the same time in the same compartment; there was not reinfection and no incubation period. Importantly, an individual becoming infected on the day t was supposed to be infectious starting from the day t + 1 until infection resolution. This was quite an unrealistic assumption for the infected individuals that were notified, thus isolated. Additionally, as usual in compartmental models in their simplest form, we implicitly assumed that the transition times were exponentially distributed 54 , inducing a non-negligible probability that the infection duration was much longer than the average, as well as a high probability of very short waiting times. As already discussed in Section 4.3, also the assumption of an IFR which is constant over time was questionable, in the light of the comparison between the observed notified cases and the infections predicted by the SIRD model. An additional limitation concerns the fact that we calibrated the SIRD models only on the observed COVID19 related deaths, without exploiting the availability of other observed time series, like the ones of notified infected. Calibrating on the notified infected would have required the formulation of a more complex compartmental model with separate compartments for detected and undetected cases and the introduction of additional unknown transition parameters [2][3][4]29 . On the other hand, this more complex model would have also allowed us to take into account the fact that the detected and isolated infected individuals are not able to spread the contagion as the undetected ones.
Additionally, the regional R 0 (t) curves could be quite sensitive to spline knots position. One could extend the procedure to use penalized splines 26 .
Following a procedure which is frequently used in compartment models literature 32, 55 , we did not make assumptions about data distribution in the phase of estimation of the SIRD parameters, but we generated bootstrap samples for confidence intervals construction by assuming a Negative Binomial distribution on the daily increments of deaths. Alternative approaches rely on likelihood maximization or Bayesian inference, but compartmental models often exhibit complex likelihood requiring particle filtering methods to be maximized 56 or computationally intensive methods based on data augmentation procedures 57 . In the Bayesian framework, likelihood-free approaches have been also proposed 58 .
The bootstrap percentile intervals could have lower coverage than the nominal one 59 . Bias-corrected and accelerated bootstrap confidence intervals could have better performance 60 , but they are too computationally demanding to be applied in this context.
Finally, regarding meta-regression, a finer geographical detail, e.g. considering provinces or municipalities instead of regions, would have allowed us to better appreciate the sources of heterogeneity in contagion spread. However, estimating R 0 (t) at a provincial or at finer level would have led to a more unstable inference. Not to mention that the assumption of a close population would be, at this geographical detail, even more exceptionable.

Conclusions
Despite the difficulty of drawing information from poor and limited data, our approach allowed us to estimate R 0 (t) and get an evaluation of the prevalence of infections in Italy at a regional level. Besides the common elements -including a mid-October peak and a decline during the month of November, steeper in those regions where stronger restriction measures were earlier applied -the rate of contagion changed over time heterogeneously across regions. This proves that models treating the phenomenon mostly at a national level could ignore important characteristics, specific to certain areas. Meta-regression results show that heterogeneity can be partly explained by socioeconomic and demographic factors, like the grade of urbanization, family medicine and health care system, employment rate, use of public transport for school commuting. Higher temperatures were associated to lower R 0 (t) levels. These factors should be further explored with analyses at a finer geographical scale.
Although we fixed some parameters to make the model identifiable, the results of the sensitivity analyses reassured us that the overall shape of the estimated R 0 (t) curves and the conclusions drawn from comparisons among regional curves were robust to changes of those parameters. On the contrary, the estimate of the prevalence of infections was strongly related to changes in the IFR. Moreover, our findings seemed to support the hypothesis of a heterogeneous IFR over time and among regions, that could have been lower than 1% in some regions, especially at the beginning of the epidemic wave.
Regarding sensitivity analysis, we want to stress that GSA has been proved to be a powerful and promising tool and its use should be encouraged in this and other research contexts.

Data availability
The datasets used for the current study are available in the Protezione Civile github repository, https://github.com/ pcm-dpc/COVID19. All methods were carried out in accordance with relevant guidelines and regulations.