Spatiotemporal Modelling of COVID-19 Infection Risk in Portugal

Since its outbreak, the SARS-CoV-2 pandemic has shown complex dynamics in both time and space. These dynamics are the result of a combination of factors, including the spatial distribution of the population’s social and economic levels and its mobility patterns within a given country. After assessing the risk of infection and associated uncertainty based on infection rates by municipality, one of the most important challenges now facing health authorities concerns the ability to predict second waves and interactions with epidemics of other viruses (e.g. influenza). We propose characterizing the local spatiotemporal behaviour of risk of infection based on existing historical data, and classifying the local spatiotemporal patterns of the time series, thus allowing the management of new waves by region. We combined functional data analysis with geostatistical simulation to model the spatiotemporal evolution of infection risk by COVID-19 in mainland Portugal. The daily number of infection data by municipality reported by the Portuguese Directorate-General for Health are used to build time series of infection since the beginning of the outbreak in Portugal. We employ a dimensionality reduction of these curves using functional principal component analysis. The objective of this step is twofold: detect municipalities with a similar temporal evolution, and get a small number of coefficients to describe the temporal pattern of the series. The low-dimension coefficients are then used as experimental data to map the infection time series spatially using geostatistical simulation. With this step, we recover high-resolution maps of COVID-19 infection risk at any time step, allowing the simultaneous modelling of time and space. With the resulting spatiotemporal models, authorities can identify locations where the disease exhibits similar behaviours and, therefore, devise mitigation actions based on


Introduction
One of the crucial factors that have most conditioned approaches to characterizing the risk of infection by the SARS-CoV-2 virus is the lack of knowledge about the behaviour history of the virus and uncertainty about the way it spreads in predictions of new waves, and an inability to discriminate it, over space and time, from other viruses that present similar symptoms (e.g., seasonal influenza) [1].Azevedo et al. (2020) [2] proposed a method for mapping the risk of infection and uncertainty based on Poisson kriging [3] and a geostatistical model of stochastic simulation [4], which considers the daily infection rates by municipality and the uncertainty associated with the number of inhabitants.This work resulted in the creation of daily update maps of infection risk and associated uncertainty for Portugal: a useful tool for the management of the pandemic risk in the country.
After approximately six months of gathering data and knowledge on the risk of infection, one of the most critical challenges facing health authorities is the ability to discriminate the space-time behaviour of the infection risk given the socio-economic factors in each region, such as the concentration of risk groups in urban areas and the existence of retirement homes.
The spatiotemporal characterization of the risk of COVID-19 will also help identify the presence of second waves by region and increase the possibility of distinguishing the incidence of COVID-19 from other less dangerous viruses that present identical symptoms, which is a significant factor for the management of medical resources.
When adding six months' of daily infection rate data from a large set of municipalities, there is a need to reduce the dimension of the problem so the local time series can be better understood in order to respond to those challenges.Here we build upon the work by Azevedo et al. (2020) [2] and propose adding the temporal dimension to the characterization of infection risk maps (i.e. the simultaneous spatiotemporal modelling of space and time), combining Functional Data Analysis [5] and geostatistical stochastic models [6].
The section below describes the methodology adopted and the data available in detail, followed by the application of the method to the data set and the results obtained.The final sections discuss the main results and present the conclusions of this study.

Methodology
This section describes the methodology adopted to model the spatiotemporal evolution of COVID-19 infection risk in mainland Portugal.
We analyzed the temporal evolution of the daily infections rates ((  )) based on the cumulative number of notified positive cases of infection made publicly available by the Portuguese Directorate-General for Health (DGS) between 1 March and 28 May 2020.The infection rate is defined as: where (  ) is the number of confirmed positive case tests in each municipality,  (with  = 1, … , municipalities from when the COVID-19 pandemic was declared up until a given day.The cumulative infection rate curves are referenced to each municipality by its geometric centroid   with coordinates (  ,   ) and (  ) and the size of the population at risk (i.e.resident population of a given municipality) [2].To integrate the temporal component in the infection risk maps, we couple functional data analysis and geostatistical simulation: i) First, each time series (i.e. the cumulative infection rates) at each municipality location is fitted by polynomial interpolation using functional data analysis (FDA) [5]; ii) Then, Functional Principal Components Analysis (FPCA) [5] is performed over the fitted parameters to extracted the main temporal features of the infection rate time series summarized in the first n principal components (PCs).This method enabled us to summarize each infection curve in a small number of PCs.
iii) Finally, the spatial characterization of infection rates for the entire country is modelled using geostatistical stochastic simulation.The PCs retained after FPCA are used as experimental data with which to generate maps of PCs loadings of polynomial coefficients, allowing us to predict the spatial distribution of these infection curves at a high resolution for Portugal mainland.
We go on to describe the FPCA used to model the evolution of the cumulative infection time series per municipality.Then we use stochastic sequential simulation to map the relevant PCs at the country level at a high resolution.The resulting maps can be used to recover the cumulative infection rate curves locally at the resolution of the simulation grid.Here we defined a two square kilometre grid cell.

Functional Principal Component Analysis (FPCA)
FPCA is a well-known mathematical model order reduction technique [5].In FPCA, functional data analysis (FDA) is combined with principal component analysis (PCA) to summarize time series into a relatively small number of coefficients [5] [7] [8].FPCA is a robust dimension reduction method, even in the presence of noise and non-periodic data, such as the temporal evolution of new COVID-19 infections.
It is possible, considering the infection time series per municipality, to note municipalities demonstrating steeper or smoother behaviours, and to hypothesize that this behaviour is related to such local variables as the socio-economic context of a given municipality.Also, similar curves representing municipalities with similar behaviours and future outcomes may demonstrate regional spatial patterns.
To model the spatiotemporal evolution of the infection, FPCA first fit and smooths the cumulative infection time series, then summarizes their key components (i.e. the functional PCs).By analyzing the cumulative infection curves with functional PCA, we reduced the dimensionality of the infection time series from 88 days per municipality ( = 278) to three principal component scores per municipality.
FDA is a data smoothing technique that uses a series of basis functions [5]: where   are the coefficients of the basis functions.In this case study, we used third-order splines as basis functions rather than Fourier basis functions, as the cumulative infection time series did not show a periodic behaviour.The coefficients,   , can then characterize the temporal cumulative infection curves of the basis functions (  ()) (Eq.2).The polynomial fit results in a smoother curve.For a detailed description of FDA, we recommend the seminal work by Ramsay (2006) [5].
The number of  basis functions that are needed to fit a given time series accurately depends upon the nature of the time series and is typically determined by trial and error.For example, in this case study, we used 89 basis functions to fit the cumulative infection curves using the FDA Matlab® package [10].The number of basis (  ) was computed following [10] where #  is the number of days in the cumulative infection rate curves and   is the order of the splines used as basis functions.
After performing FDA, we used principal component analysis (PCA) [10] on the functional data coefficients (  ) to reduce the dimensionality problem at hand.PCA reduces the dimensions of a data set by transforming it into  number of principal components, where  is considerably less than the original dimensions of the data set.By definition, PCs are uncorrelated linearly.Each cumulative infection curve is then described by the coefficients, or scores, on each principal component.Like FDA, the original fit of the time series can be reconstructed by multiplying the coefficients by the PCs, then summing them (Eq.4).For a more detailed description about PCA, see [10].
Because they explained most of the original temporal infection rate curves variance, we use the first three PCs to model the spatiotemporal evolution of COVID-19 infection in mainland Portugal (Figure 3).
From the FPCA analysis, each municipality has three scores or coefficients, that can be used to reconstruct the original cumulative infection curve () by multiplying the functional principal components (FPC) and their scores () following: where  is of the mean cumulative infection values across all municipalities.
Based on the spatial continuity patterns of different scores,   ,  = 1, . .,3, one can obtain the spatial distribution of scores for each FPC for Portugal mainland and, with Equation 3, one can reconstruct a cumulative infection curve at any location within a region (i.e.mainland Portugal).
We used stochastic sequential simulation [11] to perform this task, as it allows us to model the spatial distribution of a given random variable (i.e. the FPCA scores) with high spatial resolution.

Stochastic sequential simulation
In the stochastic sequential simulation, such as direct sequential simulation (DSS) [11], the simulation grid is simulated following a pre-defined random path that visits all the nodes of the simulation grid.In this case study, mainland Portugal was overlaid with a regular two square kilometre grid.Following the pre-defined random path, and at each node of the simulation grid (xu), the local mean estimated by simple kriging estimate: where (  ) * is the simple kriging estimator at any given location, xu, conditioned to existing experimental points, (  ) (i.e. the FPCA scores per municipality).The kriging weights are shown as   .The local Kriging variance (  2 ) is defined as: where () is the spatial covariance, as expressed by a variogram model, between two points with a distance equal to vector .
A value is then drawn from an interval defined at the global distribution function Each realization of the FPCA scores is transformed into a spatial risk map (Eq.4) for the entire country.It is worth noting the following important properties of this space-time approach: i) The spatial risk map obtained for a given period, , is conditioned by the entire historical time series of all municipalities until time . ii) The ensemble of geostatistical realizations of risk allows assessment of the average map of the risk as a result of the space-time model at the period, .It means the average risk value at a time period , in a given municipality location, is not necessarily equal to the measured value at that period, , in that municipality (as in the method proposed by Azevedo et al. 2020).It is an average risk resulting from the historical time series of that municipality.

Spatiotemporal infection risk maps
The cumulative infection rate curves, built as defined in the previous section, were approximated with FDA using third-order splines as basis functions [5] (Eq.2).We then applied PCA to reduce the dimension of the functional data coefficients.We retained the first three PCs as they represent more than 90% of the total variance of the original data.
Each principal component factor per municipality was used as experimental data in the DSS of the three variables individually [11].The PCs were assigned to the centroid of each municipality.A set of 100 realizations for each component was generated independently in a two square kilometre simulation grid.The point-wise average model of the 100 realizations was used to reconstruct the approximated time series average with FDA.We illustrate the methodology by showing maps of the cumulative infection rate at a given moment for all of mainland Portugal.

Temporal modelling of the infection rate curves by municipality
We model the temporal evolution of the daily infection risk, as defined in Eq. ( 1), for the period between 1 March and 28 May 2020.As an illustration, we show the temporal evolution of the cumulative infection rate curves of the five municipalities with the highest infection rates on 28 May (Figure 1).During this time, the infection hotspots were located mainly in the north of Portugal, where the initial outbreak occurred.The daily infection curves were then approximated with FDA, following the methodology described above.Third-order splines basis were fitted to each of the observed curves.The FDA approximates the original temporal evolution of the infection risk per municipality without introducing considerable noise (Figure 2).We decomposed the approximated curves of infection risk using FPCA [5].We considered the three first FPC, as these explain most of the total variance in the original data (Figure 3).The first FPC explains 96% of the data, the second 2% and the third 0.5%.In total, we were able to explain 98% of the total variability of the original cumulative infection rate curves and reconstruct the entire curves based on only three FPCs.

Spatial modelling of infection rate curves
The three FPC were then used to create maps with their spatial distribution using DSS [11].For each FPC, we generated 100 independent realizations after verifying there is no correlation between FPCs (Figure 4).
It is important to note that with the stochastic simulation, we intend to generate realizations of FPC factors and use them to reconstruct the evolution curves of infection ratios.At each point in space and time, the average of infection ratios is, by definition, the average risk of infection [2].
With the proposed methodology, we characterize areas with homogeneous space-time risk behaviour.With the stochastic simulation, we intend to calculate, for the entire regular grid of points, both the average curve of risk evolution over time and the uncertainty in space and time associated with the behaviour of these curves.Local areas in which the temporal evolution of risk displays a very homogeneous behaviour will generate a small spatial uncertainty in the different time slices: transition zones of distinct behaviour, of the temporal evolution of risk of infection, will have more significant uncertainties.To simulate each FPC map, we assigned the corresponding FPC score to the geometric centroid of each municipality and used it as experimental data for the three random variables of the geostatistical simulation (Figure 5).The simulation grid was defined with a regular two square kilometre grid cell size, and the horizontal variogram models of each variable were manually fitted to experimental weighted variograms by population size, as described in [12], [13] and [2].For each realization, we reconstructed the infection rate curves (Eq.2) at each cell in the simulation grid.In this way, we were able to assess infection rate curves at a high spatial resolution (i.e. two square kilometres) for the timespan considered (i.e.we couple the space and time domains of the pandemic's evolution).A reconstructed realization for the last day of the time series considered (28 May 2020) is shown in Figure 7, along with the reconstructed map from the average model of the sets of geostatistical realizations for the three FPC (Figure 6).For comparison, we show the average infection risk model obtained using the methodology presented in [2].Both maps show similar spatial patterns in terms of hot and cold spots.From the set of simulated FPCs, we can also assess the spatial uncertainty of the predictions by, for example, calculating the interquartile range (Q75-Q25) between the set of FCPs and then performing the reconstruction of the variable.Figure 8 shows the interquartile maps for three distinct days (10, 20 and 30 days after the beginning of the time series) and compares the results obtained with those from the geostatistical simulation.Finally, and as an illustration, we show the reconstructed cumulative infection rate curves for three different locations where the behaviour of the evolution of the disease is distinct within the simulation grid.This example is a preliminary illustration in risk analysis and assessing regions of the country demonstrating similar behaviour (i.e.similar curves) and can promote early mitigation actions.

Conclusions
We developed and implemented a methodology based on FPCA and geostatistics to model local infection risk by COVID-19 temporal series.With this methodology, we were able to describe the temporal pattern of the series using a small number of coefficients and to identify similarities in the temporal series.Beyond the ability to identify different temporal patterns, it was also possible to recover high-resolution maps of COVID-19 infection risk at any time step.Doing so allows the simultaneous modelling of time and space, enabling a full reconstruction of the history of the pandemic and exploration of the relationship with socioeconomic variables and the effectiveness of health policiesfor example, the impact of local, regional or national mitigation and prevention actions.
Comparing current spatiotemporal patterns with those that took place in the past will help with the design of more effective mitigation actions.Moreover, while we did not explore this here, we believe these models may also be used for short-term forecasting as a simple data-driven proxy of full SEIR models.

Ethics approval and consent to participate
Not applicable.Reconstructed infection risk maps from (a) a single realization of the rst three FPC, (b) the average maps of the rst three FPC and (c) infection risk map obtained with geostatistical simulation as described in [2].This example is a preliminary illustration in risk analysis and assessing regions of the country demonstrating similar behaviour (i.e.similar curves) and can promote early mitigation actions.
() (i.e. the probability distribution function inferred from the FPCA scores) centred on the estimated local mean ((  ) * ) and with an interval defined by the estimated local variance (s  2 (  )).The simulated value is assigned in the simulation grid and included in the experimental data set, (  ).All nodes of the simulation grid are then visited following the random path.Because the random path changes on each run, meaning the conditioning data at every grid node also changes, the different sequential simulation runscommonly designated as realizationsproduce distinct spatial models.

Figure 1 :
(a) Geographical location of the five Portuguese municipalities with the highest infection risk on 28 May 2020; (b) temporal evolution of the infection rate from 1 March to 28 May 2020.

Figure 2 :
Figure 2: Temporal evolution of the infection risk curves approximated with FDA.

Figure 3 :
Figure 3: Scree plot of the FPCAonly the first 20 FPCs are shown.The black circles

Figure 4 :
Figure 4: Cross-plot between the first three FPC showing there is no correlation between

Figure 5 :
Figure 5: Geographical location of the scores of each FPC used as experimental data for the

Figure 6a ,
Figure 6a, 6b and 6c show one geostatistical realization for each PC.It is worth noting

Figure 7 :
Figure 7: Reconstructed infection risk maps from (a) a single realization of the first three

Figure 8 :
Figure 8: Uncertainty infection risk maps for: (a) to (c) three different days along the No. 391) granted by the FCT.MCR acknowledges FCT support for the research contract established under the transitional rule of Decree-Law 57/2016 (IST-ID/175/2018).

Figures
Figures

Figure 4 Cross
Figure 4