In autumn-winter periods (i.e., between October and February), especially during season 2021-22,) incidence rates exhibited higher variability in amplitude, when compared to the other seasons (see Fig. 1A).
The original time-series data were transformed into functional data composed of 278 curves, each corresponding to 570 days of incidence rates per municipality (Fig. 1B). The smoothed correlation and variance-covariance plot surfaces are illustrated in Fig. 2, showing higher correlations between consecutive or close days. The higher covariance values were observed in winter periods, as illustrated by the variance-covariance plot.
To find the dominant types of variation in the functional data, we first centered the curves (mean adjustment) and computed principal components. Over 70% of overall variability observed in the curves was explained by the first 3 principal components (see Fig. 3).
Figure 4 illustrates the results obtained for the first three functional principal components (accounting for 72% of overall variability), as they represent the main types of variation observed in the data.
The first functional principal component (PC) represents 47% of overall variability and defines the main type of variation shared by the 278 incidence curves. It reflects the major trends in the curves, as seen in Fig. 1, with higher incidence amplitudes in autumn-winter seasons and more stable evolutions in spring-summer seasons. We may refer to this component as the “global effect”. Looking at the first PC curve we can identify four main phases of pandemic evolution, the first phase (August to October 2020) shows low differences in pandemic evolution across municipalities, in the second (October 2020 to January 2021) differences between curves increased and reached a maximum around mid-November 2020 and, after that, began to decrease till mid-January 2021. The third phase (January 2021 to December 2021) is again characterized by low differences across municipalities. In the last phase (December 2021 to February 2022), considerable differences in incidence curves occurred across municipalities, as the pandemic curve increased severely and peaked in February 2022.
The second functional PC retained 16% of overall variability. The curves sharing this type of variation contributed to the decrease of the average incidence observed in the winter 2020-21 and between Christmas 2021 up to the end of January 2022. The municipalities with high negative scores in this component were the most relevant contributors for lowering the mean incidence rate observed in winter periods. This functional PC will be referred as “preventive measures effect”.
The third functional component accounts for 9% of overall variability and corresponds to the increase of incidence caused by curves with high incidence rates in autumn-winter periods, especially in autumn 2020-21. This functional PC was negative until December 2020 and since then, became positive and kept increasing to reach the peak on late-January 2021. After that, the curve started decreasing but stayed positive till the end of February 2021. Over the autumn-winter period 2021-22, it presents a similar pattern to the second functional component except for a smaller amplitude. This component will be referred as the “outbreak effect”. Remaining components represent small amounts of variation within the original 278 incidence curves (e.g., 6% and 4% for the fourth and fifth components, respectively), are more difficult to interpret. Consequently, from now on the results to be presented are restricted to the first three principal components.
From a spatial perspective, it is interesting to visualize the functional PC scores, as they may contribute to find geographical patterns, clusters or to detect outliers.
For example, on the first PC (viewed as the “global effect”) it is clear that the highest scores correspond to municipalities clustered along the northern coast and adjacent regions, where municipalities tend to be small and close to each other (Fig. 5A). Municipalities from those regions were the most relevant contributors for increasing the average incidence curve as they reported high incidence rates over autumn-winter seasons. On the other hand, the second functional PC (interpreted as the “preventive measures effect”) showed highest negative scores in municipalities dispersed along Portugal’s hinterland where municipalities are characterized by being separated by longer distances and having sparse populations, plus the municipalities from the region of Lisbon, that have been under severe restrictive measures (e.g., mobility restrictions, lockdown, and school closures) during pandemic (Fig. 5B). In this case, negative scores are linked to municipalities that contributed to lower average incidence rates during autumn-winter seasons. In the third component (Fig. 5C), interpreted as the “outbreak effect”, the municipalities with high positive scores are spread over the country in places where population sizes are small, and disease outbreaks (which occurred mainly in nursery houses) have been reported.
To simplify the interpretation of principal components scores, we further classify municipalities into homogenous groups using the hierarchical classification algorithm as described above. To weight the dissimilarity matrix we used an Exponential variogram model with a nugget-effect of 31131 (38%), an estimated partial sill equal to 51313 (62%) (units: 14-day cumulative number of new cases/105 habitants)2 and an estimated range parameter of 227 kilometers (Fig. 6), which is close to the width of mainland across east-west.
The estimated parameters indicates that functional data from municipalities separated up to 227 km are still correlated, but almost two fifths of the variability (38%) is not explained by the distances separating them (i.e., the proportion of variability with no spatial pattern). The dissimilarity matrix used is based on the Euclidean distances among the coefficients of the cubic-spline basis functions, while the ward-D2 linkage is applied as the agglomeration method. The dendrogram (not presented here) obtained was splitted in different ways to find an interpretable number of partitions. The tree cutting procedure (i.e., pruning) was guided by visual inspection of results and attempted to find a number of partitions neither too small to identify any spatial clustering nor too big to miss the effect of spatial dependence found in the trace-variogram of functional data. As a result, the final dendrogram was cut into six groups, or clusters, where it is possible to detect spatially correlated patterns of municipalities groups according to the clusters assigned (Fig. 7).
After classification of functional data, we found clusters of municipalities located along the northern coast and adjacent regions (cluster 1), central coast (cluster 2), hinterland regions from north to south plus southern coast (clusters 3, 4 and 5), and the metropolitan area of Lisbon plus municipalities lying deep along the hinterland (cluster 6). The curves from municipalities classified in each cluster are presented in Fig. 8 along with the estimated mean national curve. The differences between clusters are mainly defined by the average height of the curves (i.e., amplitude) in autumn-winter seasons. We note, for example, the differential incidence between the curves in cluster 1 and cluster 6 in autumn-winter season 2021-22.
The scatterplots of the first three principal components scores (Fig. 9) capture main patterns of variations found in incidence curves. In the scatterplots, municipalities scores are classified according to the groups returned by hierarchical clustering to facilitate the interpretation of results.
Figure 9A illustrates the projection of municipalities scores on the first two functional PCs. The first component scores (fPC 1) correspond to the “global-effect” and separates most of the municipalities from clusters 1 (red) and 5 (grey) with positive scores, from municipalities of clusters 4 (yellow) and 6 (light blue) with negative scores. Positive scores are associated with incidence curves above the overall average incidence, while negative scores are associated to incidence curves below the overall average. Most of municipalities from clusters 1 and 5 are located along the northern coast and adjacent regions, while municipalities from clusters 4 and 6 are mostly located in inland areas of Portugal.
The second component shows other type of variability that splits most municipalities of cluster 2 (dark blue) with positive scores from municipalities in clusters 5 (grey) and 6 (light blue) with negative scores. This component is interpreted as corresponding to the “preventive measures effect”, showing cluster 5 and 6 municipalities as the most relevant contributors for the lower incidence rates observed along autumn-winter periods.
The position of municipalities (scores) in the third component (corresponding to the “outbreak effect”) are illustrated by Fig. 9B (y-axis). It is possible to highlight four municipalities from cluster 4 with the highest scores on this component. These municipalities have small size populations and have all suffered very high incidence rates in winter 2020-21, mainly due to outbreaks occurring in nursing homes. The incidence curves of these four municipalities are shown in Fig. 10.