The city of Recife has a considerable prominence in Brazilian context since it is the second-most densely populated city, with an estimated population of 1.55 million people in an area of 218 Km2. Recife is the capital of the third most populated state in the Northeast region, with the highest GDP per capita in the Northeast region 29. So we aimed to explore how Covid-19 cases in Recife neighbourhoods relate to a set of socio-economic and demographic characteristics and the provision of essential services. Data on the disease were obtained from the municipality's Department of Health 30. An evolution of cases over time was plotted on Fig. 1, including relevant actions taken by the State Government and City Council.
It was noted that these authorities took an early decision to close facilities when first cases were reported, such that only supermarkets, grocery stores, bakeries, pharmacies, gas stations and pet shops were allowed to open 31. As daily infections were increasing, the state governor issued an edict obliging the population to wear masks in public places. However, as fines were not imposed on shoppers but on the owners of the commercial facilities they entered, adherence to the measure became more dependent on the willingness of the general public to wear masks and intense supervision at the entrances to commercial premises.
Current measures had not been enough to flatten the increase in cases of infection, so in mid-May a 15-day strict quarantine was established in five municipalities in Pernambuco, including Recife. People were only allowed to leave their homes to seek essential services for which they had to show proof and vehicles were only allowed on roads according to a rota system based on the final number of the number plate. The spread of the virus peaked in late May and the cases seem to have been stabilised at a low level for now (July 8th, 2020), even though restrictions have been gradually relaxed and places in which people congregate such as shopping malls and commercial premises have been reopened.
4.1 Spatial cluster analysis
Hotspots with regard to the total number of reported cases per neighbourhood were generated by means of Moran's I. We noted in Fig. 2 that cases of the disease at first were concentrated in the South Zone of the city, specifically in the well-developed and heavily densely-populated neighbourhood called Boa Viagem. As cases increased, other hotspots of High-High and High-Low types were found in the South, the North and the West Zones of the city. A quartile analysis was developed using 15 census indicators 25 concerning residents’ average income; government support for piped water, sewage disposal, electricity, and garbage collection; home ownership; population density and five age groups which were considered in absolute numbers. Results revealed social disparities in these areas compared to Boa Viagem, since in terms of access to sanitation, garbage collection, income and literacy, these other hotspots are at least 50% below the indices for these variables compared to all other neighbourhoods in Recife.
Case-fatality hotspots due to Covid-19 were also examined as plotted and shown in Fig. 3, highlighting areas of high case-fatality rate that are surrounded by areas of the same pattern, so-called high-high clusters. A different spatial pattern arises when comparing to the clusters of cases. They kept mostly stable throughout time so that hotspots usually formed in the North and Southwest zones, while cold spots were seen in the North and West zones.
Although Boa Viagem has been a relevant cluster for reported cases since the first date analysed, this has not occurred for case-fatality. This neighbourhood presents a high number of cases, but the case-fatality rate has been growing in a lower proportion. The first death in Boa Viagem was confirmed on April 10th after 63 cases had been recorded, which represented 15% of the Covid-19 cases in Recife at that time. Boa Viagem has led the number of deaths in the city since April 27th until July, 8th. Although when analysing the relationship between cases and deaths, the case-fatality rate, this neighbourhood did not become the hotspot in any of the transit periods, i.e, Boa Viagem neighbourhood has concentrated the highest number of cases but it has had a low number of deaths per cases. On July 3th, the neighbourhood reached its maximum case-fatality rate, 18%, which, however, is considerably lower than the hotspots of case-fatality, for which the percentages were between 34% and 54%.
The results from quartile analysis show that hotspots of case-fatality rate (the high-high clusters), usually present areas with similar environmental characteristics to those in hotspots of confirmed cases, disregarding Boa Viagem. For instance, most of these areas are characterised by having a precarious public service provision and low-income population. The opposite characteristics were verified when analysing the low-low clusters. It has been observed that neighborhoods in low-low clusters (of both case and case-fatality rates) have fewer residents per household than 75% of all other neighborhoods.
Spearman correlation tests were applied to case-fatality rate and socio-economic factors using data from July 3rd, 2020 as a way of finding monotonic relationships. Significant results were obtained considering a p-value less than 0.05. Regarding population aspects, a negative connection was identified with income (\(\rho\) = -0.51), literacy rate (\(\rho\) = -0.44) and percentage of people over 60 years old (\(\rho\) = -0.32), while residents per household presented a positive one (\(\rho\) = 0.41). Analysing governmental support showed that access to the sewage system (\(\rho\) = -0.47) and garbage collection (\(\rho\) = -0.27) were both negatively related to case-fatality.
Aggregating previous results indicate that some places tend to suffer from fewer deaths due to Covid-19 when residents in such places have a level of income and literacy that is well above average for Recife and because the number of residents per household is lower than elsewhere in Recife. These better-off areas also have more access to public services. So the residents of these privileged areas had the infrastructure and access to goods and services (public and private) that were needed to isolate themselves in relative safety and to respect quarantine restrictions. However, in contrast to what was expected, the results also showed that the percentage of the elderly population was highest in low-low clusters of case-fatality (when comparing the distribution of elderly people over the studied areas). In other words, although the elderly are more prone to catching severe forms of Covid-19 32, the incidence of these was less in clusters with a high concentration of elderly people.
4.2 Analysis of determinant factors
Local determinants for Covid-19 were explored in greater depth by using spatial regression analysis. An initial set of 15 explanatory variables was compiled from census indicators, as explained above, such as income, literacy and the provision of public services.
A second evaluation set was compiled of places that are typically conducive to attracting crowds of people and that operated even during strict quarantine, and so could have become centers of Coronavirus infections. They were called essential services and six of them were evaluated by means of Pearson correlation tests: bakeries, banks, bus terminals, grocery stores/supermarkets, official lottery establishments which also act as the agent of a public bank, pharmacies. Grocery stores represent a range from small shops to large supermarkets which sell food and general items for domestic use.
Results indicated that all places considered have significant and positive correlations to confirmed cases of Covid-10 for a p-value of 0.05. The Pearson index varied between 0.23 and 0.88, bakeries (p = 0.88), pharmacies (p = 0.86) and grocery stores (p = 0.71) being the top three highest. Thus these six factors formed a second set of determinants.
4.2.1 Analysis of socio-economic versus essential services factors
Multiple regression models were built for different days between April and July, taking the total reported cases as the independent variable and using the OLS method. They were processed twice - for essential services and for socio-economic factors. In each model, highly correlated variables were removed when they had a variance inflation factor (VIF) greater than 7.5. The remaining determinants were submitted to the stepwise method based on statistics that use the Akaike Information Criterion (AIC), thereby seeking to reduce them to a non-redundant set 33, which can sufficiently explain the spread of SARS-CoV-2 in Recife's neighbourhoods and from neighbourhood to neighbourhood. Results are shown in Table 1.
Table 1
The performance of OLS models over time using separate datasets of determinants
Date
|
GM*
|
Essential services
|
Adj. R²
|
Socioeconomic factors
|
Adj. R²
|
April 16th
|
1
|
Bakeries, bus terminals
|
0.7793
|
Age 0 to 9, age over 60, resident per household, income, literacy, rented home
|
0.8778
|
April 23rd
|
1; 2
|
Bakeries, grocery stores, banks
|
0.7558
|
Age over 60, residents per household, income, literacy, piped water, garbage collection, owned home, sewage system
|
0.8597
|
May 3rd
|
1; 2
|
Bakeries, grocery stores, pharmacies
|
0.8016
|
Age over 60, residents per household, income, rented home
|
0.8939
|
May 12nd
|
1; 2
|
Bakeries, grocery stores, banks, pharmacies
|
0.8272
|
Age over 60, piped water
|
0.9086
|
May 19th
|
1; 2; 3
|
Bakeries, grocery stores, banks, pharmacies
|
0.8415
|
Age 0 to 9, age over 60, piped water, garbage collection
|
0.9279
|
May 27th
|
1; 2; 3
|
Bakeries, grocery stores, banks, pharmacies
|
0.8583
|
Age 0 to 9, age over 60, piped water, garbage collection
|
0.9398
|
June 3rd
|
2; 4
|
Bakeries, grocery stores, banks, pharmacies
|
0.8583
|
Age 0 to 9, age over 60, piped water, garbage collection
|
0.9436
|
June 12nd
|
2; 4
|
Bakeries, grocery stores, banks, pharmacies
|
0.8631
|
Age 0 to 9, age over 60, piped water, garbage collection
|
0.9477
|
June 24th
|
2; 4; 5; 6
|
Bakeries, grocery stores, banks, pharmacies
|
0.8639
|
Age 0 to 9, age over 60, piped water, garbage collection, residents per household
|
0.9509
|
July 3rd
|
2; 4; 5; 6
|
Bakeries, grocery stores, banks, pharmacies
|
0.8659
|
Age 0 to 9, age over 60, piped water, garbage collection, residents per household
|
0.9521
|
* Government measures (GM) in 2020:
1. Closing of non-essential commercial activities
2. Mandatory use of masks
3. Lockdown
|
4. Reopening of building supply stores
5. Reopening of beauty salons and suburban retailers
6. Reopening of malls and places of worship
|
The column called GM, government measures, indicates which decisions were being imposed on each date by local authorities in their attempt to gradually reduce or enable people and traffic to be on the streets during the various stages of the pandemic to date.
All regression models were significant (p < 2.2E-16) and most independent variables had a positive correlation to the number of Covid-19 cases, except for ‘residents per household’, ‘people aged from 10 to 19’, ‘bus terminals’ and ‘banks’. In both sets of determinants, the adjusted determination coefficient R² mostly increased along with the cumulative total of confirmed cases, which reasserts the connection of the set of original factors with the variance in data. Its value stabilised from early June which is likely due to the scenario of the slow growth of cases in Recife.
The final, relevant, explanatory variables with regard only to essential services remained stable from May 12th, while the main socio-economic factors that explained the results started repeating themselves after May 19th. Although the local government has since implemented relaxation measures as seen in Fig. 1, these findings indicate their importance for predicting cases. When determinants were evaluated individually, it was noted that ‘bakeries’ could explain 77% of the variability in cases, and ‘total residents’ nearly 90%.
The subset of determinants obtained with the most recent data was considered for further analysis by applying the spatial Geographically Weighted Regression (GWR): number of bakeries, number of grocery stores, number of banks and number of pharmacies. Unfortunately, the socio-economic factors model could not meet GWR specifications due to the range in the values of the variables not varying sufficiently across space. The cumulative confirmed cases from July 3rd, 2020 were taken as the dependent variable. A R² of 0.903 was obtained considering 62 neighbours for every neighbourhood based on inverse distance. GWR residuals were tested for spatial autocorrelation by means of a Moran’s I test, which resulted in an index of -0.002 (p = 0.81). This implies that there is significant evidence that residuals are randomly distributed, so the model is adequate.
Note that there is an improvement from the OLS results (global analysis) to those of the GWR (local analysis) since adjusted R² increased from 0.866 to 0.903, and also AICc statistics reduced from 935.99 to 916.19. On analysing GWR outputs, Fig. 4 illustrates the spatial distribution of local R² throughout neighbourhoods, which were allocated to five categories by natural breaks. Hotspots for reported cases found on July 3rd, 2020 were highlighted on the map as a way of better comprehending the behavior of the model in these more affected areas. Since values were generally high (at least 0.737), the performance of the models was highest in areas in which most of the hotspots are located.
Figure 5 reveals the impact of selected determinants on predicting cases across neighbourhoods. Categories were built by natural breaks in the range of the coefficients of the variables obtained from the GWR results. Regarding the same hotspots for cases previously mentioned, note that the Southern hotspots were more impacted by the presence of bakeries and pharmacies, while banks and grocery stores had a negative influence on predicting cases. The Western cluster is best described by local banks, grocery stores and pharmacies. The strongest influence on the incidence of disease in the Northern hotspot was from banks and grocery stores.
4.2.3 Analysis of the factors of socio-economic and essential services
New assessments, by means of regression analysis, were made based on a final set of 21 determinants. As this was conducted beforehand for separate data-sets, at first the same ten dates were considered to obtain the total number of reported cases of Covid-19 and to test them as response variables using multiple regression models based on the Ordinary Least Squares method. A smaller and significant set of explanatory variables was identified for each day by excluding correlated variables and using the stepwise method as applied previously. Findings were summarised in Table 2 and, although determinants were analysed together, the final sets were shown separately to clarify patterns.
Table 2
The performance of OLS models over time using combined datasets of determinants
Date
|
GM*
|
Essential services
|
Socioeconomic factors
|
Adj. R²
|
April 16th
|
1
|
Bakeries, grocery stores, lotteries, bus terminals
|
Owned home, income
|
0.8117
|
April 23rd
|
1; 2
|
Bakeries, grocery stores, banks
|
Age 0 to 9, owned home, sewage system
|
0.8159
|
May 3rd
|
1; 2
|
Bakeries, grocery stores
|
Age 0 to 9, income
|
0.8331
|
May 12nd
|
1; 2
|
Bakeries, grocery stores, lotteries, bus terminals
|
Age 0 to 9, owned home, income, literacy
|
0.8709
|
May 19th
|
1; 2; 3
|
Bakeries, grocery stores, lotteries
|
Age 0 to 9, income
|
0.8958
|
May 27th
|
1; 2; 3
|
Bakeries, grocery stores, lotteries
|
Age 0 to 9, income
|
0.9100
|
June 3rd
|
2; 4
|
Bakeries, grocery stores, lotteries
|
Age 0 to 9, income
|
0.9162
|
June 12nd
|
2; 4
|
Bakeries, grocery stores, lotteries
|
Age 0 to 9, owned home, literacy
|
0.9215
|
June 24th
|
2; 4; 5; 6
|
Bakeries, grocery stores, lotteries
|
Age 0 to 9, owned home, literacy
|
0.9255
|
July 3rd
|
2; 4; 5; 6
|
Bakeries, grocery stores, pharmacies
|
Age 0 to 9, income
|
0.9260
|
* Government measures (GM) in 2020:
1. Closing of non-essential commercial activities
2. Mandatory use of masks
3. Lockdown
|
4. Reopening of building supply stores
5. Reopening of beauty salons and suburban retailers
6. Reopening of malls and places of worship
|
Every designed model was found significant (p < 2.2E-16) for forecasting Covid-19 cases, and all determinants contributed positively to their predictions, except for bus terminals. An increasing tendency until early June for the adjusted determination coefficient R² followed by a stabilisation pattern was verified as happened previously for separate databases. From late April, the final subset of determinants could be split between commercial facilities and residents' socio-economic attributes. Some of them are frequently repeated, such as bakeries, grocery stores, income and people from 0 to 9 years old, which indicates their importance for predicting cases.
Data from the last day explored, July 3rd, 2020, were kept for a subsequent evaluation using a spatial regression approach by means of GWR. So the following set of relevant explanatory variables was considered for predicting reported cases: number of grocery stores, number of pharmacies, number of bakeries, the average income of residents and the total number of residents aged 0 to 9 years old. Results indicated a R² of 0.960 considering 54 neighbours for every neighbourhood based on inverse distance, which represents an improvement on the essential services-only analysis conducted for separate databases. The statistical significance of this analysis was validated by a Moran's I test applied to GWR residuals, since an index of -0.028 (p = 0.63) implied they were randomly distributed.
As noted for the analysis of separate databases, an improvement occurred from OLS to GWR results since adjusted R² increased from 0.926 to 0.944, and also AICc statistics reduced from 881.42 to 873.78. GWR concerns a local prediction to elucidate spatial variations all over the region of interest 28, so the distribution of local R² in each Recife neighbourhood is illustrated in Fig. 6. SWith a view to initiating a more in-depth exploration of areas where Covid-19 infections are concentrated, hotspots for cases identified on July 3rd, 2020 were highlighted. Values were found remarkably high since the minimum one explains 82,8% of reported cases, mainly where hotspots were detected.
Again a further exploration was made of the July 3rd, 2020 data, but now seeking to clarify how every relevant contribution of a determinant to spatial regression modelling could influence a prediction of the number of cases. Figure 7 classifies their coefficients on regression local equations into five categories. When the previously mentioned hotspots are analysed, it is noted that the Southern ones were impacted most by the presence of bakeries and their residents' average income. The Western cluster is best described by local pharmacies and grocery stores. The strongest influence on the incidence of disease in the Northern hotspot came from bakeries, pharmacies and people aged from 0 to 9.