The spatial distribution and Calculator for Severity and Death for COVID-19


 Background: Latin America has one of the highest COVID-19 mortality rates in the world, driven by inequality in its population. We aimed to analyze the spatial distribution and factors associated with the risk of SARS and death in COVID-19 cases using routine record data and to develop and validate a prognostic tool for risk of death by Covid-19. Methods: A cross-sectional study was conducted from March 2020 to April 2021 in the South Zone of the city of São Paulo, SP, Brazil, with 16,061 positive cases of COVID-19. The data were -obtained from the records of the Brazilian Ministry of Health notification systems for flu-like syndrome (eSUS-VE) and hospitalized SARS (SIVEP-Gripe). The spatial distribution of the cases is represented in 2D kernel density. To assess the possible factors associated with the outcomes, generalized linear and generalized additive logistic models were adjusted. To assess the discriminatory power the C-statistic was used. A calculator was developed based on a prognostic model for the risk of death, validated with accuracy measures in the sample, internal validation and temporal validation. Results: The average age of patients was 42.1 years. Evolved to SARS 925 (11.98%) and 375 (2.37%) died. The comorbidities associated with a higher risk of SARS were obesity (OR=25.32) and immunodepression (OR=12.15). The comorbidities associated with a higher risk of death were renal diseases (OR=11.8) and obesity (OR=8.49), with clinical and demographic information being more important than the spatial area. The COVID-19 risk of death calculator showed an accuracy of 92.2% in the sample build, 92.3% in the internal validation and 80% in the temporal validation.Conclusions: Age and comorbidities were identified as the most associated factors to the severity of the disease. For the death analysis, the socioeconomic condition is included in addition to these factors. The calculator can be used and implemented in services of varying complexity because it contains easily accessible information to guide prevention and care.

Additionally, studies 4 show that there is a large inequality in the distribution of the disease in different demographic and socioeconomic contexts, with poorer and younger populations 5 being more affected. The populations with a higher risk of death are elderly, male 6 , black or brown race, lower income, and education levels, but assessed on an ecological, not individual [7][8][9] basis.
In spite of the vast amount of literature on the factors associated with COVID- 19, there are few studies that consider socioeconomic factors with individual data that can be controlled by age and the presence of comorbidities and that can aid decision-making in the context of population health. From this perspective, to assist in making clinical decisions, some predictive tools have been proposed to assess the risk of severity and death by COVID-19. However, the tools are restricted to elderly or hospitalized populations or they need laboratory support to assess the markers [10][11][12][13] .
We aimed to evaluate the risk factors for severity and death by COVID-19 in regions with social inequality and to develop a tool from individual data that helps in evaluating the risk of death by COVID-19.

Methods
A cross-sectional study was conducted from March 2020 to April 2021 in the South Zone of the city of São Paulo, SP, Brazil, including 16,061 men and women. The eligibility criteria included individuals aged 18 years or more with positive cases of COVID-19.
The area is composed of two administrative districts-Campo Limpo and Vila Andrade-that have an estimated population of nearly 400,000 inhabitants 14 ( Figure 1A). In this region, social inequality was measured by the Paulista Index of Social Vulnerability (Índice Paulista de Vulnerabilidade Social -IPVS) 15 , on a scale from 0, the lowest vulnerability (navy blue), to 6, high vulnerability (purple) ( Figure 1B). The socioeconomic level was evaluated using the Geographic Socioeconomic Index for Health Studies (Índice Socioeconômico do Contexto Geográ co para Estudos em Saúde -GeoSES) 16 in which lighter areas indicating lower values re ect conditions of lesser income, education levels and mobility ( Figure 1C). This index ranges from -1 (lowest socioeconomic level) to 1 (highest socioeconomic level), and in the area of this study ranged from -0.56 and 0.3.
The data were obtained from the Brazilian Ministry of Health information systems: (1) mild cases of the disease was extracted from eSUS Epidemiological Monitoring System (eSUS-VE) for u-like syndrome (FLS) noti cations, considered to be and noti caded by any type of health care service; and (2) information about the Severe Acute Respiratory Syndrome (SARS), e.g., from patients who were hospitalized due to the disease, was extracted from Epidemiological Monitoring Information System for the Flu (SIVEP-Gripe), with, As the system was adapted during the period of the study, to include standardized elds for some symptoms and comorbidities, a search in the open elds for "other symptoms'' and "other comorbidities" was performed to identify this information from instances prior to the standardization.
To analyze the use of different health care services sought by patients in the study, a map showing the services categorized as private or public was included in Supplementary Material 1. They were categorized according to registrations in the National Registry of Health Establishments (Cadastro Nacional de Estabelecimentos de Saúde -CNES). This is an important indicator related to health care access because everyone has access to public services, but only individuals with medical insurance or good nancial conditions can access private services.
The patients' residences were geocoded by the GISA/CEINFO -SMS, identi ed in the area, and made available by the Campo Limpo Health Monitoring Unit. The database was made anonymous therefore no information could be used to identify the patients in the study. This project was approved by the Research Ethics Committee at the Hospital Israelita Albert Einstein (4.462.994) and the São Paulo Municipal Health Department (4.648.956).
The studied population were individuals with a con rmed COVID-19 diagnosis via the polymerase chain reaction (PCR) method. This criterion provided the scienti c rigor necessary for this study, despite limited access to laboratory results during some periods of the pandemic. The criteria for inclusion in the study were: individuals residing in the area of the study, over age 18, and laboratory con rmation of COVID-19 via the PCR method. Duplicate noti cations from the same patient were excluded when they had both a u syndrome noti cation and a SARS noti cation; only the SARS noti cation was kept ( Figure 2).
The outcome variables studied were: 1) evolution for SARS (categorized by FLS or SARS) and 2) death (discharged from hospital or death).
The independent variables considered were: socioeconomic characteristics, comorbidities and symptoms, obtained from the o cial noti cation systems; and the IPVS and GeoSES socioeconomic indicators, obtained from the patient's residential address, and the address geocoded in latitude and longitude.
The data was showed in absolute and relative frequency. For a simple comparison between outcome groups, the chi-squared hypothesis test and t-Student test were used. The spatial distribution of the studied cases is represented in 2D kernel density.
To assess the possible factors associated with the outcomes, generalized linear models and generalized logistic model were adjusted, the latter to include the spatial location of the patient's residence. First, the simple models and variables considered important were adjusted, with the p-value at a maximum of 0.2 with less than 10% missing, they were considered for the multiple models. Imputed data was not used; only variables with complete data were used in the analysis. With the aim of assessing factors prior to the disease to better direct public policies and preventative actions, e.g. vaccination, the symptoms were not considered in this model in order not to mix the results with other factors. The variables for the nal model were selected using the backward stepwise selection method with a nal p-value of <0.05. In the nal model the quality of the adjustment was veri ed by the standard magnitude of errors, the in uence of the inclusion or exclusion of variables in the estimated odds ratio, and the variance in ation factor, which did not surpass 1.2. To evaluate the discriminatory power of each variable and the nal model, the C Statistic (the ROC curve area) was performed.
Since the multiple model reached good discrimination, a risk of death calculator was created using the logistic model for death with the principal clinical and demographic characteristics of the patients at the time that they accessed the health service. For this, the data was randomly separated into two datasets using a proportion of 2:1: a set for the build and a set for the internal validation of the model. In this analysis the symptoms were considered, for the purpose of guiding actions to be taken after the disease is identi ed. The nal multiple model was obtained by following the same steps as the previous one, and the calculator was obtained by the probability predicted by the logistic model 17 .
The prognostic performance of the model was assessed by measures of accuracy, sensitivity, speci city, discriminatory power and the Hosmer and Lemeshow test 17 .For the temporal validation, data was used from COVID-19 positive patients over age 18 who began presenting symptoms between March and April 2021. This database contains information from 3,798 patients; 104 of whom died of COVID-19 during the period of the study.
R software, version 3.6.3 with the mgcv, ggplot2 and viridis packages, was used. A level of signi cance of 5% was adopted 18-21 . High-resolution Google Earth and Esri World Imagery (World Imagery (arcgis.com)) were also used.

Results
The study analyzed 16,061 patients 18-103 years old, 54.8% female with positive cases of COVID-19 between March 2020 and Evolved to SARS 925 (11.98%) and 375 (2.37%) died. February 2021. The average age of patients was 42.1 years (standard-deviation of 14.9) and a median age of 40. Table 1 shows the demographic and socioeconomic caracteristics of the patients, the missing, distribution of the outcomes, and C statistic. C: C statistic, which represents the area above the ROC curve. Measurements expressed in absolute frequency (%) when average is not indicated (standard-deviation). * P-value for hypotheses tests <0.05. ** P-value for hypotheses tests <0.001. Tests performed: Chi-squared and t-Student.
The main variable that contributed with death and SARS was age, presence of symptoms of dyspnea, fever and cough, and the presence of morbidities heart disease and diabetes. Figure 3 shows the locations with a higher density of cases identi ed in the area. In an area with one of the largest favelas in the city of São Paulo, Paraisópolis, a higher density of positive COVID-19 cases was seen. This area is characterized by a large, dense population that is vulnerable and predominantly young.
The data of evolution to SARS in 16,061 patients was studied. Patients without information on morbidities were excluded. The multiple model was adjusted based on information from 15,235 patients (94.9% completeness).
In the simple model, the information of the patient's address showed a p-value of <0.001, areas under curve (AUC) of 0.57 and an explained deviation of 0.74% (Figure 4).
In the nal multiple model, the clinical and demographic information were more important than the area information, which ceased to be signi cant ( Table 2). The morbidities associated with a higher risk of SARS were obesity and immunodepression. No socioeconomic information remained signi cant or had su cient data for the multiple model. The model reached an AUC of 0.96 and an explained deviation of 50%.  In the simple model, the information of the patient's residence showed a p-value of 0.001, AUC of 0.62 and explained deviance of 1.8% ( Figure 5).
In the nal multiple model, the clinical and demographic information were more important than the information about the area, which ceased to be signi cant (  Based on this formula, it is possible to calculate the predicted probability as follows: Predicted risk (or probability) = 1 / (1+ e -Risk score ) The tool is available at: https://redcap.einstein.br/surveys/?s=DMDT7TCJ9R.
To establish a cutoff point with the best balance between sensitivity, speci city and accuracy, the ROC curve was used to obtain the point of 0

Discussion
The study analyzed the spatial distribution and factors associated with the risk for SARS, risk of death due COVID-19 and a proposal for a risk of death calculator. The risk factors associated with SARS were age, obesity and immunodepression. Our results are similar to the systematic reviews that assessed the most prevalent risk factors for the aggravation of COVID-19 [22][23][24][25][26] .mHowever, none of these reviews included the environmental factor as a relevant aspect to the outcome of the disease, whether for the aggravation of or death by COVID-19. Some observational studies demonstrated a greater prevalence of mortality among vulnerable populations, but there is no solid evidence of this association 27,28 .
The second result of this study was the increased risk of mortality, with the descending order of importance: age, comorbidities, sex, and situation of vulnerability. Among the comorbidities most associated with a greater risk of mortality, in order from highest to lowest, were: renal disease, obesity, immunodepression, heart disease, respiratory sickness and diabetes.
The distribution of the density of con rmed COVID-19 cases in the area suggests that the virus spreads independent of the status of vulnerability of the region, being greater in more populated and vulnerable areas.
A re ection on the vulnerability of speci c groups is necessary, since the impact of the pandemic on the area needs to be contextualized, due to the diversity of urban social vulnerability scores that affect the largest Brazilian city.
In Paraisópolis, one of the largest favela of São Paulo, a higher density of con rmed COVID-19 cases was observed, as is also reported by other Brazilian studies showing a greater spread of the virus in situations of high vulnerability and population density 29,30 . However, we clearly see two areas with high socioeconomic levels that, in spite of low incidences, show a high chance of aggravation and death. This corroborates the study by Bermudi et al. 30 that shows a high mortality rate in areas with the worst social conditions in São Paulo.
The ndings of the study corroborate with the importance of control of chronic conditions and care for the elderly in health care services for mortality reduction 31 .There is also evidence for the need to prioritize such actions in regions that present higher vulnerability and adverse regional conditions. Considering the fact that the di culties inherent in the region of residence affect health conditions, causing unfavorable outcomes for these populations, The Equity, one of the guidelines of the Uni ed Health System (SUS), should be implemented.
Despite of is plausible that poverty worsens health conditions 32 , this should not avoid controlling the risk factors that can undergo more immediate interventions, as well as strategies that provide isolation for the most vulnerable. SUS, free and universal, and the strong Primary HealthCare across the country are transformative for this reality 33,34 . In countries who does not have universal healthcare coverage, such as the USA, there has been a high mortality rate for COVID-19 35 .
In a prediction model for more serious forms of COVID-19 there is the recommendation for the inclusion of symptomology for medical support, resource planning and improved monitoring of COVID-19 patients 13  Booth et al. 39 , also proposed a tool to predict death, considered molecular biomarkers in laboratory samples from PCR exams, but presented a limitation related to the cost and time to obtain the measurement of risk, including quality measures that are inferior to this study: AUC (93%), sensitivity (91%) and speci city (91%).
Other studies focused only on the elderly or hospitalized cases 13,39 . However, broadening the discussion to include adults over 18 years old makes it stronger for general population use.
Thus, the proposed tool is a powerful response to support managers and professionals in health care service planning and the prioritization of more directed and assertive strategies and actions, such as monitoring positive cases with a higher prediction of death and directing and distributing resources.
We highlight the importance of this calculator as a tool to aid any health care service, including Primary Health In addition to the risk factors highlighted by the results of this study, many countries are in ethical, political and economic crises, such as Brazil has been experiencing regarding a failure to respond, management policies against social distancing, a lack of federal coordination, negationism and neoliberal policies that can impact the outcomes of the pandemic. Thus, the response to the pandemic in these contexts needs to be questioned and larger studies are needed that incorporate these variables into the analyses, such as the need for interdisciplinary analyses [40][41][42] considering clinical, demographic, socioeconomic and geospatial aspects, and incorporating the different aspects into a single model.
This study has several strengths, like the use of data from the Ministry of Health information systems on the records of patients with a con rmed COVID-19 diagnosis, including u-like syndrome, GEoSES, IPVS encompassing a large number of participants, and a proposal for a risk of death calculator, a tool to aid any health care service. Despite the richness of the data, this study has limitations. First, the study used data that had not been collected with a scienti c purpose; healthcare professionals collected the data with a high percentage of missing data, however it is still robust enough to support our results. During the period of the study, the principal variant in circulation changed-in January the P1 variant was introduced and the vaccination campaign began. However, upon performing the temporal validation, the measures reduced, but still showed excellent discrimination. The risk calculator shows great potential to be evaluated regarding its implementation.
In conclusion, the study contributed to the global effort in the ght against COVID-19, with evidence that brings together various contexts in the spread of the virus, as well as risk factors present in areas of greater vulnerability, including those with a high population density and poverty, such as favelas. Ethics approval and consent to participate The study was performed in accordance with guidelines and regulations from Brazilian National Health Council. We con rm that we have been exempt applying the consent form because we used de-identifying and secondary data. All protocols and the research project were approved by the Research Ethics Committee of the   Predicted probability of evolution to SARS by the spatial-only model. N=16,601 .