HLA Haplotypes and Differential Regional Incidence of COVID-19 in Brazil: A Population Study Based in a Large Bone Marrow Donors Bank Dataset

Background: Coronavirus disease 2019 (COVID-19) rapidly spread all over the world causing high morbidity and mortality. Brazil is currently the third country in the world in the number of COVID-19 cases. Even though all Brazilian regions and states have reported a high number of cases, mortality rates varies among them. Environmental and genetic factors may influence the immune response towards SARS-CoV-2. The Brazilian population is highly heterogeneous, with different colonization and immigration histories in each region resulting in different genetic backgrounds. Here, we test if specific HLA haplotypes are associated with COVID-19 incidence and mortality in Brazil.MethodsHLA data was obtained from The Brazilian Voluntary Bone Marrow Donors Registry (REDOME) which harbors data from more than four million individual donors, and COVID-19 data was retrieved from epidemiological bulletins issued by State Health Secretariats via the Ministry of Health of Brazil. We tested the association between the most frequent HLA haplotypes in Brazil and COVID-19 incidence and mortality using Spearman's correlation analysis.ResultsNo correlation between HLA haplotypes and COVID-19 rates was found when we analyzed data from the 26 states and Federal District, as well as when we analyzed data from the 90 cities with at least 50 deaths registered in the São Paulo state. Significant negative correlation (suggestive of protection) between COVID-19 mortality and haplotypes HLA-A*01~B*08~DRB1*03, HLA-A*29~B*44~DRB1*07 and HLA-A*02~B*44~DRB1*04 was found when analyzing data from cities with at least 50 deaths registered in the entire country.ConclusionsOur results do not support an association of specific HLA haplotypes with an increased risk of contracting SARS-CoV2 or dying from COVID-19 in Brazil. Nevertheless, using bone marrow donor registries for testing for associations between HLA variation and COVID-19 outcomes may represent an additional tool for health policymakers in the fight against COVID-19.

(Southeast region). A few days later, suspected cases were reported from all remaining 26 Brazilian states. By the end of April 2020, the North region had high community transmission and the highest mortality rates in the country. In June and July, the number of daily con rmed cases continued to increase in all regions, reaching peaks in July 29th, 2020 and, again, in January 7th, 2021. The actual number of cases could be underestimated due to the low number of tests performed [3]. Even though all Brazilian regions and states have reported a high number of cases, mortality rates varied between them.
Indeed, the symptomatology and mortality rates vary among different geographical regions and depend on the patient clinical pro le, presence of comorbidities, and access to the health system [4][5][6]. Individual environmental and genetic factors may also in uence susceptibility and immune response to the SARS-CoV-2.
Immune response variability may involve variants of the innate immune system and variants in the speci c adaptive immune system [7][8][9][10][11]. Antigen presentation to T lymphocytes is one of the most important steps in determining immune response. The genetic complex of the classic human leukocyte antigen (HLA) encodes the proteins of the main histocompatibility complex (MHC), which mediates intracellular antigen presentation (HLA-A,-B,-C -MHC Class-I) and extracellular antigen presentation (HLA-DP, -DM, -DO, -DQ, and -DR -MHC class II) to the T cell receptors (TCR) on the surface of the CD8 + and CD4 + T lymphocytes, respectively [12]. Thus, by affecting the ability of T cells to respond to a speci c antigen, HLA is one of the most important molecules of the immune system. The HLA alleles are highly polymorphic and their frequency varies widely in human populations, including among Brazilian regions [13]. In fact, Brazilian population is highly heterogeneous, with different colonization and immigration histories in each region resulting in different genetic backgrounds [14].
Previous case-control and bioinformatics studies have associated a few HLA genotypes with higher susceptibility or protection against the development of severe disease in those infected with SARS-CoV [15][16][17][18]. SARS-CoV-2 is very similar to SARS-CoV regarding aminoacid sequences, and, therefore, it is likely that HLA alleles associated with susceptibility to SARS-CoV may also play a role in COVID-19. Recently, HLA-A*02 and HLA-B*15 alleles were found to be the main presenters of SARS-CoV-2 antigens and HLA-A*25 and HLA-B*46 alleles were shown to have fewer predicted binding SARS-CoV-2 peptides in an in-silico analysis [19]. Another study analyzed the regional frequencies for the most common Italian haplotypes from the Italian Bone Marrow Donor Registry and found that the two most frequent HLA haplotypes in the Italian population were correlated with COVID-19 incidence and mortality [20]. The Brazilian Voluntary Bone Marrow Donors Registry (REDOME, in Portuguese Registro de Doadores de Medula Óssea) is the third largest bank of bone marrow donors in the world, with more than 5.2 million individuals registered to date. REDOME keeps information such as HLA-A, -B, and -DRB1 genotypes in low and/or medium resolution and city of residence of the donors.
In this study we evaluate if the most frequent HLA haplotypes in Brazil correlated to COVID-19 incidence and mortality. We also describe the frequencies of HLA genotypes previously associated with susceptibility or protection to SARS-CoV or SARS-CoV-2 infection in Brazil, which may help in establishing public control policies against COVID-19.

Data sources
For HLA data, we used a dataset composed by 4,148,713 individuals who volunteered as potential hematopoietic stem cell donors registered at REDOME until September, 2017. This registry includes information such as city of residence and HLA-A, -B, and -DRB1 genotypes. Donors come from recruitment centers distributed throughout the country and their DNA was genotyped in Health Ministry accredited Brazilian laboratories. Only data genotyped by molecular methods were included in the analyses. The volunteers are genotyped for the allelic group at the time of registration, and highresolution genotyping is performed only on those potential donors selected after an initial screening. Thus, only low resolution (allelic group) genotypes were used in this study. The dataset was subdivided according to both state and city of residence. The Brazilian territory is divided into 26 states and one Federal District and counts with 5,570 municipalities.
Data about rates of COVID-19 cases and deaths were obtained from epidemiological bulletins issued by State Health Secretariats via the Ministry of Health of Brazil, available at https://bigdata-covid19a.icict. ocruz.br/ [21]. COVID-19 rates for states were obtained on November 16th, 2020 and the rates for cities were obtained on October 24th, 2020.
This study was approved by the Ethics Committee in Research of Hospital de Clínicas de Porto Alegre, Brazil (CAAE 34701720300005327, GPPG 2020 − 0361), and all methods were carried out in accordance with local guidelines and regulations.

Statistical analysis
The incidence and mortality rates of COVID-19 were obtained based on the con rmed cases and deaths issued by each of the State Health Secretariats. To calculate the coe cient of incidence and mortality of each municipality, the number of con rmed cases or deaths, respectively, was divided by the resident population and multiplied by the population base of 100 thousand inhabitants.
Allele and haplotype frequency estimations and Hardy-Weinberg equilibrium (HWE) test were performed using the GENE[RATE] tools as described elsewhere [22][23][24]. Boxplots of allele and haplotype frequency were generated at RStudio Version 1.3.1093. Maps of rates of COVID-19 cases and deaths were performed in ArcGis v10.3. Spearman's correlation test between HLA alleles and the ve most frequent haplotypes in Brazil [25] versus rates of COVID-19 cases and deaths was performed using IBM SPSS software, Version 20.0 (IBM Corp., Armonk, NY). We evaluated the HLA x COVID-19 cases and deaths correlations in the following scenarios: (1) the 26 states and Federal District of Brazil (data obtained until November 16th ); (2) only cities with at least 50 deaths due to COVID-19 registered in the entire Brazilian territory (data obtained until October 24th, 2020) and; (3) only cities with at least 50 deaths registered in São Paulo state (data obtained until October 24th, 2020). In scenario #1 we replicated the approach previously performed by Pisanti et al. (2000) [20]; in scenario #2 we used municipalities with a de ned minimum number of deaths to increase the number of observations and, therefore, gain statistical power; nally, in scenario #3 we used data from a single state (São Paulo) in order to control population heterogeneity. São Paulo state was chosen as a study model since it can be considered as similar to a European country in terms of population and territorial dimensions. P-values below 0.05 were considered statistically signi cant. We applied the FDR (false discovery rate) correction for multiple tests to avoid having too many false-positives in haplotype and allele correlations.

Geographical distribution of COVID-19 epidemic in Brazil
The states with the highest incidence of COVID-19 cases registered until November 16h, 2020 were Roraima, the Federal District and Amazonas, respectively; while the states with the lowest incidence were Pernambuco, Minas Gerais and Rio de Janeiro. Regarding COVID-19 deaths, the Federal District, Rio de Janeiro and Mato Grosso had the highest rates, respectively, while the lowest death rates were observed in Minas Gerais, Santa Catarina and Paraná, respectively (Table 1 and Fig. 1). The cities with the highest incidence of COVID-19 cases are Parauapebas (Pará), Boa Vista (Roraima) and Araguaína (Tocantins), all of them located in the North region of the country, while Santa Helena de Goiás (Goiás), Guajará-Mirim (Rodônia) and Rio de Janeiro (Rio de Janeiro), located in Central-West, North and Southeast regions of Brazil, respectively, had the highest rates of COVID-19 deaths.  Correlation between HLA haplotype frequency and COVID-19 incidence and mortality Spearman correlation coe cient was calculated to test if the regional COVID-19 incidence and mortality correlated with any of the ve most frequent haplotypes in Brazilian population. Table 3 (Table 5).

Discussion
In the present study, we investigated the potential correlation between HLA allele and haplotype frequencies and the different regional distribution of COVID-19 incidence and mortality in Brazil. We included HLA data at low resolution of 4,148,713 donors from REDOME and COVID-19 data registered until November 16th, 2020. No correlation between haplotype frequencies and COVID-19 rates was found when we analyzed data from the 26 states and Federal District, or when we analyzed data from the 90 cities with at least 50 deaths registered in São Paulo state (Tables 3 and 5 2020) found signi cant correlation between the two most frequent haplotypes in the Italian population with both COVID-19 incidence and mortality [20]. In their study, HLA data at high resolution of 104,135 donors from Italian Bone Marrow Donors Registry (IBMDR) and COVID-19 mortality and incidence registered until May 24th, 2020 were included. The haplotype HLA-A*01:01g ~ B*08:01g ~ C*07:01g ~ DRB1*03:01g showed a positive correlation (suggestive of susceptibility) and HLA-A*02.01g ~ B*18.01g ~ C*07.01g ~ DRB1*11.04g showed negative correlation (suggestive of protection) with both COVID-19 incidence and mortality [20]. At low resolution, HLA-A*01 ~ B*08 ~ DRB1*03 is the most frequent HLA haplotype in both Brazil and Italy. However, while this haplotype has been positively correlated with COVID-19 incidence and mortality in Italy, suggesting increased susceptibility, we found, depending on the analyzed dataset, no correlation or a negative correlation with mortality in Brazil, which might suggest protection against COVID-19. One possibility is that HLA versus environment interaction is different in Italy and Brazil resulting in opposite associations between HLA variation and COVID-19 risk, in a context-dependent manner. Another possibility is that the positive association in the Italian population re ects regional genetic structure, as both COVID-19 rates [20] and genetic population structure in Italy [26,27] show a North-South gradient. Another important point to highlight is that none of the studies took into consideration the SARS-CoV-2 variants, and it becomes increasingly important to consider this point in future studies.
In general, we observed that the four most common HLA haplotypes had higher frequencies in the South region and lower frequencies in the North of Brazil (Table 2 and Fig. 2). For example, the most frequent haplotype (HLA-A*-01 ~ B*08 ~ DRB1*03) ranged from 1.26% in Pará (North) to 3.16% in Rio Grande do Sul (South) (Fig. 2). These regional differences re ect the different contributions of Native American, European, and African populations across the country after a long history of colonization and immigration. From a broad perspective, the South region has more European in uence; the Northeast region is characterized by more African in uence, while in the North region the Native American in uence is comparatively more pronounced [28].
In addition to the study that evaluated HLA haplotypes in the Italian population to the differential regional incidence of COVID-19, two bioinformatic studies predicted the binding a nity between HLA alleles and SARS-CoV-2 antigenic peptides [19,29]. For HLA-A, it has been predicted that -A*02 had the best binding a nity, resulting in a more favorable immune response, while -A*25 had the weakest binding a nity, were the weakest binders [29]. In line with these results, we found a signi cant negative correlation for allele -DRB1*01 (-0.246) and mortality (Supplementary le 3). Overall, our analysis provided mixed support for the inferences made by bioinformatics models (Supplementary le 3). This may be related to the complexity of the immune response against SARS-CoV-2, which probably depends on many genetic and environmental factors. For example, variants in other genes related to the immune response, like genes that code for in ammatory factors and interleukin 6 (IL-6), and genes related to SARS-CoV-2 entry into the cell may also in uence the disease course. Recent studies have already correlated and reviewed the role of variants in IL6 [31], ACE2 [32], and TMPRSS2 [33] genes in COVID-19 disease.
Together with genetic and immune system variation, environmental and social disparities among Brazilian regions may contribute to the differential burden of COVID-19, affecting disproportionally individuals carrying genetic factors of susceptibility and/or the most vulnerable people regarding social assistance. For instance, at the time we were collecting our data, Rio de Janeiro and Minas Gerais, both in the Southeast region, had about 1800 cases/100,000 inhabitants. However, the mortality rate was almost among genetic, social and environmental variability is essential for building an e cient way to prevent, control and understand disease outbreaks [35,36]. Our study focused on genetic variation of the HLA system and is an important initial step in the understanding of COVID-19 dispersion and behavior in Brazil.
An important limitation of this study is that we used HLA data from bone marrow donors instead of directly genotyping individuals affected by COVID-19. However, this approach has some advantages.
Bone marrow donor registries usually include very large sample sizes and a wide geographic coverage, as illustrated by the REDOME registry. In addition, because several countries maintain such large banks, statistically signi cant associations could reveal regions or populations in higher genetic risk for COVID-19, thus representing an additional tool for health policymakers in the ght against COVID-19.

Conclusions
In summary, we investigated if there was a correlation between HLA allele and haplotype frequencies and the different regional distribution of COVID-19 incidence and mortality in the country. No correlation between HLA haplotypes and COVID-19 rates was found when we analyzed data from the 26 states and Federal District, or when we analyzed data from the 90 cities with at least 50 deaths registered in São Paulo (Tables 3 and 5 Availability of data and materials COVID-19 data used in this study are freely available from the source cited. HLA data are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions JAB participated on study design, obtained HLA data, analyzed and interpreted HLA and COVID-19 data and was a major contributor in writing the manuscript; FSLV participated on study design, analyzed and interpreted HLA and COVID-19 data and participated in writing and reviewing the manuscript; NJRF was a major contributor on statistical analysis and participated in manuscript writing and revision; LS was responsible for graph and map design and contributed to data interpretation; MB participated on study design and statistical analysis; MZO participated on COVID-19 data collection and map design; TFA participated on COVID-19 data collection and writing and reviewing the manuscript; LCMSP participated on study design and availability of HLA data; JABC participated on study design, writing and reviewing the manuscript; LS-F participated in writing and reviewing the manuscript; PA-P participated in writing and reviewing the manuscript, and CR was responsible for study design and approval, participated in analysis and interpretation of HLA and COVID-19 data, and was a major contributor in writing and reviewing the manuscript. All authors read and approved the nal manuscript.