In the present study, we investigated the potential correlation between HLA allele and haplotype frequencies and the different regional distribution of COVID-19 incidence and mortality in Brazil. We included HLA data at low resolution of 4,148,713 donors from REDOME and COVID-19 data registered until November 16th, 2020. No correlation between haplotype frequencies and COVID-19 rates was found when we analyzed data from the 26 states and Federal District, or when we analyzed data from the 90 cities with at least 50 deaths registered in São Paulo state (Tables 3 and 5). When analyzing data from cities with at least 50 deaths registered in the entire country (377), a significant negative correlation (suggestive of protection) was observed between COVID-19 mortality and haplotypes #1 (HLA-A*01 ~ B*08 ~ DRB1*03), #2 (HLA-A*29 ~ B*44 ~ DRB1*07) and #4 (HLA-A*02 ~ B*44 ~ DRB1*04), with haplotype #1 showing a larger effect. Recently, Pisanti et al. (2020) found significant correlation between the two most frequent haplotypes in the Italian population with both COVID-19 incidence and mortality . In their study, HLA data at high resolution of 104,135 donors from Italian Bone Marrow Donors Registry (IBMDR) and COVID-19 mortality and incidence registered until May 24th, 2020 were included. The haplotype HLA-A*01:01g ~ B*08:01g ~ C*07:01g ~ DRB1*03:01g showed a positive correlation (suggestive of susceptibility) and HLA-A*02.01g ~ B*18.01g ~ C*07.01g ~ DRB1*11.04g showed negative correlation (suggestive of protection) with both COVID-19 incidence and mortality . At low resolution, HLA-A*01 ~ B*08 ~ DRB1*03 is the most frequent HLA haplotype in both Brazil and Italy. However, while this haplotype has been positively correlated with COVID-19 incidence and mortality in Italy, suggesting increased susceptibility, we found, depending on the analyzed dataset, no correlation or a negative correlation with mortality in Brazil, which might suggest protection against COVID-19. One possibility is that HLA versus environment interaction is different in Italy and Brazil resulting in opposite associations between HLA variation and COVID-19 risk, in a context-dependent manner. Another possibility is that the positive association in the Italian population reflects regional genetic structure, as both COVID-19 rates  and genetic population structure in Italy [26, 27] show a North-South gradient. Another important point to highlight is that none of the studies took into consideration the SARS-CoV-2 variants, and it becomes increasingly important to consider this point in future studies.
In general, we observed that the four most common HLA haplotypes had higher frequencies in the South region and lower frequencies in the North of Brazil (Table 2 and Fig. 2). For example, the most frequent haplotype (HLA-A*-01 ~ B*08 ~ DRB1*03) ranged from 1.26% in Pará (North) to 3.16% in Rio Grande do Sul (South) (Fig. 2). These regional differences reflect the different contributions of Native American, European, and African populations across the country after a long history of colonization and immigration. From a broad perspective, the South region has more European influence; the Northeast region is characterized by more African influence, while in the North region the Native American influence is comparatively more pronounced .
In addition to the study that evaluated HLA haplotypes in the Italian population to the differential regional incidence of COVID-19, two bioinformatic studies predicted the binding affinity between HLA alleles and SARS-CoV-2 antigenic peptides [19, 29]. For HLA-A, it has been predicted that -A*02 had the best binding affinity, resulting in a more favorable immune response, while -A*25 had the weakest binding affinity, resulting in a less favorable response. Recently, Shkurnikov et al. (2021)  performed HLA class 1 genotyping through next-generation sequencing in deceased patients with COVID-19. In their analysis, HLA-A*01:01 allele (especially in homozygosis) was associated with high risk to present SARS-CoV-2, while HLA-A*02:01 and HLA-A*03:01 mainly contributed to low risk. In our study, considering the scenario #2, for which we have more statistical power, we found significant negative correlations for HLA-A*01 (-0.276), HLA-A*02 (-0.175), HLA-A*03 (-0.302) and HLA-A*25 (-0.164) (Supplementary file 3). As for HLA-B alleles, both bioinformatics studies suggested that alleles -B*46:01 and -B*15:03 behaved as low and high affinity ligands against COVID-19 peptides, and Barquera et al. (2000) further characterized alleles -B*44:06, -B*51:07, -B*08:03, and B*52:01 as weak binders, and -B*35:10, as well as other molecules of the -B*15 lineage, as strong binders . We found a significant negative correlation for -B*35 (-0.160) with COVID-19 mortality, but, unexpectedly, a significant positive correlation between -B*15 and COVID-19 mortality (0.280) (Supplementary file 3). Finally, for HLA-DRB1, Barquera et al. (2020) suggested that the alleles -DRB1*01:01 and -DRB1*10:01 were the strongest binders, while - DRB1*03:02 and -DRB1*03:03 were the weakest binders . In line with these results, we found a significant negative correlation for allele -DRB1*01 (-0.246) and mortality (Supplementary file 3). Overall, our analysis provided mixed support for the inferences made by bioinformatics models (Supplementary file 3). This may be related to the complexity of the immune response against SARS-CoV-2, which probably depends on many genetic and environmental factors. For example, variants in other genes related to the immune response, like genes that code for inflammatory factors and interleukin 6 (IL-6), and genes related to SARS-CoV-2 entry into the cell may also influence the disease course. Recent studies have already correlated and reviewed the role of variants in IL6 , ACE2 , and TMPRSS2  genes in COVID-19 disease.
Together with genetic and immune system variation, environmental and social disparities among Brazilian regions may contribute to the differential burden of COVID-19, affecting disproportionally individuals carrying genetic factors of susceptibility and/or the most vulnerable people regarding social assistance. For instance, at the time we were collecting our data, Rio de Janeiro and Minas Gerais, both in the Southeast region, had about 1800 cases/100,000 inhabitants. However, the mortality rate was almost 3 times higher in Rio de Janeiro (122.66/100,000 inhabitants) than in Minas Gerais (44.7/100,000 inhabitants) (Table 1). Besides Minas Gerais, only Paraná and Santa Catarina, both in the South region, maintained mortality rates below 50/100,000 inhabitants, while higher mortality rates were found in states located in the Central-West and North regions (Table 1). These numbers should be taken with caution. The number of reported cases depends on the number and type of diagnostic tests performed by each city and state, and with limited testing it is unlikely that asymptomatic subjects would have been diagnosed for SARS-CoV-2 infection. The number of COVID-19 related deaths could be more reliable since, during the period analyzed in this study, the Brazilian health system was able to provide medical assistance for most people showing COVID-19 symptoms. Our study included only confirmed deaths by COVID-19 in our analysis, however, when considering deaths due to SARS reported in DATASUS (Department of Informatics of the Unified Healthcare System of Brazil), it was expected an average underreporting of 40% . Therefore, it is possible that the number of cases have been underestimated in our analysis. The mortality rate may also vary according to the coverage of local health systems, number of hospital beds available, and measures of social isolation, for example. It is important to highlight that in Brazil there was no unified policy regarding COVID-19 protocols, resulting in huge differences in how different state governors and city mayors managed COVID-19 locally. Distinguishing among genetic, social and environmental variability is essential for building an efficient way to prevent, control and understand disease outbreaks [35, 36]. Our study focused on genetic variation of the HLA system and is an important initial step in the understanding of COVID-19 dispersion and behavior in Brazil.
An important limitation of this study is that we used HLA data from bone marrow donors instead of directly genotyping individuals affected by COVID-19. However, this approach has some advantages. Bone marrow donor registries usually include very large sample sizes and a wide geographic coverage, as illustrated by the REDOME registry. In addition, because several countries maintain such large banks, statistically significant associations could reveal regions or populations in higher genetic risk for COVID-19, thus representing an additional tool for health policymakers in the fight against COVID-19.