This study was an ecological study.
Pará is in the northern region of Brazil and is the second largest state in terms of territorial area (1,245,870,707 km2). It is divided into 144 municipalities with a population of 8,690,745 inhabitants13 (Fig. 1). Despite its’ abundance of natural resources, the state has the third lowest Brazilian Human Development Index (HDI = 0.698), high income inequality distribution (Gini = 0.53), and a low coverage rate of primary healthcare services (59.13%). The majority of the primary healthcare places (68.2%) are in urban areas. Thus, people living in rural population zones have difficulty accessing health services14.
The study population consisted of HIV/AIDS cases among young MSM living in Pará. These cases were recorded in the Information System for Notifiable Diseases between 2007 and 2018. All data were provided by the State Department of Public Health in Pará. Only notifications containing the city of residence and age of the individual were included in the study. The following variables were collected: age, year of diagnosis, race, educational level, and city of residence. Data were double-checked and redundant and inconsistent data were removed.
Data analysis
Descriptive analysis was performed using Microsoft Office Excel 365® 2019 (Microsoft Corporation, Santa Rosa, CA, USA). The results were expressed as absolute values (n) and relative frequencies (%).
RStudio 1.4 software (RStudio, Inc. Washington, DC, USA) was used for the temporal analysis. First, the inflection points of the time series were analyzed using the algorithm described by Bai and Perron15 with a confidence interval (CI) adjusted to 95%.
The method of Box and Jenkins16 was used for temporal modeling to adjust the ARIMA model to the time-series data. In this study, we employed monthly HIV/AIDS incidence rates. The number of cases reported in a specific month in Pará was divided by the population projection of young people aged 15–29 in the specific year. It was then multiplied by 100,000 inhabitants to calculate the incidence rate. The best ARIMA model was estimated using the “auto.arima ( )” function, which was validated through the analysis of its’ residuals in terms of autocorrelation, normality distribution, and equality of variance. For these analyses, we employed the Box–Pierce and Ljung–Box tests, the Kolmogorov–Smirnov test, and the F test, respectively. The best time-series model was the one with the lowest Akaike information criterion (AIC), corrected Akaike information criterion (AICc), and Bayes information criterion values.
Subsequently, a temporal forecast for the time series to the adjusted ARIMA model was applied considering a 4-year period (2019–2022). We used the STLF prediction methodology developed by Fan and Hyndman to replicate past patterns in forecasts of future values17. This methodology considers the STLF of the series into their trend, seasonality, error components, and adjusts the ARIMA-type model for the errors.
In the spatial analysis, the spatial distribution and autocorrelation of the HIV/AIDS incidence rate were determined using ArcGIS 10.1, software (ESRI, Redland, California, United States). The incidence rate was calculated for quadrenniums (2007–2010, 2011–2014, and 2015–2018) and for the entire period of the study to avoid annual fluctuations. The calculation of the incidence rate was based on the average population projections of young men in the municipalities for each specific period. Spatial autocorrelation was analyzed using Moran’s global index (I) with 999 permutations. This was followed by the statistical method and local indicators of spatial association (LISA). A queen-type W contiguity matrix was employed in both analyses and neighboring municipalities were considered to share borders and nodes.
Moran's global I index ranges from − 1 to 1, were applied where 1 indicates a direct correlation, 0 indicates randomness, and − 1 indicates an inverse correlation. Four types of clusters can be identified in the LISA map: high–high and low–low indicate a direct correlation, whereas low–high and high–low indicate an inverse correlation.
Spatial scan analysis based on the discrete Poisson model was performed using the SatScan 9.7 software (Kulldorf, Cambridge, MA, USA) to assess the regions at risk for HIV/AIDS18. The following criteria were used: non-overlapping clusters with a maximum size of 50% of the population at risk and 999 replications. Each risk region had a relative risk (RR) and 95% CI. Regions with RR ≥ 1 and p ≤ 0.05, were considered at risk. RStudio 1.4 software was used to calculate the 95% CI.
The geographic variability of the HIV/AIDS epidemic in Pará promoted by SDHs was assessed using geographically weighted regression (GWR) analysis19. The HIV/AIDS incidence rate for the 12 years of the study was considered a dependent variable. The SDHs were the independent variables.
The SDHs were obtained from websites of the Institute of Applied Economic Research and Brazilian Institute of Geography and Statistic. They were then categorized into the following dimensions: Education: 1) percentage of people aged 18 or over without complete primary education and with informal work, 2) elementary school dropout rate, 3) high school dropout rate, 4) percentage of people aged 18 or over with complete primary education, and 5) school attendance; Income: 1) people who live in households with per capita income less than the minimum wage (from 2010) and require more than an hour to arrive at work, 2) percentage of people between 15 and 24 years of age who do not study or work and have a per capita income equal to or less than the minimum wage (from 2010), 3) proportion of people with per capita income less than or equal to half the minimum wage (from 2010), 4) unemployment rate of the population of 18 years of age or over, 5) total number of families registered in the Bolsa Família Programme, 6) total number of families registered in Single Register of Social Programmes, 7) formal employment relationships – Male, and 8) average salary of formal workers – Male; Social prosperity: 1) municipal HDI (HDIm), HDIm-longevity, 3) HDIm-education, 4) HDIm-income, and 5) the social prosperity index; Primary Care Coverage: Environmental and Social Framework coverage.
For GWR, Pearson's correlation analysis was used to verify the correlation between the dependent and independent variables in RStudio. All correlations (p ≤ 0.05) were then analyzed through an ordinary least squares (OLS) regression model using the step-by-step method in MGWR software (ASU, MD, USA). The generated models were evaluated for multicollinearity and only those with variance inflation factor (VIF) values lower than 10 were accepted. The best explanatory model for this phenomenon was defined by the lowest AIC value (p < 0.05). The OLS residuals were then analyzed for spatial autocorrelation to validate the model. After the spatial dependence of the residuals was discarded, GWR was applied considering the fixed-bandwidth kernel. The GWR residuals were also analyzed for spatial autocorrelation. The adjusted R2, AIC, and AICc values obtained from the GWR were used to compare the OLS and GWR models.