Research design and scenario
This was an ecological study[11] carried out in Ribeirão Preto, a municipality in the interior of São Paulo. As regards the care of people in the municipality with TB, the Basic Health Units (BHSs) are responsible for carrying out active searches for respiratory symptoms, with the collection of sputum smear microscopy and/or X-ray requests. However, the treatment and follow-up of TB cases is performed in specialized outpatient clinics for infectious patients[12].
It is noteworthy that in Ribeirão Preto, TRM-TB using the GeneXpert® MTB/RIF system was implemented and started to be used as a diagnostic technology for TB in November 2014, and this was the cut-off point considered in the study.
Population
The study population consisted of TB cases notified to the Tuberculosis Patient Control System (TBWeb) from 2006 to 2017, which were made available through the Epidemiological Surveillance Division of the Ribeirão Preto Municipal Secretariat.
In the state of São Paulo, a decision has been made to use a single system for the notification and monitoring of people with TB. In TBWeb, which started to be used effectively from 2006, notifications are made online; the main advantage of this system is the uniqueness of each patient’s records, and the automatic communication in cases of transfer and hospitalization[13].
Confirmed cases of TB among individuals residing in Ribeirão Preto were considered. Only one record per person was adopted as the selection criterion, with the most current record being selected if there was more than one entry in the system.
The notified TB cases were separated in order to show the different behaviours of the time series for different groups in the municipality: general TB, pulmonary TB, extrapulmonary TB and TB-HIV co-infection.
Analysis plan
Calculation of incidence rates
The monthly incidence rates of TB in the municipality were calculated as the number of cases in the month (corrected by the number of days in that month) divided by the corresponding estimated population for the investigation period (2006 to 2017), thus resulting in 144 monthly time observations[14]. Subsequently, the Average Monthly Percentage Change (AMPC) of incidence rates was calculated, identifying, in average percentage terms, any increase or decrease in rates during the study period.
Detection of structural changes
In an analysis of time series, in addition to the usual variability observed over time, the data can also be influenced by various types of event that can cause structural changes. Thus, in order to identify possible changes in the series, the R CRAN package strucchange was used[15].
Basically, a yiseries is considered and it is assumed that there are m breakpoints in the series, in which the coefficients change from one stable regression relationship to another. There are, therefore, m + 1 segments in which the regression coefficients are constant and the model can be rewritten as:
with xt being the vector of the covariables, βj(where j denotes the segment index) the corresponding regression coefficients, and ut white noise (that is, an uncorrelated series, with zero mean and constant variance)[16].
Modelling and forecasting future values
To model the TB rates, and to predict their future values, incidence rates smoothed by first-order moving averages were considered, as proposed by Becketti[17], using the model for linear time series called the autoregressive integrated moving average (ARIMA) model. The analysis steps proposed by Box and Jenkins[18] were adapted for the chosen model, based on the data structure itself: Identification, Estimation, Verification and Forecasting.
An ARIMA model (p, d, q) allows the variability of a time-related, linear, stationary (d=D=0) or non-stationary (otherwise) process to be described.
The letters p and q represent, respectively, the number of parameters of the autoregressive parts and the moving averages within the period, and the letter d represents the degrees of simple differentiation necessary to transform a non-stationary series into a stationary one[19].
An ARIMA model can be written as follows:
are, respectively, the autoregressive and the moving average polynomials. T is the transformation to stabilize the variance, if this is necessary, and Zt represents the white noise process (a non-correlated process, with zero mean and constant variance).
The KPSS unit root test was performed to determine whether the series was stationary or not, using a significance level of 5%. For a non-stationary series, it is necessary to resort to the usual transformation techniques (Box–Cox and simple differentiations) in order to transform the series into a stationary one and then to determine, through the empirical autocorrelation and partial autocorrelation functions, the p and orders of the ARIMA model.
To estimate the model parameters, the maximum likelihood method was used. For the validation of the model, namely in the analysis of residues, the usual tests of absence of autocorrelation (Portmanteau tests: Ljung–Box and Box–Pierce), randomness (Rank and Turning Point tests), normality (Kolmogorov–Smirnov test) and the t test of average nullity were performed.
It is worth mentioning that the choice of the best model was made taking into account the lowest values of the Akaike information criterion (AIC). Subsequently, data and trend forecasts were made for an eight-year period (2018 to 2025).
A set of tests using the last two years (2016 to 2017) was used to assess the predictive performance of the models. The following measures were considered in order to evaluate this predictive performance: Root Mean Square Error (RMSE), which indicates the difference between the values predicted by a model and the observed values, Mean Absolute Error (MAE), which is a measure of the accuracy of a forecast in the estimation of trends, and Mean Absolute Percentage Error (MAPE), which is the percentage of the predicted values that are incorrect. Forecasts were then made based on the adjusted models for the eight-year period (2018 to 2025).
We have chosen to present possible trajectories for the forecasts, instead of the usual average forecasts (since when the series are stationary or show trends and/or seasonality, the forecasts do not usually tend towards the process average), through a simulation of the adjusted model. Both for the simulation of future trajectories and for the calculation of the respective confidence intervals of the predictions, errors were considered as random variables, normally distributed, with mean and standard deviation equal to their estimated values.
All the analyses were performed using the statistical software R Studio® version 3.5.2 (https://rstudio.com).