Study design:
This is an environmental ecological study, using time series analysis to identify seasonality, harmonic regression to quantify seasonality and a mixed-effect linear model to explain the incidence of TB by province.
Study Setting:
Vietnam has a large diversity in climate due to the fact that it stretches approximately 2000km from north (22oN near the tropic of cancer) to south (8oN) and from sea level to 2000-3000m elevation from east to west. It lies relatively close to the equator and enjoys a tropical climate. Additionally the north is exposed to four seasons, whereas the south only has two, a wet and dry season. The maximal hours of sunlight in the north peaks at the same level as the lowest rates in the south and are out of phase. In addition to this, seasonal climate factors do not have the same temporal associations north to south, that is absolute humidity and precipitation vary in different regions in a different temporal pattern than hours of sunlight. Precipitation levels are the highest in the central region.
There are 63 provinces in Vietnam and data were available for all. The median area of a province is 4,600 km2 (IQR 2,300 – 6,800 km2), with a median population size of 1.175 million (IQR 0.84 – 1.61 million)
Tuberculosis data:
TB is a notifiable communicable disease in Vietnam. The national TB program (Viet Nam TB Information Management Electronic System, VITIMES) collects data using a passive surveillance system covering the entire country. Data is then aggregated quarterly and available by province (n = 63). These data were publicly available from 2010-2015 (23). Classification of TB follows the WHO guidelines to include data on new cases, relapse cases, failure of treatment cases, recurrence cases and others. Any cases with both pulmonary and extra-pulmonary symptoms were classified as pulmonary TB (PTB). Cases with only extra-pulmonary symptoms are classified as extra-pulmonary TB (EPTB) (24). Whole genome sequences from MTB were not yet available from the national program 2010-2015 and therefore those cases defined as “new cases” may be from new transmissions or from latent disease activation.
Influenza-like-illness data:
In the absence of robust data for influenza, influenza-like-illness (ILI) incidence was taken as a surrogate for influenza as a potential confounder or driver of seasonality. The global epidemiological surveillance standards for influenza, published by the WHO suggest ILI data is a useful tool in understanding trends and has been shown to correlate with influenza outbreaks (25). ILI data from Vietnam have previously been shown to be highly seasonal (21). These data were available from the national passive surveillance programme (29 diseases) of the General Department of Preventive Medicine (GDPM) of the Ministry of Health, monthly totals per province are publicly available (26). These data were then aggregated by quarter.
Demographic and socio-economic data:
The General Statistics Office of Viet Nam collects and collates data on a wide variety of topics including socio-economic and health data (27). These data have been systematically organized and made publicly available for research purposes ((28), https://www.gso.gov.vn). The following variables were taken from this resource and were all defined per province per year: population density, proportion of male to female, proportion of urban to rural population, poverty ratio (ratio of the number of people whose income falls below ½ the median income of the country), literacy rates of 15 year olds and above, number of hospitals, immigration intake and prevalence of HIV infection at year end. Age structure was available divided into five year intervals up to the age of 80 for the year 2014. The age structure for 2014 was assumed to remain static over the 6 years of the study. We grouped the ages as young, middle and old, £ 15 years, 16-64, ³ 65 years respectively, in keeping with international studies (29, 30).
Climate data:
Meteorological data was made available from the Vietnamese Institute of Meteorology, Hydrology and Environment for 67 climatic stations positioned throughout Vietnam. This includes monthly averages of daily minimum, maximum and average temperatures (°C), as well as of daily average absolute (g/L) and relative (%) humidity, cumulative precipitation (mm) and hours of sunlight. In order to compute climatic variables per province from these climatic station data, the variables where kriged (31) on a 10,000-cell regular grid (i.e. pixels of 5.75 x 5.75 km) and then aggregated by province. Kriging is a technique of spatial interpolation using a Gaussian process, whereby interpolations are weighted averages of neighboring measures. This is the standard technique in meteorology and climatology. The data from the 67 meteorological station data were used to impute the values of a 10,000-cell regular grid. Local population density was account for using weighted averages depending on the population density of each grid. The relative population densities were computed from the 2009 population density data available from WorldPop project (32). This allowed the computation of province-aggregated climatic variables that are representative of the population of the considered province. Kriging was done by an automatic tuning of the hyperparameters of the variogram estimation as made available by the automap R package (33). Kriging performed well overall, except for rainfall, most likely because this variable typically varies widely in space, resulting in some extreme values through kriging. These outliers were identified by comparison with the original dataset and replaced by the average rainfall for the province from the other 5 years.
Seasonal factors (meteorological and ILI data) were investigated with and without a lag of one or two quarters. This was based on findings that diagnosis of TB was made at least 12 weeks from transmission or disease activation (34). Unfortunately, precise data on delay in diagnosis specific to Vietnam or other similar settings were not available, and therefore we assumed a similar delay in diagnosis as published. Since our data were only available quarterly we added lags so that we could examine the climate conditions at the proposed time of transmission or activation.
We followed the Reporting of studies Conducted using Observational Routinely-collected health Data (RECORD) guidelines and checklist (see supplementary material) (35).
Analysis:
All analysis, graphical representations and maps were produced using R version 3.5.0. Data from the four databases were merged by province, year, and quarter.
Determination of geo-temporal variation in TB incidence:
Incidence of TB, per province per year, was mapped using the “sf” R package (36). Two heat maps of TB incidence, per province per quarter were produced to demonstrate and visualize seasonality in space and time (one normalized per province for temporal variation and one normalized by quarter for geographical variation).
Determination of seasonality by province:
Time series data per province were decomposed into seasonality, trend and residual components using a LOESS regression. The trends indicated in this analysis were complex and province specific and could not be modeled linearly.
For each province, harmonic regression using cosine and sine waves were used to model the seasonality on the de-trended data according to the following formula:
LOESS-detrended incidence of TB ~ a*cos(time) + b*sin(time)
Determination of strength of seasonality by province:
Where seasonal amplitude was characterized as sqrt(a2 + b2) and relative seasonal amplitude was expressed as the seasonal amplitude/mean incidence of TB ratio.
Mixed-effect Linear Model: association between explanatory variables and TB incidence:
A mixed-effect linear model was used, to allow both fixed and random effects to be incorporated, using the “lme4” R package (37).
All variables (except latitude, quarter and age proportions) were de-trended by province using the LOESS method described above. Year and province were introduced as random effects to control for clustering by unmeasured variables or potential trends that could remain when considering all the provinces together. All other variables were introduced as fixed effects with quarter coded as a factor and all others as continuous variables. Likelihood Ratio Tests (LRT) were used to determine significance level of each covariable. Potential confounding effects due to colinearities between covariables were controlled for by computing type-2 sums of squares (using the “car” R package (38)). Covariables were examined after detrending for normal distribution and if skewed, transformed using log transformation in order to linearize potential associations between covariables. Seasonal factors were examined with different lag times and combinations, and the final model chosen from the lowest Akaike information criteria. The full model reads:
TB incidencex,y ~ urbanx + sexx + population densityx + literacyx + youngx + oldx + povertyx + hospitalsx + HIVx + migrationx + quarter + influenza-like-illnessx,y-2 + average temperaturex,y-2 + absolute humidityx,y-2 + rainfallx,y-2 + sunshinex,y-2 + latitudex + (1|province) + (1|year)
Where x = province and y = year quarters. See Table 1 for full description of variables.
Table 1: All variables in final mixed-effect regression model
Variable
|
Definition
|
TB incidence
|
The number of new cases of TB/quarter/population of the province, in x province (x = 1,2,…63) at y quarter (y = 1-4, in a given year), LOESS detrended
|
Urban
|
Proportion of people living in urban compared to rural areas per province per year, LOESS detrended
|
Sex
|
Proportion of male to female per province per year, LOESS detrended
|
Population density
|
Number of individuals per province divided by the geographical area of the province, per province per year, LOESS detrended
|
Literacy
|
Percentage of literate individuals in those ≥ 15 years of age per province per year, LOESS detrended
|
Young
|
Percentage of population under 20 years of age per province per year, LOESS detrended
|
Old
|
Percentage of population ≥ 65 years of age per province per year, LOESS detrended
|
Poverty
|
Ratio of the number of people whose income falls below ½ the median income of the country, per province per year, LOESS detrended
|
Hospitals
|
Number of hospitals per province per year, LOESS detrended
|
HIV
|
Prevalence of HIV at 31st December each year, per province, LOESS detrended
|
migration
|
Rate of in migration into a given province, per province per year, LOESS detrended
|
quarter
|
1-4, beginning in January grouping into 3 month blocks.
|
Influenza-Like-Illness
|
Number of cases of influenza-like-illness/population of the province, per province per quarter, lagged by two quarter, LOESS detrended
|
Average temperature
|
average temperature per province per quarter, lagged by two quarters, LOESS detrended
|
Absolute humidity
|
average absolute humidity per province per quarter, lagged by two quarters, LOESS detrended
|
Rainfall
|
summed rainfall per province per quarter, lagged by two quarters, LOESS detrended
|
Sunshine
|
summed hours of sunlight per province per quarter, lagged by two quarters, LOESS detrended
|
Latitude
|
Latitude value of centroid point of province
|
(1|province)
|
Province as a random effect.
|
(1|year)
|
Year as a random effect
|
Model validation
The validity of the internal assumptions was assessed by examining the presence of trend, temporal and spatial auto-correlation (using the Durbin-Watson test) in the residuals.