The number of weekly scrub typhus cases was obtained from the National Infectious Disease Portal, provided by the Korea Disease Control and Prevention Agency (KDCA), and the period of interest ranged from 2001 to 2019 since data acquisition from this platform was only available from 2001. Surveillance system of scrub typhus was based on mandatory reports of all suspected and confirmed cases district-wise. We obtained weekly reported suspected and confirmed cases in seven metropolitan cities and nine provinces. Suspected cases were defined by clinical manifestations of scrub typhus, such as acute febrile illness, lymphadenopathy and/or skin eschar, and epidemiological association. Confirmed cases were defined based on clinical symptoms and laboratory confirmation. The year 2020 was excluded to avoid the influence of the coronavirus disease in 2019. Our study only used publicly available data; therefore, the requirement for informed consent was waived off (IRB No. 2003-131-1110).
Meteorological data acquisition
We obtained meteorological data from the Weather Data Open Portal provided by the Korea Meteorological Administration. Meteorological data such as AT, RH, precipitation, wind speed, and sunshine duration on a minute-to-minute basis were collected from a total of 102 monitoring stations across the country. We computed the daily average values of these meteorological variables by city and province.
We obtained information on the daily mean, maximum, and minimum AT, mean RH, mean wind speed, sunshine duration, and precipitation. The diurnal temperature range (DTR) was defined as the difference between the daily maximum and minimum temperatures. We then computed the weekly mean AT (mean, minimum, and maximum), mean RH, DTR, mean wind speed, and weekly cumulative sunshine duration and weekly cumulative precipitation.
We also computed the difference between the annual mean AT each year (2001–2019) and the mean AT for the period 1971–2000. Likewise, we computed the difference between the number of heatwave days each year (2001–2019) and the mean number of heatwave days during the years 1971–2000. The number of heatwave days per year was defined by the number of days in which the daily maximum AT was ≥33°C.
1. Descriptive analysis
We adopted locally estimated scatterplot smoothing (LOESS) curve fitting to determine whether scrub typhus incidence showed seasonality 64,65. Time series plots of climate factors and weekly incidence of scrub typhus from January 2001 to December 2019 are shown in Figure 1. We then conducted a bivariate analysis between the meteorological variables and weekly scrub typhus incidence. As there were lags of up to 7 months between the annual peaks of the mean AT and the peaks of scrub typhus incidence, we decided to use lags from 0 to 28 weeks (Figure S1).
The correlation between each meteorological variable (mean, maximum, and minimum AT, mean RH, mean wind speed, weekly cumulative sunshine duration, weekly cumulative precipitation, and DTR) and weekly scrub typhus incidence was explored by applying a generalized additive model (GAM). Minimum, mean, and maximum AT, mean RH, and weekly cumulative precipitation were correlated with weekly scrub typhus incidence. Since the minimum, mean, and maximum AT were correlated, we only selected mean RH and weekly cumulative precipitation as covariates for GAM.
We constructed GAM for the association between AT and weekly scrub typhus incidence by adjusting for mean RH and precipitation, using a quasi-Poisson function, and simple lags and moving average lags from 0 to 28 weeks. Meteorological variables and weekly scrub typhus incidence showed seasonality between 2001 and 2019 (Figure 1 and S1). According to previous literature, Fourier terms up to six harmonics per year were included to reflect seasonality in the GAM 66,67. The GAM formula used is as follows:
Ln [E(Y)] = β0 + β1Xt + Σ Si (Xj) + sin (2πtj /N) + cos (2πtj/N) + year,
where E(Y) denotes expected scrub typhus incidence; β1 is the coefficient (slope) of a meteorological variable for week t; Si(Xj) is a smoothing function for an individual covariate i; tj indicates the week of the year j (j = 1, 2, ... ,52); N is a harmonic for the year (N = 1, 2, ..., 6); and year depicts the year when the scrub typhus cases were reported on an ordinal scale.
Associations between AT and weekly scrub typhus incidence were explored for the seven cities and nine provinces with different lags, from 10 to 15 weeks (Figure S5). Association plots obtained by applying GAM showed that there was a nonlinear relationship between a meteorological factor 10–15 weeks ahead and weekly scrub typhus incidence (Figure S5). Compared with other lags, such as simple lags of 10–14 weeks or moving average lags of 10–15 weeks, GAM plots with a simple lag of 15 weeks consistently showed V or U shape in most of the cities or provinces (Figure S5).
2. The association between weekly AT and scrub typhus incidence (threshold analysis)
To estimate the association between the increase of scrub typhus incidence and AT or RH, which showed “V” or “U” shape, we attempted to find a threshold where the slope of the graph changes from negative to positive (Figure S6 and S7). The R HEAT package uses a generalized linear model to approximate a TP if the beta coefficient (slope) is converted from negative (positive) to positive (negative) 68. In particular, the threshold function in the HEAT package uses piecewise linear regression to approximate a threshold. For instance, β1 was calculated below a threshold, where weekly scrub typhus incidence decreased as mean AT increased and β2 was calculated above a threshold, where scrub typhus incidence increased as mean AT increased. After visual approximation of a threshold for mean AT and mean RH from GAM, we applied the threshold () function to quantify the percent changes in scrub typhus incidence following a unit change of each meteorological variable above or below the threshold. The GLM formula used is as follows:
Ln [E(Y)] = β0 + β1Xt + Σ Si (Xj) + sin (2πtj /N) + cos (2πtj/N) + year, if Xt <TP.
Ln [E(Y)] = β0 + β2Xt + Σ Si (Xj) + sin (2πtj /N) + cos (2πtj/N) + year, if Xt ≥TP,
where E(Y) denotes the expected scrub typhus incidence; β1 is the coefficient (slope) of a meteorological variable for week t below the threshold point (TP) of Xt and β2 is the coefficient (slope) of a meteorological variable for week t above the threshold point (TP) of Xt; Si(Xj) is a smoothing function for an individual covariate i; tj indicates the week of the year j (j = 1, 2, ... ,52); N is a harmonic for the year (N = 1, 2, ..., 6); and year depicts the year when scrub typhus cases were reported.
Since data distributions for meteorological variables and scrub typhus incidence were skewed, we adopted quasi-Poisson regression in the GAM and GLM models. The smoothing parameter was determined by natural cubic spline by selecting an optimal degree of freedom 60, where the unbiased risk estimator (UBRE) 69,70 in the GAM and Akaike information criterion (AIC) 71,72 values in the GLM for each province approached their minimum. Considering all provinces’ UBRE values following increasing degrees of freedom (df), our GAM model used 5 as its df.
For the sensitivity analysis, we used different dfs for each city and province. We also compared GAM with and without harmonics. Harmonics of 2, 4, 6, 8, and 10 were used for further comparison. Furthermore, we compared the GLM by additionally adjusting for mean wind speed, weekly precipitation, and weekly sunshine duration.
3. Long-term association between meteorological factors and scrub typhus incidence
We investigated the long-term association between meteorological factors and scrub typhus incidence using annual cumulative scrub typhus cases. The long-term mean AT difference was defined as the difference between the mean annual AT each year (2001–2019) and the mean AT of the previous 30 years (1971–2000). The difference in the number of heat wave days was defined as the difference between the number of heat wave days each year (2001–2019) and the mean number of heat wave days during the years 1971–2000.
We plotted the long-term trends of meteorological variables from 2001 to 2019 using the LOESS regression. The difference in annual mean AT or annual number of heatwave days from the mean AT or mean number of heatwave days between 1971 and 2000 was grouped by five-year terms to detect long-term trends (Figure 4A, 4B). A simple linear regression model overlaid the point plots to determine if a linear correlation existed between the variables. The fitted model slope, p-value, and coefficient of determination (R2) were calculated accordingly. Although the time variable year best explained increases in cumulative scrub typhus cases in autumn outbreaks in the model selection process, we intended to correlate cumulative scrub typhus cases in autumn outbreaks with differences in the annual mean AT or the number of heatwave days from those of the years from 1971 to 2000. Therefore, point plots, linear regression lines, 95% confidence intervals, slopes, p-values, and R2 values were obtained in the same manner (Figure 4).
To analyze the association between the difference in annual mean AT of each year and mean AT of the past 30 years (1971–2000), we used a generalized linear mixed-effects model (GLME). Covariates were selected from the directed acyclic graph (DAG) (www.dagitty.net) (Figure S10). We considered changes in population and land use or greenness over the last 19 years; however, we did not include them as covariates since population changes were not directly associated with AT, and land use was a causal mediator rather than a confounder (Figure S10). Instead, we used the incidence rate (annual incidence of scrub typhus per 100,000 population) to consider population factors in the regression model. We put the province as a random intercept in the glmer() to diminish spatial interference in the estimation. The analysis was repeated after replacing the difference in annual mean AT by the difference in the number of heatwave days.
All statistical analyses were conducted using the statistical software packages SAS (version 9.4) and RStudio software (ver. 1.2.5042).