The Multivariate Distributed Lag Non-Linear Model (MV DLNM ) that Accommodates Cumulative Outcomes

In recent years, the effects of meteorological factors on health outcomes have gained popularity due to the impact of climate change, which not only results in a general rise in temperature but also the abnormal climatic extremes. Instead of the conventional cross-sectional analyses that focus on the association between the main predictor and one single dependent variable, the distributed lag non-linear model (DLNM) has been widely adopted to examine the association between the multiple lagged environmental factors and the health outcome in epidemiology studies. In this research, we further investigate a more complex association structure between the lagged mortality and the lagged temperature. Five newly proposed strategies, which are derived by various statistical concepts, such as summation, autoregressive, principal component analysis, and adjustment or offset in the DLNM, are evaluated by the permutation study. The longitudinal climate and daily mortality data in Taipei Taiwan from 2012 to 2016 were permuted to simulate the null distribution. According to simulation results, only one strategy, named as MV DLNM , could yield valid Type-I errors. With an application to the real data, the MV DLNM that incorporates both the current and lagged mortalities demonstrates a much more significant association comparing to the conventional DLNM that only relies on the current mortality.


Introduction
Extensive studies have indicated the association between temperature and human health, which arouses public health concerns as the climate, have changed drastically on a worldwide scale due to global warming in recent years. [1,2]. After accounting for climate changes and other factors, how hot and cold weather, or their delayed effects, trigger human death were widely discussed in different areas, including the United States [3,4], Europe [5], and Northeast Asia [6]. In addition to temperature, it has also been documented that exposure to air pollutants, which includes particulate matter (PM), ozone (O3), nitrogen dioxide (NO2), and sulfur dioxide (SO2) according to The 2005 WHO Air Quality Guidelines, leads to adverse effects on human health, especially the respiratory and cardiovascular diseases. A number of researches have examined the relationship between PM10, PM2.5 and daily mortality. Some showed that exposure to polluted air in a period would harm health conditions such as the development of lung or heart diseases, where the sources of pollution come from air, second hand smoke, ozone, or particle matters. [7,8].
In 2010, Gasparrini [2] carried out the Distributed Lag Non-Linear Model (DLNM) to evaluate the lagged effect of predictors. The DLNM fits the non-linear association between the outcome variable and predictors. A cross-basis function depicts simultaneously the exposure-response relationship along the predictor space and lagresponse relationship along the lag space. In 2018, a new approach assessed both the same day and one day lagged mortality in DLNM [9]. Therefore, associations in both lagged outcomes and exposures needs more attention in order to describe such complex structure.
In this research, we collected both weather and air pollution data as well as daily mortality in Taipei City from 2012 to 2016. Since the DLNM is widely adopted in public health and environmental research [10], we aim to extend the DLNM with Poisson link function and natural cubic splines [11] to accommodate lagged outcomes.
Validity and performance of the new methods would be evaluated by the permutation study. Finally, a real data application shows the improvement by the new method.

Data collection
All-cause mortality in Taipei

Statistical analysis
The DLNM model is defined as the following: In order to extend the DLNM to accommodate the lagged outcomes, we propose five different approaches to transform the lagged outcomes (n ) into a 1-dementional dependent variable (n 1) to be integrated into the DLNM.
For illustration purpose, assume that the Y matrix consists of four days of mortality with two lagged days. Hence, the dimension of Y is (4 3). The second column of Y is the one-day lagged mortality. The third column of Y represents the two-day lagged mortality. i.
The n-day lagged mortality is multiplied by coefficients 0.

Software
All the statistical analyses and simulations were conducted by the software R, equipped with the package "dlnm" by Gasparrini [2].

Results
According to the permuted samples, the observed type-I error for is presented in Table 2. Method 1 and 2 are based on summation of previous outcomes, but with different weights. Therefore, results of , , and are similar and not shown.
Due to a negative value in principal components, and failed to satisfy the assumption of Poisson distribution model and did not generate any results from the DLNM package. Hence, the type-I errors were not obtained. Note that type-I error rates for , , , and were much larger than the nominal level of 0.05. The inflation is increasing with respect to the number of lagged outcomes.
Therefore, these methods are not valid, although the idea is simple and could be easily implemented. Finally, Table 4 showed that the type-I error rate of MVDLNM was smaller than 0.05 when the lagged exposure was 10 days. The type-I error ranges from 0 to 0.078 if the lagged exposure was 20 days. When the lagged exposure was 30 days, the type-I error ranges from 0 to 0.102., Therefore, the results indicated that MVDLNM is the only valid test. For the 10, 20, and 30 lagged exposures, the cumulative outcome mortality could be implemented up to 10, 10, and 13 days, respectively.

Applications to the real data
Our previous work using 6 six major cities in Taiwan [15] reported a significant temperature impact on mortality. In this research, only Taipei city is available for the recent years. Hence the association based on the conventional DLNM is not statistically significant. However, the MVDLNM could provide more significant overall p-values (

Discussions
Through simulation studies, we examined several novel approaches to characterize the association between the lagged mortality and the lagged temperature measures.
Results suggested that most methods are invalid, although these statistical concepts are

∑ ； log
The illustration of real data analysis of Taipei City from 2012-2016 confirmed that the cumulative mortality is significantly associated with lagged temperature measures.
In contrast, the conventional DLNM method failed to provide significant results as we previously reported [15]. Nevertheless, this new strategy is a very useful tool and could be adopted by various research fields when the cumulative effect of the outcome is desired. Limitations: The data used in this study is limited to Taipei, the capital of Taiwan, while the relationship between temperature and mortality may consist of various profiles in other regions. For example, the accessibility and quality of medical care may be different in smaller towns. Additionally, we considered the all-cause mortality, since we could not further classify the causes of death into more categories, such as sudden cardiac death or myocardial infraction, which are more likely to be related to temperature and air pollution. As for the temperature, only daily mean temperature was considered in this study. We didn't explore the highest, lowest temperature, and intraday temperature variation in the contribution to human death. Finally, some researchers proposed that a threshold is needed to differentiate impact of hot and cold temperature on mortality. In      Table 4