Modeling tuberculosis transmission flow in China CURRENT STATUS: UNDER REVIEW

Background: China has the largest population and third largest number of tuberculosis cases in the world. Tuberculosis still remains a major public health concern in China. The increasing floating population has become an important part of China’s socioeconomic process and brought the potential risk for infectious disease transmission in the huge population. Both the flow of tuberculosis population in the country and the role of massive floating population in tuberculosis transmission are yet unclear. Methods: 14,027 tuberculosis flow data were derived from the new smear-positive pulmonary tuberculosis cases in China in 2012, provided by the nationwide Infectious Disease Reporting System. Spatial interaction model was used to model the tuberculosis flow and the regional socioeconomic factors. Results: The Pearl River Delta in southern China and the Yangtze River Delta along China’s east coast presented as the largest destination and concentration areas of tuberculosis inflows. Socioeconomic factors were determinants of tuberculosis flow. A 10% increase in per capita GDP was associated with 2.1% decrease in tuberculosis outflows from the provinces of origin, and 0.5% increase in tuberculosis inflows to the destinations and 18.9% increase in intraprovincial flow. Per capita net income of rural households and per capita disposable income of urban households were positively associated with tuberculosis flows. A 10% increase in per capita net income corresponded to 3.6% increase in outflows from the origin, 12.8% increase in inflows to the destinations and 47.9% increase in intraprovincial flows. Tuberculosis incidence had positive impacts on tuberculosis flows. A 10% increase in the number of tuberculosis cases corresponded to 1.1% increase in tuberculosis inflows to the destinations, 2.0% increase in outflows from the origins, and 2.2% increase in intraprovincial flows. A 10% increase in the tuberculosis incidence rate was associated with 9.9% increase in tuberculosis intraprovincial flows. In addition, the In this study, we found that TB flows had clear spatial stratified heterogeneity and spatial autocorrelation, which influence the influx of TB to the neighboring provinces. We found that the TB flows had a statistically significant relationship with regional incidence, socioeconomic differences in regional characteristics produced different effects on TB flows in the origin and destination, and income factor played an important role among the determinants. These findings provided scientific bases for the joint and precise prevention and control of TB transmission in population inflows to provinces and their neighbors.

tuberculosis flows in the origin and destination, and income factor played an important role among the determinants. Background Tuberculosis (TB) is a type of chronic infectious disease caused by Mycobacterium tuberculosis [1].
The disease is primarily via respiratory tract transmission and has seriously threatened human health for thousands of years [2]. China has the third largest number of TB cases in the world, and about 900,000 new cases are diagnosed each year, with the mortality rate of 1.43 per 100,000 in 2017 [3]. Therefore, TB is still a leading cause of mortality and economic burden in China.
In recent decades, China has experienced rapid urbanization with the growth rate from 45 Regional economic difference is an important driving force for population mobility [5,10]. Existing research indicated that in areas with poor economic conditions, there is a high TB risk [11,12], and TB transmission has been confirmed to be associated with migration [13]. TB migration flows come from infected migrants among these floating populations. The increasing floating population across regions has a tremendous potential for spreading infectious diseases [14,15]. Spatial interaction exists in the spatio-temporal transmission of an infectious disease [16]. The epidemiological mechanism interacting with socioeconomic factors at different spatial locations determines the variability of the geographical distribution of a disease [17]. The disparities of TB incidence were influenced by geospatial factors, population and socioeconomic heterogeneity, which will further affect the migrant population [18][19][20][21][22]. However, there is still no systematic analysis of how TB population size and socioeconomic heterogeneity affect tuberculosis transmission between the provinces in China.
With detailed information available about human movements, spatial interaction models have 4 become standard tools for describing population mobility dynamics for infectious disease epidemiology [23][24][25][26][27][28]. In previous studies, spatial interactions were often used to model population movements [29][30][31]. The main advantage of these models is that they could take into account the regional characteristic factors of the origin and destination, which affect migration flows. This is in line with the actual situation of disease flow, as it is affected by many regional characteristic factors, for example, origin-specific factors, destination-specific factors and spatial separation factors, such as the number of disease cases in the origin and destination, the distance between the origin and the destination, the level of economic development and the difference in income and employment opportunities. Current studies focused on the spatial distribution characteristics and influential factors of TB spread. Understanding the spatial interaction pattern of TB migration flows is essential for clarifying the mechanisms of transmission and targeting control interventions. Therefore, we applied the spatial interaction model as a useful tool to model TB flow and the socioeconomic factors of origin -destination (OD) regional characteristics.
In this study, we used q-statistic to measure the spatial stratified heterogeneity of TB transmission among floating populations in China. To further understand the complexity and heterogeneity of socioeconomic factors influencing TB transmission, we applied the spatial interaction model to estimate TB migration flows in association with the potential influential factors. Finally, the results will be explained and compared with those of other studies.

Data
We used the database of 14,027 migrant TB cases, which were diagnosed with new smear-positive pulmonary TB in 2012. This set of data covered the 31 provinces of China, which have highly different geographical environments and socioeconomic conditions (Fig. 1). The data was provided by the Chinese Center for Disease Control and Prevention (CDC), and reported directly by a nationwide webbased Infectious Disease Reporting System (IDRS).
The database of migrant TB cases was a collection of all TB case data from 1 January 2012 to 31 December 2012. Each case record in the dataset contained detailed information on age, sex, career, permanent residence, current residence, diagnosis month, results of smear microscopy diagnosis and so on.
Some studies have indicated that TB migration flows were influenced by various factors, such as income and social status, physical environments, working conditions, social environments, families, personal health practices and so on [32][33][34]. Generally speaking, there are relationships and differences between generic population flow and TB flow.
On one hand, TB flow is part of the generic population flow, and TB migration flow refers to infected TB migrants in the floating population. There is a commonality between generic population flow and TB flow, as they are commonly affected by the different socio-economic and geographical environmental factors of the sending and receiving regions.
On the other hand, TB flow has its specificity compared with generic population flow. TB migration flows increase the risk of TB transmission from migrants to residents and amplify their harmfulness because of the dense population and active interactions with the migrating population. In addition, short and long distance transmissions of infectious diseases are the result of the interaction between their own epidemiological mechanisms, and socioeconomic and environmental factors, and the complexity and variability of disease spread also require factors specific to TB transmission, such as the number of TB cases and TB incidence rate.
As TB migration flow is a special kind of migration flow, the key factors affecting TB migration flow have not only the same commonality of the floating population, but also the characteristics of the disease spread. The levels of unbalanced economic development, income and job opportunities were recognized as the key determinants of population migration [35][36][37][38]. Per capita GDP, income and TB incidence were confirmed to have a relationship with the spread of infectious disease [4,18,20,21,39]; however the quantitative relationship between them still remains unclear.
Therefore, in this study various socioeconomic indexes were employed as variables in the TB migration flow model specifications representing OD regional characteristics (Table 1) TB flow data and all potential factors were calculated at the provincial or municipal level. Figure 2 shows the relationship between TB migration flow and its proxy variables.

q-statistic
Spatial stratified heterogeneity (SSH) reflects uneven distributions of disease transmission because their environmental and socioeconomic factors have different regional characteristics [40][41][42]. SSH refers to the phenomenon that the spatial distribution of the disease and its risk determinants are more similar within a geographical region than between geographical regions. Such a spatial variation of disease transmission can be measured with the geodetector q-statistic [43]. The q-statistic is as follows: where h = 1, 2, . . ., L denotes the spatial stratification of the variable y or the factor x, i.e., classification or partitioning; N h and N are the numbers of units in layer h and the whole region are the variances of the y value of layer h and the whole region, respectively; The value range of q is [0,1]. The larger the value of q, the more obvious the spatial heterogeneity of y.

Spatial interaction model
Spatial interaction models of the gravity type typically rely on origin-specific, destination-specific and spatial separation factors to explain mean interaction frequencies between origins and destinations of 7 interaction. Origin-specific factors represent the ability of regional characteristics to generate outflows, destination-specific factors characterize the attractiveness of regional characteristics to absorb inflows, and spatial separation factors characterize the method of impeding the interaction from origins to destinations [44,45].
Spatial dependence is the key to express how spatial proximity affects TB migration flow [4,46], which refers to the phenomenon in which TB migration flows in a given region are affected by fluctuating TB flows in neighboring regions. In this study, spatial interaction modeling epidemic flow data, combined with geographic, socioeconomic and demographic information of a country's administrative regions, seek to explain the variation of TB migration flows at the provincial/municipal level. Estimating multiplicative spatial interaction models in their log-linearized form has long been a widely employed filtering approach [47,48], their log-linearized form is as follows: 8 The use of spatial weight matrixes is a convenient way of capturing the spatial dependence of migration flow. In the SIM, W represents geographical connectivity between the shared boundaries of n locations and measures the spatial dependence between the geo-referenced locations. W o is the Nby-N spatial weight matrix of origin-based dependence, W d is the N-by-N spatial weight matrix of destination-based dependence, and W w is the N-by-N spatial weight matrix of the origin-to-destination dependence. S(ρ d ,ρ o ,ρ w ) represents the sum of squared errors expressed as the scalar dependence parameters alone and C denotes a constant.

Spatial patterns of TB migration flow
There were two most remarkable regional clusters of TB migration flows. The largest destination and concentration areas were located in the Pearl River Delta in southern China and the Yangtze River Delta along China's east coast. The two most prominent regional clusters of TB migration flows appeared from the provinces with a large number of rural surplus labor (Sichuan, Guizhou, Guangxi, Hunan, Hubei, Henan, Shanxi and Hebei) to the coastal provinces with a high level of economic development (Guangdong, Zhejiang, Jiangsu, Shanghai and Fujian).
TB inflow represented a typical type of interregional migration flow from different regions of origin to the same destination. The top six provinces with inflows were Guangdong, Zhejiang, Fujian, Shanghai, Jiangsu and Beijing, which accounted for 84.5% of the TB migration flows (Additional file S1).
In addition to TB inflow, TB epidemic characteristics and their corresponding socioeconomic conditions also presented significant spatial heterogeneity, as the q-statistics for them were 0.94 (p < 0.05) and 0.89(p < 0.05) , respectively. In the eastern coastal region of China, TB incidence was low, and the three provinces with the lowest incidence were Beijing, Tianjin and Shanghai, while the highest TB incidence was found in the middle and west of China. The economic index also has a similar spatial pattern. The eastern coastal region has developed economic conditions and the three provinces with the highest per capita GDP were also Beijing, Tianjin and Shanghai, while the lowest per capita GDP was also found in the middle and west of China.

Spatial dependence of TB migration flow
TB migration flow had clear spatial dependence (Table 2). The origin dependence (ρ o ) and destination dependence (ρ d ) were positively correlated, which implied that the TB inflows to the destination or outflows from the origin showed a consistent trend compared with those in their neighboring regions.
Furthermore, the strength of the spatial dependence was much greater in the origin than in the destination. A 1% increase in TB migration flows in the regions of origin corresponded to a 66% increase in interregional flows in nearby regions. Moreover, 1% increase in TB migration flows of the destination locations was associated with a 47% increase in interregional flows within nearby locations. Whereas the origin-to-destination dependence (ρ w ) was negatively correlated with the TB flows. A 1% increase in ρ w was associated with a 28% decrease in TB flows from the neighbors of a given region.

Determinants of TB migration flows
In this study, we found that economic development level had significant effects on TB migration flow.
Generally, the regions with high per capita GDP were associated with low TB interprovincial outflows.
Furthermore, we found that this factor had different effects on TB migration flow in the origin and destination. A 10% increase in per capita GDP corresponded to a 2.1% decrease in TB outflows from the origin, while a 0.5% increase in TB inflows to the destination was associated with an 18.9% increase in TB intraprovincial flows.
The number of TB cases was found to have a positive impact on TB migration flow and was an important risk factor influencing TB migration flow. In this study, we found that a high number of TB cases were associated with increases in TB interprovincial and intraprovincial flows. A 10% increase in the number of TB cases was associated with a 1.1% increase in TB inflows to the destination provinces. By comparison, a 10% increase in the number of TB cases corresponded to a 2.0% increase in TB outflows from the provinces of origin, and a 2.2% increase in TB intraprovincial flows.
TB incidence rate as a key risk factor had positive impacts on TB intraprovincial flows and negative impacts on TB interprovincial flows. A 10% increase in TB incidence rate was associated with a 9.9% increase in TB intraprovincial flows and a 3.9% decrease in TB interprovincial flows.
Rural and urban income level exerted different impacts on TB migration flow in the origin and destination. The per capita net income of rural households had positive impacts on TB outflows from the origins and negative impacts on TB inflows to the destination. A 10% increase in this factor corresponded to a 3.6% increase in TB outflows from the origins, and a 1.7% decrease in TB inflows to the destination.
The per capita disposable income of urban households had negative impacts on TB inflows and positive impacts on TB flows. A 10% increase in this factor corresponded to a 3.6% decrease in TB outflows from the origins, a 12.8% increase in TB inflows to the destination and a 47.9% increase in TB intraprovincial flows.

Discussion
TB migration flows reflected the details of the highly complex dynamics and spatial heterogeneity [49,50]. In this study, we used TB flow data and the spatial interaction model of the new gravity type to model tuberculosis interactive transmission in spatially and economically heterogeneous settings. The results showed that the spatial pattern of tuberculosis transmission revealed clear the regional clusters and OD socioeconomic factors as important factors exerted different effects on TB migration flow in the origin and destination.
The gravity model has been widely used to capture the spatial interaction pattern in epidemics and account for distance, population size and other factors [23,51,52]. However, this approach may be insufficient for dyadic OD flows where the origin and destination regions may interact with their neighbors. Spatial interaction models of the new gravity type aim to explain the population migration variation of spatial interaction across geographic space. They focus on dyads of regions instead of individual regions and are increasingly used to understand the regional spread of an infectious disease in epidemiology [24][25][26][27][28].
TB migration inflows have become important factors that change the urban and rural distributions of the TB burden. In particular, two mainstreams of TB inflows with their destination (Guangdong or Zhejiang) have become important hubs for the interplay of infectious diseases transmission between migrants and local residents. The TB outflows were mostly from the less developed provinces in central and western China with high TB epidemics such as Hunan, Guangxi, Guizhou, Sichuan and Yunnan [53]. The formed pattern of TB migration flow was mostly in agreement with the spatial path of China's population flows [35]. Such patterns identified with provinces with severe tuberculosis were consistent with previous studies and results of several epidemiological surveys in China [12,19,[54][55][56]. Accordingly, China's health system should be a major consideration in prioritizing resource assignment in high-risk areas for TB control and reducing the burden in the future. In particular, it is important to focus on the majority of the key intercity paths for TB prevention and control among migrants and local residents.
In this study, we found that economic development level had significant and different effects on TB migration flow. This factor has negative impacts on TB outflows from the origins and positive impacts on TB inflows to the destination and TB intraprovincial flows in a given region. Previous studies indicated that per capita GDP and TB incidence at the provincial level had a negative relationship.
Similar results were observed for other infectious diseases such as SARS and H5N1 [4,57]. Therefore, improving the local economy is beneficial to the control of TB in the origin.
The number of TB cases and incidence rate were also important risk factors influencing TB migration flow in this study. A high number of TB cases and a high incidence rate were associated with an increase in TB inflows to the destination. Our finding highlighted that migration highly corresponded with TB transmission across space, which was consistent with some previous studies [58][59][60][61]. TB transmission presented an apparent regional characteristic. Geospatial clusters of TB cases reflected ongoing transmission or colocation of risk factors. This can account for TB transmission from migrants to local residents regardless of low or high endemic setting. Accordingly, it is critical to optimize effective prevention and control strategies of the TB epidemiology in this high-burden setting.
Income-related factors were found to have positive effects on TB inflows to the destinations in this study. Rural and urban income levels exerted different impacts on TB flow in the origin and destination. Income level was widely considered as a factor for tuberculosis transmission in previous studies [18,56,[62][63][64][65]. For example, regions with low income levels in central and western China 13 always had high tuberculosis rates. Migration flows from the low income regions involved a heterogeneous and vulnerable group. They had a significantly higher risk of latent TB infection than did permanent residents when their health suffered from overwork, hard life and inadequate nutrition [50,66,67]. Although basic free diagnosis and treatment of tuberculosis can be obtained at the inflow locations, it was difficult for migrants to receive the same treatment as the registered population in terms of employment and medical security because of the permanent hukou system in China [6,56].
Socioeconomic interventions can be powerful for tuberculosis control. Strategies to prevent overwork, improve individual living conditions and increase social expenditure per person have been associated with decreased tuberculosis prevalence.
The tuberculosis epidemic is still serious in China, and the incidence of tuberculosis has been decreased. In recent years, the epidemic pattern of tuberculosis in cities has undergone a fundamental change because of the rapid increase in the floating population, especially the influx of migrants into the metropolis [68]. The related floating population having tuberculosis, which accounts for nearly 20% of China's population, is a great challenge for tuberculosis control in China. The characteristics of the tuberculosis spatial pattern are becoming increasingly complicated and diverse.
This study has some limitations. TB migration flow involves a special population mobility group. This phenomenon of migration flow often occurs at various spatial scales with different socioeconomic levels. The data used in this study were unable to reflect the characteristics of small-scale TB migration flows. Therefore, a more refined scale will be a topic for future studies. In addition, poverty factors such as living standards and living conditions often cause TB to spread among low-income individuals in different regions ; thus, on-the-spot investigations need to be conducted on the nutritional status and living conditions of frequently-affected populations need to be conducted for indepth research.

Conclusions
In this study, we found that TB flows had clear spatial stratified heterogeneity and spatial autocorrelation, which influence the influx of TB to the neighboring provinces. We found that the TB flows had a statistically significant relationship with regional incidence, socioeconomic differences in regional characteristics produced different effects on TB flows in the origin and destination, and income factor played an important role among the determinants. These findings provided scientific bases for the joint and precise prevention and control of TB transmission in population inflows to provinces and their neighbors.  Tables   Table 1. Variables of regional characteristics used in the TB migration flow model specifications  Supplemental Information Note Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or 24 boundaries. This map has been provided by the authors. Relationship between TB migration flow and its proxy variables. Note: z represents direct variables, and x characterizes indirect variables territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. 3Additional fileS1.docx