Spread of Mycobacterium tuberculosis lineage 4 in South China inuenced by Maritime Silk Road and “Huguang Filling Sichuan” population migration

Background Lineage 4 of Mycobacterium tuberculosis complex (MTBC), mainly epidemic in Europe and Americas, presents in high proportions in South China and is believed to enter China around the 13th century, when the important human migratory events of the Maritime Silk Road and “Huguang Filling Sichuan” population migration happened in China. This study was to explain the coincidence of these two events with lineage 4’s high proportion in South China. Methods Based on the spatial interpolation analysis of the genotyping data of 25,575 MTBC isolates, the distribution of lineage 4 was compared with that of targeted surname populations and the Maritime Silk Road’s main ports. Results The results showed that lineage 4 distribution in China could be mapped to the regions affected by “Huguang Filling Sichuan” population migration; while the distribution of lineage 4’s two sub-lineages in Asia, Europe, Africa and Oceania could be best explained by the Maritime Silk Road. Our results suggest that these two events might pose a crucial shared inuence, leading to the greater incidence of lineage 4 in South China. And this may contribute to our better understanding of the prevailing tuberculosis landscape in China and facilitate the epidemiological investigations and tracking of emerging MTBC clones.


Introduction
Mycobacterium tuberculosis (MTB) and other members of Mycobacterium tuberculosis complex (MTBC) lead to human tuberculosis (TB) [1]. In 2019, about 2 billion people were MTBC carriers in the world, of which 5%-10% might develop active disease [2]. Out of 9.96 million new TB cases and 1.21 million deaths observed among the HIV negative patients worldwide in 2019, China accounted for 833,000 new cases of TB and about 31,000 deaths, ranking third in the world [2]. According to the phylogenetic analysis based on whole-genome sequencing (WGS), MTBC was divided into seven major TB lineages [3][4][5][6]: Lineage 1 (Indo-Oceanic), Lineage 2 (East-Asian), Lineage 3 [East-African-Indian (EAI)], Lineage 4 (Euro-American), Lineage 5 (West-Africa 1), Lineage 6 (West-Africa 2) and Lineage 7 (Ethiopian or Aethiops vetus lineage). Stable associations between MTBC strains and their human host populations led to a distinct gradient of In China, more than 80% of MTBC strains belonged to lineage 2 [11], and lineage 4 accounted for about 15% [12]. The distribution of lineage 4 was uneven, which was signi cantly higher in some provinces of South China (Supplementary Fig. 1) than in other regions [12]. In particular, in Sichuan-Chongqing region ( Supplementary Fig. 2), including provinces of Sichuan and Chongqing of South China, an area with underdeveloped tra c and reduced population mobility in ancient times, lineage 2 accounted for about only 60%-70% of all strains in MTBC, which was lower than the national level [13][14][15] while lineage 4 accounted for about 25%-35%, signi cantly higher than that of the whole country [12,15,16]. Because the expansion and evolution of MTBC are mainly caused by human doings, such as migration, trade, wars, etc. [12,[17][18][19], historical human activities might be an important cause of the distinctive distribution of lineage 4 in Sichuan-Chongqing region even the whole South China. Unfortunately, China's historical documents did not contain a description of any TB pandemic in South China [20,21], making it unclear which historical events may have caused this singular distribution pattern observed.

Liu et al. suggested that an external incoming event of lineage 4 occurred in South China during A.D.
1150-1268 [12]. Further, Li et al. reported that the most recent common ancestor of the largest strain complex of lineage 4 strains (within the same evolutionary branch of the Neighbor-Joining tree) in Sichuan-Chongqing region appeared during A.D. 1069-1498 [16]. The most important human activities close to this period and possibly bringing lineage 4 from Europe to South China and making it spread in a particular direction, were the Maritime Silk Road and "Huguang Filling Sichuan" population migration. As regards to the Maritime Silk Road, which represents the rst o cial international sea-trading route in Chinese history from the 1st century to the middle of 19th century [22], it constituted a key passage for foreign exchanges, trade and cultural communication between ancient China and Southeast Asia, the Indian subcontinent, Arabian peninsula, Somalia, Egypt and Europe [23]. Further, as an important part of ancient world maritime trade, the Maritime Silk Road has been proved that infectious diseases could spread across continents through it [24,25]. "Huguang Filling Sichuan" population migration refers to the multiple immigration episodes in Chinese history from the 14th century to the end of the 18th century, in which immigrants from Huguang region ( Supplementary Fig. 2) and other provinces of South China were forced by the government, or settled spontaneously, in Sichuan-Chongqing region [26,27]; at least millions of migrants, including those from southeast coastal region ( Supplementary Fig. 2), had made a purposeful move in South China during this event [26].
We hypothesized that the combined in uence of the Maritime Silk Road and "Huguang Filling Sichuan" population migration might be important factors contributing to the singular lineage 4 distribution pattern of MTBC in South China. Since the distribution of surnames can supply quantitative information on the structure of human populations and migration rates [28,29], we decided to explore available MTBC genotyping data from 32 provinces in China (n = 11,171) and 80 countries (n = 14,404), in conjunction with family names potentially representing migratory movements. The distribution of four representative surnames was used to characterize the migration. In addition, the correlation between the geographical distribution of the main ports associated with the Maritime Silk Road and the distribution of the areas with high proportion of lineage 4 was analyzed likewise.   Table 4).

Analysis of the Geographical Distribution of MTBC
The following work was mainly completed by using ArcMap (version 10.1). Firstly, the typing results of 11171 MTBC clinical isolates (Supplementary Table 1 Table 2), IDW (Inverse Distance Weighted) interpolation prediction was used to predict the distribution of lineage 4's three sub-lineages (L4.2, L4.4 and L4.5) in Asia, Europe, Africa and Oceania. In order to improve the accuracy of this prediction, the typing results from most provinces was used to replace the overall data of China.
In addition, based on the prediction of L4.2, L4.4 and L4.5's distribution in Asia, Europe, Africa and Oceania, we chose the coordinates of regions with high proportion of L4.2 and L4.4. And we obtained the coordinates of the main ports along the Maritime Silk Road. The information of main ports' location was from the Old World Trade Routes Project (www.ciolek.com/owtrad.html), and all corresponding coordinates were from GCS: WGS 1984 (Global Coordinate System: World Geodetic System 1984). Origin was the point at coordinate (0,0) in GCS: WGS 1984. The relative distance from the coordinates of main ports and high proportion regions to the origin were calculated respectively (Supplementary Table 5, 6). Minitab (version 17.1.0) was used to analyze the correlation between the relative distance from the main ports to the origin and the relative distance from the high proportion regions to the origin. The former was regarded as independent variable and the latter dependent one to draw the scatter plot (the two sets of relative distance data were mapped one by one in order from large to small).

Distribution of four Surname Populations in China
The distribution of four surname populations (Zeng, Tang, Deng and Zhong) in China showed obvious regional characteristics (Figure 2c-f). Among them, the proportions of Zeng and Deng in Sichuan-Chongqing region, Huguang region and southeast coastal region were signi cantly greater than those in other regions of China (p < 0.01). The proportion of Tang population in Sichuan-Chongqing region and Huguang region was signi cantly higher than those in other regions of China (p < 0.01), while the proportion of Zhong population in Sichuan-Chongqing region and southeast coastal region was signi cantly larger than those in other regions of China (p < 0.01). These four surname populations were rarely distributed in North China. Interpolation prediction showed that the distribution of the population of these four surnames extended directionally from southeast coastal region and Huguang region to Sichuan-Chongqing region (Figure 2c Moreover, there was a signi cant correlation between the relative distance from the main ports along the Maritime Silk Road to the origin (coordinate is (0,0)) and that from the regions with a high proportion of L4.2 and L4.4 to the origin (r = 0.980, P < 0.01). The scatter plot constructed by the two sets of relative distance showed signi cant aggregation (Moran's I index = 0.772861, z score = 7.479802, P < 0.01), and the slope of regression line was close to 1.00 ( Figure 4). The results suggested that there was indeed a geographical correlation between main ports along the Maritime Silk Road and the regions with a high proportion of L4.2 and L4.4.

Discussion
The results obtained revealed that the proportion of MTBC lineage 4 population in some provinces of South China was signi cantly higher (> 25%) than in other provinces, while it was much lower in North China (9.0%). Although seasonal changes in TB incidence rate tentatively associated with temperature and humidity variations have been observed within a given area [31,32], this may not be considered a determining factor for the particular distribution pattern of lineage 4 observed in our study. Indeed, lineage 4 of MTBC is widely distributed in the world, and high burden countries span different latitudes and climatic conditions [2,33]. Additionally, despite modern era's increased human activities, transportation and expanding globalization, the geographical distribution of MTBC has not yet been effusively affected; i.e., its population structure in different regions of the world has remained signi cantly different [6,33], probably arising from longer duration of historical human activities that shaped its worldwide geographical distribution landscape [34,35]. This could explain why lineage 4 comprises both globally distributed and geographically restricted sub-lineages [33]; an observation that led us presume that modern human activities could not be the focal elements affecting the observed distribution gradient of lineage 4 in China. We subsequently concentrated on how singular historical events may have contributed to the particular distribution of lineage 4 in China.
Previous studies suggested that the early transmission of lineage 4 was related to the "Northern Route" migration of East Asians about 15,000-18,000 years ago [36], and lineage 4 was introduced into China from Central Asia or Siberia, spreading from north to south [37]. However, the current proportion of lineage 4 in China being higher in the south than in the north, we looked into other factors that might have signi cantly in uenced the lineage 4 population structure in China.
Bearing in mind higher genetic diversity of lineage 4 but not genetic differentiation among the circulating strains in Sichuan-Chongqing region [16], and despite the likelihood of successive events of lineage 4 strains arriving to this region historically, it seems probable that only limited incoming events have led to the pool of lineage 4 strains epidemic today. Besides, Liu et al. [12] found that during the period of A.D.  [12]. Therefore, one may presume that lineage 4 did originate in this period in South China, an event that constitutes the main factor affecting its current distribution in China today.

The potential spread of lineage 4 to South China through the Maritime Silk Road
It seemed crucial to study the transmission path of the lineage 4 spread to China during A.D. 1150-1268. Our results showed that the geographic distribution of L4.5 was continuous in Asia and Europe, which is consistent with the conclusion that lineage 4 might have its origin in South China [12,33], and its spread might be related to the expansion of the Mongol Empire [38]. On the contrary, distribution of regions with high proportions of L4.2 and L4.4 in Asia, Africa, Europe and Oceania was intermittent, suggesting that their geographical distribution primarily did not relate to the expansion of Mongol Empire.
Besides, the possibility of L4.2 and L4.4's spread by land route could be basically ruled out, because if they were spread by land route, the countries along the route would have higher proportion of them, and the lineage 4 isolates of these countries would be ahead of the lineage 4 isolates of China in evolutionary status. However, the proportion of L4.2 and L4.4 was low in Central Asia, Siberia and other regions where the land route connecting Europe and China had to pass. Additionally, L4.2 and L4.4 isolates from some countries in the Middle East and West Asia were nested in the samples from China on the evolutionary tree [12], signifying that L4.2 and L4.4 isolates in these areas were not earlier than Chinese samples in evolutionary status. Therefore, it seems likely that the lineage 4 was introduced in China earlier (around A.D. 1150-1268) from Europe through the sea route, and that L4.2 and L4.4 spread and diversi ed in China before being exported to the above areas in the Middle East and West Asia.
Around 13-14th century, the main maritime migration route was the Maritime Silk Road, and some infectious diseases did probably spread through it in ancient times, e.g., Yersinia pestis, which caused one of the most terrible plague episodes in European history, most likely arrived in Europe from China through the Maritime Silk Road [24,25]. It is therefore feasible that MTBC lineage 4 reached the southeast coastal region of China from Europe through the Maritime Silk Road in a similar manner. In addition, 31 main ports associated with the Maritime Silk Road were geographically related to regions with a high proportion of L4.  [39]), when the foreign trade was highly developed. The Song dynasty mainly relied on ports such as Guangzhou and Quanzhou in southeast coastal region to conduct trade activities and population movements with Europe through the Maritime Silk Road [22,[40][41][42], and numerous arriving immigrants through this route settled in southeast coastal region of China [43,44], thus providing suitable conditions for arrival and local spread of lineage 4.
The potential spread of lineage 4 in South China caused by "Huguang Filling Sichuan" population migration Sichuan-Chongqing region is mainly located in Sichuan Basin, a lowland region in the west of South China surrounded by upland regions and mountains on all sides, with underdeveloped tra c and reduced population mobility in ancient times till A. D. 1950s. A di cult access certainly made spontaneous spread of MTBC to Sichuan-Chongqing region from outside territories rather di cult. Considering that the external in ow of lineage 4 occurred in the southeast coastal region, it is worth mentioning that in some provinces nearby that are more prone to easier population mobility, the proportion of lineage 4 was lower than that in Sichuan-Chongqing region. Most likely, a large number of lineage 4 carriers migrated from coastal areas to inland in a speci c direction, and eventually reached Sichuan-Chongqing region. Furthermore, the most recent common ancestor of the largest strain complex of lineage 4 collected by Li et al. in Sichuan-Chongqing region appeared during A.D. 1069-1498 [16], a time point close to the lineage 4 external incoming event discovered by Liu et al. [11]. Therefore, the arrival of the ancestor entering Sichuan-Chongqing region, was most probably triggered by exceptional population in uxes. Since the largest population migration related to Sichuan-Chongqing region is "Huguang Filling Sichuan" population migration [27], and the distribution of surnames can supply quantitative information on the structure of human populations and migration rates [28,29], we analyzed the distribution of four surname populations (Zeng, Tang, Deng and Zhong) in China (it was clearly recorded in the ancient literature that these four surnames populations migrated to Sichuan-Chongqing region in "Huguang Filling Sichuan" population migration [26]). The result showed that these populations were very concentrated, and the key distribution areas of the four surname populations were on the "Huguang Filling Sichuan" immigration route (Fig. 2c-f). Interestingly, the four surname populations were mainly distributed in Sichuan-Chongqing region, Huguang region and southeast coastal region, which was basically consistent with the areas that had a high proportion of lineage 4 in South China. It supports a large-scale population migration in these areas, and suggests that the entry of the lineage 4 (that spread to South China in A.D. 1150-1268) into Sichuan-Chongqing region (in A.D. 1069-1498) might be related to "Huguang Filling Sichuan" population migration (Fig. 4), that covered a time span of 400 years [26]. Lineage 4 was then inexorably able to ow and spread ensuing its current distribution and high proportion in South China.

Conclusion
This study suggested that human activity on Maritime Silk Road and "Huguang Filling Sichuan" population migration might have contributed to the historic spread of the lineage 4 epidemic in today's South China. Therefore, our work may provide new evidence for the long-distance transmission of infectious diseases between the East and the West in ancient times, and show that cross-border and regional population migration in history may have a profound impact on the current distribution of TB, which may be also true to other infectious diseases around the world. Such studies may help us have a better understanding of the current TB landscape in China and facilitate the follow-up emerging MTBC clones. Further, this study may have a guiding signi cance for the global epidemiological investigations and traceability, suggesting that a global perspective in the epidemiological study of infectious diseases such as TB and COVID-19, etc. is necessary.

Availability of data and materials
Genotyping data of MTBC isolates, the proportion of targeted surname populations related to "Huguang Filling Sichuan", and the information of main ports' location around the Maritime Silk Road were listed in the supplementary les.

Competing interests
The authors declare that they have no competing interests.   do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.