This study identified the molecular biological characteristics of Mtb in different TB epidemic areas of southern China. Only two major lineages (L2 and L4) were found in this study setting. The predominate Mtb strain is lineage 2.2 (Beijing family), and it was significantly higher in TB notification hot spots. Through the population gene structure analysis (AMOVA) and SNPs comparison between cold and hot spots and the multidimensional scaling modeling of each county, we found that the two spot areas had some differences in genetic structure, and the spatial internal consistency was relatively high. Specific SNPs sites between the cold and hot spots with high Fst estimation mapped to special proteins that may contribute to the pathogenicity differences in Mtb. Three genomic (SNPs ≤12) and geographic groups were detected and identified as Mtb recent transmission individuals.
Previous studies have suggested that Mtb among human originated in Africa and was divided into seven lineages by several thousand years of mutations [18-21]. The evolution of Mtb has been related to human migration and evolution. It spread from Africa to the rest of the world along with human migration and formed the current genotype distribution. Nowadays, the most prevalent Mtb strain in China is lineage 2 . Although in northern China, the proportion of Beijing strains is as high as 80%, which were mostly Modern Beijing strains), as the latitude decreases, this proportion decreases . Moreover, with the increase of population mobility, the polymorphism of Mtb genotype becomes more and more obvious. Therefore, as a southern province of China, Guangxi has more proportion of Ancestral Beijing strains and genetic diversity of Mtb strains than that in northern region . The origin of Protobeijing strain (L 2.1) is likely to be in Southern China as it has the highest percentage . In this study, the lineage 4 with three sub-lineages (L4.2, L4.4 and L4.5) also accounted for a large proportion. In contrast to other major human-adapted lineages, lineage 4 appears with significant frequency on all inhabited continents . Thus, it is the most widespread cause of TB in humans geographically . Among this lineage, L4.4 and L4.5 were mostly reported from China, although we usually called L4 as Euro-American . Stucki hypothesized that the global spread of L4 maybe caused by European migration and colonization . Yet, the reasons for this spatial distribution in China needs more evidence.
In this study, by comparing all the gene loci of the cold and hot spots strain, it was found that the two populations had mutation differences in some special regulatory proteins. The mutation of Rv1186c (PruC) has a certain significance. Mtb is an obligate aerobic bacterium that needs oxygen to grow. However, paradoxically, it shows a remarkable metabolic flexibility that allows itself to survive and metabolize in oxygen-deprived conditions . It has been shown that mycobacteria can grow on proline as the sole carbon and energy source under hypoxia, and it is regulated by a unique transcriptional regulator (PruC) . An animal study performed by Smith DA et al. found that mycobacteria with abnormal proline metabolism were nonpathogenic in immune-competent mice . However, Rv1186c was predicted to be non-essential gene in the papers of DeJesus et al.  and Lamichhane et al. , except the paper of Sassetti et al. . Apart from Rv1186c, the 3 genes (Rv0210, Rv1508c, Rv3900c) that came out of the SNP analysis are unknown function genes and all are predicted as non-essential genes. Thus, it is expected that there will be no significant association with the pathogenicity of Mtb. Nevertheless, further epidemiological and clinical exploration are needed.
As the dominant genotype, Beijing family strains have been shown to cluster more frequently . This suggests that Mtb recent transmission is more likely to occur in such strains [35, 36]. Some scholars claimed that the determination of recent transmission or MTB outbreak (transmission within two or three years) is that the cut-offs of WGS-Based genomic distance is less than or equal to 12 SNPs . However, only three recent transmission groups in this study were detected. In the previous research, the research samples with recent transmission cases are generally from communities of long-term surveillance or tuberculosis outbreaks field [38, 39]. Although the specimens in this study were from two spatial clusters of TB notification (hot and cold spots), it is likely to be true that there is no obvious outbreak occurring during the study period. The locations of included participants were scattered. Thus, this study showed that the median SNPs distance of strains in hot spots was significantly lower than that in cold spots. The comparison of SNPs population genetic structure was also proved the significant difference in the gene structure between the two areas, but the differences within the areas were relatively small. We did not find any cluster that have members crossing the hot and cold spots. Actually the minimal SNP distance between any genetically related isolates in both spots were at least 96. This suggests that the transmission pattern of the Mtb in hot spots may be different from those in cold spots. Local transmission in hot spot areas (over a period of more than three years) is more likely than in cold spot areas. Homologous transmission may occur over a longer period of time . They might have gotten the same type of Mycobacterium tuberculosis many years ago, and the strains might have mutated after a long time of latent infection, proliferation and then endogenous reactivation. Thus, the SNPs distance between the two strains would become larger. Our study also performed contact tracing among included cases . The results showed that the detection rate of TB in household contacts was very low. Only two domestic cases have been detected in hot spot areas. Therefore, we believe that the transmission patterns of TB patients in cold and hot spots were dominated by community transmission.
Either the spreading of the Mtb was local or there were some socioeconomic factors that hinder the transmission between the hot and cold spot areas. Our previous studies on the ecology of tuberculosis suggested that there is a negative correlation between average sunshine time and reported incidence of tuberculosis . Therefore, there may be some interaction between natural factors and strain pathogenicity. This requires further exploration.
Meanwhile, sub-lineage 2.1 (Protobeijing strain), a special subgroup related to drug resistance, is mainly concentrated in hot spots. Since it has been reported that the virulence and drug resistance of Beijing gene strain is greater than other strains [41, 34, 42], we can infer that the prevalence of this strain in hot spots is one of the important reasons for its higher TB epidemic than cold spots.
The limitation of this study is that the sputum isolates were collected mainly from passively detected cases in the public hospital. This may have contributed to the fact that some specimens of the TB cases treated in private practice were not included during the study period, which caused the bias to some extent. However, as TB case management strategies become more widely publicized under the national TB control programme, this impact should diminish [43, 44]. The estimated inclusion rate were over 90%.