The genus Mycobacterium is known to cause the disease tuberculosis, which infects ~ 10 million people world-wide annually, according to Global Tuberculosis Report published by World Health Organization . Within this genus, Mycobacterium tuberculosis complex (MTBC) is known to cause tuberculosis in humans as well as animals . The members of the pathogenic MTBC family are known to be host-specific. Mycobacterium tuberculosis var. bovis (M.bovis) infects cattle and many other species, whereas Mycobacterium tuberculosis H37Rv (M.tuberculosis) and its related substrains are mainly known to infect humans [3, 4]. M.bovis is also zoonotic and can spread from animals to humans, is an identified public health problem globally and is also known to have intrinsic drug resistance towards several drug molecules [5–8]. M.bovis infection in humans is identical with that of M.tuberculosis infection, differing only in being non-transmissible among immunocompetent hosts [7, 9]. Infection due to M.bovis, causing Bovine Tuberculosis (BTB) is recognized as a One Health problem in certain parts of the globe where cattle-oriented domestication is practised. Close contacts with livestock and consumption of unpasteurized milk, raw or uncooked meat is the key source of zoonosis . According to WHO reports, although BTB cases make up only a small portion of human tuberculosis disease burden, efforts to curb global TB by 2030 seems to be hindered by incidences of zoonosis . Though TB remains to be an escalating problem worldwide, it is curable (non-treatable in cases of total drug resistance) and preventable, only if all efforts to cure it are intensified to reduce mortality and morbidity .
Continuous efforts of controlling global TB is delayed mainly due to less sensitive diagnostic tests, lack of effective vaccines and drugs, along with a rise in multi-drug resistant (MDR), extensively-drug resistant (XDR) and total-drug resistant (TDR) strains of TB .
The standard method of diagnosing BTB is the regular tuberculin skin test, which cannot distinguish between M.tuberculosis and M.bovis infections, moreover its results can be affected by cross-reactivity to BCG and other environmental mycobacteria . Conventional diagnosis using culture tests takes time for confirmation, hence diagnosis gets considerably delayed [13, 14]. Several non-sequence based molecular typing techniques [15–17] have been used independently or in combination for genotyping. These methods have variable specificity, and are suitable to be used only in clinics with well-equipped microbiological laboratories . Hence, a robust methodology which would help identify M.bovis isolates likely to infect hosts other than cattle would be useful in controlling BTB infection.
M.bovis AF2122/97 genome is 99.95% similar to M.tuberculosis H37Rv [18, 19] differing predominantly due to Insertion-deletions(indels) and Single Nucleotide Polymorphisms (SNP)s [2, 19], which in turn attribute towards their host-specificity . MTBC members are also known for their low mutation rates and limited genomic diversity , which makes studying their variation profile important for identification as biomarkers. These polymorphisms obtained from analyzing whole genome sequence (WGS) data are capable enough to differentiate amongst populations and predict their host-specificity . SNP profiles of M.bovis isolates capable of causing zoonosis may give us clue towards their changing host-associations . Repositories of SNP data for TB community has been generated and is on the rise [24–29], but, all predictions till date have been done using individual samples, which have their own limitations and are known to include false-positives, due to low coverage, small read lengths and sequencing errors . An effort to enlist the variation profile present within a cohort is still lacking as it becomes computationally challenging to predict SNP/Indel present across samples within a cohort during multi-sample variant prediction .
In the light of these facts, for effective control and treatment, a sequence-based improved and targeted rapid diagnostics for BTB is desirable to provide accurate identification. With the advent of massively-parallel next generation sequencing (NGS) techniques, bacterial populations can be sequenced to study population dynamics using WGS. Although heterogeneity analysis has been a regular phenomenon with respect to (w.r.t.) viral genomics, its application in bacterial genomes seems to be tricky. Heterogeneity refers to the genetic differences present within certain isolates of a genetically similar homogeneous population. Heterogeneity w.r.t variation profile in prokaryotic populations forms the basis of survival in stressed environment like drug resistance, or change in metabolic requirements, etc. and this provides seed to microevolution [32, 33]. Microevolution can be defined as gradual acquisition of mutations within a population to give rise to variations leading to speciation . The mutations arising in a prokaryotic heterogeneous or a homogeneous population needs to be studied carefully keeping in mind the gene information, annotation and their functionality. In order to identify polymorphisms responsible for causing BTB, a cross-infecting M.bovis population, i.e., a heterogeneous population needs to be identified and compared against a M.bovis cohort capable of infecting only cattle and has low divergence ratio amongst isolates, i.e. a homogeneous population. Clustering approaches capable of distinguishing such features have been implemented in this study to identify individual populations from the global M.bovis isolates. The initial approach uses Principal Component Analysis (PCA), which help us identify individuals with similar variation profile . The second approach for clustering of isolates was performed using distance-based UPGMA and maximum-likelihood (ML) based methods for the SNP data . Based on PCA clustering and distance-based clustering, a homogenous population and a heterogenous population was identified to study the variant distribution using different variant calling approaches and their significance on each population type. A study of the variant distribution of the homogeneous and heterogeneous populations of M.bovis gives us an insight into the SNPs under selection and their distribution for each population type.
Approaches like Joint Variant Calling (JVC), (concept used for the first time on prokaryotes in this study) which predict variants present in a cohort, promises to overcome the shortcomings proposed by single sample variant calling (SVC) methods, as variants are analyzed simultaneously across all samples in a population [31, 37, 38]. JVC can predict variants for low coverage data in cohorts with high sensitivity. This approach works well for both homogeneous as well as heterogeneous population, wherein, population-specific biomarkers can be identified for each cohort. JVC in diverging population can also help identify SNPs which are under selection pressure and may give rise to phenotypic variations within the population. Hence, for non-model organisms like Mycobacterium, polymorphisms can be detected with more confidence within populations using different variant calling approaches like, Bayesian or heuristic . Variant callers capable of handling cohort data, like FreeBayes and GATK use Bayesian approach to predict variants, whereas, VarScan2 and BCFtools uses pileup results along with heuristic approaches for variant detection [39–44]. JVC was performed on a heterogenous as well as a homogenous population using a combination of various tools with different methodologies, also capable of handling prokaryotic population data for best results. Variant annotation along with functional distribution analysis of the same was done to identify the high confident variants and their contribution towards gene functionality. The TB community has a catalogue of studies related to SNPs and Indels in MTBC genomes , but efforts to enlist and map all to their respective chromosomal position along with their Reference SNP cluster id (rsid) is lacking. Hence, rsids were assigned to the variants mapping them to their specific chromosomal position. These polymorphisms could be used by other TB researchers for further addition and future reference of SNPs based on their unique rsids. The current study is the first report of a comparative analysis of JVC approach versus single isolate variant detection in prokaryotes using homogenous and heterogenous cohort data, namely, M.bovis United Kingdom, henceforth abbreviated as UK [45, 46], and New Zealand (NZ) isolates  respectively. JVC approach promises to improve consistency with fewer artefacts, and hence, more accurate variants were detected for homogenous as well as heterogeneous distribution of population [37, 48], that have the potential to be used as biomarkers for diagnostics and treatment purposes apart from aiding in improvising our understanding of the pathogen in each population. The SNPs identified across a heterogeneous population as compared to the homogeneous population also throws light on the differential metabolic capabilities of the isolates which may explain certain aspects of their zoonosis.
We also aim to enlist/catalogue the distinct polymorphisms present between M.tuberculosis and M.bovis by performing JVC on the global M.bovis population to detect SNPs which occur across all samples to identify a set of "core SNP" of M.bovis, which may help in understanding host-tropism in bovine hosts apart from adding onto existing list of known polymorphisms. These core-SNPs in addition to Regions of Difference (RD), may be used for lineage identification  in M.bovis, in turn aiding in identification of specific biomarkers.