Insight into the PZA resistance whole genome of Mycobacterium tuberculosis isolates from Khyber Pakhtunkhwa, Pakistan


 Background

Tuberculosis (TB) is a global public health issue, getting worse due to emergence of resistance. Pyrazinamide (PZA) is first-line antimicrobial drugs used against non-replicated Mycobacterium tuberculosis (MTB). Data is scarce about whole genome sequencing of PZA resistance (PZA-R) in Khyber Pakhtunkhwa (KP) province of high burden country, Pakistan. In the current study we aimed to find the most common mutations in PZA-R MTB isolates in association with other candidate genes in a whole genome sequence (WGS). Samples were collected from TB suspects and drug susceptibility testing (DST) was performed according according to the WHO standards. The resistant samples were subjected for whole genome sequencing (WGS). The sequence data was through MTBseq and Total Genotyping Solution for Mycobacterium tuberculosis (TGS-TB). Metabolic model was analyzed, using RAST server.
Results

Among the three whole genome sequences, (NCBI BioProject Accession: PRJNA629298, PRJNA629388) 1997, 1162, and 2053 mutations including indel, was detected. Diverse variability has been detected in the membrane proteins PE and PPE, modulating the host immune response. Nine mutations in coding and promotor region have been detected in pncA with one novel (T-4C) variants. Mutations in the other drug candidate genes, KatG, rpoB have also been detected.
Conclusion

The metabolic model shows a distinct property. Diversity of variants has been detected in majority of MTB essential genes, functions from cell growth to cell signaling. The current study provides useful information, associated with geographic specific strains for biomarkers development and better management of drug resistance isolates.


Conclusion
The metabolic model shows a distinct property. Diversity of variants has been detected in majority of MTB essential genes, functions from cell growth to cell signaling. The current study provides useful information, associated with geographic speci c strains for biomarkers development and better management of drug resistance isolates. Background Tuberculosis (TB) is a global public health issue, getting worse due to emergence of resistance against rst and second-line drugs. Under Directly observed treatment, short-course (DOTS), TB takes 6 to 8 months while the treatment of drug resistance takes a variable length of time upto 28 months depending upon drug resistance pro le of bacilli [1,2].
Although in our recent studies, the prevalence and mechanism of PZA resistance behind mutations in pncA and rpsA have been investigated, [19,[28][29][30][31] however, whole genome sequencing (WGS) is onestop method for epidemiological and molecular study of drug susceptibility testing. The potential clinical value and public health impact in the areas of DST for patient management and tracing of transmission chains for timely public health intervention have also been discussed [32][33][34]. The next-generation sequencing (NGS) is more affordable, fast, screening approach which allows for genetic variation in geographic speci c MTB isolates. The WGS assists in phylogeny of the global distribution and tracing of transmission chains [35][36][37][38].
Numerous studies have highlighted the importance of WGS in the elds to record mutation rate, resistance, novel drugs, and evolution of the Mycobacterium tuberculosis complex (MTBC) [32,34,[39][40][41]. The WGS lets for the examination of diverse mutations in the genome of circulating isolates in the geographic speci c location conferring variations in pathogenicity and resistance. Mutations level as well as their occurrence inside and outside coding regions depicts the overall pathogenic variation among the circulating isolates that could be recognized from entire sequencing of genomes. The nding of novel targets, and their involvement in drug resistance could be accessed through mutations from entire genomes. The genome sequences may help to unveil the genetic variations in coding regions which are not directly confer resistance (Compensatory mutations) but rather compensate for the tness cost of resistance [42]. In previous studies, we have investigated the PZA drug sensitivity testing (DST) followed by sequencing to nd mutations in pncA and rpsA genes associated with PZA-resistance [19,43] (Accession No. MH461111). In the current paper, we aimed to analyze the whole genome of PZA-R MTB to investigate the mutations in pncA, rpsA, panD, and other candidate genes along with the diversity of variations in the circulating isolates for better management of TB in this high burden area.

Results
Analyzing whole-genome sequencing data of MTB isolates enables both comprehensive antibiotic resistance, summarizing, outbreak surveillance, and also the identi cation of recent transmission chains. A total of ve WGS results were analyzed, however, the two samples (ba1, ba2) were not suitable to be analyzed broadly through MTBSeq pipeline. Therefore, three samples (ba4, ba5, ba9) have been investigated comprehensively (NCBI BioProject Accession No: PRJNA629298, PRJNA629388). Large number of variants have been detected in majority of important genes including antituberculosis drug targets (Fig 1). Majority of these variant were single nucleotide polymorphism including synonymous and non-synonymous. However, insertion and deletion were also detected in signi cant numbers. About one over third part of whole genome (about 4000 genes) harbored mutations.

Mutations in drug targets
Mutations in pncA, rpsA and other rst-line drug targets in coding as well as in promotor region have been depicted in table 1, 2, and 3.
The PZA-R samples harbored numerous mutations in the drug target genes. Sample Ba-1 has four promotor region mutation with one novel (T-4C) which has not been reported earlier. NGS strain typing could offer Mtb outbreak and surveillance. The genotype information obtained may help for better management of local TB infections. Mutations in the other antituberculosis drug targets have been detected ( Table 2). The rpoB gene encoding the β-subunit of RNA polymerase, has been associated with rifampicin resistance caused by mutations in the 81-base pair region. In the current study four non-synonymous mutations has been detected in the β-subunit of RNA polymerase. SNPs S450L and D435Y, associated with RIF resistance, were outside RRDR of rpoB gene and rare in the previous studies. Among the isoniazid resistance, katG, S315T was a common mutation in all the three samples while inhA promoter region mutation were also detected. A total of 1996 variants have been detected in the resistance samples (Table 3), among which 188, 336, and 1472 were insertion, deletion, and SNPs respectively. Sample ba9 and ba5 is unique among the other. The length and branch are also unique in pattern in the phylogenetic tree (colored red). Phylogenetically the sequenced MTB strains showed a unique characteristic, where the spoligotypes pattern is also novel in this study as shown in gures 2, 3, 4. The resultant pattern may have more or less severe drug resistance. However, these strains should be investigated through phenotypical DST to nd it minimum inhibitory concentration against rst and second-line drugs.
The subsystem difference has been shown through arrow pointing. Interestingly in majority of subsystems, the MTB isolates circulating in Khyber Pakhtunkhwa geography shows a little high number of protein/genes as compared to reference (H37Rv). The metabolic model in the gure 4 has clearly shown the difference between PZA-R and H37Rv suggesting the distinct metabolic characteristics of isolates circulating in this geography.
Circular diagrams have been generated through CGViewer server is shown ( Figure 5) in comparison with reference genome. The genomic size of the MTB isolates circulating in this geography is more than the reference H37Rv. The CGView generates graphical maps of circular genomes. Sequences can be supplied in raw, FASTA, GenBank or EMBL format. The server uses BLAST to compare the primary sequence to compare genomes or sequence sets. The BLAST results are converted to a graphical map showing the entire sequence. The CGView Server can aid in the identi cation of conserved genome segments and differences in gene copy number [49].

Diversity of variation in ESX
A large number of synonymous and non-synonymous mutations have been detected in ESX secretory system proteins (ESX-SSP) ( Table 4). However, the effect of these mutations on virulency is needed to be investigated. Mutation colored yellow (Table 4) have been reported earlier [50] present in Beijing strains, where they have been characterized. The presence of such diverse type of mutations in drug resistance strains might be important of geographic speci c diversity in ESX secretory system proteins. Six regions have been identi ed where 4 have been shown, represent T7SS and 2 of prophage regions (Fig 5). The component genes have also been shown in the gure, shows a high variability in mutations. Type-VII secretory system (T7SS) The essential genes of T7SS has been shown in gure 6 while mutations detected in these genes has given in the table 4. Mutations may affect host immune response. Studies should be carried to address the effect of these mutations on the immune response for better management of TB.
Diversity of PE/PPE PE/PPE genes are highly polymorphic, present only in pathogenic mycobacterial species (Fig. 6).
Hypervariable genes have been detected in recent sublineages (IV and V) of the PE and even within the sublineages. Sequence comparison of MTB isolates, circulating in this high burden country, shows a high variability in comparison with H37Rv. This sequence diversity has been speci cally observed in PPE55 ( Table 5) that may affect the immune response. The PPE55 gene is speci c to the M. tuberculosis complex and is a strongly immunogenic protein whose expression correlates with active infection in humans with active clinical TB.
Mycobacterium tuberculosis has evolved from environmental mycobacteria, and some PE/PPE family proteins inherited from environmental mycobacteria that MTB utilizes for survival intracellularly now. Examining immune responses to PE/PPE suggests that PE/PPE epitopes evoking host immune responses, bene cial to the pathogen for survival. In the current study highly variable region in PE and PPE have been detected (Table 5).
Sequence variability in PE_PGRS family gene.
In our local strains, the PPE55 when compared with reference strain H37Rv, harbored numerous mutations with 13 non-synonymous, throughout its entire coding region (Table 5). In a previous investigation, PPE55 protein was found highly immunogenic that may be useful for distinguishing amongst latent and subclinical TB [67]. This high variability may contribute a weak immune response, resulting a weak eradication of MTB isolates. However, the complete diversity of mutations among major geographical regions and their affect the immune response is still unclear.
Similarly, PE_PGRS21 and PE_PGRS28 also exhibited a high variability in the local strains (Table 6). A sequence region from 1212336 to 1212362 (CGGCGGTGCCGGCGGTGTCGGCG GTGC) was deleted in PE_PGRS21. While one synonymous and non-synonymous mutation was also detected at amino acid position 268 and 471 (Ala268Ala and Val471Ala). Three non-synonymous mutations Ala438Thr (gcc/Acc), Arg412Gly (cgc/Ggc), and Ser16Leu (tcg/tTg) have also been detected in PE_PGRS28.

Discussion
It is very practical to investigate the drug resistance level in novel genotypes to uncover the reason of high prevalence drug resistance. The frequencies of resistance in Beijing genotype have been reported, higher against rifampin, o oxacin and also multidrug-resistance were signi cantly higher than non-Beijing strains. In Fujian and Guangdong, a novel genotype named "China Southern genotype (CS)" was found [44,45]. Studies in high burden countries including Pakistan are needed to nd a correlation of drug resistance and novel spoligotypes circulating in distinct geographic regions. According to the previous reports [45], drug-resistance and Beijing genotype has signi cant associations which might be responsible for the spread and emergence of MDR-TB. These results have been con rmed in Vietnam, Germany, Cuba and America, indicating high incidence of the Beijing genotype that might be correlated with high transmission and virulence. In a more recent study in Khyber Pakhtunkhwa province of Pakistan [44], the majority of the MTB isolates were found of unknown pattern that were evolutionary linked to L3/CAS strain, where nine isolates (5.4%) of the unknown strains were tentatively named as L3/CAS-KP. Drug resistance in these geographically distinct strains should be investigated for better management of global TB control program 2030. Although the number of samples are very limited in a recent study [46], variation in key genes' suggests some more investigation on the pathogenesis and single-nucleotide polymorphisms to nd any correlation with high transmission and virulence.
In the current investigation, the diversity of variation in ESX protein may suggest a different type of virulency which need to be investigated through murine models. MTB uses early secretory antigenic target (ESAT6) protein family secretion (ESX) systems (T7SS), to export effector proteins that helps the pathogen to resist or evade the host immune response [51,52]. The ESX-1 plays an important function in virulence, linked to secreted effector proteins, EsxA, inducing phagosomal rupture in phagocytes of host.
Interestingly  [68][69][70]. Together with apparent abundance during in vivo infection, suggests that these members play essential roles in pathogenicity [71][72][73]. About 10% of MTB coding genome is comprised of PE and PPE families [74]. Members of the PE and PPE are known by the presence of a Pro-Glu (PE) and Pro-Pro-Glu (PPE) conserved motif at N-terminal [75]. PE and PPE have 110 and 180 amino acids conserved region at N-terminal. however, the C-terminal is highly variable.
The characteristic PE and PPE domains is very crucial for targeting the family members of PE/PPE to the cell wall, eliciting potent B-and T-cell responses. The PGRS domain of PE_PGRS30, a virulence factor, is responsible for targeting the protein to mycobacterial cell poles in M. tuberculosis and M. bovis [76,77]. In PPE17, the PPE domain is crucial for cell surface localization [78]. The PE/PPE family of proteins in uence macrophage signaling and pathogenesis of mycobacteria [79][80][81].
The PE/PPE protein family contributes pathogenicity of MTB that may be targeted for novel drug or immune-modulatory approaches. However, a better understanding of PE/PPE protein function is required before it could be realized as potential. This high diversity and variation in genomic sequence in the PE/PPE proteins in locally circulation strains may affect the immunity against MTB. This variation should be investigated in geographic speci c isolates among high burden countries for better management of TB.
The PE-PGRS subfamily is largest class of the PE family, with 67 members in MTB H37Rv consist of the PE domain followed by a C-terminal extension encoded by polymorphic GC-rich repetitive sequences (PGRS) motif. Currently PE-PGRS proteins contain about 1900 amino acids with glycine up to 50% [79].
The C terminals of subfamilies of PE_PGRS and PPE_MPTR have GC-rich stretches. These stretches are considered as hotspots for recombination events and mutations [82]. This high variability results in antigenic diversity that may help the pathogen to evade host-protective immune responses, [76]. The high variability may help the pathogen in survival to evade host-protective immune responses [76]. Among the PPE family, PE_PGRS47 suppresses autophagy and impairs antigen presentation (Saini et al. 2016). In a more recent study, depletion of PknA, resulted in increased drug susceptibility to key tuberculosis drug, rifampin and β-lactam antibiotics [83].
PknG is a Serine/Threonine protein kinase recognized as a key player in mycobacterial physiology and pathogenesis. Recent studies show that the enzyme glutamine synthetase and the protein FhaA are the two substrates are phosphorylated by PknG in vitro speci c residues in both [84]. In another study the fhaA deletion mutant were created and different classes of antibiotics were screened to check their sensitivity. MTB isolates that were fhaA deletion mutant, have been detected as sensitive to multiple antibiotics showing an overall permeability aw [85]. However, the effect of substitutions has not been evaluated so for, which need to be investigated for future drug development against this target.
A novel deletion (TCAGGCCG) and two nonsynonymous mutations (Pro447Ser and Gly429Ala) in PncB1 at genomic position 1499213-1499220. PncB1 (Rv1330c) and PncB2 (Rv0573c) are phosphoribosyl transferases, playing role in cofactor salvage [86]. Cell death is caused by disruption of cellular redox homeostasis and cofactor starvation and also as electron transport is weakened by limiting NAD. PncB2 plays a more important role in the adaptation to nonreplicating persistence and also known as member of the DosR regulon, important during hypoxic conditions [87,88]. However, the effect of these mutations is needed to be investigated before future drug designing for better management of treatment.
In conclusion the metabolic model shows a diversity of variants in essential genes, functions from cell growth to cell signaling. The physiology and pathogenesis determinants genes have novel type of mutations in this distinct geographic region. The high variability in antigenic diversity that may help the pathogen to evade host-protective immune responses. The diversity of mutations in PE_PGRS47 family may help the pathogen to suppresses autophagy and impairs antigen presentation. The current study provides useful information, associated with geographic speci c strains for biomarkers development and better management of drug resistance isolates.

Ethical approval
A local ethics committee ruled that no formal ethics approval was required in this particular case.

Sample collection
Random samples have been collected from TB suspects with basic information from their guardians.
All the samples were subjected to decontamination and digestion processing.

Sample processing
All received samples were digested and decontaminated using standard N-acetyl-L-cysteine sodium hydroxide (NALC-NaOH) method (GLI, 2014) in a biosafety level-III laboratory (BSL-III) at Provincial TB Reference Laboratory, Peshawar. Brie y one aliquot was inoculated on Lowenstein Jensen medium (LJ) and in a mycobacterium growth indicator tube (MGIT). Positive growth in the tubes was con rmed by Tbc ID device (Ref: 245159, Becton, Dickinson).
Drug susceptibility testing All con rmed mycobacterial isolates were processed for both phenotypic DST and molecular resistance assay. DST was performed using a BD BACTEC MGIT 960 SIRE kit (Ref: 245123, Becton, Dickinson) according to the policy WHO second line drug susceptibility testing [89]. The standard minimum inhibitory concentration (MIC) for o oxacin (OFX), levo oxacin (LEV), and moxi oxacin (MOX) was taken as 2ug/mL, 1ug/mL, and 1ug/mL respectively. One sample aliquot was processed for acid fast bacilli (AFB) microscopy using primostar-LED uorescent microscopy.

Whole genome sequencing
Samples resistance to PZA were subjected for extraction of DNA through CTAB method [40,90] followed by whole-genome sequencing (WGS), a more effective, accessible and reasonable method for deep insight into the genome of MTB isolates, involving single nucleotide variations (SNVs). The extracted DNA from PZA-resistance samples were sequenced and partial analysis has been performed through Illumina-X10 sequencer, based on the Solexa sequencing principle at center of Shanghai Jiao Tong University, China.

Ssequences analysis
To analyze the WGS data, an all-in-one web-based tool, Total Genotyping Solution for TB (TGS-TB) [91], using NGS for spoligotyping and the detection of phylogenies, IS6110 insertion sites, with core genomic SNVs, and 43 customized loci for variable number tandem repeat (VNTR). The tool is a user-friendly, performed all these analysis on a simple click interface. The methodology was implemented with a KvarQ script, predicting MTBC lineages and antimicrobial resistance.

Quality and mutation detection
The quality of the sequence was checked with FASTQC and after trimming the raw reads, the genome was mapped against the reference strain H37Rv using MTBSeq software package [92]. The TGS-TB and MTBseq detect variant positions with known associated with antibiotic resistance and performs also lineage classi cation. When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identi es groups of related isolates, provides a more accurate strain typing for epidemiological and clinical investigations [91,92]. In brief, the alignment was carried out using BWA-mem algorithm. The resulting output in Sam/Bam format was analyzed using Sam tools package. Statistical analysis, variant and positioning was done in gatk software package. For further annotation, samples in FASTA format have been subjected to RAST server [47], assigning functions to the genes, identifying the protein-encoding, rRNA and tRNA genes, predicts subsystems in the genome.

Whole genome sequencing comparison
Whole genome sequence results in comparison with H37Rv were uploaded to RAST server subsystem technology. The server achieves accuracy and completeness on the use of a subsystems that are manually curated as protein families largely derived from the subsystems. RAST server implements two classes of asserted gene functions: subsystem-based and non-subsystem-based assertions. The earlier is based on recognition of functional variants of subsystems, while the later are using integration of evidence from a number of other tools. The SEED servers inside RAST offer access to data [47,48] which have the ability to annotate prokaryotic genomes and creating metabolic reconstructions and detailed model of metabolism.

Mapping whole genome into circular diagrams
The whole genome sequence was mapped in the form of circular structure using CGView Server [49], generating graphical maps that show sequence features, base composition plots, and sequence similarity plots. The server can visualize the genome features of bacteria, plasmid, chloroplast or mitochondria.
Analysis of type-VII secretory system In mycobacteria, the ESX-1 type VII secretion system (T7SS) is vital for access to the host cytosol [65,93]. In order to study the variability in the T7SS, the member genes were analyzed in the whole genome of MTB through VRpro le server [94], a web server facilitating in fast screening of virulence and antibiotic resistance genes using a backend MobilomeDB database, built on sets gene cluster loci of bacterial type III/IV/VI/VII secretion systems.

Declaration
Ethical approval and consent to participate A local ethics committee ruled that no formal ethics approval is required in this particular case (PTP/PTRL-402/16). Prior to research study, an informed written consent was gained.

Image attribution
The graphical abstract in the current study is our own created image.

Consent for publication
Not applicable Availability of data and materials The datasets in the present study is available in public database under accession No. (NCBI BioProject Accession: PRJNA629298, PRJNA629388).

Competing Interest
All the authors have no con icts of interest  Total number of variants in three WGS of MTB.

Figure 2
A unique pattern of sample ba9. Total Genotyping Solution for TB (TGS-TB) has been used for spoligotyping and the detection of phylogenies, IS6110 insertion sites, with core genomic SNVs, and 43 customized loci for variable number tandem repeat (VNTR).

Figure 3
Spoligotyping pattern of sample ba5. Sample ba5 is unique in pattern in the phylogenetic tree (colored red).

Figure 4
Comparison of WG MTB reference genome and PZA-resistance isolate through RAST. The RAST server provides an environment for browsing the annotated genome and comparing it to the hundreds of genomes maintained within the SEED integration [47,48]. The genome viewer included in RAST supports comparison against existing genomes in detail. It also determines and displaying genomic context around speci c genes. The differences in subsystem of H37Rv and PZA resistance whole genome has been shown through an arrow pointing towards a speci c category and their number of proteins/genes included.