Analyzing whole-genome sequencing data of MTB isolates enables both comprehensive antibiotic resistance, summarizing, outbreak surveillance, and also the identification of recent transmission chains. A total of five WGS results were analyzed, however, the two samples (ba1, ba2) were not suitable to be analyzed broadly through MTBSeq pipeline. Therefore, three samples (ba4, ba5, ba9) have been investigated comprehensively (NCBI BioProject Accession No: PRJNA629298, PRJNA629388). Large number of variants have been detected in majority of important genes including antituberculosis drug targets (Fig 1). Majority of these variant were single nucleotide polymorphism including synonymous and non-synonymous. However, insertion and deletion were also detected in significant numbers. About one over third part of whole genome (about 4000 genes) harbored mutations.
Mutations in drug targets
Mutations in pncA, rpsA and other first-line drug targets in coding as well as in promotor region have been depicted in table 1, 2, and 3.
The PZA-R samples harbored numerous mutations in the drug target genes. Sample Ba-1 has four promotor region mutation with one novel (T-4C) which has not been reported earlier. NGS strain typing could offer Mtb outbreak and surveillance. The genotype information obtained may help for better management of local TB infections.
Table 1. Mutations detected in the pncA, rpsA genes of PZA resistance isolates.
Sample
|
PncA Mutation
|
rpsA Mutation
|
Location
|
Lineage
|
literature
|
|
T-11C
A-7C
A-12T
T-4C
|
Nil
|
promoter
|
4
|
Reported
Reported
Reported
Novel
|
Ba-2
|
|
Indel (s) found
|
|
4
|
|
Ba-4
|
L120R
|
Nil
|
Coding
|
3
|
Reported
|
Ba-5
|
G97D
392-Indel: ACC/ACCCC
|
Nil
|
Coding
|
4
|
Reported
|
Ba-9
|
V130G
|
Nil
|
coding
|
4
|
Reported
|
Mutations in the other antituberculosis drug targets have been detected (Table 2). The rpoB gene encoding the β-subunit of RNA polymerase, has been associated with rifampicin resistance caused by mutations in the 81-base pair region. In the current study four non-synonymous mutations has been detected in the β-subunit of RNA polymerase. SNPs S450L and D435Y, associated with RIF resistance, were outside RRDR of rpoB gene and rare in the previous studies. Among the isoniazid resistance, katG, S315T was a common mutation in all the three samples while inhA promoter region mutation were also detected. A total of 1996 variants have been detected in the resistance samples (Table 3), among which 188, 336, and 1472 were insertion, deletion, and SNPs respectively.
Table 2 Mutations detected in first-line drug targets
Sample
|
Phylo
|
Drug Resistance
|
Gene
|
Mutation
|
ba4
|
Lineage 3
|
Rifampicin
|
SNP764817 TG=rpoC
|
V483G
|
Isoniazid
|
SNP1673425 CT
|
inhA promoter
|
Isoniazid
|
SNP2155168 CG=katG
|
S315T
|
Ethambutol
|
SNP4247429AC=embB
|
M306L
|
Rifampicin
|
(RRDR) 761155CT=rpoB
|
S450L
|
Fluoroquinolones
|
(QRDR) 7582AG=gyrA
|
D94G
|
ba5
|
Lineage 4
|
Rifampicin
|
(RRDR) 761155CT=rpoB
|
S450L
|
Isoniazid
|
SNP2155168CG=katG
|
S315T
|
Ethambutol
|
SNP4247431GA=embB
|
M306I
|
Fluoroquinolones
|
(QRDR) 7570CT=gyrA
|
A90V
|
ba9
|
Lineage 3
|
Rifampicin
|
(RRDR) [761109GT=rpoB
|
D435Y
|
Streptomycin
|
SNP1472359AT=rrs
|
S172C
|
Isoniazid mutation -15
|
SNP1673425CT
|
inhA promoter
|
Isoniazid
|
SNP2155168CG=katG
|
S315T
|
Ethambutol
|
SNP4247431GC=embB
|
M306I
|
Table 3. Bases comparison altered in PZA-R isolate and types of polymorphism
Ref Sequence (H37Rv)
|
PZA-R sequences
|
SNP Type
|
Base
|
Frequency
|
Base
|
Frequency
|
Type
|
Freq
|
A
|
411
|
GAP
|
336
|
Insertion
|
188
|
T
|
413
|
T
|
353
|
Deletion
|
336
|
G
|
555
|
A
|
359
|
SNPs
|
1472
|
C
|
617
|
C
|
471
|
Total
|
1996
|
|
|
G
|
477
|
|
|
|
|
Total mutations
|
1996
|
|
|
Sample ba9 and ba5 is unique among the other. The length and branch are also unique in pattern in the phylogenetic tree (colored red). Phylogenetically the sequenced MTB strains showed a unique characteristic, where the spoligotypes pattern is also novel in this study as shown in figures 2, 3, 4. The resultant pattern may have more or less severe drug resistance. However, these strains should be investigated through phenotypical DST to find it minimum inhibitory concentration against first and second-line drugs.
The subsystem difference has been shown through arrow pointing. Interestingly in majority of subsystems, the MTB isolates circulating in Khyber Pakhtunkhwa geography shows a little high number of protein/genes as compared to reference (H37Rv). The metabolic model in the figure 4 has clearly shown the difference between PZA-R and H37Rv suggesting the distinct metabolic characteristics of isolates circulating in this geography.
Circular diagrams have been generated through CGViewer server is shown (Figure 5) in comparison with reference genome. The genomic size of the MTB isolates circulating in this geography is more than the reference H37Rv. The CGView generates graphical maps of circular genomes. Sequences can be supplied in raw, FASTA, GenBank or EMBL format. The server uses BLAST to compare the primary sequence to compare genomes or sequence sets. The BLAST results are converted to a graphical map showing the entire sequence. The CGView Server can aid in the identification of conserved genome segments and differences in gene copy number [49].
Diversity of variation in ESX
A large number of synonymous and non-synonymous mutations have been detected in ESX secretory system proteins (ESX-SSP) (Table 4). However, the effect of these mutations on virulency is needed to be investigated. Mutation colored yellow (Table 4) have been reported earlier [50] present in Beijing strains, where they have been characterized. The presence of such diverse type of mutations in drug resistance strains might be important of geographic specific diversity in ESX secretory system proteins. Six regions have been identified where 4 have been shown, represent T7SS and 2 of prophage regions (Fig 5). The component genes have also been shown in the figure, shows a high variability in mutations.
Table 4: Diversity of variation in ESX protein of MTB isolates, from Khyber Pakhtunkhwa
Sample
|
Position
|
Ref
|
Alle
|
Substitution
|
Freq
|
Rv
|
Gene*
|
Gene Product
|
ba4, ba9
|
4343784
|
G
|
A
|
Lys157Lys (aag/aaA)
|
2
|
Rv3868
|
eccA1
|
ESX-1 SSP EccA1
|
ba4
|
4366195
|
T
|
C
|
Glu215Gly (gag/gGg)
|
1
|
Rv3884c
|
eccA2
|
ESX-2 SSP EccA
|
ba4, ba5, ba9
|
4366272
|
G
|
C
|
Ala189Ala (gcc/gcG)
|
3
|
Rv3884c
|
eccA2
|
ESX-2 SSP EccA
|
ba4, ba9
|
342146
|
A
|
C
|
Glu6Ala (gaa/gCa)
|
2
|
Rv0282
|
eccA3
|
ESX-3 SSP EccA
|
ba4, ba9
|
342873
|
C
|
T
|
Val248Val (gtc/gtT)
|
2
|
Rv0282
|
eccA3
|
ESX-3 SSP EccA
|
ba4, ba9
|
4345548
|
G
|
A
|
Met170Ile (atg/atA)
|
2
|
Rv3869
|
eccB1
|
ESX-1 SSP EccB
|
ba4, ba9
|
3871246
|
T
|
C
|
Gly417Gly (gga/ggG)
|
2
|
Rv3450c
|
eccB4
|
ESX-4 SSP EccB4
|
ba4, ba5, ba9
|
4379680
|
C
|
G
|
Arg258Pro (cgc/cCc)
|
3
|
Rv3894c
|
eccC2
|
ESX-2 type VII SSP EccC
|
ba5
|
4377447
|
G
|
A
|
Asp1002Asp (gac/gaT)
|
1
|
Rv3894c
|
eccC2
|
ESX-2 type VII SSP EccC
|
ba4, ba5, ba9
|
346275
|
C
|
G
|
Pro214Arg (ccg/cGg)
|
3
|
Rv0284
|
eccC3
|
ESX-3 SSP EccC3
|
ba9
|
347388
|
A
|
T
|
His585Leu (cat/cTt)
|
1
|
Rv0284
|
eccC3
|
ESX-3 SSP EccC3
|
ba4, ba5, ba9
|
3864995
|
T
|
C
|
Ser1082Gly (agc/Ggc)
|
3
|
Rv3447c
|
eccC4
|
ESX-4 SSP EccC4
|
ba4, ba5, ba9
|
2022868
|
T
|
C
|
Ser1204Ser (agt/agC)
|
3
|
Rv1783
|
eccC5
|
ESX-5 type VII SSP EccC5
|
ba5
|
2019942
|
A
|
G
|
Gln229Arg (cag/cGg)
|
1
|
Rv1783
|
eccC5
|
ESX-5 type VII SSP EccC5
|
ba4
|
4355319
|
C
|
G
|
Leu105Val (ctg/Gtg)
|
1
|
Rv3877
|
eccD1
|
ESX-1 SSP EccD1
|
ba4, ba5, ba9
|
4356110
|
G
|
C
|
Leu368Leu (ctg/ctC)
|
3
|
Rv3877
|
eccD1
|
ESX-1 SSP EccD1
|
ba9
|
4371132
|
C
|
T
|
Val185Met (gtg/Atg)
|
1
|
Rv3887c
|
eccD2
|
ESX-2 SSP EccD
|
ba4, ba9
|
353197
|
C
|
T
|
Arg39Cys (cgc/Tgc)
|
2
|
Rv0290
|
eccD3
|
ESX-3 SSP EccD
|
ba4, ba9
|
3869355
|
T
|
C
|
Ile335Thr (atc/aCc)
|
2
|
Rv3448
|
eccD4
|
ESX-4 SSP EccD4
|
ba4, ba5, ba9
|
356528
|
A
|
G
|
Asn217Asp (aat/Gat)
|
3
|
Rv0292
|
eccE3
|
ESX-3 SSP EccE
|
ba4, ba5 ba9
|
4342187
|
A
|
G
|
Asp103Gly (gac/gGc)
|
3
|
Rv3866
|
espG1
|
ESX-1 EspG
|
ba4, ba5 ba10
|
4357597
|
C
|
G
|
Cys729Ser (tgc/tCc)
|
1
|
Rv3879c
|
espK
|
ESX-1 EspK
|
ba4, ba5 ba11
|
4359165
|
G
|
C
|
Thr206Thr (acc/acG)
|
2
|
Rv3879c
|
espK
|
ESX-1 EspK
|
Alle; Allele, SSP; secretory system proteins, *; part of T7SS
At least three of the five ESX systems (ESX-1, ESX-2, ESX-3, ESX-4 and ESX-5) are required for full virulence. The first ESX system (ESX-1) is used as attenuated vaccine strains Mycobacterium bovis bacille Calmette–Guérin (BCG) and Mycobacterium microti [54–59].
Type-VII secretory system (T7SS)
The essential genes of T7SS has been shown in figure 6 while mutations detected in these genes has given in the table 4. Mutations may affect host immune response. Studies should be carried to address the effect of these mutations on the immune response for better management of TB.
Diversity of PE/PPE
PE/PPE genes are highly polymorphic, present only in pathogenic mycobacterial species (Fig. 6). Hypervariable genes have been detected in recent sublineages (IV and V) of the PE and even within the sublineages. Sequence comparison of MTB isolates, circulating in this high burden country, shows a high variability in comparison with H37Rv. This sequence diversity has been specifically observed in PPE55 (Table 5) that may affect the immune response. The PPE55 gene is specific to the M. tuberculosis complex and is a strongly immunogenic protein whose expression correlates with active infection in humans with active clinical TB.
Mycobacterium tuberculosis has evolved from environmental mycobacteria, and some PE/PPE family proteins inherited from environmental mycobacteria that MTB utilizes for survival intracellularly now. Examining immune responses to PE/PPE suggests that PE/PPE epitopes evoking host immune responses, beneficial to the pathogen for survival. In the current study highly variable region in PE and PPE have been detected (Table 5).
Sequence variability in PE_PGRS family gene.
Among the PE‐PGRS subfamily, PE_PGRS49 and PE_PGRS 21 exhibited the highest variability (48, 37) followed by PE_PGRS28 and PE_PGRS42 (Fig. 7) A sequence fragment (genome position: 1212336 -1212362) in PE_PGRS 21 of local strains is seemed, to be deleted (Table 6).
In our local strains, the PPE55 when compared with reference strain H37Rv, harbored numerous mutations with 13 non-synonymous, throughout its entire coding region (Table 5). In a previous investigation, PPE55 protein was found highly immunogenic that may be useful for distinguishing amongst latent and subclinical TB [67]. This high variability may contribute a weak immune response, resulting a weak eradication of MTB isolates. However, the complete diversity of mutations among major geographical regions and their affect the immune response is still unclear.
Similarly, PE_PGRS21 and PE_PGRS28 also exhibited a high variability in the local strains (Table 6). A sequence region from 1212336 to 1212362 (CGGCGGTGCCGGCGGTGTCGGCG GTGC) was deleted in PE_PGRS21. While one synonymous and non-synonymous mutation was also detected at amino acid position 268 and 471 (Ala268Ala and Val471Ala). Three non-synonymous mutations Ala438Thr (gcc/Acc), Arg412Gly (cgc/Ggc), and Ser16Leu (tcg/tTg) have also been detected in PE_PGRS28.
Table: 5. Diversity of mutations detected in PPE 55 of PZA resistance isolates
Sample
|
Position (Genome)
|
Ref
|
Type
|
Allele
|
Amino Acid alteration
|
Rv
|
Gene
|
Freq*
|
ba9, ba4
|
3746409
|
A
|
SNP
|
G
|
Leu2259Pro (ctg/cCg)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3749653
|
G
|
SNP
|
A
|
Gln1178_ (caa/Taa)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750177
|
A
|
SNP
|
T
|
Phe1003Tyr (ttc/tAc)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750178
|
A
|
SNP
|
C
|
Phe1003Val (ttc/Gtc)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750185
|
C
|
SNP
|
G
|
Ser1000Ser (tcg/tcC)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750187
|
A
|
SNP
|
T
|
Ser1000Thr (tcg/Acg)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750188
|
C
|
SNP
|
G
|
Met999Ile (atg/atC)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750193
|
G
|
SNP
|
A
|
Leu998Phe (ctc/Ttc)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750205
|
C
|
SNP
|
T
|
Asp994Asn (gac/Aac)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750209
|
A
|
SNP
|
G
|
Asn992Asn (aat/aaC)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750210
|
T
|
SNP
|
G
|
Asn992Thr (aat/aCt)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750407
|
G
|
SNP
|
C
|
Gly926Gly (ggc/ggG)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750417
|
A
|
SNP
|
T
|
Phe923Tyr (ttc/tAc)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750421
|
T
|
SNP
|
C
|
Ser922Gly (agc/Ggc)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3750584
|
G
|
SNP
|
A
|
Asn867Asn (aac/aaT)
|
Rv3347c
|
PPE55
|
2
|
ba4
|
3751281
|
G
|
SNP
|
A
|
*Pro635Leu (ccg/cTg)
|
Rv3347c
|
PPE55
|
1
|
ba9
|
3751706
|
G
|
SNP
|
A
|
Ile493Ile (atc/atT)
|
Rv3347c
|
PPE55
|
1
|
ba9, ba4
|
3752003
|
G
|
SNP
|
C
|
Val394Val (gtC/gtG)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3752006
|
G
|
SNP
|
A
|
Asn393Asn (aac/aaT)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3752007
|
T
|
SNP
|
C
|
Asn393Ser (aac/agC)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3752008
|
T
|
SNP
|
C
|
Asn393Asp (aac/Gac)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3752012
|
C
|
SNP
|
G
|
Pro391Pro (ccg/ccC)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3752207
|
A
|
SNP
|
G
|
Ile326Ile (att/atC)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3753116
|
C
|
SNP
|
T
|
Pro23Pro (ccg/ccA)
|
Rv3347c
|
PPE55
|
2
|
ba9, ba4
|
3753164
|
T
|
SNP
|
G
|
Pro7Pro (cca/ccC)
|
Rv3347c
|
PPE55
|
2
|
*Freq; frequency, *Pro635Leu; novel
Table: 6. Mutation in PE_PGRS21 and PE_PGRS28 gene
Position
|
H37Rv (Ref)
|
MutationType
|
Local Allele
|
Substitution/Deletion
|
Rv
|
Gene
|
1212336
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212337
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212338
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212339
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212340
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212341
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212342
|
T
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212343
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212344
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212345
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212346
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212347
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212348
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212349
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212350
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212351
|
T
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212352
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212353
|
T
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212354
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212355
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212356
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212357
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212358
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212359
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212360
|
T
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212361
|
G
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212362
|
C
|
Del
|
GAP
|
------------------------
|
Rv1087
|
PE_PGRS21
|
1212363
|
C
|
SNP
|
T
|
Ala268Ala (gcc/gcT)
|
Rv1087
|
PE_PGRS21
|
1212971
|
T
|
SNP
|
C
|
Val471Ala (gtc/gCc)
|
Rv1087
|
PE_PGRS21
|
1636826
|
C
|
SNP
|
A
|
Gly468Gly (ggg/ggT)
|
Rv1452c
|
PE_PGRS28
|
1636918
|
C
|
SNP
|
T
|
Ala438Thr (gcc/Acc)
|
Rv1452c
|
PE_PGRS28
|
1636991
|
T
|
SNP
|
C
|
Gly413Gly (gga/ggG)
|
Rv1452c
|
PE_PGRS28
|
1636996
|
G
|
SNP
|
C
|
*Arg412Gly (cgc/Ggc)
|
Rv1452c
|
PE_PGRS28
|
1637006
|
G
|
SNP
|
A
|
Val408Val (gtc/gtT)
|
Rv1452c
|
PE_PGRS28
|
1637009
|
G
|
SNP
|
A
|
Gly407Gly (ggc/ggT)
|
Rv1452c
|
PE_PGRS28
|
1638182
|
C
|
SNP
|
T
|
Ser16Ser (tcg/tcA)
|
Rv1452c
|
PE_PGRS28
|
1638183
|
G
|
SNP
|
A
|
Ser16Leu (tcg/tTg)
|
Rv1452c
|
PE_PGRS28
|
1638188
|
C
|
SNP
|
T
|
Ala14Ala (gcg/gcA)
|
Rv1452c
|
PE_PGRS28
|
1638191
|
C
|
SNP
|
G
|
Ala13Ala (gcg/gcC)
|
Rv1452c
|
PE_PGRS28
|
1638194
|
G
|
SNP
|
C
|
Ala12Ala (gcc/gcG)
|
Rv1452c
|
PE_PGRS28
|
*Arg412Gly; novel
Variants in forkhead-associated domains of MTB
In the current study, a novel deletion of sequence, CCGCGTTGCTCGGGGTAA in Forkhead-associated (fhaA) (Rv0020c) domain at genome position 24699-24716 was detected. Further its interacting signaling partner PknG also harbored a novel mutation at location 613 (Arg613Gln (cgg/cAg)). Similarly, mutation (Ser385Arg) in pknA at genomic position 17608 has been observed, reflecting mutations in diverse genes functions from regulation the MTB cell growth to cell signaling under stress conditions.
Variants in pncB1
A novel deletion (TCAGGCCG) and two nonsynonymous mutations (Pro447Ser and Gly429Ala) have been detected in PncB1 at genomic position 1499213- 1499220. PncB1 (Rv1330c) and PncB2 (Rv0573c) are two putative nicotinic acid phosphoribosyl transferases. Inhibitors designed against nicotinamide adenine dinucleotide (NAD) synthetase, may induced bactericidal effect as abrupt starvation of NAD in actively growing and nonreplicating MTB.