The complete pan-genome is defined as the total number of non-redundant genes present in a given clade, amounting to a given clade’s entire genomic repertoire, and encodes for all possible ecological niches of the strains examined [37, 38]. A pan-genome typically contains core genes, accessory genes and strain-specific genes. In this study, a core genome represented a conserved set of genes that was present in all of the 81 genomes. The soft core represented genes present in 95% of the total genomes (i.e., ≥ 76 of the 81 genomes). Defining the soft-core genes aided in obtaining a robust estimate of the core genes. The core and soft-core clusters together represented a pool of highly conserved genes, which could provide insights into the evolutionary history of the P. ananatis strains used in this study. The cloud cluster included genes that were strain specific or unique in the pan-genome. Cloud genes, being strain-specific, are unique genes of which only a few genes (1,000 of 6,808) were shared, and only between two genomes. The remaining moderately conserved genes were identified as shell genes. Both the shell and cloud clusters represented the subset of flexible (accessory) genome that reflects life-style, adaptation and evolutionary history of the P. ananatis strains to the environments in which they were residing. Earlier pan-genome studies identified 3,875 core genes, 3,876 clusters, 3,750 CDS in seven, eight and 10 P. ananatis strains [5, 25, 39]. In this study with a larger set of genomes, we found that the core genome was stabilized with 3,153 gene clusters while the pan-genome expanded continuously as a result of the addition of gene clusters. The openness of a bacterial pan-genome reflects the diversity of the gene pool within the strains of the same bacterial species. The addition of new genomes to an existing pan-genome can significantly alter the pan-genome size of open pan-genomes , in contrast to a closed pan-genome.
Another significance of a pan-genome is that it can provide a greater resolution and reconstruct bacterial phylogeny in a more reliable way than single or multiple gene-based phylogeny. The pan-genome provides an overview of the entire gene set (100% of the genomes) of a given population, unlike a 16S rRNA phylogeny that represents only a tiny fraction of the genome (~ 0.07%), or multi-locus sequence analysis (MLSA) involving house-keeping genes (~ 0.2%). Pan-genome analysis, therefore, provides a greater resolution of phylogeny of the strains than other methods of comparing bacterial strains, as evident in this study. MLSA and rep-PCR assays showed limited genetic diversity, despite high phenotypic variation, among 50 strains of P. ananatis . In the same study, Stice et al.  demonstrated that PAVs from the pan-genome analysis of 10 strains of P. ananatis separated the pathogenic from non-pathogenic strains, which was not observed when the core genome was used. Similar to the study by Stice et al , the core and accessory genomes evaluated in this study, when used individually, did not distinguish between pathogenic and non-pathogenic strains; however, the pan-genome PAVs distinguished the pathogenic vs. non-pathogenic strains, except for one non-pathogenic strain that clustered with pathogenic strains, PNA_98_11. In this pan-genome based clustering, we did not find any correlation of the clusters with strains isolated from different years or from different sources (onion vs. other plant hosts) or from different geographic locations, although all the strains originated from the Vidalia region of Georgia. We believe that increasing the number of strains in the pan-genome panel from across onion growing regions and from diverse sources of inoculum may shed some light.
An ANI of ≥ 95% is a benchmark to classify organisms of the same species [41, 42], whereas genomes of organisms with ANI values of 93–94% suggests the organisms belong to different sub-species . In this study, ANI ranged from 99.0-99.9% and AAI ranged from 96–99%, which not only suggested low core genome diversity among the 81 P. ananatis strains but also confirmed that the strains were all the same species.
Bacterial phenotypes, in general, can be linked to the presence or absence of genes that are inherited through either descent or lateral gene transfer . Previous studies on P. ananatis deployed comparative genomics of 2 to 10 strains to identify pathogenicity-related regions in the genome [5, 24, 25, 39]. We conducted pan-GWAS analysis to predict and associate genes related to pathogenicity in onion bulbs. Red-onion scale necrosis has been shown to be an accurate predictor of pathogenicity to onion bulb tissues and, hence, it was utilized as a high-throughput screening to differentiate pathogenic vs. non-pathogenic strains of P. ananatis. However, it can be argued that if this assay alone is adequate predictor of pathogenicity both in onion foliage and bulb. We found a strong correlation between bacterial phenotype (pathogenicity) observed in foliar assay vs. red-scale necrosis assay. Hence, this assay was also utilized for the pan-GWAS analysis.
We utilized comparative genomics of 81 strains that included 51 onion-pathogenic and 30 strains non-pathogenic to onion. Furthermore, the pan-genome PAVs were utilized to associate onion pathogenicity phenotypes (determined using a red-onion scale necrosis assay) with genes that were associated statistically with pathogenicity to onion. Out of the 42 genes strongly associated with the onion pathogenic phenotype, 14 were identified as a part of HiVir/PASVIL cluster. These genes were annotated as hvaA, pepM, pavC-N . These associated genes coded for phosphonate metabolism, metabolism of plant-derived aromatic compounds, monooxygenases, a methyltransferase, leucine biosynthesis and an L-amino acid ligase. Association of HiVir/PASVIL genes using this pan-genome in-silico approach corroborated earlier findings of the roles of these genes in P. ananatis-pathogenicity on onion [23, 24]. We also using the pan-GWAS on P. ananatis to predict 28 novel genes contributing to pathogenicity in onion. Out of the 28 novel genes identified, eight have annotated functions of site-specific tyrosine kinase, N-acetylmuramoyl-L-alanine amidase, TraR/DksA family transcriptional regulator, and HTH-type transcriptional regulator, and the remaining 20 genes are hypothetical. Further functional analysis of these genes will aid in a better understanding of pathogenicity to onion.
Comparative genomic analysis revealed a trend for the presence or absence of complete HiVir/PASVIL cluster genes in pathogenic vs. non-pathogenic P. ananatis strains. Among the 51 pathogenic strains evaluated, 50 possessed the complete set of HiVir/PASVIL cluster genes (14 genes; hvaA-pavN). In the non-pathogenic strains, 73.3% (n = 22) of the strains lacked the complete HiVir/PASVIL cluster and 6.6% (n = 2) of the strains had only a subset (one or more) of the genes of the HiVir/PASVIL cluster. Interestingly, 20% (n = 6) of the non-pathogenic strains possessed a conserved complete HiVir/PASVIL cluster. If the presence of a complete cluster is correlated with pathogenicity to onion, then it is difficult to explain the presence of a complete cluster of genes in these non-pathogenic strains unless the presence of the complete HiVir/PASVIL cluster does not mean the strains are pathogenic on onion. It is possible that the HiVir/PASVIL genes in the cluster in non-pathogenic strains are not expressed or are non-functional, which require further investigation for confirmation.
Phosphoenolpyruvatemutase (pepM) is involved in phosphonate biosynthesis [reference?]. Organophosphonates are synthesized as secondary metabolites in certain prokaryotes to function as antibiotics, and can have specialized roles in pathogenesis or signaling . Phosphonate metabolites are derived from phosphonopyruvate, which in turn is formed from phosphoenolpyruvate (PEP) by the action of PEP mutase (PepM). Asselin et al.  identified a pepM gene as the first pathogenicity factor associated with the fitness of P. ananatis as well as with symptom development in infected onion leaves and bulbs. Deletion of pepM or inactivation of pavJ genes resulted in loss of the ability to cause lesions on onion foliage and yellow onion bulbs. Furthermore, growth of the deletion mutant in onion leaves was significantly reduced compared with the wild-type P. ananatis strain. This pan-genome in-silico study corroborated the association of pepM gene with pathogenicity to onion, using a diverse panel of P. ananatis strains from Georgia, USA. The pepM gene was present in 50 of 51 pathogenic strains, with the exception strain of PNA 18 − 9 s. This strain also lacked pavE, pavJ, and pavK. If it is not an assembly artifact, the absence of pepM along with four other genes in the HiVir/PASVIL cluster in this strain could be the reason for a compromised red scale necrosis phenotype (weak pathogenicity). This observation also indicated the presence of a potential alternative pathogenicity factor than pepM, which requires further investigation. For the non-pathogenic strains of P. ananatis, pepM gene was absent in 23 of 30 strains. Despite the presence of pepM gene and, in some cases the entire HiVir/PASVIL cluster (six of 30 strains), these strains displayed a non-pathogenic phenotype with the onion red-scale assay. These observations suggest that these genes may be non-functional in these strains, which warrants further research.
A monooxygenase and a flavin reductase enzyme belonging to the two-component non-hemeflavin-diffusible monooxygenase (TC-FDM) family were associated with pathogenicity of P. ananatis strains to onion bulb scales in this study The monooxygenase and the reductase associated were nitrilotriacetate monooxygenase coded by ntaA (similar to pavC in the HiVir/PASVIL cluster of P. ananatis) and flavin reductase, a flavin:NADH oxidoreductase component of 4-hydroxyphenylacetate (4-HPA) 3-monooxygenase coded by hpaC (similar to pavL in the HiVir/PASVIL cluster of P. ananatis). Nitrilotriacetate monooxygenase was reported previously in the genomic region referred as WHOP (woody host of Pseudomonas spp.) in a Pseudomonas syringae complex . This region is associated with strains of P. syringae that infect woody host plants, and is absent in strains infecting herbaceous tissues. This gene, along with other genes present in the WHOP region, is responsible for the fitness and virulence of P. savastanoi pv. savastanoi in woody olive trees, but not in non-woody olives [45, 46]. Nitrilotriacetate monooxygenase is known to catabolize plant-derived aromatic compounds and help bacteria to adapt to woody host tissues . On the contrary, center rot of onion is caused by P. ananatis moving into onion bulbs from infected foliage or neck tissue; therefore, it was intriguing to find this gene associated with pathogenicity in onion, an herbaceous plant. The 4-hydroxyphenylacetate (4-HPA) 3-monooxygenase reductase uses NAD(P)H to catalyze the reduction of free flavins that diffuse to the oxygenase component for oxidation of the substrate (aromatic or non-aromatic compounds) by molecular oxygen.
The HiVir/PASVIL gene pavG in P. ananatis has an annotated function for a class-I S-adenosyl-L-methionine (SAM)-dependent methyltransferase. We presume that pavG is responsible for the esterification of phosphonates synthesized in P. ananatis, led by pepM, based on the fact that the di-anionic form of phosphonates interferes with the metabolic intermediates and carboxylates of antibacterial compounds [48–50]. To counteract this problem, microbes either synthesize phosphinites (with a double bond between the C and P instead of a single bond) or carry out esterification of phosphonates. Phosphonate esterification appears to be an obvious mechanism operational in P. ananatis because of the presence of pavG in the HiVir/PASVIL cluster. SAM dependent O-methyltransferase has been shown to methylate a variety of phosphonates (1-hydroxyethylphosphonate, 1,2-dihydroxyethylphosphonate, and acetyl-1- aminoethylphosphonate) . Therefore, there is a high possibility of involvement of SAM methyltransferase in methylation of the phosphonate produced in P. ananatis. Further studies are required to characterize the type of phosphonate and its methylation in order to understand the mechanism of SAM methyltransferase and pepM in causing red-scale necrosis. Another role that pavG could be playing is methylation of other HiVir/PASVIL genes that renders them inactive despite their presence in the cluster. We hypothesize that the inactivity of HiVir/PASVIL genes may be due to methylation of genes carried out by SAM dependent methyltransferase in non-pathogenic strains of P. ananatis, implying a secondary role of pavG in strains non-pathogenic to onion. Methylation profiling will help evaluate this hypothesis.
The HiVir/PASVIL gene pavI is similar to RizA, an l-amino-acid ligase (LAL) from Bacillus subtilis that participates in the biosynthesis of rhizocticin, a phosphonate oligopeptide antibiotic and possess L-arginyl-L-2-amino-5-phosphono-3-cis-pentenoic acid . Although, the functional role of pavI is yet to be characterized in P. ananatis, it may play a role in the formation of anti-microbial secondary metabolites of ‘phosphonate derivatives’. LAL is a member of the ATP-dependent carboxylate–amine/thiol ligase superfamily , and catalyzes the ligation reaction which involves an aminoacyl-phosphate intermediate, in an ATP-dependent manner . LALs contain the ATP-grasp fold, which is composed of three conserved domains referred to as the A-domain (N-terminal domain), the B-domain (central domain) and the C-domain (C-terminal domain). These three domains commonly grasp the ATP molecule, and also provide binding sites for the Mg2 + ion and the amino-acid substrate .
The pan-GWAS approach used in this study did not associate the alt cluster of genes with the onion pathogenic phenotype using the red-scale necrosis assay. This may be because the phenotyping was based only on the necrotic scale clearing assay thathas been shown to be induced by the HiVir/PASVIL cluster [reference]. However, the alt cluster comes into play after the onset of necrosis, when endogenous antimicrobial sulfur compounds are produced by damaged onion cells. In this scenario, the alt cluster helps P. ananatis survive and colonize onion plants. P. ananatis uses 11 alt cluster genes associated with sulfur metabolism to impart tolerance to the thiosulfinate ‘allicin’ that is produced by damaged onion cells . The presence of the alt cluster in 80% (n = 41) of the onion pathogenic strains, and its absence in 67% (n = 20) of the non-pathogenic strains, suggests some potential involvement in bacterial virulence. However, the alt cluster alone is not responsible for the onion pathogenic phenotype, as 33% (n = 10) of the non-pathogenic strains did not exhibit any evidence of pathogenicity in the onion red scale assay despite possessing a completely conserved alt cluster.