Evolution constantly exerts selection pressure on organisms due to physiological and cellular effects of mutations in the genome [1]. This then creates a driving force for evolution of features or characteristics in the organism along certain trajectories. These trajectories could be encoded in one or more genes. Hence, determining the gene sequence of the organism or strain of interest in relation to others could offer a glimpse into the evolutionary relationships between the organisms or strains under study.
One useful approach to determine such evolutionary relationships is through the combined use of multiple sequence alignment and phylogenetic tree analysis [2] [3] [4] [5]. Briefly, multiple sequence alignment sought to detect, at the single nucleotide level, differences and similarities in the gene sequence of the same gene under investigation from different species or strains. Upon determining this ensemble set of similarities and differences, phylogenetic tree analysis help visualize the evolutionary distance between the strains and species. Depending on the structure of the phylogenetic tree obtained, further information on the number of phylogenetic gene cluster could also be obtained.
16S rRNA gene has been well-established as a good phylogenetic marker for prokaryotes at the species level [6] [7] [8]. This meant that 16S rRNA gene encodes sufficient sequence diversity to help chronicle the individual evolutionary trajectories of different species [9] [10] [11]. However, little is known about the ability of this molecule in encoding strain level evolutionary relationships. To understand this task better, it is important to comprehend the molecular underpinnings that demarcates different strains of the same species at the genome level.
Specifically, strain level differences in the genome could arise in both phylogenetic markers such as 16S rRNA gene or other genes [12] [13] [14]. Depending on the prevailing evolutionary selection marker of the strains, this could be metabolic genes or virulence genes, as these hold the most significance for the fitness of the strain. Take, for example, the human gastric pathogen, Helicobacter pylori. As a pathogen, virulence genes play an important role in determining the success of the organism in colonizing human stomach epithelial lining. Hence, evolutionary pressure would exert its effect on the virulence genes to help select for fitter strains that have higher level of success in colonizing human stomachs and causing other complications.
This work sought to examine the utility of 16S rRNA gene of H. pylori in chronicling the evolutionary relationships between different strains of the species through the combined approach of multiple sequence alignment and phylogenetic tree analysis. 16S rRNA gene sequence information of 7 strains of Helicobacter pylori was obtained from de Silva database and European Nucleotide Archive. These were subjected to multiple sequence alignment and phylogenetic tree analysis using an in-house MATLAB software using built-in MATLAB functions.
Results from multiple sequence alignment were plotted in Figure 1. The data shows high level of sequence conservation in the 16S rRNA gene of different strains of H. pylori, thereby, suggesting that the molecule could not encode sufficient sequence diversity to account for strain level genetic differences for the different strains investigated. Hence, 16S rRNA gene of H. pylori is not a good biomarker to demarcate different strains of the species.
Phylogenetic tree analysis as depicted in Figure 2 reveals high level of similarity in the 16S rRNA gene of different strains of H. pylori. This comes about from the close evolutionary relationships between different strains of the species except for one outlier strain. Combining the results from multiple sequence alignment and phylogenetic tree analysis reveals that 16S rRNA gene is not a good candidate for describing the phylogeny of different strains of H. pylori.
Overall, evolutionary relationships between strains could be encoded in 16S rRNA gene or other metabolic and virulence genes. In the case of H. pylori, combined multiple sequence alignment and phylogenetic tree analysis shows that 16S rRNA gene could not encode sufficient sequence diversity to chronicle the evolutionary divergence of different strains of the species. This could be due to the relatively weak evolutionary pressure on the 16S rRNA gene which is related to the selection pressure on ribosome function. If this is true, then most of the evolutionary pressure on H. pylori is not on metabolism, protein production or growth rate, rather the fitness of a strain is determined by the activity of virulence proteins encoded by virulence genes. Hence, if the latter hypothesis is true, strain level divergence may be chronicled by differential genetic sequences of virulence genes of the species.