We studied 495 bacterial species throughout the domain of bacteria. The habitat of these bacteria is very different from each other. The nature of these bacteria in terms of cell shape and size (coccus or bacillus), intracellular metabolic reactions (aerobic or anaerobic), and even the way they respond to the external environments (mesophilic or thermophilic or psychrophilic) are distinct [4]. Here, we considered the following bacterial Phyla: Deinococcus-Thermus, Chlorobi, Actinobacteria, Firmicutes, Chlamydiae, Fusobacteria, Spirochaetes, Chloroflexi, Tenericutes, Cyanobacteria, Bacteroidetes, Thermotogae, Acidobacteria, Aquificae, Caldiserica, Chrysiogenetes, Deferribacteres, Elusimicrobia, Fibrobacteres, Gemmatimonadetes, Lentisphaerae, Nitrospirae, Planctomycetes, Thermodesulfobacteria, Verrucomicrobia, and Proteobacteria for our study [5].
pI and molecular weight value distributions of translation protein factors
In the case of translation (Additional file 1), we found a unique pattern of pI value distribution as depicted in Figure 1A. Here, except for some bacterial species in the case of the initiation factor 2 (IF2), the other bacterial initiation factors were basic. On the other hand, the elongation and release factors were highly acidic except the ribosome-recycling factor (RRF). All the four quartiles of initiation factor 1 (IF1) and initiation factor 3 (IF3) were above pI 7. On the other hand, IF2 had more than one quartile of basic pI values. The elongation factor Tu (EF-Tu), elongation factor G (EF-G), elongation factor 4 (EF-4), & elongation factor P (EF-P) and the release factor 1 (RF1), release factor 2 (RF2), & release factor 3 (RF3) had all the four quartiles in the acidic range. On the other hand, IF2, and RRF had more than one quartile in basic pI ranges. For the comprehensive in silico study, along with the pI values, we also studied the molecular-weight (MW) value distributions of these translation protein factors (Figure 1B). Like pI value distribution, the protein IF2 showed a wide range of variations in MW value distributions also (Figure 1B). All the other proteins showed precise MW value distributions. A surprising observation is to be noted here that although RRF proteins showed highly variable pI value distributions, their MW values were quite fixed.
Statistical analysis of pI values of translation proteins
We further performed asymptotic tests [6] for 5% quantile and 95% quantile (Table 1) of these translation factors. We found that the p values corresponding to the null hypotheses (H0: q05 ≥ 7, and H0: q95 ≤ 9.95) for the 5% and 95% quantiles, respectively, for both the initiation factors, IF1 and IF3 to be more than 0.05, from which we inferred that 90% data lies in basic pI values, i.e., between 7 to 9.95. On the contrary, in the case of elongation (EF-Tu, EF-G, EF-4, and EF-P) and release factors (RF1, RF2, and RF3), 90% data lies in completely acidic pI values i.e., between 4.635 and 6.225 (p values corresponding to H0: q05 ≥ 4.635 and H0: q95 ≤ 6.225 turned out to be more than 0.05, respectively). But we found a different scenario in the case of initiation factor, IF2, and ribosome recycling factor, RRF. In both these cases, 90% of data stretches in between acidic 5.1 to basic 9.25 (p values are more than 0.05 for H0: q05 ≥ 5.1 and H0: q95 ≤ 9.25, respectively).
Table 1.Asymptotic tests for 5% quantile and 95% quantile for translation factors.
Translation
Factors (IF1 & IF3)
|
5% sample quantiles
|
p-values
(H0: q05 ≥ 7)
|
95% sample quantiles
|
p-values
(H0: q95 ≤ 9.95)
|
IF1
|
6.82
|
0.2839
|
9.98
|
0.1684
|
IF3
|
8.87
|
0.5
|
9.94
|
0.6444
|
Translational
Factors (IF2)
|
5% sample quantiles
|
p-values
(H0: q05 ≥ 5.1)
|
95% sample quantiles
|
p-values
(H0: q95 ≤ 9.25)
|
IF2
|
5.09
|
0.4173
|
9.31
|
0.1198
|
Translational Factors
(Elongation and
Release Factors)
|
5% sample quantiles
|
p-values
(H0: q05 ≥ 4.635)
|
95% sample quantiles
|
p-values
(H0: q95 ≤ 6.225)
|
EF-Tu
|
4.81
|
0.9861
|
5.77
|
0.7839
|
EF-G
|
4.78
|
0.9573
|
5.54
|
0.7272
|
EF-4
|
4.99
|
0.5337
|
6.36
|
0.0909
|
EF-P
|
4.77
|
0.9978
|
5.75
|
0.5473
|
RF1
|
4.81
|
0.9746
|
6.06
|
0.8042
|
RF2
|
4.62
|
0.1005
|
5.47
|
0.5
|
RF3
|
5.03
|
0.6555
|
6.14
|
0.8347
|
Translational
Factors (RRF)
|
5% sample quantiles
|
p-values
(H0: q05 ≥ 5.1)
|
95% sample quantiles
|
p-values
(H0: q95 ≤ 9.25)
|
RRF
|
5.063
|
0.1999
|
9.028
|
0.9905
|
Amino acid frequency distributions of elongation and release factors
Interestingly, when we randomly chose 60 amino acid sequences (represents 60 bacterial species) of each of the elongation and release factors and calculated their amino acid frequencies, we found the occurrence of a high frequency of glutamic acid in all of those factors, (Figure 2A-G). In 2001, Schwartz et al. [7] also observed that the cytosolic acidic proteins also found to have a high frequency of glutamic acid
Relation of pI values of IF2 and RRF proteins with phylogeny
Since IF2 had a wide range of pI value distribution from acidic to basic, we performed phylogenetic analysis (Figure 3A) of the IF2 proteins (Additional file 1) to investigate the relation of its pI value distribution with the phylogeny. In the case of the Phylum Proteobacteria, we found that the class of Gammaproteobacteria (blue) and Betaproteobacteria (verdigris) were acidic (with only a few exceptions). Whereas the class Alphaproteobacteria (brown) had few genera as acidic (i.e., Ehrlichia) and some genus as basic (i.e., Brucella and Bartonella) and others had both acidic and basic (i.e., Rickettsia) pI values. In the case of other Phyla, Chlorobi (cyan), Cyanobacteria (red), Thermotogae (yellow), and Deinococcus-Thermus (light grey), they had mostly acidic pI values, whereas the Chlamydiae (Safron) and Spirochaetes (light green) had basic pI values. The pI values of the IF2 protein in Phylum Firmicutes (pink) and Actinobacteria (light blue) and Tenericutes (purple) had both the acidic and basic pI values.
The phylogenetic analysis of RRF (which had a wide range of pI value distribution also) showed that the pI value distribution of RRF (Figure 3B) (Additional file 1) like IF2 (Figure 3A) also linked to the phylogeny. We found that different classes of Proteobacteria had different pI value distributions. The Gammaproteobacteria (blue), and Alphaproteobacteria (brown) (with a few exceptions e.g. Genus; Salmonella of Gammaproteobacteria and Genus; Rickettsia and Ehrlichia of Alphaproteobacteria) had acidic pI values. However, Betaproteobacteria (verdigris) (i.e., Bordetella - acidic, Burkholderia - basic) and Deltaproteobacteria (apple green) (i.e., Desulfococcus – acidic, Geobacter – basic,) had acidic and basic pI values as well. In the case of other Phyla, Chlamydiae (Safron), Chlorobi (cyan), and Spirochaetes (light green), they had basic pI values. In contrast, Phylum, Actinobacteria (light blue), and the Phylum Firmicutes (pink) had both the acidic and basic pI values.