Primary and secondary structures
For this study, the complete genome sequences of 103 Pseudomonas strains from GenBank were collected and analysed. These sequences belonged to 62 Pseudomonas species. The following is a summary of the ITS sequence characteristics of Pseudomonas. A total of 560 rrn operon sequences were collected and the number of ITSs in each strain was 3–8. In the Pseudomonas species selected, the number of operons was different (Supplemental Table 1). According to the tDNA contained in the ITS, all rrn operons can be divided into two types: (1) type N (ITS without tDNA), with a length of 310 ± 20 bp; and (2) type-IA (ITS contains tDNAIle and tDNAAla), with a length of 500 ± 50 bp. Type-IA appears in all Pseudomonas species, whereas type N appears only slightly, the type-IA ITS as the research sequences. In addition, according to statistics, type-IA accounts for 91.1% of the 560 ITS sequences collected. The Pseudomonas species with N-type ITS sequences were P. putida, P. antarctica, P. entomophila, P. fulva, P. mandelii, P. plecoglossicida and P. psychrophila, and their proportions of ITS types were different.
Type-IA ITS sequences were arranged and aligned using the GeneTool Lite software and it showed a mosaic structure. These sequences were divided into five parts: (1) the upstream sequence of tDNAIle (US), with a length of 90 ± 30 bp; (2) tDNAIle; (3) the linker sequence between tDNAIle and tDNAAla (LS), with a length of 20 ± 10 bp; (4) tDNAAla; and (5) downstream of the tDNAAla sequence (DS), with a length of 240 ± 20 bp. All the US, LS, DS and WS contain C regions and V regions. The N-type sequences of 13 strains were aligned, and the C regions and V regions could also be identified from the ITS sequence.
After simulating the secondary structure of the type-IA rrn operon by RNAstructure software, the secondary structure of Pseudomonas species was found to share a common trunk. Taking P. aeruginosa, for example (Fig. 1), the secondary structure contains a variety of stem-loop structures. There are three stems cross-bred with the upstream of 16S rRNA gene or the downstream of 23S rRNA gene, constituting reverse complementary sequences called hybrid stems (h-stem), and two stems folded with the neighbouring sequences called inner items (i-stem). In addition, each ITS sequence contains three C regions (C1, C2, C3) and three V regions (V1, V2, V3). It corresponds to the results from the GeneTool Lite software. We determined that the diversity of the sequence was mostly in the inner stems. The proper folding of RNA is closely related to its function. The ITS sequence participates in the folding of the 16S and 23S rRNA genes, indicating that the ITS is an important and indispensable structure.
Species-specific analyses of ITS sequences
The specificity of the four Pseudomonas substructures, which are US, LS, DS and WS, were evaluated by BLAST. The Gap values of 62 strains of Pseudomonas were obtained according to the RS values difference of the lowest target bacteria and the highest non-target bacteria in BLAST results (Fig. 2).
Although five red dots representing LS reached the Gap value of 100%, the short sequence caused no difference between target bacteria and non-target bacteria in BLAST results, often resulting in the Gap value of zero. This was shown in the figure of 41 strains, including P. aeruginosa, P. denitrificans, P. entomophila, P. fluorescens, P. granadensis and P. knackmussii. LS is not suitable as a species-specific DNA marker. In addition to the red dots, it can be clearly seen from the figure that the grey dots representing DS and the green dots representing WS show a higher Gap, whereas the blue dots representing US show a lower Gap because their sequence length is only about one-third of the DS.
By comparing Gap values specifically, the US, DS and WS of 27 species showed positive results, such as P. alcaligenes, P. alcaliphila, P. asturiensis, P. balearica, P. corrugata, P. cremoricolorata, etc. The ITS sequence used as a genetic marker in these 27 species was efficient. However, the Gap value of the other 35 species was negative, and the performance of the three structures (US, DS and WS) was consistent.
Therefore, 35 strains were further analysed and their frequency graphs were generated by calculating the RS values of target bacteria and non-target bacteria at each stage. According to the frequency analysis results of these 35 species, they can divided into four types.
P. aeruginosa is the species with the largest amount of data in NCBI; the BLAST results are more complex as well. Its specificity was discussed in three aspects: (1) a frequency diagram was made based on BLAST results (Fig. 3a–c); (2) a frequency diagram was made with BLAST results containing only genome-complete data (Fig. 3d–f); and (3) the three P. aeruginosa sequences with low RS values from BLAST results were used as target bacteria to conduct BLAST again (Fig. 3g–i). The range of target bacteria and non-target bacteria was crossed, and the RS value of target bacteria aggregated on the horizontal axis was significantly higher than that of non-target bacteria. Therefore, frequency analysis can reveal the species-specificity of the ITS in P. aeruginosa.
The amount of data obtained from P. parafulva was low in the NCBI. In BLAST results, one of the ITS sequences showed a low RS value, making the RS of some non-target bacteria exceed it and showing a negative Gap value. Aiming at this situation, the low RS sequence and then BLAST were targeted to obtain two frequency diagrams (Fig. 4a–f).
From the BLAST results of P. amygdali, the GAP value of P. amygdali affected by P. syringae was shown. Instead of comparing two objects, P. amygdali, P. syringae and non-target bacteria were compared. The frequency diagram is shown in Fig. 4g–i. The RS value of P. amygdali represented by the red line aggregated on the horizontal axis was significantly higher than those of the other two groups, indicating the species-specificity of the ITS in P. amygdali.
P. syringae can be pathogenic to a variety of organisms, which can be divided into P. syringae pv. actinidiae, P. syringae pv. tomato, P. syringae pv. syringae and other pathogenic bacteria. According to the frequency analyses, the species-specificity of the ITS in P. syringae pv. actinidiae and P. syringae pv. tomato was demonstrated (Fig. 5a–f). However, during the analyses of P. syringae pv. syringae and P. syringae pv. maculicola, their specificity could not be accurately obtained because of interference by other pathogenic bacteria (Fig. 5g, h). In conclusion, the ITS has high specificity with P. syringae specie.
Therefore, through frequency analysis with different ways of 35 species without GAP value, all species researched can be accurately distinguished by frequency chart. These results suggest that ITS and its subdomain can be used as DNA markers expressing species-specificity.
Identification of Pseudomonas by ITS
To verify this conclusion, 200 ITS sequences from the NCBI were selected, including ITS sequences of 160 Pseudomonas strains and ITS sequences of 40 non-Pseudomonas strains but belonging to Pseudomonadales, and then randomly scrambled for a double-blind experiment (Supplemental Table 2). In this experiment, neither the experimenter nor the analyst knew which strain the sequence belonged to. The results showed that 66 ITS sequences could be directly identified by Gap, and the remaining 134 ITS sequences were identified successfully by further frequency analyses, with a success rate of 100% (Table 1).
Table 1
Results of the double-blind experiment
|
Pseudomonas
|
Pseudomonadales
Non-Pseudomonas
|
Total
|
160
|
40
|
GAP
|
36
|
30
|
FRE
|
124
|
10
|
No distinction
|
0
|
0
|
a GAP, the amount of ITS sequences which could be identified by GAP. |
b FRE, the amount of ITS sequences which could be identified by further frequency analysis. |
Identification of Pseudomonas sp. in samples
16S rRNA gene and ITS sequences of the bacteria in samples were amplified, and species-level identification was performed. The ITS sequence analysis revealed the species and proportion of Pseudomonas in the samples. Abundance values of Pseudomonas were obtained from 12 samples. The three bacteria with the highest abundance in each sample and their proportions are listed in Table 2. P. putida, P. monteilii, P. koreensis, P. aeruginosa and P. fluorescens are widely distributed in water. Among them, P. putida is the most widely distributed, with the highest abundance.
Table 2
Results of next generation sequencing
contains
Sample number
|
Species1
|
|
Species2
|
|
Species3
|
|
1
2
3
4
5
6
7
8
9
10
11
12
|
P. putida
P. putida
P. putida
P. putida
P. putida
P. putida
P. putida
P. putida
P. monteilii
P. putida
P. putida
P. putida
|
37%
51%
32%
21%
52%
43%
35%
39%
17%
38%
51%
49%
|
P. koreensis
P. monteilii
P. fluorescens
P. aeruginosa
P. fluorescens
P. fluorescens
P. aeruginosa
P. fluorescens
P. putida
P. fluorescens
P. koreensis
P. aeruginosa
|
17%
9%
12%
7%
11%
17%
9%
23%
15%
22%
16%
11%
|
P. fluorescens
P. fluorescens
P. koreensis
P. monteilii
P. monteilii
P. aeruginosa
P. fluorescens
P. monteilii
P. fluorescens
P. monteilii
P. fluorescens
P. fluorescens
|
6%
5%
9%
5%
3%
11%
7%
6%
9%
14%
8%
6%
|
a Each sample is arranged in order of abundance from largest to smallest, and top 3 species are selected in every sample. The percentage after each specie is the abundance of the sample.