Brassicaceae preliminary tests
To test our protein biotyping methodology we set out to complete both protein extractions and biotyping using seeds from four species of the family Brassicaceae. These seeds were used because they were readily available and in large amounts at the National Archive of Legal Reference Material at the CFIA Seed Science Unit. This allowed for multiple testing to standardize conditions before testing Amaranthus spp. where material was initially more scarce.
We performed a preliminary test with small modifications to previously established protocols (see materials and methods) on five seed batches corresponding to two varieties of Brassica napus (spring and winter), and one variety of each of other three species (B. rapa, B. juncea and B. carinata). Three seeds of each batch, with two different solvent dilutions (1:1 and 1:10 - see methods) and two technical replicates were obtained. Out of the 12 potential protein spectral profiles expected for each seed batch, we obtained 12 spectra for B. napus (spring), 11 for B. napus (winter), 12 for B. rapa, 7 for B. carinata and 12 for B. juncea. Both a composite Principal Component Analysis (PCA) (Figure 1), and a dendrogram analysis (Additional file 1A), showed that all seeds belonging to the same species had similar protein spectra and clustered together. There was no specific clustering linked to dilution or technical replicates.
Figure 1. Principal Component Analysis (PCA) of protein spectra corresponding to four Brassica species. B. c (Brassica carinata), B. r (Brassica rapa), B. j (Brassica juncea), B. n (Brassica napus).
To test potential variability among seed batches from the same species but from different years we conducted a second run where we compared seed batches from three different years in B. napus (spring) and B. rapa along with the available single-year batches of B. napus (winter), B. juncea and B. carinata. To increase the probability of obtaining successful protein spectra we improved the protocol by using a more reliable approach to grind the seeds (tissue lyzer), and modified the preliminary protocol so that spotting of the samples in the target plate would require less manipulation (compare preliminary test with final protocol in materials and methods). Out of 45 total seeds (5 per accession) we obtained protein spectra for 43, with only two seeds failing from B. carinata. Samples clustered correctly for each one of the species (Additional File 1B).
Amaranthus species protein biotyping
Protein biotyping is based on differences and similarities of protein spectra of major protein fractions of the different species. Figure 2 shows the difference and similarities between protein spectra generated from different accessions of A. palmeri and A. tuberculatus (two species of regulatory concern).
Figure 2. Protein spectra of two Amaranthus species. Three different A. palmeri accessions and three different A. tuberculatus accessions are shown. GRIN ID refers to their identification number from: https://npgsweb.ars-grin.gov/gringlobal/search.
We performed protein biotyping with 15 Amaranthus species using five biological replicates (5 seeds) per accession. For some species like A. tuberculatus and A. palmeri we tested more than one accession due to regulatory concern of these two species in Canada or the United States. The protein spectra were clustered in three major groups with sub-clusters within the clusters (Figure 3). Cluster 1 had two sub-clusters grouping the protein profiles corresponding to A. powellii - A.hybridus - A. retroflexus, and A.hypochondriacus - A. caudatus, respectively. However, the subgroups from this first cluster could not be clearly delimited by species. Cluster 2 had six species which could all be separated by their own sub-cluster. Importantly, a regulated species in Canada (A. tuberculatus) was separated from all other species on this cluster. In Cluster 3 we found A. palmeri, A. watsonii, A. spinosus and A. arenicola. While A. spinosus samples can be distinguished in a subgroup, the spectra from the two phylogenetically sister species (A. palmeri and A. watsonii) cannot be distinguished. A single spectra from A. palmeri was clustered with A. arenicola.
Figure 3. Clustering of protein spectra corresponding to 15 Amaranthus species. Species were clustered into three major clusters, with sub-clusters within each cluster.
Database generation
While spectra clustering is a rapid way of visualizing relationships between spectra of individual samples, PCA and dendrograms are not based on structured phylogenetic algorithms. Protein biotyping is meant to be used to characterize samples for which a phenotypic identification is not possible, to confirm phenotypes and to classify blind samples, using a spectral database of known samples to which unknown samples can be compared to. Enrichment of this database with multiple accessions per species (e.g., coming from different geographical regions), provides a way of accounting for potential intra-species variability and increases accuracy of determination of test samples. Furthermore, even when a dendrogram or PCA analysis shows different species clustering together, a rich database will likely correctly match the right species.
We generated a database with 16 Amaranthus species (seeds obtained from the Germplasm Resource Information Centre, U.S. National Plant Germplasm System – see methods), including species of regulatory concern, weedy species, and species that show phenotypic plasticity at the seed level that may be easily confused when performing phenotypic characterization (Additional file 2). To increase accuracy and resolution power of our database we included at least three different accessions per species when available, and used three seeds per accession (each one with 30 spectral readings – see methods), to account for biological and technical variation. We also had a larger number of accessions for species which pose the largest regulatory concern in Canada (A. tuberculatus and A. palmeri).
We generated spectral information for each seed, and produced a consensus Main Spectra (MSP) from at least 20 spectra per sample (Additional file 2). Each newly produced MSP was compared to the full database of MSPs to test if each MSP matched itself as the highest hit, and if the secondary hit (second highest similarity) also corresponded to another accession of the same species. This analysis showed that when the generated MSPs are used as unknown samples, they match themselves as top hits and match other accessions of the same species as secondary hits (Additional File 3). There were only four exceptions where the second best hit was not the expected species: A. watsonii (PI633593-1) second hit was A. spinosus, A. palmeri (PI667167-2) second hit was A. spinosus, A. watsonii (PI633593-RE2) second hit was A. palmeri, and A. caudatus (PI553073-1) second hit =was A. hybridus.
Blind sample testing
We received blind samples from three different labs doing work in Amaranthus spp. Six blind samples were received from AAFC Saint-Jean-sur-Richelieu, 9 blind samples from AAFC Harrow and 60 blind samples from CFIA’s Seed Science and Technology Sections in Saskatoon. From the 6 samples received from Saint-Jean-sur-Richelieu, using 3 seeds per accession, we obtained 100% correct identification with matching average scores > 2 in most cases (Table 1, samples MIRL22-A.unknown-07 to 12). In one case a batch that was identified as A. tuberculatus (MIRL22-A.unknown-09) had one out of three seeds identified as A. arenicola. However, since two out of the three seeds for the respective accession were correctly identified, the final assigned identification matched the original identification uncovered by the providers after the protein biotyping analysis was completed. A similar situation happened with sample MIRL22.A.unknown-11, which had one of three seeds identified as A. rudis instead of A. tuberculatus. While historically these two were at times identified as different species, the latest consensus is that both are the same species, or varieties of the same species (Waterhemp | CALS (cornell.edu), Amaranthus rudis J.D.Sauer — The Plant List), which supports our identification. In this sense we classified all samples that were A. rudis or A. tuberculatus as A. tuberculatus. Two samples that were originally sent by the provider as A. powellii and A. viridis (see superscript information 3-4 from Table 1) were identified by us as A. retroflexus. After the provider of the seeds grew plants from the seeds of these two accessions, their phenotypic characterization confirmed our identification by protein biotyping, showing that our method was effective in correcting initial misidentifications from the seed batches from the provider.
In the case of the samples obtained from AAFC Harrow (Table 1, samples MIRL22-A.unknown-14 to 23), all samples were correctly identified according to the identification uncovered by the seed providers after our analyses were complete. We learned that a sample which was originally part of the blind samples (MIRL22-A.unknown-13 – Additional file 4), and initially classified as A. caudatus, corresponds to seeds whose correct taxonomic identification could not be confirmed by the provider (the sample was provided to them by an external collaborator years ago without verification). Therefore, this sample was excluded from our analysis as there was no morphological confirmation. Finally, MIRL22-A.unknown-17, identified as A. hypochondriacus, had technical replicates which diverged from the consensus identification (Additional file 4), but was nevertheless identified correctly by our majority rule.
Overall the application of our methodology to the identification of blind samples obtained from AAFC centres in Quebec and Ontario was 100% accurate to samples with confirmed morphological identification.
Table 1. Identification of AAFC blind samples. MIRL22-A.unknown-07 to 12 were provided by AAFC Saint-Jean-sur-Richelieu. MIRL22-A.unknown-13 to 23 were provided by AAFC from Harrow. The MIRL22-A.unknown-13 was not included due to lack of valid morphological identification and MIRL22-A.unknown-15 ID was not assigned to any samples.
Sample Name
|
Identification Call1
|
Average Bruker score2
|
Original Identification
|
MIRL22-A.unknown-07
|
Amaranthus retroflexus
|
2.62
|
A. retroflexus
|
|
Amaranthus retroflexus
|
2.59
|
|
Amaranthus retroflexus
|
2.64
|
MIRL22-A.unknown-08
|
Amaranthus retroflexus
|
2.69
|
A. retroflexus3
|
|
Amaranthus retroflexus
|
2.66
|
|
Amaranthus retroflexus
|
2.65
|
MIRL22-A.unknown-09
|
Amaranthus tuberculatus
|
2.21
|
Amaranthus tuberculatus
|
|
Amaranthus tuberculatus
|
2.26
|
|
Amaranthus arenicola
|
1.79
|
MIRL22-A.unknown-10
|
Amaranthus arenicola
|
2.05
|
Amaranthus arenicola
|
|
Amaranthus arenicola
|
1.95
|
|
Amaranthus arenicola
|
2.04
|
MIRL22-A.unknown-11
|
Amaranthus tuberculatus
|
2.28
|
Amaranthus tuberculatus/rudis
|
|
Amaranthus tuberculatus
|
2.03
|
|
Amaranthus rudis
|
2.14
|
MIRL22-A.unknown-12
|
Amaranthus retroflexus
|
2.69
|
A. retroflexus4
|
|
Amaranthus retroflexus
|
2.65
|
|
Amaranthus retroflexus
|
2.57
|
MIRL22-A.unknown-14
|
Amaranthus hybridus
|
2.45
|
Amaranthus hybridus
|
|
Amaranthus hybridus
|
2.31
|
|
Amaranthus hybridus
|
2.34
|
MIRL22-A.unknown-16
|
Amaranthus palmeri
|
2.28
|
Amaranthus palmeri
|
|
Amaranthus palmeri
|
2.12
|
|
Amaranthus palmeri
|
2.25
|
MIRL22-A.unknown-17
|
Amaranthus hypochondriacus
|
1.81
|
Amaranthus hypochondriacus
|
|
Amaranthus hypochondriacus
|
1.80
|
|
Amaranthus hypochondriacus
|
1.88
|
MIRL22-A.unknown-18
|
Amaranthus spinosus
|
2.44
|
Amaranthus spinosus
|
|
Amaranthus spinosus
|
2.40
|
|
Amaranthus spinosus
|
2.39
|
MIRL22-A.unknown-19
|
Amaranthus tuberculatus
|
2.33
|
Amaranthus tuberculatus/rudis
|
|
Amaranthus tuberculatus
|
2.18
|
|
Amaranthus tuberculatus
|
1.95
|
MIRL22-A.unknown-20
|
Amaranthus retroflexus
|
2.43
|
Amaranthus retroflexus
|
|
Amaranthus retroflexus
|
2.43
|
|
Amaranthus retroflexus
|
2.38
|
MIRL22-A.unknown-21
|
Amaranthus powelli
|
2.49
|
Amaranthus powelli
|
|
Amaranthus powelli
|
2.51
|
|
Amaranthus powelli
|
2.47
|
MIRL22-A.unknown-22
|
Amaranthus albus
|
2.27
|
Amaranthus albus
|
|
Amaranthus albus
|
2.30
|
|
Amaranthus albus
|
2.31
|
MIRL22-A.unknown-23
|
Amaranthus blitoides
|
2.57
|
Amaranthus blitoides
|
|
Amaranthus blitoides
|
2.53
|
|
Amaranthus blitoides
|
2.63
|
1 Each one of the three rows per sample corresponds to a single seed from the accession batch send to us for identification. The identification of each one of these seeds used a majority rule for 3 technical replicates per seed (if two technical replicates indicated one species, the sample was catalogued as such). Original data with technical replicates can be found in Additional file 4.
2 Average score from 3 technical replicates on the same seed. When using 2 technical replicates for the majority rule identification, the average was done between the 2 matching reps.
3 This was initially provided to us as A. powellii. After the providers grew the plants they confirmed the plants actually matched our identification (A. retroflexus).
4 This was initially provided to us as A. viridis. After the providers grew the plants they confirmed the plants actually matched our identification (A. retroflexus).
We then examined 60 blind samples sent from the CFIA SSST (Saskatoon Seed Science and Technology) unit. Out of 60 individual seeds, only two failed to produce a protein spectra in the three technical replicates (MIRL22-A.unknown 68 and 70 in Table 2 and Additional file 5); which can be most likely attributed to a technical error during processing of the samples. This means that our method was 97% effective in producing protein spectral profiles. Out of 60 seeds where a phenotypic identification was performed by seed analysts, our protein biotyping assay was able to correctly predict the species for 52 samples, which means an accuracy of 87%. Three samples provided by the SSST but not identified by seed analysts, were tested by our protein biotyping assay but excluded from the validation analysis (MIRL22-A.unknown-37, 49 and 71 – Additional file 5). Out of the 8 samples where we could not correctly identify the source sample species, two were the samples where we did not obtain protein spectra. On the 6 samples where our biotyping identification did not match the phenotypic identification, there was no apparent relationship to the Brukker score, with some mismatches scoring below 2 and some scoring above 2 (Table 2). Three of the mismatches corresponded to seeds phenotypically identified as A. palmeri, where our identification matched A. watsonii (A. palmeri sister species) in two of those cases. In two cases a seed classified as A. cruentus matched A. hypochondriacus protein spectra, but our database did not have A. cruentus protein profiles, so this misidentification is expected. And in one case, a seed classified as A. caudatus matched a A. hypochondriacus MSP from our database.
Table 2. Identification of SSST blind samples. MIRL22-A.unknown-24 to 86 were provided by the Saskatoon Seed Science and Technology unit. The samples corresponding to MIRL22-A.unknown-37, 49 and 71 were not included due to lack of valid morphological identification.
Sample Name
|
Identification Call1
|
Average Bruker score2
|
Original phenotypic Identification3
|
MIRL22-A.unknown-24
|
Amaranthus spinosus
|
1.30
|
Amaranthus palmeri atypical5
|
MIRL22-A.unknown-25
|
Amaranthus retroflexus
|
2.47
|
Amaranthus retroflexus
|
MIRL22-A.unknown-26
|
Amaranthus tricolor
|
1.92
|
Amaranthus tricolor
|
MIRL22-A.unknown-27
|
Amaranthus powelli
|
2.14
|
Amaranthus powellii subsp. powellii
|
MIRL22-A.unknown-28
|
Amaranthus caudatus
|
2.31
|
Amaranthus caudatus
|
MIRL22-A.unknown-29
|
Amaranthus palmeri
|
2.11
|
Amaranthus palmeri
|
MIRL22-A.unknown-30
|
Amaranthus arenicola
|
1.95
|
Amaranthus arenicola
|
MIRL22-A.unknown-31
|
Amaranthus albus
|
2.04
|
Amaranthus albus
|
MIRL22-A.unknown-32
|
Amaranthus tuberculatus
|
2.16
|
Amaranthus tuberculatus atypical
|
MIRL22-A.unknown-33
|
Amaranthus californicus
|
2.25
|
Amaranthus californicus
|
MIRL22-A.unknown-34
|
Amaranthus californicus
|
2.26
|
Amaranthus californicus
|
MIRL22-A.unknown-35
|
Amaranthus spinosus
|
2.24
|
Amaranthus spinosus
|
MIRL22-A.unknown-36
|
Amaranthus caudatus
|
2.34
|
Amaranthus caudatus
|
MIRL22-A.unknown-38
|
Amaranthus tuberculatus
|
2.09
|
Amaranthus tuberculatus
|
MIRL22-A.unknown-39
|
Amaranthus albus
|
1.92
|
Amaranthus albus
|
MIRL22-A.unknown-40
|
Amaranthus californicus
|
2.30
|
Amaranthus californicus
|
MIRL22-A.unknown-41
|
Amaranthus powelli
|
2.16
|
Amaranthus powellii
|
MIRL22-A.unknown-42
|
Amaranthus albus
|
2.19
|
Amaranthus albus atypical
|
MIRL22-A.unknown-43
|
Amaranthus hypochondriacus
|
1.80
|
Amaranthus cruentus5
|
MIRL22-A.unknown-44
|
Amaranthus hybridus
|
2.42
|
Amaranthus hybridus
|
MIRL22-A.unknown-45
|
Amaranthus albus
|
2.08
|
Amaranthus albus
|
MIRL22-A.unknown-46
|
Amaranthus retroflexus
|
2.05
|
Amaranthus retroflexus atypical
|
MIRL22-A.unknown-47
|
Amaranthus powelli
|
2.41
|
Amaranthus powellii subsp. bouchonii
|
MIRL22-A.unknown-48
|
Amaranthus powelli
|
2.23
|
Amaranthus powellii subsp. powellii
|
MIRL22-A.unknown-50
|
Amaranthus hypochondriacus
|
2.03
|
Amaranthus cruentus5
|
MIRL22-A.unknown-51
|
Amaranthus tuberculatus
|
2.17
|
Amaranthus tuberculatus atypical
|
MIRL22-A.unknown-52
|
Amaranthus tuberculatus
|
2.27
|
Amaranthus tuberculatus
|
MIRL22-A.unknown-53
|
Amaranthus spinosus
|
2.20
|
Amaranthus spinosus
|
MIRL22-A.unknown-54
|
Amaranthus albus
|
1.72
|
Amaranthus albus
|
MIRL22-A.unknown-55
|
Amaranthus palmeri
|
1.49
|
Amaranthus palmeri atypical
|
MIRL22-A.unknown-56
|
Amaranthus tricolor
|
2.22
|
Amaranthus tricolor
|
MIRL22-A.unknown-57
|
Amaranthus watsonii
|
2.00
|
Amaranthus palmeri5
|
MIRL22-A.unknown-58
|
Amaranthus powelli
|
1.92
|
Amaranthus powellii subsp. powellii
|
MIRL22-A.unknown-59
|
Amaranthus hybridus
|
2.21
|
Amaranthus hybridus
|
MIRL22-A.unknown-60
|
Amaranthus hypochondriacus
|
1.97
|
Amaranthus caudatus5
|
MIRL22-A.unknown-61
|
Amaranthus arenicola
|
2.14
|
Amaranthus arenicola
|
MIRL22-A.unknown-62
|
Amaranthus tricolor
|
2.14
|
Amaranthus tricolor
|
MIRL22-A.unknown-63
|
Amaranthus powelli
|
2.32
|
Amaranthus powellii subsp. bouchonii
|
MIRL22-A.unknown-64
|
Amaranthus retroflexus
|
2.35
|
Amaranthus retroflexus
|
MIRL22-A.unknown-65
|
Amaranthus retroflexus
|
1.97
|
Amaranthus retroflexus atypical
|
MIRL22-A.unknown-66
|
Amaranthus retroflexus
|
2.02
|
Amaranthus retroflexus
|
MIRL22-A.unknown-67
|
Amaranthus retroflexus
|
2.40
|
Amaranthus retroflexus
|
MIRL22-A.unknown-68
|
Flatline4
|
0.00
|
Amaranthus palmeri atypical
|
MIRL22-A.unknown-69
|
Amaranthus arenicola
|
2.03
|
Amaranthus arenicola
|
MIRL22-A.unknown-70
|
Flatline4
|
0.00
|
Amaranthus tuberculatus atypical
|
MIRL22-A.unknown-72
|
Amaranthus palmeri
|
2.05
|
Amaranthus palmeri
|
MIRL22-A.unknown-73
|
Amaranthus watsonii
|
2.02
|
Amaranthus palmeri5
|
MIRL22-A.unknown-74
|
Amaranthus hybridus
|
2.43
|
Amaranthus hybridus
|
MIRL22-A.unknown-75
|
Amaranthus powelli
|
1.90
|
Amaranthus powellii subsp. bouchonii
|
MIRL22-A.unknown-76
|
Amaranthus californicus
|
2.14
|
Amaranthus californicus
|
MIRL22-A.unknown-77
|
Amaranthus retroflexus
|
2.30
|
Amaranthus retroflexus
|
MIRL22-A.unknown-78
|
Amaranthus retroflexus
|
2.16
|
Amaranthus retroflexus
|
MIRL22-A.unknown-79
|
Amaranthus powelli
|
2.52
|
Amaranthus powellii
|
MIRL22-A.unknown-80
|
Amaranthus tricolor
|
2.09
|
Amaranthus tricolor
|
MIRL22-A.unknown-81
|
Amaranthus caudatus
|
2.43
|
Amaranthus caudatus
|
MIRL22-A.unknown-82
|
Amaranthus spinosus
|
2.04
|
Amaranthus spinosus
|
MIRL22-A.unknown-83
|
Amaranthus retroflexus
|
2.12
|
Amaranthus retroflexus
|
MIRL22-A.unknown-84
|
Amaranthus powelli
|
2.40
|
Amaranthus powellii subsp. bouchonii
|
MIRL22-A.unknown-85
|
Amaranthus albus
|
1.97
|
Amaranthus albus atypical
|
MIRL22-A.unknown-86
|
Amaranthus retroflexus
|
2.41
|
Amaranthus retroflexus
|
1 Each row corresponds to a single seed selected from a single source. The bioyping identification call of each one of these seeds used a majority rule for 3 technical replicates per seed (if two technical replicates indicated one species, the sample was catalogued as such). Original data with technical replicates can be found in Additional file 5.
2 Average score from 3 technical replicates on the same seed. When using 2 technical replicates for the majority rule identification, the average was done between the 2 matching reps.
3 Samples marked as 'atypical' correspond to seeds that are less mature or outside the range of variation for typical seeds and would result in uncertain identification.
4 Flatlined samples did not produce a protein spectra in any of the three technical replicates.
5 Mismatches between protein biotyping and phenotypic identification (also highlighted).