Monitoring changes in the race structure of L. maculans populations is important for selecting effective R genes for blackleg resistance breeding and management . In Canada, this has been conducted in the prairies for almost 30 years, first based on pathogenicity group and later on the race structure according to L. maculans Avr profile [3, 32, 64]. These isolates provided the unique resources for this study to investigate the relationship between DNA variants and the function of Avr genes. It would be unnecessary and uneconomic to perform DNA re-sequencing on all the isolates phenotyped, only representative isolates were selected (Fig. 1, Table S2) to capture the genetic variation in Avr genes and explore their applications in pathogen race profiling.
What first caught our attention is that the SNP numbers considerably varied with avirulence genes, led by AvrLm4-7 with 38 SNPs, followed by AvrLm3 with 9 SNPs, but other avirulence genes only had 1 ~ 3 SNPs. It was believed that avirulence gene mutates in response to selection pressure imposed by resistant genes, so intuitively, Rlm4 and Rlm7, the cognate resistant genes corresponding to AvrLm4 and AvrLm7 were supposed to be the most dominant resistant gene. It was found, however, that Rlm1, Rlm2 and Rlm3 were the most frequently used R genes in Canadian B. napus varieties . AvrLm1 was deleted in about 80% of isolates, but all isolates still carry functional AvrLm3 genes despite of SNPs. Synonymous polymorphisms in human could affect messenger RNA splicing, stability, and structure as well as protein folding , therefore, the SNPs in AvrLm3, either synonymous or non-synonymous, might change AvrLm3 protein transformation to avoid host recognition. On the other hand, varieties carrying Rlm4 and Rlm7 represented only about 2%, but AvrLm4-7 mutates more frequently than AvrLm2 and AvrLm3. Now that AvrLm4-7 interacts with AvrLm3 , it is reasonable to ask if the interaction serves as an advanced mechanism for L. maculans to overcome host defense system.
Although some SNPs were chosen for L. maculans avirulence/virulence diagnostic test [20, 38], there is still an understanding gap between SNP patterns and L. maculans pathogenicity. Investigation of highly dense DNA variants in the pathogen isolates offers a panoramic view of the DNA variant profile, which may help better discern the pathogen’s strategy to cope with biological and environmental changes at the DNA level. In the current study, about 21,000 SNP sites were identified among 158 L. maculans isolates. We first examined variant composition and distribution among genomic regions, including AT-blocks, GC-blocks, and coding and non-coding regions. Ts/Tv is considered as an important parameter in evolutionary genetics . Theoretically transversions are much less common than transition mutations, because the generation of transversions during replication requires much greater distortion of the double DNA helix than transition mutations , for this reason, nucleotide transitions are favored several fold over transversions, which was suggested to be a result of selection . Relatively lower Ts/Tv ratios were reported in the ranges of 1.21 ~ 2.46 and 0.75 ~ 1.83 for some bacteria and unicellular eukaryotes respectively , but higher for fungi, for example, a ratio of 5 for rice blast fungus, Magnaporthe oryzae ; however, all these ratios were calculated at whole genome level. With great interest, we computed and compared the ratios across different genetic regions. The Ts/Tv of the whole L. maculans genome was found to be 3.1 for L. maculans, less than 4 as previously reported . In addition, Ts/Tv changed dramatically between AT block and GC blocks. Ts/Tv in AT-block was 3 times higher than that in GC blocks and genes, suggesting that transversion in GC blocks and genes occurs more frequent than AT block. The results suggested that GC blocks and gene regions contribute more to the pathogen evolution, because transversion could impose greater impact on functional regulatory element activity .
As mentioned above, monitoring the changes in L. maculans race structure plays a pivotal role in blackleg management. The conventional approach involves multiple steps including isolate collection, bioassays on differential hosts and disease severity rating. The approach has served its purpose for many years, but obviously with room for improvement on efficiency, accuracy and clarity. Use of SNP genotyping can be more cost-effective, with a high-throughput process amenable to automation. However, SNP genotyping approach has its limitations. Firstly, it relies on a single SNP to determine Avr gene functionality, which can be true if the SNP chosen is the sole nucleotide associated with the compatible/incompatible interaction. This might not be always the case because such a SNP was often selected from a limited number of isolates or populations, and some undetected SNPs or variants in other populations might compromise the avirulence gene function. Secondly, pathogens evolve because of host-pathogen interactions, and mutation in the pathogen occurs frequently for survival under selection pressure. While a SNP can be selected empirically to detect loss-of-function of an Avr gene, it cannot be ruled out that other SNPs within the gene region that may disrupt the Avr gene despite the presence of the SNP normally would predict the avirulence. For instance, none of SNPs we detected in AvrLm3 interferes with the gene function, but the functionality of AvrLm3 is actually dictated by SNPs in AvrLm4-7 . Because of the masking effect of AvrLm7 on AvrLm3 , AvrLm7 always disrupts AvrLm3 regardless of AvrLm3 gene sequence. Thirdly, SNP genotyping assumes that the site of interest is biallelic, so commercial SNP genotyping chemistry is designed generally to interrogate biallelic SNPs. While biallelic SNPs have been reported to be the majority of polymorphic sites, triallelic SNPs exist in L. maculans, and more importantly, a biallelic site will probably turn out to be multiallelic when more populations or individuals are tested. We found approximately 2% of total SNPs in L. maculans were triallelic in this study. For AvrLm4-7, the SNPs at SC12-1374707 are C/G/T (Fig. 5B). Empirically and coincidentally, this SNP site was selected as a biallelic marker (C/G) to determine whether AvrLm4 is dysfunctional [20, 26, 32]. Consequently, the SNP assay based on the biallelic assumption was not able to interrogate the third SNP allele T, which was present in some isolates like MT07-35 (Fig. 3B), leading to missing or even false calls. Similarly, the issue will arise for other triallelic sites, in AvrLmJ1-5-9 (SNP A/C/G at the 164th nucleotide) , and AvrLm2 (SNP A/C/G at the 397th nucleotide). Fourthly, pre-mature STOP codons raise an issue for SNP genotyping. Six pre-mature STOP codons were identified in AvrLm4-7 in this study. Pre-mature STOP codons are usually associated with gene loss-of-function (Fig. 4). Despite the observation that these nonsense SNPs in AvrLm4-7 were linked with the SNP T at SC12_1374707, they may co-exist with the SNP C or G at the same site, causing double virulence of avr4avr7, defying the SNP genotyping results of Avr4Avr7 or avr4Avr7. Therefore, without prior information, a diagnostic method based only on the site SC12_1374707 could be inaccurate or erroneous. Taken all together, a single SNP appears inadequate for genotyping L. maculans Avr genes.
Some DNA mutations in Avr genes are shared by L. maculans isolates from different continents. For example, AvrLm1 deletion was found in France , Australia  and this study, and SNP G/C at SC12_1374707 discovered in this study was previously identified in France . Some mutations, however, only found in one continent. For example, a K55T and a pre-mature STOP codon (R29X) were previously detected in AvrLmJ1-5-9 in Australia , but not reported in Canadian isolates. It is also noteworthy that the deletion of AvrLm4-7 was detected in 516 of 845 European isolates , but in none of the Canadian isolates examined in this study. Partial or complete deletion of AvrLm4-7 was also suggested to be responsible for the double virulence phenotype of avr4avr7 . An attempt to investigate the InDel location and size in AvrLm4 failed to find the forward primer (TATCGCATACCAAACATTAGGC) in either masked or unmasked assembly of scaffords available in the L. maculans genome. We used the forward primer sequence as a query to nucleotide-BLAST the L. maculans genome deposited in the NCBI gene bank, and retrieved the genomic region of accession GenBank: AM998635.1 surrounding the Avr gene AvrLm4-7 as a hit. The forward primer sequences were located in SC12 between the 1375938th and 1375960th bases, about 1,350 base downstream of the start codon of AvrLm4-7, which happened to be in a gap identified between the 1,375,400th and 1,376,400th in all isolates examined (data not shown), however the whole DNA sequence for AvrLm4-7 remained complete and intact. Because of the above-mentioned gap between the 1,375,400th and 1,376,400th bases, if the two primers were used to amplify AvrLm4-7 in Canadian prairie isolates, wrong genotyping result of deletion could be expected for some of isolates. This indicates that L. maculans isolates of different geographical regions mutate in different ways, suggestive of different evolutionary pathways.
With reduced cost of NGS and broader use of target amplicon sequencing for studying microbe genomes , it may be possible to use sequencing and haplotypes as a more reliable metric to characterize Avr profile and race structure of L. maculans. Statistical analysis based on haplotypes may often be more efficient than analyses of individual markers through an empirical process  or simulation studies , because haplotype analyses take tightly linked markers into account, providing much more information than individual markers do . SNP haplotype has been applied mostly in identifying genomic polymorphism and other genetic studies, such as the work on the honeybee pathogen Nosema ceranae . It is well-documented that the methods for SNP haplotype inference require family and population information  either for unrelated  or related individuals based on exact-likelihood , approximate-likelihood  computations, or rule-based strategies . Although the SNP haplotype construction algorithms are intended to identify co-segregating SNPs and then establish reliable genotype-phenotype connection, they are essentially a family-based analysis and the haplotypes generated from one pedigree might not be extrapolated to other populations. These methods were developed for diploid or polyploid organisms, and none of these methods was considered immaculate without limitations. Furthermore, this study dealt with haploid L. maculans, and aimed at establishing most accurate association of avirulence/virulence with DNA mutations in Avr genes. Therefore, in this study all discovered SNPs were taken into account, even synonymous SNPs that could compromise protein functions to some extent . This simplified SNP construction strategy can accommodate any SNPs, already-identified or newly-emerged in L. maculans isolates collected from any fields or populations, and alleviates concerns about the accuracy or inadvertent ignorance of SNPs resulting from any complicated SNP construction algorithm.
In conclusion, there are at least three advantages with genotyping-by-haplotyping when compared to other methods of analysis for L. maculans Avr profile or race structure, including SNP genotyping. Firstly, it considers all DNA variants in an Avr gene. Secondly, it can be readily translated into protein isoforms. Thirdly, it is able to capture the emergence of new SNPs in any pathogen populations. Therefore, we propose genotyping-by-haplotyping as a new method for large-scale L. maculans Avr profile analysis (Fig. 5). All existing SNP haplotypes with connection with Avr gene functions can be categorized, indexed, and stored in a database for inquiries. Any new haplotype will be added to the database once its relationship with the avirulence/virulence has been determined through the conventional phenotyping process. To this end, we are in the process of developing a new strategy for target sequencing of L. maculans Avr genes to improve the reliability of L. maculans Avr profiling while reducing the cost of genotyping-by-haplotyping procedure.