Identification of Genome-Wide DNA Variants and SNP Haplotypes Associated with Avirulence Genes of Leptosphaeria Maculans in Western Canada CURRENT STATUS:

Background Blackleg, caused by Leptosphaeria maculans , is a serious disease of canola/ oilseed rape in many parts of the world. An integrated approach is needed to control the disease, with genetic resistance as a key component of the management strategy. Towards this goal, a reliable approach for L. maculans race structure assessment becomes essential to gain understanding of the frequency of avirulence genes and race groups of the pathogen, and provide guidance for deployment of resistant canola cultivars. Results A total of 162 representative isolates collected in western Canada were selected for genome re-sequencing with an Illumina platform. Assembly of the short reads against the reference genome of L. maculans 'brassicae' isolate v23.1.3 led to the discovery of 21,016 DNA variants (SNPs and InDels), 93% SNPs and 7% InDels, with a transition/transversion (Ts/Tv) ratio of 3.1 genome wide. InDels occurred mainly in GC-blocks and the Ts/Tv ratio of SNPs in AT-blocks was > 2 times higher than that in GC-blocks. The number of variants were positively correlated with supercontig size, GC-block size and gene numbers. DNA variants in most avirulence genes were SNPs, except a deletion in AvrLm1 . The number of SNPs varied from 1–2 in AvrLm2 , AvrLmJ1-5-9 , AvrLm6 , AvrLm10A , AvrLm10B and AvrLm11 , 8 in AvrLm3 and 38 in AvrLm4-7 . This study is the first report of triallelic SNPs in AvrLm2 and AvrLm4-7 , and premature STOP codons in AvrLm4-7 . Nine SNP haplotypes were identified in AvrLm4-7 , however, only 2 ~ 3 haplotypes occurred in other avirulence genes, and in total 47 haplotype groups were identified from the isolates. The 47 SNP haplotype groups were translated into 44 protein haplotype groups and then isolates of L. macualns collected in western Canada were classified into10 races.


Background
Blackleg, caused by Leptosphaeria maculans, is a serious disease of canola/ oilseed rape in many parts of the world. An integrated approach is needed to control the disease, with genetic resistance as a key component of the management strategy. Towards this goal, a reliable approach for L. maculans race structure assessment becomes essential to gain understanding of the frequency of avirulence genes and race groups of the pathogen, and provide guidance for deployment of resistant canola cultivars.

Results
A total of 162 representative isolates collected in western Canada were selected for genome resequencing with an Illumina platform. Assembly of the short reads against the reference genome of L.
maculans 'brassicae' isolate v23.1.3 led to the discovery of 21,016 DNA variants (SNPs and InDels), 93% SNPs and 7% InDels, with a transition/transversion (Ts/Tv) ratio of 3.1 genome wide. InDels occurred mainly in GC-blocks and the Ts/Tv ratio of SNPs in AT-blocks was > 2 times higher than that in GC-blocks. The number of variants were positively correlated with supercontig size, GC-block size and gene numbers. DNA variants in most avirulence genes were SNPs, except a deletion in AvrLm1.
The number of SNPs varied from 1-2 in AvrLm2, AvrLmJ1-5-9, AvrLm6, AvrLm10A, AvrLm10B and AvrLm11, 8 in AvrLm3 and 38 in AvrLm4-7. This study is the first report of triallelic SNPs in AvrLm2 and AvrLm4-7, and premature STOP codons in AvrLm4-7. Nine SNP haplotypes were identified in AvrLm4-7, however, only 2 ~ 3 haplotypes occurred in other avirulence genes, and in total 47 haplotype groups were identified from the isolates. The 47 SNP haplotype groups were translated into 44 protein haplotype groups and then isolates of L. macualns collected in western Canada were classified into10 races.

Conclusion
In this study, we document the shortcoming of inferring races from SNP genotyping, and propose the use of SNP haplotyping for more reliable and informative analysis of L. maculans race structure. Background 3 The fungal pathogen Leptosphaeria maculans, a filamentous ascomycete, is the causal agent of blackleg (phoma stem canker) of canola or oilseed rape (Brassica napus L.) that often leads to economic losses [1,2]. Sustainable canola production requires effective management of blackleg using an integrated approach, including crop rotation, fungicide seed treatment, and the development of resistant cultivars [3][4][5][6]. The use of resistant cultivars has been considered one of the most effective and economical way to control the disease. However, breeding resistant cultivars is challenging because the pathogen has both sexual and asexual reproduction systems, which enables it to combine the most fitting genotypes and quickly increase their frequencies in the population through clonal reproduction [7][8][9]. The sexual reproduction system creates a high level of genetic diversity, enabling the pathogen to adapt to the resistance genes used in crop cultivars, such as Rlm1 in France [10], LepR3 in Australia [11] and Rlm3 in Canada [12]. Therefore, the constant search for new resistance genes from closely related Brassica species and introgression of these genes into canola is a major objective in blackleg resistance breeding programs [13,14]. Development of resistant cultivars involving specific resistance genes against L. maculans is deemed an on-going process due to host-pathogen co-evolution [15,16]. To gain an upper hand, the molecular mechanism underlying how the pathogen responds to selection pressure and subsequently overcomes the host resistance should be thoroughly investigated [17,18]. To this end, fundamental studies have been conducted on the genetic diversity of the pathogen [19,20] and host-pathogen interaction [21,22].
Regular profiling of Avr genes in L. maculans populations provides key information on the deployment of effective specific resistance genes that may be used for blackleg resistance breeding [32]. The Avr profile can be determined by using differential lines carrying known resistance genes based on the gene-for-gene theory [33,34]. This approach is time-and resource-consuming, and its accuracy can be affected by impurity in the seed stock of differential hosts and subjectivity of researchers.
Therefore, alternative methods have been explored to simplify and improve the accuracy of Avr profiling, and molecular markers are potentially more efficient and objective options. Generations of molecular markers, including random amplified polymorphic DNAs (RAPD), restriction fragment length polymorphisms (RFLP), simple sequence repeats (SSRs) and DNA variants (SNPs and InDels) have been used for the pathogen differentiation and monitoring. For example, Goodwin and Annis (1991) [35] used random decamer primers to differentiate three Canadian L. maculans isolate groups. RAPD markers have been used to assess the aggressiveness of L. maculans isolates [36,37]. Polymerase chain reaction (PCR) was also used to detect DNA variation in AvrLm1, AvrLm6 [38] and AvrLm4-7 [20]. PCR based methods can usually detect polymorphism in terms of presence/absence or size variation at a specific locus, but cannot capture DNA variants that potentially impact gene function. Brown (1996) [39] also indicated the importance of selecting molecular markers for plant pathogen research. Recent advances in modern DNA sequencing technologies, represented by Next Generation Sequencing (NGS), significantly reduce the cost of discovering DNA variants, including SNPs. Although NGS has found its broad application in fundamental genome and genetic research [40][41][42], there are limited reports on the identification of genome-wide DNA variants in plant pathogens, including L. maculans.   [43] reported 21,814 genome-wide SNPs in L. maculans based on a study of two Australian isolates.
The majority of SNPs are typically biallelic with lower information content as compared to other multiallelic markers [44], and single SNPs alone are inadequate for genetic diagnosis [45,46]. Use of SNP haplotypes was suggested to be more effective shortly after the discovery of high-density SNP markers from studies on human genetic diversity [47] and crop genetics of quantitative traits [48].
Haplotype is a term contracted from 'haploid genotype', and refers usually to a combination of alleles on the same chromosome, which are transmitted together, although the concept has been used in different contexts [49][50][51]. SNP haplotypes have been increasingly applied in genetic studies. For example, a SNP haplotype can be constructed based on linkage disequilibrium for genetic analysis of crop genome [44]. Haplotypes are often inferred from various algorithms, including parsimony [52], maximum likelihood [53] and Bayesian [54]. The SNP haplotype inference by a statistical algorithm, however, may also result in an incorrect outcome [54]. Because a haplotype carries more information than each individual SNP [46], SNP haplotypes have been used broadly in biomedical research on human diseases [55][56][57], association mapping in plants [45,56,59,60], marker-selection for resistance genes in crop species [60] and crop yield improvement [44]. Studies using SNP haplotypes have not been reported on plant pathogen race profiling.
Cost-effective genotyping technologies that capture sequence variation at ultra-high resolution are now available and commercial genotyping platforms that can generate thousands or millions of data points per experiment have become almost routinely used for genetic research [61]. In this study, we describe re-sequencing of 162 representative L. maculans isolates, which were collected in western Canada with the Avr profile characterized originally using a set of differential hosts, for the identification of DNA variants. The objectives were to: i) identify and characterize genome-wide DNA variants with a special focus on cloned Avr genes; ii) identify SNP haplotypes in these Avr genes and their protein isoforms; and iii) explore the possibility of L. maculans race structure determination through the analysis of haplotypes. Ultimately, we were interested to know whether SNP haplotypes are more reliable than individual SNPs for genotyping L. maculans isolates, especially for Avr profiling.

Results
Selection of isolates for re-sequencing Cluster analysis was conducted using the phenotypic data from bioassays on the differential hosts (Table S1) inoculated individually with the 1590 L. maculans isolates from western Canada. Six clusters containing 125, 325, 179, 143, 195, and 623 isolates respectively, were identified ( Fig. 1). A total of 162 isolates (Table S2) The AT-and GC-blocks accounted for 35.2% and 64.8% of the genome size (Table 2), containing 28.6% and 71.4% of variants; this indicated that variants were proportionally distributed in the two blocks. Table 1 Variant distribution and density in supercontigs and different genomic regions of L. maculans.  (Ts and Tv) with a Ts/Tv ratio of 3.1. Ts/Tv, however, varied with genomic regions, for instance, it was 7.5 in ATblocks, much higher than 2.3 in GC-blocks. Similarly, non-coding region possessed a ratio of 3.7, higher than 2.2 in coding regions including small secreted protein (SSP) genes and non-SSP genes.
InDels made up approximately 7% of total genome-wide variants, and were unevenly distributed in AT-blocks and GC-blocks. InDels occurred much more frequent in GC-blocks (9.5%) as compared with AT-blocks (1.2%). Deletions were found in nearly equal proportion with insertions in AT-blocks, or slightly higher than insertions in GC-block at whole genome level. Comparatively, deletions appeared more frequently, 3.8 times as many as insertions in SSP genes (  Table 2 Variants identified in different genomic regions of L. maculans

Characterization of DNA variants in the cloned Avr genes
As fore-mentioned, AvrLm1, AvrLm2, AvrLm3, AvrLm4-7, AvrLm6, AvrLmJ1-5-9, AvrLm10A, AvrLm10B and AvrLm11 of L. maculans have been cloned. The variants in these genes can be categorized by SeqMan Pro software into four groups: non-synonymous, nonsense, frameshift and synonymous. Nonsynonymous variants occurred in all of the Avr genes except AvrLm1, AvrLm10A and AvrLm10B.
The majority of the SNPs identified in L. maculans were biallelic; however, there were also 236 triallelic DNA variants detected in this study (Table S3). Of these triallelic variants, 41 were located in genes, including a triallelic SNP (A/G/C) site in AvrLm2 at 1,887,678 bp in SC6 (Fig. 3A) and a SNP (C/G/T) site in AvrLm4-7 at position 1,374,707 bp in SC12 (Fig. 3B). The presence of the triallelic SNP detected by NGS was verified by Sanger-sequencing the AvrLm4-7 gene from 3 isolates following PCR amplification and TOPO-TA cloning (Fig. S3).
Nonsense mutations might produce truncated proteins, so unlike other nonsynonymous SNPs, most nonsense mutations theoretically result in non-functional proteins. In this study at least 40 nonsense SNPs were identified in 32 genes, six of them located in AvrLm4-7, but none in any of the other Avr genes (Table S4, Fig. 3C). The presence of the nonsense SNP in AvrLm4-7 was also confirmed by Sanger-sequencing (Fig. S3).
AvrLm11 was identified in a dispensable chromosome, and its presence/absence was previously reported [31]. However, the full length of AvrLm11 was detected in all isolates sequenced in the current study despite of three SNPs. Similarly, a deletion was previously reported in AvrLm4-7 when PCR was utilized [62], but no deletion was found in any of the isolates employed in this study.

SNP haplotypes in Avr genes and their corresponding protein isoforms
The number of SNP haplotypes varied with Avr genes. For ease of description, the absence and presence of AvrLm1 were treated as two haplotypes. Two haplotypes were also found for AvrLm6, AvrLmJ1-5-9, AvrLm10A and AvrLm10B, and three haplotypes for AvrLm2, AvrLm3 and AvrLm11, and nine SNP haplotypes for AvrLm4-7 (  (Table S5). The two most popular haplotype groups, SNP-1:A-  (Table 4).   Table 4 Prevalence of SNP haplotype groups and races of L. maculans SNP haplotype Frequency Race structure* SNP-1:  Table 3.
Since a cDNA sequence determines that of its protein amino acids, protein sequence polymorphism was then examined, and a unique set of amino acid changes corresponding to an Avr gene SNP haplotype was referred to as a protein haplotype. For most of the Avr genes, a SNP haplotype translated into a unique protein haplotype, but for both AvrLm10A and AvrLm10B, the two SNPs did not cause any amino acid change and they corresponded to only one protein haplotype (Table 3).
Similar to SNP haplotype group, protein haplotype of each Avr gene in an isolate were combined to form a protein haplotype group, and a total of 44 protein haplotype groups were identified (Table S6).
Avirulence gene frequency and race structure assessment Since Avr gene SNP haplotypes potentially dictates protein function, we were tempted to utilize the haplotypes to assess the frequency of Avr genes in a L. maculans population. was not included in the assessment for the lack of host genetic sources to decide if the two amino acid changes, namely D 31 T and K 34 D, affected its protein function. Based on SNP haplotype groups, 10 L. maculans races in the isolates were differentiated with incidence significantly different from each other (Fig. 4A). Two most prevalent races, AvrLm2,3,5,6,9,10 and AvrLm2,4,5,6,7,10, collectively accounted for about 61% of the isolates. While AvrLm5, AvrLm6 and AvrLm10 were carried by all isolates, the other avirulence genes were found present in some isolates with frequency ranging from 17.5% (AvrLm1) to 64.3% (AvrLm7) (Fig. 4B).

Discussion
Monitoring changes in the race structure of L. maculans populations is important for selecting effective R genes for blackleg resistance breeding and management [3]. In Canada, this has been conducted in the prairies for almost 30 years, first based on pathogenicity group and later on the race structure according to L. maculans Avr profile [3,32,64]. These isolates provided the unique resources for this study to investigate the relationship between DNA variants and the function of Avr genes. It would be unnecessary and uneconomic to perform DNA re-sequencing on all the isolates phenotyped, only representative isolates were selected (Fig. 1, Table S2) to capture the genetic variation in Avr genes and explore their applications in pathogen race profiling.
What first caught our attention is that the SNP numbers considerably varied with avirulence genes, led by AvrLm4-7 with 38 SNPs, followed by AvrLm3 with 9 SNPs, but other avirulence genes only had 1 ~ 3 SNPs. It was believed that avirulence gene mutates in response to selection pressure imposed by resistant genes, so intuitively, Rlm4 and Rlm7, the cognate resistant genes corresponding to AvrLm4 and AvrLm7 were supposed to be the most dominant resistant gene. It was found, however, that Rlm1, Rlm2 and Rlm3 were the most frequently used R genes in Canadian B. napus varieties [12]. AvrLm1 was deleted in about 80% of isolates, but all isolates still carry functional AvrLm3 genes despite of SNPs. Synonymous polymorphisms in human could affect messenger RNA splicing, stability, and structure as well as protein folding [65], therefore, the SNPs in AvrLm3, either synonymous or non-synonymous, might change AvrLm3 protein transformation to avoid host recognition. On the other hand, varieties carrying Rlm4 and Rlm7 represented only about 2%, but AvrLm4-7 mutates more frequently than AvrLm2 and AvrLm3. Now that AvrLm4-7 interacts with AvrLm3 [25], it is reasonable to ask if the interaction serves as an advanced mechanism for L. maculans to overcome host defense system.
Although some SNPs were chosen for L. maculans avirulence/virulence diagnostic test [20,38], there is still an understanding gap between SNP patterns and L. maculans pathogenicity. Investigation of highly dense DNA variants in the pathogen isolates offers a panoramic view of the DNA variant profile, which may help better discern the pathogen's strategy to cope with biological and environmental changes at the DNA level. In the current study, about 21,000 SNP sites were identified among 158 L.
maculans isolates. We first examined variant composition and distribution among genomic regions, including AT-blocks, GC-blocks, and coding and non-coding regions. Ts/Tv is considered as an important parameter in evolutionary genetics [66]. Theoretically transversions are much less common than transition mutations, because the generation of transversions during replication requires much greater distortion of the double DNA helix than transition mutations [67], for this reason, nucleotide transitions are favored several fold over transversions, which was suggested to be a result of selection [65]. Relatively lower Ts/Tv ratios were reported in the ranges of 1.21 ~ 2.46 and 0.75 ~ 1.83 for some bacteria and unicellular eukaryotes respectively [68], but higher for fungi, for example, a ratio of 5 for rice blast fungus, Magnaporthe oryzae [69]; however, all these ratios were calculated at whole genome level. With great interest, we computed and compared the ratios across different genetic regions. The Ts/Tv of the whole L. maculans genome was found to be 3.1 for L. maculans, less than 4 as previously reported [23]. In addition, Ts/Tv changed dramatically between AT block and GC blocks. Ts/Tv in AT-block was 3 times higher than that in GC blocks and genes, suggesting that transversion in GC blocks and genes occurs more frequent than AT block. The results suggested that GC blocks and gene regions contribute more to the pathogen evolution, because transversion could impose greater impact on functional regulatory element activity [70].
As mentioned above, monitoring the changes in L. maculans race structure plays a pivotal role in blackleg management. The conventional approach involves multiple steps including isolate collection, bioassays on differential hosts and disease severity rating. The approach has served its purpose for many years, but obviously with room for improvement on efficiency, accuracy and clarity. Use of SNP genotyping can be more cost-effective, with a high-throughput process amenable to automation.
However, SNP genotyping approach has its limitations. Firstly, it relies on a single SNP to determine always disrupts AvrLm3 regardless of AvrLm3 gene sequence. Thirdly, SNP genotyping assumes that the site of interest is biallelic, so commercial SNP genotyping chemistry is designed generally to interrogate biallelic SNPs. While biallelic SNPs have been reported to be the majority of polymorphic sites, triallelic SNPs exist in L. maculans, and more importantly, a biallelic site will probably turn out to be multiallelic when more populations or individuals are tested. We found approximately 2% of total SNPs in L. maculans were triallelic in this study. For AvrLm4-7, the SNPs at SC12-1374707 are C/G/T (Fig. 5B). Empirically and coincidentally, this SNP site was selected as a biallelic marker (C/G) to determine whether AvrLm4 is dysfunctional [20,26,32]. Consequently, the SNP assay based on the biallelic assumption was not able to interrogate the third SNP allele T, which was present in some isolates like MT07-35 (Fig. 3B), leading to missing or even false calls. Similarly, the issue will arise for other triallelic sites, in AvrLmJ1-5-9 (SNP A/C/G at the 164th nucleotide) [27], and AvrLm2 (SNP A/C/G at the 397th nucleotide) [24]. Fourthly, pre-mature STOP codons raise an issue for SNP genotyping. Six pre-mature STOP codons were identified in AvrLm4-7 in this study. Pre-mature STOP codons are usually associated with gene loss-of-function (Fig. 4). Despite the observation that these nonsense SNPs in AvrLm4-7 were linked with the SNP T at SC12_1374707, they may co-exist with the SNP C or G at the same site, causing double virulence of avr4avr7, defying the SNP genotyping results of Avr4Avr7 or avr4Avr7. Therefore, without prior information, a diagnostic method based only on the site SC12_1374707 could be inaccurate or erroneous. Taken all together, a single SNP appears inadequate for genotyping L. maculans Avr genes.
Some DNA mutations in Avr genes are shared by L. maculans isolates from different continents. For example, AvrLm1 deletion was found in France [71], Australia [72] and this study, and SNP G/C at SC12_1374707 discovered in this study was previously identified in France [26]. Some mutations, however, only found in one continent. For example, a K 55 T and a pre-mature STOP codon (R 29 X) were previously detected in AvrLmJ1- [5][6][7][8][9] in Australia [27], but not reported in Canadian isolates. It is also noteworthy that the deletion of AvrLm4-7 was detected in 516 of 845 European isolates [62], but in none of the Canadian isolates examined in this study. Partial or complete deletion of AvrLm4-7 was also suggested to be responsible for the double virulence phenotype of avr4avr7 [26]. An attempt to investigate the InDel location and size in AvrLm4 failed to find the forward primer (TATCGCATACCAAACATTAGGC) in either masked or unmasked assembly of scaffords available in the L. maculans genome. We used the forward primer sequence as a query to nucleotide-BLAST the L. With reduced cost of NGS and broader use of target amplicon sequencing for studying microbe genomes [73], it may be possible to use sequencing and haplotypes as a more reliable metric to characterize Avr profile and race structure of L. maculans. Statistical analysis based on haplotypes may often be more efficient than analyses of individual markers through an empirical process [74] or simulation studies [75], because haplotype analyses take tightly linked markers into account, providing much more information than individual markers do [59]. SNP haplotype has been applied mostly in identifying genomic polymorphism and other genetic studies, such as the work on the honeybee pathogen Nosema ceranae [76]. It is well-documented that the methods for SNP haplotype inference require family and population information [77] either for unrelated [54] or related individuals based on exact-likelihood [78], approximate-likelihood [79] computations, or rule-based strategies [80]. Although the SNP haplotype construction algorithms are intended to identify cosegregating SNPs and then establish reliable genotype-phenotype connection, they are essentially a family-based analysis and the haplotypes generated from one pedigree might not be extrapolated to other populations. These methods were developed for diploid or polyploid organisms, and none of these methods was considered immaculate without limitations. Furthermore, this study dealt with haploid L. maculans, and aimed at establishing most accurate association of avirulence/virulence with DNA mutations in Avr genes. Therefore, in this study all discovered SNPs were taken into account, even synonymous SNPs that could compromise protein functions to some extent [65]. This simplified SNP construction strategy can accommodate any SNPs, already-identified or newly-emerged in L.
maculans isolates collected from any fields or populations, and alleviates concerns about the accuracy or inadvertent ignorance of SNPs resulting from any complicated SNP construction algorithm.
In conclusion, there are at least three advantages with genotyping-by-haplotyping when compared to other methods of analysis for L. maculans Avr profile or race structure, including SNP genotyping.
Firstly, it considers all DNA variants in an Avr gene. Secondly, it can be readily translated into protein isoforms. Thirdly, it is able to capture the emergence of new SNPs in any pathogen populations.
Therefore, we propose genotyping-by-haplotyping as a new method for large-scale L. maculans Avr profile analysis (Fig. 5). All existing SNP haplotypes with connection with Avr gene functions can be categorized, indexed, and stored in a database for inquiries. Any new haplotype will be added to the database once its relationship with the avirulence/virulence has been determined through the conventional phenotyping process. To this end, we are in the process of developing a new strategy for target sequencing of L. maculans Avr genes to improve the reliability of L. maculans Avr profiling while reducing the cost of genotyping-by-haplotyping procedure.

Conclusion
In this study, identification of genome-wide DNA variants and SNP haplotypes associated with avirulence genes of Leptosphaeria maculans were performed with 158 isolates selected from 1590 isolates originating from western Canada. There were 21,016 polymorphic variants identified in the isolates. Forty eight SNP haplotype groups were discovered and linked with different avirulence gene functionality. Being more informative and accurate than SNP genotyping, SNP haplotyping was hence proposed to be a more reliable and informative strategy for large-scale survey of L. maculans race structure. ( Table S1) were used as differential hosts to determine the presence/absence of Avr genes following established methodology [3]. Phenotypic similarity of the 1590 isolates was determined using AvrLm3, which was previously reported to be absent from the aforementioned reference genome [25], was included as the reference template to detect variants in AvrLm3 and its flanking regions.

Materials And Methods
Variants (SNPs and InDels) relative to the L. maculans reference isolate were determined using QSEq. Variant tables of isolates were exported, filtered (MAF ≥ 5, depth ≥ 10), and combined for further SNP data mining. The SNP distribution in specific genomic regions and variant composition was investigated. To explore variant density in different blocks, AT-and GC-block segments in supercontigs (SC) were first determined by Excel VBA programming with sliding windows of 120 nucleotides. In this study, any continuous nucleotide stretch meeting the criteria of GC content < 33% [15] and length > 2000 base pairs was considered an AT-block. All "N" masked genome regions were excluded from the AT-block assessment.
The confirmation of a triallelic SNP of SC12_1374707 was confirmed by Sanger sequencing. Briefly, AvrLm4-7 whole gene fragments were amplified using a forward primer CTCACCTCCGTATCTTTAGTCGCA and a reverse primer CAGTTAACAACATGCCACTATCCCT, and cloned into the vector PCR2.1-TOPO using the Invitrogen TOPO™ TA cloning kit. The inserts were Sangersequenced using regular M13 forward and reverse primers. Sequence profiles were imported into BioEdit for quality examination and sequence alignment among isolates.

Construction of haplotypes
SNP haplotype could be defined as SNP groups inherited together because of genetic linkage or their haploid nature. Pycnidiospores and mycelia of L. maculans are haploid fungal propagules so SNPs within a gene of an isolate are always inherited together. Considering that synonymous SNPs might impact protein function by altering protein structure [82], in this study all SNPs in each avirulence gene, both synonymous and non-synonymous were concatenated in the order of their position on supercontigs to form haplotypes. Each SNP haplotype was given a name starting with the gene name (AvrLm1, AvrLm2, etc.) followed by "SNP_Haplo" plus a serial number; amino acid changes as the result of haplotypes, were referred to as protein haplotypes in a similar fashion [65], but with "SNP" replaced by "PRO". For ease of description, each SNP haplotype for an individual Avr gene was given a unique code, following the naming convention of 'gene serial number:haplotype serial number'. A gene serial number refers to an Avr gene, and a haplotype serial number to a SNP haplotype based on their phenotype (virulent or avirulent) and prevalence. For example, the three haplotypes of AvrLm3 were assigned the codes 3:1, 3:2 and 3:3, whereas SNP or protein haplotype groups were subsequently constructed by concatenating SNP or protein haplotype codes. The functionality of a protein with a specific amino sequence translated from a SNP-incorporated cDNA were assessed based mainly on our phenotypic data along with the information in literatures [25,31]. Availability of data and materials The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interest.     Proposed approach for L. maculansAvr profiling using haplotyping