We initially analyzed the genotypes of 382 soybean accessions using the SoySNP180K BeadChips, which were divided into four panels. The total number of SNP markers was 180961. Of these, 58,388 of the SNP markers were of the Poly High Resolution type and accounted for 32.24% of the total markers. The selection of SNP markers was carried out in the next step. After comparing data over three years, 277 accessions with stable results were selected as the candidate population for the GWAS of the resistance genes. Finally, 234 accessions with both genotype and phenotype data were selected as the GWAS population. All the tested accessions had different degrees of lesions, including the most resistant genotypes, which indicated that the infection process was successful. There were seven resistant accessions (R), 119 moderate resistant accessions (MR), 87 susceptible accessions (S), and 21 highly susceptible accessions (HS). Additional file 1: Table S1).
A total of 58,388 SNP genotypes of 234 accessions were performed by descriptive analysis. The results showed that 27,302 SNPs had no polymorphism in the association analysis population. After filtering, 30,890 SNPs on 20 soybean chromosomes were selected as the genotype data sources for the GWAS. The number of SNPs on Chr. 18 was the largest (2051), and the numbers of SNPs on Chr. 1 and Chr. 12 were the least (1198) (Fig. 1). On average, there were 1544.5 SNPs on each chromosome, and each SNP covered 31.301 kb of chromosome (Table 1).
Population structure analysis of 234 accessions
A total of 30,890 SNPs from 234 accessions were analyzed by Principal component analysis (PCA). The first and second principal components explained 6.44% and 4.60% of the variance, respectively, and they explained 11.04% of the phenotypic variation. A scatter-plot of the first and second principal components showed that the soybean genotypes collected from different sources were closer to each other. A subpopulation structure was not observed in this population (Fig. 2). Cluster analysis of 234 soybean accessions based on UPGMA was conducted using 30,890 SNP marker genotypes. There was no obvious classification of the accessions, which was consistent with the PCA results (Additional file 2: Figure S1).
GWAS of genes resistant to C. sojina
The mixed linear model (MLM) and the "Q+K" model were used for the GWAS (Additional file 3: Figure S2). The K matrix of 234 accessions was completed by TASSEL5.0, and the Q matrix was constructed by selecting the first three principal components. A total of six SNPs were associated with FLS resistance (p ˂ 0.001), of which, one SNP, three SNPs, and two SNPs were located on Chr. 2, Chr. 5, and Chr. 20, respectively (Fig. 3). The phenotypic variation explained by each peak ranged from 6.17% to 9.20%, and the highest peak was for Affx-89062122 on Chr. 5 (Table 2).
Haplotype analysis forFLSresistance gene
The linkage disequilibrium (LD) between pairs of SNPs on Chr. 2, Chr. 5, and Chr. 20 was analyzed by Haploview Software 5.0. C. sojina resistance-related SNPs and adjacent SNPs form different haplotype blocks. Seven significant SNPs on Chr. 2 were located within an adjacent haplotype block of 37 kb (Fig. 4a). The r2 values of all SNPs in the LD block were close to 1, indicating that these seven SNPs were highly associated and might have the same causal site (s) with FLS resistance. A total of 27 SNPs on Chr. 5 were located in one adjacent haplotype block forming six haplotypes (Fig. 5a). A total of 35 SNPs on Chr. 20 were located in one adjacent haplotype block forming six haplotypes (Fig. 6a). Through the T test, we found a significant difference in the resistance and susceptibility index of Chr. 2. Compared with the Hap C (haplotype block C) genotypes, Hap A was significantly more resistant to FLS (p=0.016, less than 0.038). Hap A is the resistance genotype, and Hap C is the susceptible genotype (Fig. 4b). The Hap A and Hap D genotypes on Chr. 5 showed a significant difference in resistance in the susceptibility index (p=0.025, less than 0.05). Hap A is the resistant genotype, and Hap D is the susceptible genotype (Fig. 5b). On Chr. 20, Hap A and Hap C genotypes showed significant differences in the resistance to disease index (p=0.016, less than 0.05). Hap B and Hap C also showed significant differences (p=0.019, less than 0.05). Hap A and Hap B are the resistance genotypes, and Hap C is the susceptible genotype (Fig. 6b). The gene information distributed in three haplotype blocks was extracted, and the positions on the chromosomes were indicated (Fig. 7).
Candidate genes for FLS resistance at GWAS loci
A total of 55 genes within the three haplotype blocks on Chr. 2, Chr. 5, and Chr. 20 were annotated with Glyma1.0 in NR, GO, and KEGG databases (Additional file 4: Table S2). These genes were separated into 29 GO terms, mainly including mitochondrial outer membrane (GO:0005741), calcium-dependent protein serine/threonine kinase activity (GO:0009931), calcium-dependent protein kinase activity (GO:0010857), MAP kinase activity (GO:0004707), protoxylem development (GO:0090059), and xylan metabolic process (GO:0045491) (Additional file 5: Table S3). The enriched KEGG pathway is involved in plant–pathogen interaction (gmx04626), MAPK signaling pathway–plant interaction (gmx04016), and biosynthesis of secondary metabolites (gmx01110) (Additional file 6: Table S4). Among these genes, Glyma05g28980 encodes mitogen-activated protein kinase 7 (MPK7). Glyma20g31510 and Glyma20g31520 encode the calcium-dependent protein kinase (CDPK4) family proteins, and Glyma20g31630 encodes pyruvate dehydrogenase (PDH), which may be involved in plant disease resistance. These genes were predicted to be candidate resistance genes.