Identification of potential genes for resistance to tomato spotted wilt and leaf spots in peanut (Arachis hypogaea L.) through GWAS analysis

Background Tomato spotted wilt (TSW), early leaf spot (ELS), and late leaf spot (LLS) are three serious peanut diseases in the United States, causing tens of millions of dollars of annual economic losses. However, the genes underlying resistance to those diseases in peanut have not been well studied. We conducted a genome-wide association study (GWAS) for the three peanut diseases using Affymetrix version 2.0 SNP array with 120 genotypes mainly coming from the U.S. peanut mini core collection. Results A total of 87 quantitative trait loci (QTLs) were identified with phenotypic variation explained (PVE) from 10.2% to 24.1%, in which 41 QTLs are for resistance to TSW, 18 QTLs for ELS, and 28 QTLs for LLS. Among the 87 QTLs, there were six, four, and two major QTLs with PVE higher than 14.9% for resistance to TSW, ELS, and LLS, respectively. Of the 12 major QTLs, 10 were located on the B sub-genome and only 2 were on the A sub-genome, which suggested that the B sub-genome has more significantly resistance genomic regions than the A sub-genome. In addition, two genomic regions on linkage group B09 were found to provide significant resistance to both ELS and LLS. A total of 22 non-redundant candidate genes were identified significantly associated with diseases, which include 18 candidate genes for TSW, 3 candidate genes for both ELS and LLS, and 1 candidate gene for LLS, respectively.

yield loss caused by these diseases. Marker assisted selection (MAS) has been available in peanut breeding in recent years [12]. However, the molecular mechanisms underlying the three diseases in peanut and the genes responsible for resistance are still unclear.
Various studies have attempted to unravel the genetics of resistance in both field studies with natural inoculation and greenhouse studies using artificial inoculation methods. One major QTL for TSW resistance, 6 major QTLs for ELS resistance, and 5 major QTLs for LLS resistance were identified using a recombinant inbred line (RIL) population derived from Tifrunner × GT-C20 [13]. Khera et al. [1] reported 48 QTLs from S-population (SunOleic 97R × NC94022) for resistance to TSW, ELS, and LLS, which were primarily on linkage groups A01 (TSW), A01 and A03 (ELS), and B03 (LLS). Five major QTLs for LLS resistance with 10.27-67.98% PVE were detected in the RIL-4 population developed by crossing TAG 24 × GPBD 4 [14]. Wang et al. [15] identified 15 QTLs from F 2 and 9 QTLs from F 5 populations derived from Tifrunner × GT-C20 for resistance to TSW and 37 QTLs from F 2 and 13 QTLs from F 5 for resistance to LS. A major QTL with 22.8% PVE related to TSW in peanut was refined to a 0.8 Mb region on A01 chromosome [16]. Lately, 5 QTLs with a total of 36.4% PVE were identified for resistance to TSW by association mapping analysis in greenhouse studies [17]. Han et al. [9] revealed that two major QTLs on A03 and A04 were associated with resistance to ELS and one major QTL on B05 was resistance to LLS.
Most of the previous studies were used marker based linkage maps to identify QTLs associated with disease resistance. However, no studies have been conducted at the whole genome level to identify QTLs or their associated genes related to TSW, ELS, and LLS in peanut. With the development of next-generation sequencing, high-throughput genotype data coupled with phenotypic data can be used to identify marker-trait associations via genome-wide association studies (GWAS). GWAS has emerged as a powerful tool to detect markers (SNPs) closely linked to QTLs, based upon the principle of linkage disequilibrium between genetic markers and QTL that affect the trait [18]. Based on the high density SNPs array platform, GWAS can potentially offer higher mapping resolution with lower cost of time and money when compared with linkage mapping [19][20][21]. GWAS has been successfully conducted in many major agronomic crops, such as wheat, soybean, and cotton, to identify genes or markers responsible for various quantitative traits [21][22][23]. In peanut, the first attempt at GWAS was reported by Pandey et al. [24]. In that study, 300 genotypes were tested for 36 traits including biotic and abiotic resistances, seed quality and yield. More recently, GWAS of major agronomic traits related to domestication in peanut has been done by Zhang et al. [25]. However, GWAS on disease resistance in peanut is still very few.
Therefore, in this study, we performed a GWAS analysis for TSW, ELS, and LLS in 120 genotypes mainly coming from the U.S. peanut mini core collection using Affymetrix version 2.0 SNP array [26]. Our objectives were to determine genomic regions that are involved in resistance against TSW, ELS, and LLS diseases, and identify candidate genes residing within the identified QTLs, providing insights into the genetic mechanisms of resistance to the three diseases in peanut.

Phenotypical variation
A total of 120 genotypes were phenotyped in the field for TSW, and LLS in 2013, 2014, and 2015, and ELS in 2013 and 2014. For ELS and LLS, the phenotypic data displayed nearnormal distributions from year to year, however, for field TSW, the frequency of distribution was skewed to more resistance in all three years (Fig 1a). In general, the rankings were consistent for each disease in different years. The rating scores ranged  Figure S1.
A total of 1,038 tested peanut plants from the 120 genotypes and 9 control plants of 'Georgia Green' were screened by mechanical inoculation for TSW resistance in a greenhouse study. All 9 'Georgia Green' controls displayed visual symptoms. Of the 1,038 plants, 549 were infected by TSW based on ELISA, which is about 53% of susceptibility [17]. Not all virus-infected plants showed symptoms, but the correlation coefficient between visual symptoms and ELISA was 0.73, indicating that visual symptoms and ELISA results were highly consistent. For details see Li et al. [17].

Analysis of linkage disequilibrium (LD) blocks
The LD block was defined as a set of contiguous SNPs with the minimum pairwise r 2 value exceeding 0.50 [27]. After LD pruning, 1,024 independent SNPs and LD blocks were kept.

Genomic regions for diseases resistance
A total of 87 QTLs were identified related to TSW, ELS, and LLS by GWAS analysis using both field and greenhouse data (Table 1). Among which, there were six, four, and two major QTLs with PVE higher than 14.9% for resistance to TSW, ELS, and LLS, respectively ( Table 2). The distribution of all 87 QTLs across 18 linkage groups (LG) revealed that 25 QTLs were distributed throughout 8 LGs of the A sub-genome while 62 QTLs were mapped across 10 LGs of the B sub-genome (Additional file 2: Table S1). This indicated that the B sub-genome has more resistance genomic regions than the A sub-genome and of the 12 major QTLs, 10 were located on the B sub-genome and only 2 were on the A sub-genome (Table 2). A maximum of 21 QTLs were identified on LGB08 followed by 15 QTLs on LGB09 and no QTL was found on LGA02 and LGA04 (Additional file 2: Table S1).
LGB08 and LGB09 have more significant QTLs than other linkage groups, which number were 3 and 4 correspondingly ( Table 2). In addition, two genomic regions (AX-177643393 and AX-177643343) on LGB09 were found significantly resistance to both ELS and LLS. LGB08, there were 3 significant QTLs in a genomic region from 34 Twelve genomic regions on different linkage groups were found to be suggestively associated with TSW (− log 10 (P value) > 3.01), but were not statistically significant at the genome level. As shown in Additional file 4: linkage groups were also found to be suggestively associated with TSW by EMMAX method.

Genomic regions and genes associated with ELS
For ELS, phenotypic data was collected in 2013 and 2014 (Fig 1a). surrounding each identified promising SNP were obtained. A total of 15 genes were determined including 3 genes nearby significant QTLs and 12 around suggestive QTLs. The corresponding genomic positions and biological processes related with ELS of the genes are listed in Additional file 5: Table S3 and Additional file 8: Table S6. Of the 15 genes, 5 genes have known functions in immunity and defense response, MKP1 (including proteintyrosine-phosphatase), LOC107489483 (NF-kappa-B essential modulator-like), PR1B1 (pathogenesis-related leaf protein 6), LOC107457889 (glucan endo-1,3-beta-glucosidase 2), and PTI5 (pathogenesis-related genes transcriptional activator).

Genomic regions and genes associated with LLS
Disease resistance to LLS was observed in the field in 2013, 2014, and 2015. A genomewide significant region for LLS was detected on LGB09 (Fig 4). The genome wide significant region harbored two QTLs that were statistically significant at the genome level (− log 10 (P value) > 4.31). The significant SNPs were located in a genomic region from 143,767,171 to 143,783,013 bp, spanning a total of around 15.84 kb with PVE ranged from 19.15% to 19.99% (Table 2). These two QTLs were also identified as being associated with resistance to ELS, which means ELS and LLS shared two common significant genomic regions.
In addition to the genome-wide regions, the GWA study identified 26 suggestive regions associated with LLS (Additional file 9: Table S7). On LGB08, 7 suggestive associated QTLs were located in genomic region from 39

Discussion
Tomato spotted wilt phenotyping Natural inoculation and mechanical inoculation are two dramatically different ways to study TSW since the former depends on the disease activity in nature with resulting in differing results among years and locations. In this study, the correlation coefficients among the visual symptoms in the greenhouse and the field evaluation results were small ranging from -0.06 to 0.24 (Fig 1b). The results also revealed that there was sizable variability of TSW field incidence ranging from 0.7% in 2014 to 7.0% in 2015 (Fig 1b).
Greenhouse or laboratory tests were not reliable predictors of field resistance to TSW but are useful in molecular mechanisms for TSW resistance. Phenotypic data displayed nearnormal distributions for TSW in the greenhouse test but not in the field tests. TSW is only transmitted by thrips in the field, and tobacco thrips (Franklinieila fusca) is the most common vector of TSW in peanut. The incidence of TSW in the field was much lower than in the greenhouse, which could be caused by a variety of factors. The timing of thrips flights into crop fields in the spring relative to the age and susceptibility of the target crop determines final incidence of TSW [28,29]. Field transmission is also determined by the thrips-plant-virus interactions, whereas in the greenhouse effects produced by thripsvirus and thrips-plant interactions was eliminated to study only genotype susceptibility to TSW. Therefore, in addition to environmental factors, thrips-plant interactions may also influence field results. Some plant factors that might deter thrips feeding and subsequent virus transmission include physiological differences in nutrient contents and morphological traits such as leaflet thickness and wax content [30][31][32]. With mechanical inoculation, the amount of the virus received by each plant is relative uniform. However, the plants acquired the virus from thrips randomly depending on their number and host preference.
Temperature, which is also a factor that might influence the virus movement and expression in the plants is more variable in the field [33]. Plant age can also influence virus transmission and symptom expression [34]. In the greenhouse young peanut plants LGA04 and 10 suggestive QTLs on LGA01, LGA04, LGA08, LGA09, LGB02, LGB04, and LGB10. Compared to our study, the overlapped linkage groups containing significant or suggestive QTLs were LGA01 and LGA09. In addition, Li et al. [17] identified five markers: pPGPseq5D5, GM1135, GM1991, TC23C08, and TC24C06 associated with visual symptoms in the greenhouse by association mapping analysis, but they didn't provide the locations of those markers. Since the peanut genome sequence is available, we did BLAST for those markers to find the positions on the physical map. BLAST results showed that 4 hits for marker pPGPseq5D5, which included one hit on LGB07, one on LGB10, and two hits on LGA07. For marker TC23C08, two hits on LGA01 and LGB01 were detected. In total, 13 hits were obtained for marker TC24C06, including linkage groups: LGA02 (1), LGA03 (1), LGA05 (2), LGA07 (1), LGA08 (2), LGB05 (2), LGB06 (3), and LGB08 (1). Since the linked markers or QTLs were located on different linkage groups without their physical positions, comparison studies cannot be conducted.

Leaf spot complex inheritance pattern
Resistance to leaf spots are more complex to study since their inheritance pattern is controlled by multiple genes [9]. Several studies have demonstrated the QTLs association with ELS and LLS, however, the results were totally different [1,9,13]. Khera et al. [1] identified 6 major QTLs related to ELS distributed on LGA01, LGA03, LGA05, LGB03, and LGB04 and 2 suggestive QTLs on LGB05 and LGA10, while in our study, 3 suggestive QTLs were found in the same linkage groups as reported by Khera et al. [1], which were LGA01, LGA03, and LGB03. For LLS, Khera et al. [1] identified 6 major QTLs distributed on LGB03 and LGB05 and 8 suggestive QTLs on LGA01, LGA05, LGA06, LGA10, LGB04, and LGB06, however, no statistically significant QTL was found on LGB03 or LGB05 in our study but suggestive QTLs were identified on LGA05, LGA06, LGA10, and LGB06. Pandey et al. [13] reported 6 major QTLs associated with ELS distributed on LGA03, LGA05, LGA06, and LGB06 and 3 suggest QTLs on LGA04, LGA07, and LGB01 and they also identified 5 major

GWAS and genes associated with diseases
Since more DNA markers have been developed in peanut, genome-wide association mapping for desirable traits in cultivated peanut is now feasible. SNP coverage and sample sizes can affect the ability to achieve significance. In our case, the SNP numbers were relatively small, but the use of a large number of accessions from the mini core collection was helpful for the detection of the QTL. While GWAS has a higher power to detect associated markers, it also can produce false positive associations, which is a type I error [35,36]. EMMAX model, using high-density markers to calculate a pairwise relatedness, was utilized for correcting population stratification which can lead to biased or spurious results [37]. To exclude false positive results produced by sample structure observed in our study, EMMAX method was applied and adjusted for the first ten principal components after calculation of kinship matrix-pairwise IBS distance [38]. The statistical significance of suggestive QTLs detected in this study may also be affected by phenotyping errors. Visual ratings were used to evaluate all of the three diseases and accurate phenotyping is difficult since disease resistance is a complex trait.
In this study, a total of 12 significant QTLs were identified, including 6 for TSW, 4 for ELS, and 2 for LLS. One interesting finding is that ELS and LLS shared 2 significant QTLs and the markers nearby were AX177643393 and AX177643343. Both of these two markers were on LGB09 with locations 143,783,013 and 143,767,171. AX177643393 was the most significant marker, which explained 24.11% phenotypic variance of ELS. Since the reference peanut genome sequence was available, it was possible for us to determine the genes around the QTL regions. Three genes nearby significant QTLs were identified for ELS and LLS, which included 205D04_12 (TIR-NBS-LRR disease resistance protein), KK1_048795 within the significant associated region on LGB08 in our study, was reported to participate in plant innate immune response and hormone signaling [42,43]. Glucan endo-1,3-betaglucosidase 2 (LOC107457889) was thought to be an important plant defense-related product against pathogen attack [44]. The analysis of cDNA clones revealed that LOC107457889 expression was changed after tobacco mosaic virus infection [45]. This resistance. NLR proteins of the plant innate immune system had a role in quantitative disease resistance in addition to dominant gene resistance that has been well characterized [46,47]. Signal transducer and activator of transcription A (LOC107484292) was identified with gene expression regulation related functions within the genome-wide significant association regions on LGB08, which had similar function with STAT1 in mammals (Additional file 8: Table S6). STAT1 plays an important role in the control of fungal and other infections by innate immunity [48,49]. Mice with knockout STAT1 showed significant resistance to calicivirus pathogenicity [48,50]. Several reports have indicated that activation of the PKA/cAMP pathway could cause down-regulation of STAT1 activation [51][52][53]. KK1_048795 and KK1_043666 were reported to involve DNA integration and reverse transcriptase activity [46,54]. It seems possible that resistance to ELS and LLS were at least partially controlled by the same genes. If so, it will be helpful for breeders to identify genes that simultaneously impart resistance to both diseases. In addition, those three genes also appeared in the region of significant QTLs for TSW, but on different linkage groups.
LOC107470950 (translation initiation factor eIF-2B subunit delta), identified nearby a significant QTL for TSW resistance on LGB05, was reported to participate in regulation of translation. DEF1 (RNA polymerase II degradation factor 1) was an important mediator of DNA damage stimulus response, where DEF1 assisted in the degradation of the RNA polymerase stalled at DNA damage sites and probably coordinated the repair mechanisms [55,56]. In addition, there were some genes associated with disease resistance in suggestive regions, such as PTPN22 (tyrosine-protein phosphatase non-receptor), LOC107489483 (NF-kappa-B essential modulator-like), RPPL1, AMC1 (Metacaspase-1), CDR1 (aspartic proteinase), CESA3 (cellulose synthase A catalytic subunit 3), RLM1B (disease resistance protein RML1B-like isoform X3), CESA3, and SR1IP1 (BTB/POZ domaincontaining protein) related with TSW, MKP1, PR1B1, PTI5, and RPM1 related with ELS, and LECRKS4, EXO70B1, CERK1, LOC107460592, and LOC107461399 related with LLS. After a comparison of the candidate genes identified in this study, we found that several genes associated with plant disease were also reported or even verified in other plants, such as LRR-RLK, RPM1, and PTI5. Protein RPM1 detected the phosphorylation of RIN4 by pathogen effectors AvrB and AvrRpm1 elicits the resistance response in Arabidopsis and will be degraded at the onset of the hypersensitive response [39]. Plants lack animal-like adaptive immunity mechanisms, and therefore have evolved a specific system against pathogens including PAMP-triggered immunity and effector-triggered immunity. Activation of FLS2 and EFR triggers MAPK signaling pathway that activates defense genes for antimicrobial compounds. In addition, pathogens can manipulate plant hormone signaling pathways to evade host immune responses using coronatine toxin. Putative pathways involved in disease resistance in peanut are illustrated as a diagram (Additional file 11: Figure S3.).

Conclusions
This is the first GWA study using the U.S. peanut mini core collection. Results identified TSW, ELS, and LLS related QTLs and some novel candidate genes which have never been found to affect disease in peanut. Fine mapping of the QTL for disease resistance will allow application of marker-assisted selection and understanding of underling molecular mechanisms. Further RNA-Seq, qRT-PCR, or gene knockout experiments will be needed to demonstrate the candidate genes as the disease resistance genes in peanut. These findings may provide the genetic basis for better understanding the molecular mechanisms for peanut disease resistance. In addition, except for gene expression, posttranscriptional and translation process as well as environment and genotypeenvironment interactions also can affect final phenotype.

Methods
Plant material and field evaluation of disease resistance A total of 120 genotypes mainly coming from the U.S. mini core collection [26] were included in the experiment (Additional file 12: Table S9). These accessions included six botanical varieties: fastigiata, hypogaea, peruviana, vulgaris, aequatoriana, and hirsuta Greenhouse evaluation of disease resistance One hundred and twenty genotypes mainly coming from the U.S. peanut mini core germplasm collection were used for screening for TSW resistance by inoculation and ELISA assay. Plants were grown in the greenhouse at the temperature of 25 to 30 °C, and 60 to 90% relative humidity. Nine seeds per genotype were sown in a plastic seedling trays isolation. The genomic DNA was extracted using the modified CTAB method [58]. Purified DNA was dissolved in TE buffer for subsequent analysis. The quantity and quality of the DNA were measured using the ND 2000.
The genotyping was performed using SNP array (Affymetrix) at GeneSeek (Lincoln, Nebraska, USA). The markers density and heterozygosis see Additional file 13: Figure S4 and Additional file 14: Figure S5. No samples were excluded due to low quality or low call rate (< 0.95). A total of 13,382 SNP markers retained after filtering out SNPs with genotyping error, a call rate < 0.95 or minor allele frequency < 0.05. To characterize the population structure of the 120 accessions, the best value of K was determined by STRUCTURE 2.2.3 [59].

Statistical analysis
Statistical analysis was carried out using the SVS software package (SNP & Variation Suite, version 8.4.4) and GAPIT [60]. The Q-Q plots indicated that the model in SVS fit the data better than GAPIT. The phenotype data, genotype data, and genetic marker map were imported into SVS. Linkage disequilibrium (LD) pruning was conducted with a window size of 50 SNPs, a step of 5 SNPs and r 2 threshold of 0.5, generating 1,024 independent SNPs and LD blocks for this population. Principal component analysis was conducted using these independent SNP markers (Additional file 15: Figure S6).
To efficiently correct population structure in the association test, Efficient Mixed-Model Association eXpedited (EMMAX) analyses [37] in SVS software package using first ten principal components as covariates was conducted for genome-wide association analysis.
The model is listed as follows: where Y is the vector of phenotype (disease level); X is the matrix of fixed effects including first ten principal components; b is the vector representing the coefficients; Z is the matrix of random effect; u is the coefficient vector, Var(u) = σ 2 gG, where σ 2 g represents the additive genetic variance and G stands for the genomic kinship matrix; e is the vector of random residuals. Threshold P value for genome-wide significance was calculated based on Bonferroni-correction with the estimated number of independent SNPs and LD blocks [61]. Manhattan plots were produced using qqman [62].

Candidate genes
The genes within the associated genomic region (~1 Mb) with TSW, ELS, and LLS were identified separately. AUGUSTUS [63] and FGENESH+ [64] were used to analyze the peanut genome sequences (https://peanutbase.org/) that surround the SNPs to identify the upstream and downstream genes. The identified genes were annotated by BLAST against the non-redundant protein database [65]. Gene function and pathways were collected form Uniprot [66] and Kyoto Encyclopedia of Genes and Genomes (KEGG) database [67]. Availability of data and material All phenotypic data, GBS data, and the code used to execute the GWAS are available through direct contact to the corresponding author Charles Chen by email cyc0002@auburn.edu.

Competing interests
The authors declare that they have no competing interests.

Funding
This work was supported in part by funding from The Peanut Foundation for initial the project for phenotyping the panel in field and greenhouse; National Peanut Board for genotyping the population, Alabama Peanut Producers Association for the PI oversee the project, and the USDA-NIFA hatch fund for supporting data analysis and paper publication.
Authors' contributions HZ conducted the statistical analysis and prepared the manuscript. YC collected GWAS data and revised the manuscript. PD, YT, and JL prepared the samples and phenotypic data. TJ, JPC, PO, CH, MLW, AJ, HC, and AH were involved in manuscript revising. CC supervised the whole study and provided assistance for manuscript preparation. All authors read and approved the final manuscript.