Identication of novel loci and candidate genes for resistance to cucumber powdery mildew using GWAS

Powdery mildew (PM) is one of the most serious diseases in cucumber and causes huge yield loss. Multiple quantitative trait loci (QTLs) for PM resistance have been detected in previous studies, however, none of these studies used the approach of genome-wide association analysis (GWAS). In this study, a cucumber core germplasm (CG) consisting of 109 resequenced lines was evaluated for PM resistance in three seasons. Results Among these loci, and were the novel ones that have not been reported in previous studies. The loci of pmG2.1, pmG5.2, pmG5.3 showed stronger signal in three seasons, for which three candidate genes Csa2G022790, Csa5G484620 and Csa5G604330 were respectively predicted using pairwise LD correlation and qRT-PCR analysis. Further, Csa4G022270 in pmG4.1, Csa6G022350 and in pmG6.1 might be the candidate genes based on the annotation of homologous genes in Arabidopsis.


Introduction
Cucumber (Cucumis sativus L.) is an important vegetable crop with a long history of cultivation. Powdery mildew (PM) is one of the most important diseases in cucumber and occurs extensively in various countries all over the world [1][2][3]. PM infection in cucumber at owering stage could reduce the yield by 20% ~ 40%. Development of resistant cultivars is the most economic and effective method to control PM in cucumber.
Most of the QTLs showed recessive inheritance, indicating that the resistance is governed by loss-offunction mutant alleles of certain susceptible genes [12,13]. Indeed, a mutant allele of the cucumber susceptibility gene CsaMLO8 (named as CsMLO1 in [14] explains the contribution to PM resistance by the QTL pm5.1 [15]. Except for the pm5.1, causal genes of other mapped QTLs for PM resistance in cucumber have not been discovered yet. Thus, further researches are need to ne-map and clone the QTLs, which may be combined with an approach to search for mutations in susceptibility genes colocalized with mapped QTLs (Schouten et al. 2014).
Genome-wide association analysis (GWAS) is an effective and feasible way for the analysis of disease resistance traits [16]. With the development of sequencing technology and the reduction of sequencing costs, GWAS has been applied in many species, such as corn [17], rice [18][19][20], maize [21], sorghum [22], and foxtail millet [23]. In cucumber, GWAS has been used to map multiple genes. For example, the bitterness gene Bi were detected via GWAS in 115 diverse cucumber lines [24]. Downy mildew (DM) resistance was analyzed in 395 accessions selected from 1234 USDA cucumber lines [25]. Using GWAS in combination with QTL mapping, a candidate gene related to callus regeneration was identi ed (Wang et al. 2018c In this study, 109 core germplasm (CG) lines [11,27] were used for a GWAS on PM resistance. In total, 12 QTLs were detected, of which three are novel and nine were reported in previous studies. Further, potential candidate gens for certain QTLs were studied by evaluating their expression in contrast lines with high resistance or susceptibility.

Results
Genetic diversity of PM resistance in the CG population  (Fig. 2a). The DI correlation among the three experiments was signi cant with a R 2 = 0.7 ~ 0.81 (Fig. 2b). Based on the SNP in the three loci and the DIs of three seasons, we detected the genotypic effect at each of the two seasons ( Supplementary Fig. 1) [28]. Results showed that the three loci were independent in the three seasons, which indicated that these three loci work independently for powdery mildew.
The mean DI per line of the CG population was used for phylogenic analysis to study the PM resistance diversity, resulting in three clusters based on the complete method (1.5) of SAS ( Supplementary Fig. 2). The rst cluster contains 34 lines with a DI < 20. Among these lines, 23 lines showed a DI < 10 and classi ed as highly resistant (HR) lines. Of these HR lines, nine (CG7, CG9, CG32, CG35, CG44, CG63, CG70, CG82 and CG101) showed a consistent DI score in all three experiments. The second cluster contains 33 lines with a DI > 35 and regraded as highly susceptible (HS) lines. Five HS lines, CG3, CG23, CG31, CG55, CG209 showed a comparable DI among the three tests. Interestingly, seven tested Japanese type (CG3, CG4, CG28, CG29, CG31, CG103 and CG104) showed high sensitivity. The third cluster includes 30 sensitive lines with a DI between 20 and 35.
Candidate genes for the PM resistance Aiming at identifying potential candidate genes of the stable loci (pmG2.1, pmG5.2 and pmG5.3) and novel loci (pmG4.1 and pmG6.1), chromosomal region of 50 kb around the peak SNPs were further analyzed.
Candidate gene analysis for novel loci ( pmG4.1 and pmG6.1) Physical distance of the peak SNP in pmG4.1 was at Chr.4:2,436,922 bp. Based on the cucurbit genomics (http://cucurbitgenomics.org/), 14 genes were predicted in the pmG4.1 region (Supplement table S2). In this region, Csa4G022270 encodes NPR1-like protein that regulates salicylic acid (SA)-mediated plant defense [30]. Physical distance of the peak SNP in pmG6.1 was at Chr.6:2,291,66 bp. Ten genes were predicted in the pmG6.1 candidate region (Supplement table S3). Csa6G022350 encodes splicing factor U2AF subunit and its homologous gene in Arabidopsis is involved in pathogen response via the salicylic acid pathway [31]. Csa6G022370 encoded MAC/Perforin domain containing protein and its homologous gene in Arabidopsis is related to response during germinivirus infection [31]. These two genes might be the candidate genes of pmG6.1.
Candidate gene analysis for pmG2.1 For the pmG2.1 locus, SNPs of the chromosome region Chr. 2: 2,710-2,949 kb were analyzed by pairwise LD correlations with a focus on candidate genes in the interval from 2.715 to 2.768 Mb (Fig. 4a). Based on the Cucumber Genome Browser (http://www.icugi.org/cgi-bin/ICuGI/index.cgi), 11 candidate genes were located in this interval. Three genes, Csa2G022770, Csa2G022780 and Csa2G022790 were selected, since they all encode a disease resistance-like protein although the DNA sequence similarity among them is very lower than 25%. The relative expression of the three genes were analyzed by qRT-PCR ( Fig. 4d-f) with gene speci c primers (Supplement table S4) in the selected HR and HS lines. Results showed that Csa2G022790 was signi cantly up-regulated in three of the four HR lines at 12 h after PM inoculation. In one of the three HR lines, CG44, the induction was also seen at 8 d after PM inoculation. A single nucleotide polymorphisms (SNP) (G to T) in the promoter of Csa2G022790 was located on the TCAelement (salicylic acid responsiveness) with the G presents in HR lines (Fig. 4c). Salicylic acid is a key signaling component involved in the activation of certain plant defense responses [32]. In the HS lines the expression of Csa2G022790 was down-regulated in the CG3 line and no difference was found in the other lines (Fig. 4g). These results indicated that Csa2G022790 might be the candidate gene of the pmG2.1 locus.
Candidate gene analysis for pmG5.2 For the pmG5.2 locus, SNPs in the chromosomal region of Chr.5: 16,650 − 17,150 kb were analyzed (Fig. 5a). We focused on the interval from 16,878,456 bp to 16,879,990 bp based on pairwise LD correlations (r 2 ≥ 0.6). Csa5G484620 was identi ed in this region, encoding an NBS resistance protein.
Two SNPs (SNP2431819 and SNP2431821) were found in this gene (Fig. 5c). Twenty nine out of 30 HR lines were G/G and the other one is A/C, whereas 9 of 29 S lines were A/C and the others were G/G.
Based on the qRT-PCR, the Csa5G484620 was signi cantly up-regulated in CG44 (HR) at the 12 h and down-regulated at 24 h after PM inoculation (Fig. 5d). Then, four HR lines and four HS lines were applied to qRT-PCR (Fig. 5e). In the HS lines, the expression of Csa5G484620 was down-regulated in CG23 while up-regulated in CG55, and no difference in the other two lines. Whereas the expression of all the HR lines were signi cantly up-regulated at 12 h after PM inoculation. These results indicated that Csa5G484620 might be the candidate gene of the pmG5.2 locus.
Candidate gene analysis for pmG5.3 For the pmG5.3 locus, SNPs in the chromosomal region of Chr.5: 22,230 − 22,610 kb was analyzed with the focus on the interval from 22,506 kb to 22,518 kb (~ 12 kb) using pairwise LD correlations (r2 ≥ 0.6) (Fig. 6a). Two annotated genes were identi ed; Csa5G604330 encoding alpha-amylase and Csa5G604340 encoding Pectate lyase. The latter is homologous to the Arabidopsis PMR6 gene [13]. SNP_2526314 was on the CDS of Csa5G604340. For this site, 29 of 30 HR lines was A, the other HR lines was G, whereas that 7 of 29 HS lines was G, the other HS lines was A (Fig. 6b). Then, four HR lines and four HS lines were applied to analyze the expression of Csa5G604340 using qRT-PCR. Two of the HS lines (CG23 and CG31) and three of the HR lines showed up-regulated expression at 12 h after PM inoculaton (Fig. 6c). We focus on all the 18 European type lines, all the G genotype in this site was HS lines, and 55.6% of A genotype were HR lines, which showed that the gene Csa5G604340 could be related to the PM resistance of European type (Fig. 6d). Csa5G604340 might be the candidate gene of the pmG5.3 locus.

Discussion
The inheritance of PM resistance is complicated, most studies showed that the PM resistance was a quantitative trait that controlled by multiple genes [4-9, 11, 33-35]. In this study, we applied GWAS on a CG population, which contains 97 lines representing a corn collection of worldwide cucumber germplasm [27] and evaluated three times for PM resistance in different years and seasons. The frequency distributions of DI indicated that PM resistance of cucumber were controlled by multiple loci (Fig. 2), which is consistent with previous studies [7,14].
Many QTLs for PM resistances were identi ed and mapped on all seven chromosomes in different studies [6-10, 29, 34]. In the current study, twelve loci associated with PM were detected by GWAS, of which nine loci were reported in previous studies (Fig. 1) and three loci (pmG2.1, pmG4.1 and pmG6.1) are novel. The pm-QTL (LGIII, 20C) on Chr.7, pm-QTL (LGI, 26C) on Chr.6 [5] and pm1.1 on Chr.1 [6] was not detected by our study, which may be due to the different environments. Our results show that GWAS may be a complementary approach to other QTL studies for PM resistance in cucumber.
Previous studies showed that most of the QTLs for PM resistance in cucumber are recessively inherited [6,7,34], indicating that the resistance is governed by impaired susceptibility (S) genes [12,13] mapped cucumber genes that are homolog to know plant S genes for PM and downy mildew (DM) resistance.
Additional to check the overlap between S genes and QTLs for mildew resistance, we focused also on resistance (R) genes in this study. NBS and leucine-rich repeat (LRR) domains were important features for R gene family [36][37][38][39]. In our study, three genes were predicted for three major loci, pmG2.1, pmG5.2 and pmG5.3, respectively. In pmG2.1, a gene encoding disease resistance-like protein (Csa2G022790) was identi ed. The expression of Csa2G022790 was signi cantly up-regulated in the HR lines and the SNP located on the promoter of the gene is shared by 28 of 29 HS lines. In pmG5.2, the expression of Csa5G484620 was signi cantly up-regulated in the HR lines. Almost all HR lines have the same SNP in this gene. These indicated that the gene may be a resistance gene. For pmG5.2, the candidate gene Csa5G604340 is a homolog of the AtMLO6 gene, thus a S-gene [13]. The identi ed SNP in this gene was shared by all the HS of European type.
PM resistance is a complex trait and it often involves multiple genes and multiple metabolic pathways [40,41]. Few genes, which are signi cantly effect on the traits, will be mapped by one populations. Because different disease resistant materials have different disease resistant genes, it is impossible to clone all the PM genes from a group or a speci c material. However, GWAS can effectively mine genes related to PM in core germplasm due to most of the genetic variation are existed in the core germplasm of cucumber. It can detect disease resistance genes in all genetic resources at the same time and excavate disease resistance genes in the whole genome. GWAS provides a very effective mean for comprehensive research on the mechanism of powdery mildew resistance.

Conclusions
Twelve Loci related to powdery mildew resistance in cucumber were detected by genome-wide association analysis, three of which namely pmG2.1, pmG4.1 and pmG6.1 are novel. Three candidate genes within the genomic region of locus pmG2.1, pmG5.2 and pmG5.3 were identi ed, respectively.

Plant materials
A cucumber CG population consisting of 109 lines was provided by the cucumber research group in the Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, China. This CG population was selected from more than 3,000 germplasms worldwide and resequenced [27]. Sequencing data can be found on the Cucumber Genome website (http://www.icugi.org/cgi-bin/ICuGI/index.cgi). Naturally infection of PM occurred at adult plant stage. PM appeared about four weeks after sowing. From ve weeks after sowing, PM symptoms was scored once a week for three times by ranking the disease of each plant with 0, 1, 3, 5, 7 and 9, where 0: no symptom; 1: ≤1/3th of all leaves with powdery mildew spots; 3: 1/3 − 2/3 of all leaves with powdery mildew spots; 5:≥2/3 of all leaves with powdery mildew spots; 7: PM spots covers the whole leaf; 9: ≥2/3 of the browning leaves [4]. For each line, a disease index (DI) was calculated. DI = 100 × ∑(Number of plants with disease rating × Disease rating)/(Total number of plants × Highest disease rating) [29].

Disease tests and disease index calculation
Genetic diversity of PM resistance in the CG population Using the DI of each line per experiment, a mean DI per line was calculated over the three experiments and used as input to study the genetic diversity of the CG population. A phylogenetic tree was constructed with SAS 9.0 [42].

GWAS
FastLMM was used for association tests of resistance with an estimated relatedness matrix as covariate. GWAS was conducted and the genome-wide lowest P value was recorded. The 5% lowest tail was taken from the 200 recorded minimal P values as the threshold for genome-wide signi cance. The Manhattan map for GWAS was generated using the R package CMplot (https://github.com/YinLiLin/R-CMplot). The SNP data used for the association analysis were downloaded from the cucumber genome website: http://www.icugi.org/cgi-bin/ICuGI/index.cgi [27].

Analysis of novel loci for the PM resistance
The mapped QTLs of this study were compared with published studies to identify the novel ones, for which candidate region was analyzed within 50 kb around peak SNPs. The three strongest candidate region was analyzed by the position (-log10 (p-value) = 5 as the critical). The candidate gene was annotated. The physical distance was based on the cucurbit genomics (http://cucurbitgenomics.org/). For real-time RT-PCR (qRT-PCR) analysis, primers were designed using DNAMAN according to the gene CDS sequences (Supplementary Table S1). First-strand cDNA was synthesized using the PrimeScript™ RT reagent Kit (TAKARA BIO, Inc., Shiga, Japan). qRT-PCR was performed with the SYBR Premix Ex Taq™ Kit (Takara) with the following cycling parameters: 95 °C for 3 min, followed by 40 cycles of 95 °C for 10 s and 55 °C for 30 s, with a nal cycle of 95 °C for 15 s, 55 °C for 60 s and 95 °C for 15 s. Relative transcription levels were analyzed using the 2 −ΔΔCT method [44]. qRT-PCR was performed in a BIO-RAD CFX96 system (Bio-Rad, USA), and the cucumber actin1 gene (ID) was employed as the internal control [45]. Three technical replicates were performed for each gene and relative transcription levels were analyzed using the 2 −ΔΔCT method [44].

Supplemental Material
Supplementary Materials: Figure S1: Genotypic effects of the dmG1.2, dmG2.1, dmG5.2 and dmG7.1 in the dm_2013S and dm_2014A; Figure S2: The phylogenetic tree of phenotype of CG; Table S1: Phenotype of core germplasm. Table S2: genes were predicted in the pmG4.1 region. Table S3: genes were predicted in the pmG6.1 region. Availability of data and materials All data generated or analysed during this study are included in this published article and its supplementary information les.

Competing interests
The authors declare that they have no competing interests. Authors' contributions XL wrote and revised the manuscript. HL, PL and HM investigated disease index data. YB, XG and SZ conceived of the study and critically reviewed the manuscript. All authors read and approved the nal manuscript.     Identi cation of the causal gene for the peak pmG2.1. a: Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak pmG2.1; b: Dashed lines indicate the candidate region (~ 45 kb) for the peak. Eleven genes were predicted in the pmG2.1 candidate region. c: SNP variation of candidate gene in the core germplasm; d-f: Expression pattern analysis of the three predicted genes among HR and HS lines. **Signifcant diference (P <0.01); g: Expression of Csa2G022790 among the four HR and HS lines, respectively.

Figure 5
Identi cation of the causal gene for the peak pmG5.2. a: Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak pmG5.2; b: Local LD heatmap on the 16,857kb to 16880kb. c: SNP variation of candidate gene in the core germplasm; d: Expression pattern analysis of the four predicted genes among HR and HS lines; e: Expression of Csa2G022790 among the four HR and HS lines, respectively.

Figure 6
Identi cation of the causal gene for the peak pmG5.3. a: Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak pmG5.3; b: Dashed lines indicate the candidate region (~ 18 kb) for the peak. Two genes were predicted in the pmG2.1 candidate region. SNP variation of candidate gene in the core germplasm; c: Expression of Csa2G022790 among the four HR and HS lines, respectively. **Signifcant diference (P <0.01); d: Genotype variation in the European type.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.