Extreme bulks for seed weight
A RIL population have been developed by crossing Zhonghua 16 and sd-H1 in our previous study (Wang et al., 2018). The phenotyping data were generated earlier for parents and mapping population across six environments (Supplementary Figure S1). The 100-seed weight (SW) of two parents were significantly different (P < 0.001) between Zhonghua 16 (91.16 g ± 10.19) and sd-H1 (31.95 g ± 3.21) (Figure 1a). Seed weight showed continuous variation in RIL population and display normal frequency distribution, with trait values of RILs ranging between 26.53 g and 104.69 g (Figure 1a). The transgressive segregation of seed weight were observed in many individuals with extreme low or high phenotype in RIL population. Two extreme bulks for SW were prepared and subjected to the QTL-seq pipeline. Twenty of each low and high individuals with extremes for seed weight in RIL population were used to constitute LSB (low seed weight bulk) and HSB (high seed weight bulk), respectively (Figure 1b). The average values of SW in the LSB and HSB were 34.98 g ± 3.89 and 78.82 g ± 12.19, respectively.
QTL-seq predicted candidate genomic regions controlling seed weight
The two parents and two extreme bulks (LSB and HSB) were applied for NGS-based high-throughput whole genome re-sequencing based on Illumina Hiseq 4000 platform. After alignment of clean reads to the reference genome, uniquely mapped reads were used to calculate genome coverage and read depth along chromosomes. A total of 875.5 million and 841.6 million high-quality paired-end (PE) reads (150 bp in length) were generated for Zhonghua16 and sd-H1, respectively, which provided 34.4 × and 27.4 × average read depth, respectively (Table 1). For LSB and HSB, 1.07 billion and 1.26 billion PE reads were generated and achieved 43.0 × and 49.2 × average depth, respectively (Table 1). The sequence reads from parents and extreme bulks covered 84.4−86.1% of the reference genome.
Table 1. Sequencing of parental lines and bulks and mapping of sequence reads.
|
Zhonghua 16
|
sd-H1
|
LSB
|
HSB
|
Raw data (Gb)
|
122.3
|
117.6
|
149.9
|
176.5
|
High-quality reads
|
875,485,280
|
841,560,206
|
1,073,006,646
|
1,263,247,608
|
Mapped reads
|
868,791,396
|
831,141,660
|
1,065,307,644
|
1,253,763,058
|
Uniquely mapped reads
|
601,928,685
|
477,074,337
|
758,275,502
|
866,349,276
|
Genomic coverage
|
85.2%
|
84.4%
|
85.9%
|
86.1%
|
Average depth
|
34.4
|
27.4
|
43.0
|
49.2
|
Identified SNPs
|
1,684,862
|
1,647,238
|
1,835,651
|
1,777,715
|
After quality control and strict filtering, variant calling resulted in 1,684,862, 1,647,238, 1,835,651, 1,777,715 single nucleotide polymorphisms (SNPs) in Zhonghua 16, sd-H1, LSB and HSB, respectively (Table 1). The SNP densities were 0.64−0.71 SNP per kb for two parents and extreme bulks, and the density of SNP was obviously higher in B subgenome, rather than A subgenome (Supplementary Table S1). The heterozygous SNPs and SNPs with low read depth were firstly filtered out. Then, those SNPs with SNP-index value less than 0.3 in both bulks were further discarded. Finally, 541,282 reliable and high-quality SNPs along chromosomes were identified for the following analysis (Supplementary Table 2). In order to confirm the authenticity of the identified SNPs, ten SNPs were randomly selected for performing Sanger sequencing. The results showed that 9 out of 10 randomly selected SNPs have been proved to be consistent (Supplementary Figure S2).
Referring to the methods of Takagi et al. (2013) and Lu et al. (2014), theΔSNP−index values for the two extreme bulks were calculated and showed by sliding window analysis along chromosomes (Figure 2). If ΔSNP−index value of a genomic region was significantly different from 0 at a statistical confidence of P < 0.01, a major QTL harboring candidate genes in this region could be considered. Following this principle of QTL-seq, the regions on chromosome A02 from 0.58 Mb to 0.65 Mb, chromosome A05 from 101.70 Mb to 111.64 Mb, chromosome B02 from 103.90 Mb to 111.75 Mb, chromosome B06 from 0.30 to 50.22 Mb exhibited significant signals (Figure 2), suggesting that these candidate regions were major QTLs associated with seed weight. The same orthologous candidate genomic regions were also identified in two diploid genomes (Supplementary Figure S3), confirmed the good collinearity between diploid and tetraploid genomes. Totally, 45,636 SNPs were found to be located in these candidate regions of tetraploid genome. Among these SNPs, 28,936 were intergenic, 2,866 intronic, 7,452 5’ UTR, 4,381 3’UTR. Importantly, 1,269 SNPs distributed in 839 genes caused affection of missense, stop gain, stop lost. Among these candidate genomic regions from QTL-seq, three regions including 101.70−111.64 Mb on A05, 103.90−111.75 Mb on B02 and 0.30−50.22 Mb on B06 exhibited high ΔSNP-index value of 0.65, 0.53, and 0.70, respectively. Especially for candidate region on B06 (0.30−50.22 Mb), the average SNP-index were 0.16 and 0.86 for two bulks LSB and HSB, respectively, showing a strong signal and a powerful QTL controlling seed weight in the region. Therefore, we further developed SNP and SSR markers in above three candidate regions for narrowing down the localization interval through traditional QTL mapping.
Narrowing down candidate genomic regions of seed weight by traditional QTL mapping
To narrow down the location interval of candidate genomic regions, the traditional QTL analysis was performed by further developing SNP and SSR markers in above three candidate regions on A05, B02 and B06. Totally 55 SNPs within the potential candidate genes were selected for competitive allele-specific PCR (KASP). Among them, 41 KASP markers were successfully developed (Supplementary Table S2) and could distinguish SNP alleles from Zhonghua16 and sd-H1 in the RILs population. In addition, 21 SSR markers showing polymorphic between two parental genotypes were also developed (Supplementary Table S3). Finally, 41 KASP markers together with 21 SSR markers were applied for genotyping 242 individuals of the RIL population.
The CIM mapping were used to identify major QTLs for seed weight in the RIL population. For the above three candidate regions on A05, B02 and B06, nine QTLs were detected to explain 4.92-14.89 % of the phenotypic variance across different environments (Figure 3 and Supplementary Table S4). Among them, qSWA05.1 was stably detected in four environments and qSWB06.3 was detected in six environments. The qSWA05.1 was located in the region of 104.85-105.92 Mb by the nearest flanking markers AD05A20650 and A010637. The qSWB06.3 was located in the region of 14.21-17.65 Mb by the nearest flanking markers A011476 and A011478 (Figure 3). It showed that the application of traditional QTL mapping successfully narrowed down the candidate regions of QTL-seq through newly developed markers. Because the qSWB06.3 exhibited stable expression with large contribution to phenotypic variance across all environments, the candidate genes in the genomic region of qSWB06.3 would be further deduced by combining the data from RNA-seq in the following analysis.
Differently expression analysis of candidate genes based on RNA-seq during seed development
In order to investigate candidate genes related to seed weight in peanut, seeds from Zhonghua16 and sd-H1 at three different developmental stages were collected for RNA-seq. RNA-seq was performed on seeds at 20, 40, and 60 days after flowering (DAF) which corresponds to early, middle and late stages of seed development, respectively (Wan et al., 2017). Three independent biological replicates were set up for each developmental stage and subject to RNA-seq. The sequencing generated ~12 Gb of clean data on average in each sample, and the average ratio of uniquely mapped reads was 79% (Supplementary Table S6). 28,077−32,184 genes were detected as expressed in different stages during seed development and totally 42,106 genes were cumulatively to be detected as expressed. Among them, 9,294 genes and 7,983 genes were constitutively expressed in three developmental stages of Zhonghua16 and sd-H1, respectively (Figure 4a).
We identified differentially expressed genes (DEGs) between Zhonghua16 and sd-H1 at each developmental stage as well as DEGs between different developmental stages in each parent (Figure 4b). A total of 4,539, 2,489 and 3,966 DEGs were identified at DAF 20, DAF 40, and DAF 60 between Zhonghua 16 and sd-H1, respectively. We identified 3,422, 1,712, 2,502 up-regulated genes and 1,117, 777, 1,464 down-regulated genes at DAF 20, DAF 40 and DAF 60 between two parents, respectively. Particularly, there are clearly more DEGs in early stage (4,539), rather than that in middle (2,489) and late stages (3,966). Meanwhile, the number of DEGs (6,443) between DAF 20 and DAF 40 in sd-H1 was significantly more than that (2,243) between DAF 40 and DAF 60. The results suggested that a large number of genes showed differential expression between two parents at the initial stage of seed development, which may ultimately determined the seed weight. Further comparison showed that 1,457 up-regulated and 653 down-regulated DEGs at DAF 20 exhibited stage-specific, whereas 212 up-regulated and 64 down-regulated genes showed differential expression at DAF 40 and DAF 60 (Figures 4c and 4d). In order to verify the accuracy of RNA-seq, we selected 15 DE genes for qRT-PCR experiment. The expression patterns from RNA-seq and qRT-PCR were highly similar and correlated, suggested good reliability of expression quantification based on RNA-seq (Supplementary Figure S4).
The DEGs were associated with cell division and cell expansion during seed development
The GO enrichment analysis showed that DEGs at DAF 20 were enriched in many GO terms relating to cell division such as MCM complex, microtubule binding, motor activity, movement, DNA replication initiation, DNA helicase activity, regulation of cell cycle and mitotic nuclear division (Figure 5a). Moreover, an additional very considerable number of DEGs at DAF 20 were correlated with cell wall organization and regulation of protein serine/threonine kinase activity, protein kinase binding etc. For DEGs at DAF 40 and DAF 60, they were preferentially enriched in metabolisms associated with regulation of cell expansion such as nitrogen metabolism, fatty acid elongation, carbon metabolism, linoleic acid metabolism (Figure 5a). The results showed that a large number of genes involved in cell division differentially expressed at early stage, nevertheless, DEGs at middle and late stages were mainly associated with the regulation of cell expansion.
Previous studies have also shown that several metabolic pathways such as cell division pathway, ubiquitin-proteasome pathway and serine/threonine protein pathway is important in regulating seed weight (Li and Li, 2016; Li et al., 2018). Thus, DEGs in these pathways at three different developing stages were studied. Totally, we identified 27 DEGs in ATP-binding microtubele motor family, 8 DEGs encoding microtubule protein, 7 DEGs encoding tublin, 8 DEGs encoding MCM complex (Figure 5b). It was found that most of these DEGs involved in regulating cell division were significantly differentially expressed at early stage of developing seeds (Figure 5b). In ubiquitin-proteasome pathway and ubiquitin-proteasome pathway and serine/threonine protein pathway, DEGs encoding E3 ubiquitin-protein ligase, ubiquitin, ubiquitin-conjugating enzyme, serine carboxypeptidase, cyclin and serine/threonine protein enzyme demonstrated a complex regulatory pattern, and many up-regulated and down-regulated genes both appeared in different processes (Figures 5c and 5d). This is consistent with previous reports that some genes play positively regulated while other play negatively regulated genes in these pathways (Li and Li, 2016; Li et al., 2018). In addition, previous studies have also shown that the signal transduction of hormones and transcription factors is important in regulating seed weight (Li et al., 2018). We identified 6 SAUR genes, 7 IAA genes, 9 GH3 genes, 7 AUX genes, 8 ARF genes were differentially expressed at the three different developing stages (Figure 5e). Simultaneously, 26 bHLH genes and 6 MYB genes were found to have differences in expression during all three developmental stages (Figure 5f).
The candidate genes in qSWB06.3 deduced from QTL-seq and RNA-seq
The genomic region of qSWB06.3 spanning 2.07 Mb on chromosome B06 had 705 effective SNPs. A total of 211 annotated genes were located in qSWB06.3. Function annotation analysis of the 705 SNPs found that 311 SNPs were intergenic, 29 intronic, 334 in UTR. Through systematically investigating SNP variation, gene expression and functional annotation, several candidate genes related to seed weight in qSWB06.3 were predicted. For these candidate genes, we further analyzed and validated their expression pattern across different developmental stages (DAF 20, DAF 40 and DAF 60) using qRT-PCR (Figure 6).
Among these candidate genes, two genes (AH16G10100 and AH16G09300) showed similar expression pattern and both differentially expressed between two parents across all three stages. Their expression level were highest at early stage (DAF 20) but decreased at middle and late stages (DAF 40 and DAF 60). According to previous study (Kurepa et al., 2009), AH16G10100 was homologous to RPT2A which was known to regulate 26S proteasome particle AAA-ATPase and influence seed size in Arabidopsis. While, AH16G09300 was homologous to GSK3/SHAGGY-like kinase 2 which is a negative regulator of BR signaling involved in GS2-mediated grain size control (Che et al., 2015). Both two genes differentially expressed throughout the whole seed development, suggesting that their potential role in regulating peanut seed weight in qSWB06.3.
Another two genes (AH16G09020 and AH16G08270) showed high expression and differentially expressed only at late stage (DAF 60) in Zhonghua16. AH16G09020 was homologous to PGL1 in rice which is a positive regulator of grain length by controlling cell elongation (Heang and Sassa, 2012a). AH16G08270 have been annotated to encode chaperone DnaJ-domain superfamily protein, but its specific function was not studied. In addition, three candidate genes AH16G08940, AH16G08240 and AH16G08220 were identified which code for phytochrome A, squamosa promoter-binding-like protein 1 and transducin family protein. Their expression patterns were complex, and the differences between two parents were not consistent at different stages. AH16G08940 were highly expressed in middle and late stages but lowly expressed at early stage in Zhonghua16. AH16G08240 and AH16G08220 were specifically and highly expressed in late stage and early stage in Zhonghua16. However, these candidate genes related to seed weight will need to be confirmed by further functional genomics studies.
The validation of KASP markers of qSWB06.3 in natural peanut varieties
To validate the effectiveness of markers in qSWB06.3 for deploying in marker-assisted breeding, the genotype of KASP-SNP markers in qSWB06.3 were detected in diverse peanut varieties. 60 natural peanut varieties including 30 varieties with high seed weight (119.10 g ± 13.41, HSW group) and 30 varieties with low seed weight (39.95 g ± 5.45, LSW group) were selected from diverse lines. Three KASP markers, Ah011475, Ah011476 and Ah011478 in qSWB06.3, were used to detect genotypes among different varieties (Supplementary Table S5). They amplified the AA, TT and TT alleles in the parent Zhonghua16 with high seed weight while the CC, GG and GG in parent sd-H1 with low seed weight, respectively. For marker Ah011475, 20 accessions from LSW group showed CC alleles (Figure 7a). For markers Ah011476 and Ah011478, six accessions from LSW group showed TT and TT alleles, respectively (Figures 7b and 7c). The above natural peanut varieties which have same alleles from parent sdH1 were all came from LSW group, suggesting a correlation between the allele of parent sdH1 in qSWB06.3 and the low seed weight. The natural peanut varieties with AA-TT-TT alleles showed significantly lower seed weight, compared with those with CC-GG-GG alleles (Figure 7d). The results suggested that these natural peanut varieties from HSW group have acquired the same alleles of Zhonghua16 in qSWB06.3 during peanut breeding and improvement. The above results indicated that markers, Ah011475, Ah011476 and Ah011478, could be used as diagnostic markers for selecting breeding lines with high or low seed weight.