QTL Mapping of Quality Related Traits in Peanut Using Whole-Genome Resequencing

Ziqi Sun Henan Academy Of Crops Molecular Breeding Feiyan Qi Henan Academy of Crops Molecular Breeding Hua Liu Henan Academy of Crops Molecular Breeding Li Qin Henan Academy of Crops Molecular Breeding Jing Xu Henan Academy Of Sciences Lei Shi Henan Academy of Crops Molecular Breeding Zhongxin Zhang Henan Academy Of Crops Molecular Breeding Lijuan Miao Henan Academy of Crops Molecular Breeding Bingyan Huang Henan Academy Of Crops Molecular Breeding Wenzhao Dong Henan Academy of Crops Molecular Breeding Xiao Wang Henan Academy of Crops Molecular Breeding Mengdi Tian Henan Academy of Crops Molecular Breeding Jingjing Feng Henan Academy of Crops Molecular Breeding Ruifang Zhao Henan Academy of Crops Molecular Breeding Zheng Zheng Henan Academy of Crops Molecular Breeding Xinyou Zhang (  haasz@126.com ) Henan Academy of Agricultural Sciences

fatty acid compositions, respectively, using two F 2 populations and DArT markers [4]. Finally, Liu et al. [1], utilizing SNP markers from ddRAD sequencing, mapped the major and consensus QTL qOCA08.1 to an approximately 0.8-Mb genomic region containing two annotated genes predicted to affect oil biosynthesis.
In this study, WGRS was applied to an RIL population segregating for peanut quality traits, which enabled the mapping of QTLs for the content of oil, protein, and seven fatty acids. Candidate genes involved in the pathway of fat biosynthesis were predicted, and SNPs covered by these genes were validated using lines with extremely high and low oil contents. The results of this study may help to establish a foundation for further genetic research and for the development of high-oil peanut cultivars.

Phenotypic analysis of the quality related traits
Nine quality-related traits, i.e., the contents of oil, protein, palmitic acid, stearic acid, arachidic acid, behenic acid, oleic acid, linoleic acid, and arachidonic acid, were assessed in four environments (Zhengzhou, 2018 and 2019; Shangqiu and Weifang, 2019) on the male parental line P1 (W1202), the female parental line P2 (Yuhua15), and 329 segregating RILs. ANOVA indicated that genotypic effects signi cantly affected all the traits (Table 1). P1 exhibited higher contents of oleic, behenic, and arachidonic acids, whereas P2 exhibited higher contents of oil, proteins, palmitic acid, stearic acid, linoleic acid and arachidic acid (Table 1). For all the traits and environments, wide phenotypic variation and transgressive segregation were observed in the RIL population (Table 1 and Fig. 1). The CV ranged from 4.16% for oil content to 20.65% for arachidonic acid content, while broad-sense heritability ranged from 0.74 for linoleic acid content to 0.91 for behenic acid content (Table 1).  (Table 2). In addition, negative correlations were observed between protein content and behenic acid (-0.83), arachidic acid (-0.62) and palmitic acid (-0.33) contents. Among fatty acids, a strong negative correlation was observed between oleic and linoleic acid (-0.91), as well as between stearic and arachidonic acid (-0.87), whereas a strong positive correlation was observed between stearic and arachidic acid (0.86).

Snp And Bin Marker Discovery Through Whole-genome Sequencing
Whole-genome resequencing of the two parental lines and 329 RILs generated approximately 700 Gb of clean data (9.10 billion reads). For each sample, the rate of mapped reads and the rate of mapped reads with unique positions were over 96% and 73%, respectively. The effective sequencing depths were 34.42 × and 34.58 × for P1 and P2, respectively, and ranged from 1.20 × to 1.40 × for the RIL population (Table S1). The coverage rate was 99.1% for P1 and 98.49% for P2 and ranged from 52.03-63.99% for the RIL population (Table S1). All the clean sequence data obtained in this study are available from the NCBI database under Sequence Read Archive (SRA) submission SUB8691701. Following alignment and the application of the GATK protocol, 741,564 SNPs were obtained. Further ltering enabled the de nition of 213,868 SNPs homozygous and polymorphic between the two parents, which were utilized to identify bin markers.

Construction Of A High-density Genetic Linkage Map
As the RIL population was sequenced at low depth, the SNP dataset was converted into bin markers using a sliding window approach [23]. In total, 7595 bin markers were detected, and eleven lines exceeding 10% of the heterozygosis rate were removed from further analysis (Table S2). After redundant markers were ltered out, 4565 bin markers were nally used to construct a linkage map. Four bin markers remained unlinked, whereas the remaining 4561 bin markers were assigned to 20 linkage groups (LGs), as reported in Fig. 2 and Table 3. As the total map length was 2032.39 cM, the average map density per marker was 0.45 cM ( Table 3). The number of markers per LG ranged from 173 (LG11) to 323 (LG13), the LG length varied from 77.50 cM (LG20) to 170.15 cM (LG06), and the average marker density per marker ranged between 0.37 cM (LG15) and 0.59 cM (LG06) ( Table 3). The maximum marker interval was 13.41 cM on LG06, while more than 90% of marker intervals were below 1 cM (Table 3). With a LOD threshold of 3.3 being employed, 109 QTLs were identi ed for the nine quality-related traits under investigation. QTLs were distributed on all the LGs, except for LG15 and LG19 (Tables 4 and 5 and Table S3).      Twelve QTLs were mapped on LG05 (Table 4). Among these QTLs, QTL qA05.1 covered a region of 0.5 cM and was associated with all traits, except for linoleic acid content (Table 4). QTL qA05.1 showed a negative additive effect on ve traits (oil, palmitic acid, stearic acid, arachidic acid, and behenic acid content), which was identi ed in all four environments. This QTL also exhibited positive additive effects on the protein, oleic acid and arachidonic acid contents, which were detected in two or three environments. Several QTL loci were mapped on LG08, 12 and 14 (Table 5). On LG08, a region of 2.6 cM, covered by QTLs qA08.4 and qA08.5, was associated with oil, protein and behenic acid content, with LOD scores of approximately 5.70-14.67 and PVE values of approximately 3.88-12.58%. Associations with oil and protein content were consistent for all four environments being tested, whereas association with behenic acids was con rmed for three environments (Table 5). A large genomic region containing several QTLs with minor phenotypic effects was identi ed on LG12 (Table 5). In particular, QTLs for oil, protein, and behenic acid content that were consistent for all four environments were detected in regions spanning 18.30 cM, 7.40 cM, and 17.60 cM, respectively. On LG14, QTLs from qA14.5 to qA14.9, which were included in the interval between 40.3 cM and 43.4 cM, were detected in four environments for oil, stearic acid and arachidic acid content and three environments for behenic acid content (Table 5).
Among the 69 QTLs mapped on LGs different from those mentioned in the previous paragraph, some exhibited pleiotropy on several traits and exhibited consistent effects in more than one environment (

Annotation of genes and validation of the SNPs in the QTL intervals
The genes in the intervals of qA05.1, qA05.9 and qA05.10 on LG05, qA06.3 and qA06.4 on LG06, qA08.4 and qA08.5 on LG08, qA12.1 to qA12.7 on LG12, qA14.5, qA14.6, qA14.7, and qA14.8 on LG14 were extracted and screened for polymorphic SNPs between two parents, and a total of 84 polymorphic SNPs in 71 genes were identi ed (Table S4). Among these SNPs, 17 resulted in missense mutations (Table 6), whereas the remaining on the 17 SNPs associated with missense mutations (Table S5), and the markers were validated using the two parents and 44 lines of the RIL population displaying contrasting oil content. Two SNPs at sites Arahy05:6599714 and Arahy05:6709559 were closely linked with the oil content (Fig. 4). Speci cally, the average oil content was 55.40% in RILs displaying G at Arahy05:6709559, whereas RILs exhibiting nucleotide A at the same loci displayed an oil content of 50.62% (Fig. 5). The two SNPs were included in the genes Arahy.T0P5W2 and Arahy.YR3A5K, encoding a scarecrow-like transcription factor PAT1-like and a galactosyl transferase GMA12/MNN10 family protein, respectively (Table S4).  [24]. The genetic map employed in this study consisted of 4561 bin markers and spanned a length of 2032.39 cM with an average genetic distance of 0.45 cM. Both marker number and marker density were larger than those reported for other recent peanut linkage maps [1,[14][15][19][20]. This nding might be attributable to the large size of the population used in this study and/or the whole genome resequencing strategy adopted, which is more appropriate for tetraploids with respect to other sequencing technologies. Similar to the linkage map reported by Liu et al. [15], the marker order in our map was consistent with the physical order, except for two translocations between LG3 and LG13, LG6 and LG16.
For oil content, a total of 27 QTLs were mapped on 12 LGs (Tables 4 and 5 and Table S3). Among these QTLs, those mapped on LG05, 08, 12 and 14 were detected in at least two environments. QTL qA05.  (Table 4). The interval of 0.5 cM spanned by qA05.1, corresponding to an approximately 6.3-7.8 Mb physical region of chromosome A05, might be the same as that reported by Pandey et al. [2], anked by the markers GM1878 and GM1890, which were mapped to the approximately 6.4-10.9-Mb region of A05 [1]. QTL qA08.4 was detected in three environments, whereas the neighboring QTL qA08.5 was detected in the remaining environment ( with the oil content ( Fig. 4 and Fig. 5). Compared with the reference genome, the bases of the high oil content parent Yuhua15 at the sites Arahy.05:6599714 and Arahy.05:6709559 changed from C to A and A to G, respectively, and the encoded proteins changed from proline (P) to threonine (T) and from tyrosine (Y) to cysteine (C) ( Table 6).
A total of 2559 genes were involved in the metabolism of fatty acids and lipid storage, which were unevenly distributed on the 20 peanut chromosomes [12]. The SNP Arahy.05:6709559 was located in the exon of the gene Arahy. YR3A5K encodes a galactosyl transferase GMA12/MNN10 family protein. This gene is involved in transferring glycosyl groups and xyloglucan metabolic processes in Arabidopsis thaliana [25]. In the high oilcontent peanut variety, the average expression level of this gene (1.29) was lower than that (4.74) in the low oilcontent peanut variety (from unpublished transcriptome data). The SNP Arahy.05:6599714, located in the gene Arahy.T0P5W and encoding the scarecrow-like transcription factor PAT1-like, is involved in phytochrome A (phyA) signal transduction in Arabidopsis thaliana [26]. This gene may not be involved in the fatty acid biosynthetic pathway, as its expression level did not differ between the high and low oil-content peanut varieties.

Conclusion
A high-density genetic map with 4561 bin markers was constructed, and a major QTL for the content of oil, protein and several fatty acids, which was located on LG05, was consistently detected across four environments. The SNPs Arahy05:6599714 and Arahy05:6709559 were employed to design KASP markers, which were validated on lines displaying contrasting oil contents. These two markers may facilitate markerassisted breeding to develop high oil-content peanut cultivars.

Plant materials
A population of 329 RILs was derived from a cross between Yuhua15 (female parent) and W1202 (male parent), which was undertaken by the corresponding author's lab in 2012. Yuhua15 is an irregular peanut cultivar with high oil content (54.00%) released by the Institute of Industrial Crops, Henan Academy of Agricultural Science, in 2001. W1202 is a breeding line with relatively low oil content (52.60%) developed and preserved in the author's lab. F 10 RIL lines were obtained through the SSD (single seed descent) method, which was performed in the Chinese provinces of Hainan and Henan to reduce generation time. Single plants from parental lines and RILs were used for DNA isolation and sequencing and propagated by self-pollination for phenotyping.

Field Trails And Phenotyping
The RIL population and the two parental lines were grown in Zhengzhou (Henan Province) in 2018 and three locations (Zhengzhou and Shangqiu, Henan Province; Weifang, Shandong Province) in 2019. Twenty seeds for each line were sown in a 3 m × 0.5 m plot according to a complete randomized block design with two replicates.

Statistical Analysis Of Phenotypic Data
Statistics from phenotypic data, i.e., the mean, standard deviation (SD), coe cient of variation (CV), skewness, and kurtosis, were obtained using SAS software. Combined analysis of variance (ANOVA) for each trait and a correlation study among traits were performed using the AOV module implemented in QTL IciMapping software [27]. Broad-sense heritability on the basis of the mean across replications and environments (or heritability per mean) was estimated by , where e represents the number of environments and r represents the number of replicates.

Sequencing And Snp Calling
Genomic DNA was extracted from fresh leaves using the Plant Genomic DNA Kit (TIANGEN). DNA quality, concentration, and integrity were assessed using a NanoDrop-2000 spectrophotometer (Thermo), a Qubit Fluorometer (Thermo), and agarose gel electrophoresis. DNA that passed the quality control step was further randomly sheared by sonication, and fragments of approximately 300 bp DNA were recovered by electrophoresis. DNA fragments with adapters were used to prepare DNA clusters, which were sequenced on the Illumina HiSeq Xten platform with PE151. Figure 1 Phenotypic distribution of nine quality-related traits for the RILs in four environments. Figure 1a-1i shows the phenotypic distribution of the contents of oil, protein, palmitic acid, stearic acid, oleic acid, linoleic acid, arachidic acid, behenic acid, and arachidonic acid. ZZ represents Zhengzhou, SQ represents Shangqiu and WF represents Weifang.