Diversity assessment of major agronomic traits
This study systematically investigated 26 agronomic traits in peppers, classifying them into 12 qualitative and 14 quantitative traits (Table S1-S3). The Shannon-Wiener index for qualitative traits, it ranged from 0.03 to 0.95 (average 0.49), while for quantitative traits, it ranged from 0.95 to 2.07 (mean 1.82). Except for locule number, the Shannon-Wiener index for quantitative traits surpassed that of qualitative traits, signifying a more extensive diversity in quantitative traits. Correlation analysis of the 14 quantitative traits (Fig. 1) revealed significant correlations, mostly highly significant, between quantitative traits of C. frutescens and C. annuum germplasm resources. A total of 57 pairs of traits exhibited highly significant correlations, with 39 pairs positively correlated and 18 pairs negatively correlated. Furthermore, 10 pairs of traits showed significant correlations, with 9 pairs positively correlated and 1 pair negatively correlated. Notably, the highest correlation coefficient observed was 0.93, corresponding to the relationship between weight per fruit and the longitudinal diameter of the commercial fruit (Table S4). The majority of the 26 agronomic traits demonstrated substantial phenotypic variations, affirming the suitability of the population for GWAS analysis.
Whole genome resequencing analysis
A total of 107 feral peppers sample and 75 bell pepper samples underwent re-sequencing, revealing a GC content ranging from 34.01–35.36%. The Q20 proportion fell between 96.14% and 98.6%, while Q30 exceeded 90%. Mapping rates varied from 96.85–99.53%, with an average of 99.02%, and an average sequencing depth of 9.62X. The average 1X coverage (minimum 1 base coverage) was 90.47%, and the average 5X coverage (minimum 5 base coverage) was 76.79% (Table S5). These results attest to the high sequencing and gene mapping quality, establishing a robust foundation for subsequent analyses.
Employing the Genome Analysis Toolkit (GATK)for SNP detection yielded a total of 64,110,473 identified SNPs. The distribution of these variations was visually represented using a circos plot (Fig. 2). Uniform distribution of SNPs and Indels across chromosomes underscired indicating the reliability of the SNP and Indel data.
Population structure and chain imbalance analysis
The population structure analysis yields insights into the subdivision of the population into distinct subgroups. Integrating population structure analysis as a covariate proves effective in mitigating false positives resulting from population structure in GWAS analysis. Assuming the population can be divided into K = 2 ~ 10 subgroups, when K = 10 (Figure S2, Table S6), the lowest cv value is observed, signifying the optimal subdivision of all pepper germplasm resources into 10 groups (Fig. 3d). Conversely, at K = 1, the population remains undivided; at K = 2, clear stratification emerges, indicating the existence of two subgroups within the 182 pepper resources under study. Subgroup 1, denoted in red, comprises 107 materials, all of C.frutescens,while Subgroup, represented in blue, consists of 75 materials, exclusively of C.annuum. This aligns with the results of the phylogenetic tree and principal component analysis.
The construction of a phylogenetic tree utilizing the Neighbor-Joining method visually presents the distinctions within the population and the proximity of genetic relationships. The tree illustrates a division of the 182 individuals into 2 branches (Fig. 3b), consistent with the population structure analysis, predominantly segregating the population into two major groups: group 1 and group 2. A subsequent phylogenetic analysis (Fig. 3c) of 107 Hainan feral peppers (C. frutescens) reveals that all Hainan feral peppers from Baisha, Hainan (group3), Wenchang, Hainan (group7) and Haikou, Hainan (group6) from distinct branches. While feral peppercorns within the same city and county may share transmission routes, leading to minimal genetic diversity evolutionary distinctions persist among feral peppercorns from different cities and counties, undercoring the high genetic diversity of Hainan feral peppercorns in geographic distribution. Principal component analysis (PCA) classifies the 182 pepper germplasms into two major groups, C. frutescens and C. annuum (Fig. 3f). PC1, PC2, and PC3 elucidate 77.39%, 3.02%, and 2.09% of the genetic variation, respectively. C.annuum from external regions clusters densely, indicating a close genetic relationship among the pepper germplasms. Conversely, C.frutescens from diverse location within Hainan displays greater dispersion, reflecting a rich genetic background. Evaluation of genetic relationships based on selected SNP markers (Fig. 3e) reveals genetic relationship values, with meterials exhibiting of -0.2 accounting for 48.45%, those between 0 and 1.2 constituting 29.86%, and those exceeding 1.2 making up 21.68%. These findings affirm the high genetic diversity of the germplasm resources used, conducive to a robust whole-genome association study with minimal interference.
Calculation of the linkage disequilibrium (LD) decay of the 182 pepper materials (Fig. 3a) demonstrates a decline in R2 with increasing physical distance. At an R2 of 0.1, the LD decay distance for group 1 exceeds 1000 kb, while for group 2, it surpasses 500 kb. Notably, group 1 aligns with C.frutescens, and group 2 with C.annuum, indicating a faster LD decay rate for C.annuum, with the smallest LD decay distance observed in the C.frutescens population. This further substantiates the high genetic diversity within the C.frutescens population.
Genome-wide association study of 26 agronomic traits
Utilizing a subset of highly consistent SNPs derived from 182 pepper materials, we conducted a GWAS for 26 agronomic traits (Table 1). To enhance the reliability of GWAS results and mitigate the influence of population structure, we employed two statistical models: the Linear Mixed Model (LMM) and the Efficient Mixed Model Association eXpedited (EMMAX). After comparing their performance performance through Q-Q plots, we identified the most suitable statistical model for each trait. The LMM proved optimal for 9 traits, while EMMAX was optimal for 17 traits;for instance, fruit peduncle length analysis favored the LMM model (Fig. 4, Figure S3). These results underscore the necessity of selecting the most appropriate model for each trait in GWAS analysis, as the optimal model may vary.
Table 1
Screening loci and gene counts for pepper traits identified through GWAS.
Trait | Model | Chromosome | Number of SNPs | gene |
Plant height | EMMAX | NC_029977.1 | 13 | 26 |
NC_029978.1 | 3 |
NC_029980.1 | 1 |
NC_029981.1 | 5 |
NC_029982.1 | 2 |
NC_029984.1 | 1 |
NC_029985.1 | 13 |
NC_029986.1 | 4 |
NC_029987.1 | 1 |
NC_029988.1 | 1 |
NW_015960393.1 | 1 |
NW_015960461.1 | 1 |
NW_015960498.1 | 2 |
NW_015961641.1 | 1 |
Plant breadth | LMM | NC_029978.1 | 2 | 41 |
NC_029979.1 | 24 |
NC_029980.1 | 2 |
NC_029982.1 | 11 |
NC_029985.1 | 6 |
NC_029986.1 | 2 |
NW_015961225.1 | 8 |
Leaf length | EMMAX | NC_029987.1 | 10 | 6 |
NC_029988.1 | 1 |
Leaf width | EMMAX | NC_029980.1 | 10 | 10 |
Petiole length | EMMAX | - | - | - |
First stem node length/cm | LMM | NC_029978.1 | 1 | 20 |
NC_029979.1 | 1 |
NC_029982.1 | 10 |
NC_029988.1 | 10 |
NW_015960959.1 | 1 |
Stem thickness | EMMAX | NC_029977.1 | 3 | 7 |
NC_029980.1 | 3 |
NC_029981.1 | 3 |
NC_029985.1 | 2 |
NC_029986.1 | 1 |
Weight per fruit | LMM | NC_029977.1 | 5 | 46 |
NC_029979.1 | 12 |
NC_029980.1 | 5 |
NC_029981.1 | 1 |
NC_029982.1 | 4 |
NC_029983.1 | 22 |
NC_029987.1 | 3 |
NC_029988.1 | 188 |
NW_015961202.1 | 1 |
NW_015961228.1 | 2 |
Longitudinal diameter of fruit | LMM | NC_029977.1 | 7 | 62 |
NC_029979.1 | 8 |
NC_029982.1 | 5 |
NC_029983.1 | 7 |
NC_029985.1 | 2 |
NC_029987.1 | 6 |
NC_029988.1 | 37 |
NW_015960681.1 | 3 |
NW_015960900.1 | 1 |
Transverse diameter of fruit | LMM | NC_029979.1 | 12 | 28 |
NC_029980.1 | 11 |
NC_029982.1 | 2 |
NC_029988.1 | 3 |
Fruit peduncle length | LMM | NC_029977.1 | 5 | 64 |
NC_029979.1 | 18 |
NC_029980.1 | 27 |
NC_029981.1 | 3 |
NC_029982.1 | 1 |
NC_029983.1 | 10 |
NC_029985.1 | 2 |
NC_029986.1 | 1 |
NC_029987.1 | 18 |
NC_029988.1 | 2 |
NW_015960759.1 | 7 |
Thickness of flesh | EMMAX | - | - | - |
Placenta size | LMM | NC_029979.1 | 7 | 50 |
NC_029987.1 | 114 |
Number of locules | EMMAX | NC_029977.1 | 2 | 31 |
NC_029979.1 | 4 |
NC_029980.1 | 1 |
NC_029981.1 | 4 |
NC_029983.1 | 3 |
NC_029984.1 | 10 |
NC_029985.1 | 2 |
NC_029986.1 | 9 |
NC_029988.1 | 4 |
NW_015960458.1 | 1 |
NW_015961180.1 | 6 |
NW_015961815.1 | 1 |
Branching type | EMMAX | - | - | - |
Leaf surface | EMMAX | - | - | - |
Leaf shape | EMMAX | - | - | - |
Style length | EMMAX | NC_029977.1 | 3 | 26 |
NC_029979.1 | 5 |
NC_029983.1 | 2 |
NC_029984.1 | 15 |
NC_029987.1 | 1 |
NW_015961188.1 | 1 |
Flower color | EMMAX | - | - | - |
Anther color | LMM | NC_029977.1 | 10 | 94 |
NC_029979.1 | 5 |
NC_029980.1 | 8 |
NC_029981.1 | 3 |
NC_029982.1 | 14 |
NC_029983.1 | 11 |
NC_029985.1 | 30 |
NC_029986.1 | 9 |
NC_029987.1 | 2 |
NC_029988.1 | 4 |
NW_015960838.1 | 8 |
NW_015961196.1 | 2 |
NW_015961290.1 | 5 |
NW_015961339.1 | 9 |
Pendage at blossom end | EMMAX | NC_029979.1 | 3 | 8 |
NC_029981.1 | 1 |
NC_029983.1 | 1 |
NC_029984.1 | 5 |
NC_029985.1 | 2 |
NC_029987.1 | 1 |
Color of immature fruit | EMMAX | - | - | - |
Color of mature fruit | LMM | - | - | - |
Persistent calyx at base of fruit | EMMAX | - | - | - |
Fruit shoulder shape | EMMAX | - | - | - |
Fruit shape | EMMAX | - | - | - |
The number of highly significant SNPs and associated genes varied across traits and chromosome. Among the 26 agronomic traits, 15 traits exhibited 929 significantly associated loci, while the remaining 11 traits showed no significant associations, resulting in a total of 519 genes (Table S7).
For instance, plant height was associated with a total of 49 SNPs (26 genes), predominantly distributes across chromosomes NC_029977.1, NC_029978.1, NC_029980.1, NC_029981.1, NC_029982.1, NC_029984.1, NC_029985.1, NC_029986.1, NC_029987.1, NC_029988.1, NW_015960393.1, NW_015960461.1, NW_015960498.1, and NW_015961641.1. Similarly, plant breadth exhibited 55 SNPs (41 genes), primarily located on chromosomes NC_029978.1, NC_029979.1, NC_029980.1, NC_029982.1, NC_029985.1, NC_029986.1, and NW_015961225.1. Stem thicknessness was associated with 12 SNPs(7 genes), mainly distributed on chromosomes NC_029977.1, NC_029980.1, NC_029981.1, NC_029985.1, and NC_029986.1. The first stem node length ,showed association with 23 SNPs (20 genes), primarily distributed on chromosomes NC_029978.1, NC_029979.1, NC_029982.1, NC_029988.1, and NW_015960959.1. Leaf length exhibited association with 11 SNPs (6 genes), primarily located on chromosomes NC_029987.1 and NC_029988.1, while leaf width displayed association with 10 SNPs (10 genes), mainly distributed on chromosome NC_029980.1. The style length was associated with 27 SNPs (26 genes), predominantly kurtosis on chromosomes NC_029977.1, NC_029979.1, NC_029983.1, NC_029984.1, NC_029987.1, and NW_015961188.1. Anther color exhibited association with 120 SNPs (94 genes), mainly distributed on chromosomes NC_029977.1, NC_029979.1, NC_029980.1, NC_029981.1, NC_029982.1, NC_029983.1, NC_029985.1, NC_029986.1, NC_029987.1, NC_029988.1, NW_015960838.1, NW_015961196.1, NW_015961290.1, and NW_015961339.1.
Moreover, SNP analysis identified a total of 243 SNPs (46 genes) associated with weight per fruit, mainly showing kurtosis on chromosomes NC_029977.1, NC_029979.1, NC_029980.1, NC_029981.1, NC_029982.1, NC_029983.1, NC_029987.1, NC_029988.1, NW_015961202.1, and NW_015961228.1. Additionally, 76 SNPs (62 genes) were associated with the longitudinal diameter of fruits, primarily distributed on chromosomes NC_029977.1, NC_029979.1, NC_029982.1, NC_029983.1, NC_029985.1, NC_029987.1, NC_029988.1, NW_015960681.1, and NW_015960900.1. Further, 28 SNPs (28 genes) were associated with the transverse diameter of fruits, predominantly distributed on chromosomes NC_029979.1, NC_029980.1, NC_029982.1, and NC_029988.1. Additionally, 94 SNPs (64 genes) were associated with fruit peduncle length, mainly distributed on chromosomes NC_029977.1, NC_029979.1, NC_029980.1, NC_029981.1, NC_029982.1, NC_029983.1, NC_029985.1, NC_029986.1, NC_029987.1, NC_029988.1, and NW_015960759.1. Furthermore, 121 SNPs (50 genes) were associated with placenta size, primarily distributed on chromosomes NC_029979.1 and NC_029987.1. Additionally, 47 SNPs (31 genes) were associated with the number of locules, mainly distributed on chromosomes NC_029977.1, NC_029979.1, NC_029980.1, NC_029981.1, NC_029983.1, NC_029984.1, NC_029985.1, NC_029986.1, NC_029988.1, NW_015960458.1, NW_015961180.1, and NW_015961815.1. Similarly, 13 SNPs (8 genes) were associated with Pendage at blossom end, mainly distributed on chromosomes NC_029979.1, NC_029981.1, NC_029983.1, NC_029984.1, NC_029985.1, and NC_029987.1.
Gene Function Annotation
In the screened genes, we identified 26, 41, 6, 10, 20, 7, 46, 62, 28, 64, 50, 31, 26, 94, and 8 genes associated with plant height, plant width, leaf length, leaf width, first stem node length, stem thickness, weight per fruit, longitudinal diameter of fruits, transverse diameter of fruits, fruit peduncle length, placenta size, number of locules, style length, anther color, and pendage at blossom end, respectively (Table S8). Subsequently, gene functional annotation was conducted for these identified genes.
Genes associated with stem-related traits were primarily implicated in functions such as signal transduction mechanisms, replication, recombination and repair, transcription, posttranslational modification, protein turnover, chaperones, intracellular trafficking, secretion, and vesicular transport. Among these, 26 genes had an unknown function, and 10 genes lacked functional annotations (Fig. 5a). Regarding leaf-related traits, the annotated genes were predominantly involved in functions such as translation, ribosomal structure and biogenesis, signal transduction mechanisms, and posttranslational modification, protein turnover, chaperones, with 2 genes of unknown function and 3 genes not functionally annotated (Fig. 5b). Genes associated with flower-related traits were predominantly involved in functions such as posttranslational modification, protein turnover, chaperones, replication, recombination and repair, transcription, and signal transduction mechanisms, with 38 genes of unknown function and 12 genes lacking functional annotations (Fig. 5c). Finally, genes associated with fruit-related traits were mainly implicated in functions such as replication, recombination and repair, signal transduction mechanisms, posttranslational modification, protein turnover, chaperones, transcription, and carbohydrate transport and metabolism, with 92 genes of unknown function and 48 genes lacking functional annotations(Fig. 5d).
Candidate gene screening and validation
The boxplots depicting four traits—weight per fruit, longitudinal diameter of fruits, transverse diameter of fruits, and placenta size—are illustrated in Fig. 6a-d. The numerical values of these four traits in C. annuum surpass those in C. frutescens. This, in conjunction with the qRT-PCR results from both C. annuum and C. frutescens, facilitates the analysis functional trends in genes.
Leveraging the outcome of gene functional annotation, we randomly selected 11 candidate genes associated with weight per fruit, longitudinal diameter of fruits, transverse diameter of fruits, and placenta size for qRT-PCR validation experiments shown, as depicted in Fig. 6e-h. These aimed to substantiate the influence of these genes on the growth of various traits. Notably, the expression level of the TPX2 gene in C. frutescens, characterized by smaller weight per fruit, significantly exceeded that in C. annuum, indicating its potential role in inhibiting weight per fruit in both species. Similarly, nsLTPs13 and NUTCRACKER displayed significantly higher expression levels in the variety with smaller weight per fruit, suggesting their potential as genes inhibiting weight per fruit in both C. frutescens and C. annuum. In the context of the longitudinal diameter of commercial fruits, the expression levels of extensin-2-like and COP10 genes were significantly higher in C. annuum compared to C. frutescens, whereas SCPL13 exhibited significantly higher expression in C. frutescens. This implies that these extensin-2-like and COP10 may promote longitudinal diameter of fruits in both species, while SCPL13 may have an inhibitory role. Notably, no significant difference in the expression level of the extensin-1-like gene was observed between the two varieties. Examining the transverse diameter of fruits, the expression of the DDB1 gene in C. annuum with a larger diameter was significantly higher than that in C. frutescens with a smaller diameter, suggesting its potential role in promoting the transverse diameter of fruits in both species Conversely, the gene GAUT1 exhibited opposite expression levels in the two varieties, indicating its potential as a gene inhibiting this trait. No significant differences in the expression levels of the RKD4 gene were observed between the two varieties. For placenta size, SCPL51 displayed significantly higher expression in the smaller seed cavity of C. frutescens compared to the larger seed cavity of C. annuum.In contrast, HDT2 and HDAC9 showed significantly higher expression in C. frutescens, indicating their potential roles as genes inhibiting placenta size in both varieties.