Characteristics of the genome datasets
We performed an initial in-depth characterization of the genomes of the 100 YFCs from 10 different breeds and 10 Huaibei (HB) partridge chickens (used for comparisons) sequenced in this study (Figure 1a; Additional file 1). An average of 86,155,900 clean reads per genome are obtained after quality control protocols, which were then aligned to the reference genome, yielding a mean mapping rate at 87.12% (Additional files 2 and 3, Additional file 4: Fig. S1). The total average base coverage across the genome is 96.35% at a sequencing depth target of 1X, 86.81% at 4X, 41.87% at 10X, and down to 0.36% at 30X (Additional file 4: Fig. S2). The average number of nucleotides in each genome is 11,916,810,290 after filtration, with an average GC content at 44.53% (Additional file 5).
For comparative analyses, we merged the 100 YFC and 10 Huaibei partridge chicken genomes with 104, 10, and 1 previously published Chinese chicken, red junglefowl (Gallus gallus; RJF), and green junglefowl (Gallus varius; GVF) genomes, respectively, retaining a total of 3,065,814 autosomal SNPs.
Genome variants in the Yellow-feathered chickens
After filtration, 16,817,111 single nucleotide polymorphisms (SNPs) and 1,289,024 InDels (insertion or deletion of bases) (≤ 50 bp) were retained. The structural variations (SVs) and the increase or decrease of the copy number of large (>1 kb) genomic fragments were analyzed. All these genomic variants in the newly generated dataset are summarized in Additional file 4: Fig. S3. Briefly, most of the SNPs are located in intergenic followed by intronic genomic regions (Additional file 4: Fig. S4a). Those located within coding sequences are mainly associated with synonymous or nonsynonymous coding attributes (Additional file 4: Fig. S4b). There are more transitions (11,943,736; 71.02%) than transversions (4,873,375; 28.98%) in the dataset. G->A and C->T substitutions are the common transitions at 28% while A->G and T->C substitutions are around 21% (Additional file 4: Fig. S5). Different transversions show a low but relatively uniform distribution rate in the dataset. The total average ratio of transitions to transversions is 2.53 in this dataset (Additional file 6). On average, there are 208,737 and 88,832 novel transitions and transversions per genome, respectively, with respect to the reference dbSNP annotation, yielding a novel transition to transversion ratio at 2.35 (Additional file 6). Analysis of the clean SNPs in the dataset shows that averagely 2,033,275 and 2,259,628 SNPs per genome are homogenous and heterozygous hybrids, respectively. Of these, 63,995 and 233,575 are novel homogenous and heterozygous hybrids, respectively (Additional file 7).
Among the high quality InDels, there are more deletions than insertions (785,806 (60.96%) versus 503,218 (39.04%)). The genomic locations of these InDels are summarized in Additional file 4: Fig. S6a, where most InDels are located in non-coding (i.e. intergenic and intronic) regions. Additionally, the four most common genomic consequences of the InDels include frameshift or non-frameshift insertions and deletions (Additional file 4: Fig. S6b). InDels causing loss or gain stop codons are present in these genomes but in lower proportions.
Besides SNPs and InDels, the SVs, which represent a large range of chromosomal variations encompassing large genomic regions, are evaluated. These include large fragment deletions (DEL), insertions (INS), inversions (INV), translocations, and duplications [13, 14]. A predominance of intrachromosomal translocations (65%) and deletions (26%) is observed in the YFC genomes. Inversions and interchromosomal translocations are also present but in comparatively low proportions (Additional file 4: Fig. S7a). Analysis of copy number variations (CNVs; 95,918 in total), divided into deletions and duplications, reveals an overall higher proportion of deletions (56.5%) than duplications (43.5%) (Additional file 4: Fig. S7b).
Population structure
Principal component Analysis (PCA) was performed for all the 100 YFC genomes, revealing a general separation of YFCs from Henan (Zhengyang, ZY) and Hubei (Jianghan, JH) (Fig. 1b and c) into a northern cluster. The YFCs from Guangxi (Guangxi Yellow, GX), Guangdong (Huaixiang, HX, Huiyang bearded, HY, and Wuhua Yellow, WH), and Hainan (Wenchang, WC) form a southern cluster, while those from Hunan (Huanglang, HL), Jiangxi (Ningdu Yellow, ND), and Fujian (Hetian, HT) group into a central cluster. This finding is supported by ADMIXTURE analysis (Fig. 1d and e). At the lowest cross validation error value, corresponding to K=2, the northern (blue) and southern (red) clusters show a complete separation, whereas the central cluster exhibits a signal of admixture with the northern and southern clusters. These three clusters were verified when K=3, with HL and WC showing admixed ancestries. When K=4, both HT and HY harbor the same ancestry component, which also contribute to WC.
We implemented comparative population genetic analyses the YFCs against other indigenous chickens from China, RJF, and GVF. In the PCA (Fig. 2a), YFCs tend to cluster together and appear to be in close proximity to HB partridge chickens and a few indigenous chickens from Sichuan and Tibet. These patterns imply a close congruity in the total genomic architecture of the YFCs. Yuanbao bantams are grouped in a distant cluster from the YFCs and other chickens, underscoring the genomic effects of differential breeding trajectories [15, 16]. Neighbor joining (NJ) phylogeny (Fig. 2b) and ADMIXTURE (Fig. 2c and d) corroborates the findings of the PCA and further clarifies the northern, central, and southern YFC clustering pattern inferred from figures 1b-d.
Detection of selective sweeps
Genome-wide scans for signals of selection in YFCs revealed several analytical windows within the top 1% of the selection tests, corresponding to 366 and 504 positively selected genes (PSGs) in the Locus-specific branch length (LSBL) and π-ratio tests, respectively. A total of 28 PSGs were concurrently identified in the top 1% by the two selection tests (Fig. 3a). Among the 28 genes are genes that are associated with pigmentation including: RALY heterogeneous nuclear ribonucleoprotein (RALY), leucine rich repeat containing G protein-coupled receptor 4 (LGR4), ryanodine receptor 2 (RYR2), RYR3, solute carrier family 23 member 2 (SLC23A2), and SLC2A14. Functional enrichment assessment showed significant gene ontology (GO) terms including vitamin transport activity (GO:0090482; Fig. 3b), intersecting with SLC23A2 and SLC2A14, which play roles in pigmentation. There are additional genes that had top 1% significant signals in either of the selection tests and are of important for understanding the color trait and other properties of interest like meat quality of the YFCs. These include BCDO2, IL-18, FBXO5, COL1A2, COL4A2, COL6A1, COL6A2 in LSBL; and GDF8, HSPA5, SHISA9, COL4A1, and COL23A1 in π-ratio tests (Fig. 4).
BCDO2 haplotype differentiation
The assessment of the haplotype pattern of BCDO2 gene in all the chicken populations revealed a general commonality across the 10 YFC breeds sequenced in this study (Fig. 5). Interestingly, HB partridge chickens, initially showing an overall population genomic closeness to the northern cluster YFCs, clearly exhibit a BCDO2 haplotype pattern similar to other Chinese indigenous chickens rather than to the YFCs. Overall, the haplotype differentiation pattern of the BCDO2 gene is synonymous with the selection of this gene as a candidate PSG for the yellow pigmentation phenotype among the YFCs.