Animal care and experiments were conducted according to the Regulations for the Administration of Affairs Concerning Experimental Animals (Ministry of Science and Technology, China) and approved by the Animal Care and Use Committee of the South China Agricultural University, Guangzhou, China (approval number: SCAU#2014-10).
Population and phenotyping
A chicken population derived from a yellow-feather dwarf broiler breed that maintained for 25 generations by Wens Nanfang Poultry Breeding Co. Ltd. (Xinxing, P.R. China) was used in this study. This population includes 1,600 birds (800 males, 800 females) and was the 3rd batch of the 25th generation of this chicken population. These birds came from a mixture of full sib and half sib families from mating 30 males and 360 females of the 24th generation. After hatching, all birds were maintained in a closed building under controlled environmental conditions and provided with a standard diet until they were 4 wk of age. The chickens were then randomly assigned to six pens by gender (three pens for males and three pens for females) for growth performance testing from 5 to 13 wk of age. They received food and water ad libitum during all stages. Finally, remaining 1,338 birds were slaughtered at 91 day of age for carcass trait recording. Average daily gain (ADG) and average daily feed intake (ADFI) per individual were calculated for the period from 45 to 84 day. The and residual feed intake (RFI) were calculated by follow formula RFI = ADFI – (b0 + b1 × MMBW+ b2 × ADG), where b0 was the intercept, MMBW is mid-test body weight (MBW raised to the power of 0.75), and the MBW was the predicted body weight on day 21 of the test. b1 and b2 are the partial regression coefficients for MMBW and ADG, respectively. Descriptive statistics of analyzed traits could be found (Table 1). For more details about this population, please refer to Xu et al. (2016) and Zhang et al. (2017).
Genotyping, genotype imputation and quality control
After traits recorded systematically, a total of 644 male birds were randomly selected for genotyping. These birds were 15 male parents and 629 male offspring. Of these 644 birds, 450 birds were genotyped with the 600 K Affymetrix® Axiom® HD genotyping array , and remaining 194 birds were genotyped with the Affy 55K array . The 600 K SNP chip contained 580,961 SNPs probes across 28 autosomes, two linkage groups (LGE64 and LGE22C19W28_E50C23), and two sex chromosomes. The 55 K SNP chip contained 52,184 SNPs probes across 28 autosomes and a sex chromosome (chrZ). After converting genome coordinates to a chicken reference genome (galGal5), 28 autosomes and a sex chromosome (chrZ) were extracted for further analyses. In the process of quality control of genotype, SNPs that minor allele frequency (MAF) smaller than 0.5%, genotyping call rate smaller than 97%, and Hardy–Weinberg equilibrium test P-value smaller than 1×10-6 were removed. Finally, 547,020 and 51,984 SNPs were remaining for 600 K and 55K chip data, respectively. In addition, a total of 23,213 SNPs was shared between 600 K and 55 K SNP chip.
Genotype imputation was performed with two-step approach from 55 to 600 K, and then imputed to WGS. Before Genotype imputation, pre-phasing was executed in Beagle 4.1 with default parameter . Firstly, using 450 birds with 600 K chip data as a reference panel, these 194 birds were imputed from 55 K to 600 K chip data using Beagle 4.0 with pedigree. And then merge 600 K chip data of these 194 birds and 450 birds using VCFtools. Secondly, all of 644 birds with 600 K chip data were imputed to WGS data using a combined reference panel Beagle 4.1 with default parameter. The combined reference panels included 24 key individuals from the yellow-feather dwarf broiler population and 311 birds with WGS data from diverse chicken breeds. These 24 key individuals were selected by maximizing the expected genetic relationships . These 311 birds were downloaded from 10 BioProjects in ENA or NCBI. The combined reference panels contained 36,840,795 SNPs probes across 28 autosomes and a sex chromosome (chrZ). More detail information could be found in our previous study .
After performed genotype imputation, quality control of the imputed WGS data was conducted using PLINK (Purcell et al., 2007) with the criteria of SNP call rate > 95%, individual call rate > 97%, MAF > 0.5%, and Hardy-Weinberg equilibrium P-value > 1.0e-6. In addition, individuals would be excluded who existed Mendelian errors. Finally, the remaining 626 individuals and 11,173,020 SNPs were used for further analysis.
Genetic parameter estimations
The genomic heritability was calculated using the average information restricted maximum likelihood (AI-REML) method implemented in the software DMU v6.0 (Madsen and Jensen, 2013). The statistical model was:
where y is a vector of phenotypic values of all individuals; b is the vector of fixed effects including batch effect; was the vector of the animal additive genetic effect, and ; e is the residual term, and ; and X and Z are incidence matrices relating the fixed effects and the additive genetic values to the phenotypic records; is the genomic relationship matrix calculated using 600 K chip data as follows :
where is a matrix of centered genotypes, m is the number of markers, and is the minor allele frequency of SNP .
Genome-wide association analyses using imputed WGS data
Before performed GWAS, the population structure of this chicken population was calculated by PLINK. And found a slight population stratification (Figure S1), so that we added top five principal components as covariates into the GWAS model to adjust population structure. The univariate tests of association were performed using a mixed model approach implemented in the GEMMA v0.98.1 software . All sequence variants after quality control were tested for associations. The model was:
where y is a vector of phenotypic values of all individuals; X and Z are incidence matrices relating the fixed effects and the additive genetic values to the phenotypic records; b is the vector of fixed effects including batch effect and top five PCs; g is a vector of the genomic breeding values of all individuals; a is the additive effect of the candidate variants to be tested for association; is a vector of the variants’ genotype indicator variable coded as 0, 1, or 2; and e is the residual term, . Genomic breeding values were assumed to be distributed as , where is the standardized relatedness matrix calculated by GEMMA using 600 K chip data. The Wald test was applied to test the alternative hypotheses of each SNP in the univariate models. And the variance contribution to additive genomic variance by a SNP was calculated as follows: , where is the additive genomic variance explained by a SNP, p is the allele frequency, and β is the SNP effect estimated from GWAS results.
The Manhattan plot and quantile-quantile plot (QQ plot) were generated by the qqman package (Turner 2014) in R. The threshold of genome-wide significant P-values was adjusted based on the effective number of independent tests for Bonferroni method. The imputed WGS data was pruned to 1,082,126 independent SNPs using PLINK with the command (--indep-pairwise 25 5 0.2). The effective number of independent tests were set to 200,629, estimated by sampleM. Therefore, the genome-wide suggestive and significant P-values were 4.98 × 10−6(1.00/200,629) and 2.49× 10−7 (0.05/200,629), respectively. For evaluating the extent of false positive signals of GWAS results, a genomic inflation factor (λ) was calculated as the median of the resulting chi-squared test statistics divided by the expected median of the chi-squared distribution with one degree of freedom (i.e., 0.454). Haploview software (Barrett et al. 2005) was used to analyze the linkage disequilibrium around the significant SNPs. For identifying the independent signals precisely, the most significant SNP were added as covariates into the univariate models in step-wise conditional analyses. In addition, the gene information file of chicken was downloaded from Ensembl gene build 94, and candidate genes were annotated using the software SnpEff version 4.3t .
Transcriptomic analysis identifies differentially expressed genes (DEGs) associated with RFI
Raw reads of four samples (sample45561 and sample46307 with high RFI, and sample45012 and sample46732 with low RFI) were downloaded from our previous study . And raw reads were processed to clean reads by filtering low quality reads and adaptor dimers. Clean reads were mapped to the chicken reference genome (galGal5) using HISAT2 v2.0.5 with the default parameter. Then, the alignments were assembled into full and partial transcripts using StringTie, and quantify the transcripts for each sample using the GAL5. Finally, the differential gene expression analysis was made with Ballgown in R environment . In this study, differentially expressed transcripts or genes were identified based on an adjusted p-value less than 0.05 (false discovery rate, FDR of 5 %) and the absolute value of log2-transformed of fold change (FC) more than or equal to one. Function and pathway enrichment were performed by the R language package (clusterProfiler). Using Benjamini-Hochberg method, the P-values of KEGG pathway and GO terms were adjusted for multiple testing . And an adjusted P-value less than 0.05 was set as significant P-values. In addition, the genome annotation information file was downloaded from Ensembl gene build 94.
Validation of candidate genes based on differentially expressed genes (DEGs)
The identification of candidate genes of lead SNPs from GWAS results was performed basing on their corresponding genomic positions. The candidate gene regions were defined as extended 50 kb flanking regions in both upstream and downstream of lead SNP position. If there are no genes in candidate gene regions, selected the nearest genes in both upstream and downstream of lead SNP position as the candidate genes. Compared the overlap genes or regions between the candidate genes from sequence-based GWAS and DEGs, to identify the causal genes or QTLs.
Significant SNPs compared with reported QTLs
To compare results from sequence-based GWAS with reported QTLs, significant and suggestively significant SNPs were selected to compare with the QTLs. These QTLs, all of which affect ADG, ADFI and RFI, were selected from the Animal QTLdb , respectively. QTLs closest to the significant SNPs were extracted.