All chickens were obtained from the fast-growing white-feathered pure line, produced by Xinguang Agricultural and Animal Industrials Co., Ltd. (Mile, China). This line was selected for eight generations for high body weight and feed efficiency traits. RFI testing was conducted on a total of 464 broilers. They were housed in identical individual cages (length × width × height, 30 × 25 × 45 cm) and fed ad libitum. Each day, the amount of fresh feed provided was recorded individually, and residual feed was recorded daily and removed for an intervening period during 28–40 days of age. During this period, the animals were fed a corn-soybean meal diet, and detailed information about the diet is described in Additional file 1: Table S1. The bodyweight of each chicken at 28 and 40 days of age was measured using an electronic scale. The RFI calculation method was described by Li et al. . The descriptive statistics of these phenotypes are summarised in Additional file 2: Table S2. The correlation coefficient between RFI and ADFI was 0.61, and significant correlations were found in coefficients between RFI and FCR (0.79), the ratio of the breast (-0.23), and abdominal fat ratio (0.37) (Additional file 3: Figure S1).
At the age of 41 days, whole blood was collected from each bird from the wing vein using a vacuum blood tube. Furthermore, broilers were sacrificed after 2 hours of the last feed to allow time for the feed to have been digested in the GI-tract. Each bird was then euthanized by cervical dislocation. The abdominal fat tissue, whole breast muscle and thigh on the right side were carefully dissected and weighed promptly with an electronic balance (0.1 g precision). Moreover, the caecal contents (including chyme and mucosa) were collected immediately. All the samples were snap-frozen in liquid nitrogen, transported to the laboratory and stored at − 80°C for subsequent studies.
Genotyping and Quality Control
Genomic DNA was extracted from blood samples with the phenol-chloroform method. In total, 300 broilers were resequenced with 150 bp paired-end reads on an Illumina NovaSeq 6000 platform with an average depth of approximately 10×1 L coverage conducted by Beijing Compass Biotechnology Co., Ltd. (Beijing, China). Variant calling was performed according to a standardised bioinformatics pipeline for all samples [39, 40]. Specifically, clean sequencing data were aligned to the chicken reference genome (GRCg6a/galGal6: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/-000/002/315/GCF_000002315.6_GRCg6a/) with the Burrows-Wheeler Aligner (BWA)-MEM algorithm . Then, PCR duplicates were removed, and local indel realignment and base quality score recalibration were performed with the Genome Analysis Toolkit (GATK version 3.5) . Variant calling was performed via HaplotypeCaller in GVCF mode with joint genotyping on all samples. Finally, SNPs were filtered with the GATK VariantFiltration protocol. The filtering settings were as follows: variant confidence score (QUAL) < 30.0, QualByDepth (QD) < 2.0, ReadPosRankSum < − 8.0, total depth of coverage (DP) < 4.0, and FisherStrand (FS) > 60.0. In addition, quality control of the reference panel was conducted with the criteria of MAF ≥ 0.05, only bi-allelic sites, genotyping missing < 0.2, mean depth value is between 3 and 30, and site quality value is higher than 30. After filtering, 9,540,946 autosome variants remained for the 486 sequenced birds, LD decay was conducted by PopLDdecay , and the average LD level in a 5-kb interval was 0.17 (Additional file 4: Figure S2).
16S rRNA gene sequencing and analysis
Three hundred caecal samples were used to conduct 16S rRNA amplicon sequencing. The total DNA of caecal content was extracted by a QIAamp DNA Stool Mini Kit (QIAGEN, Hilden, Germany). Eight caecal samples were excluded because of DNA extraction failure. Finally, 292 microbial DNA samples were used for 16S rRNA sequencing. Seventy-nine top-ranked and seventy-nine bottom-ranked RFI samples were assigned as high and low groups. The V4 region of the 16S rRNA gene was amplified using the primer pair 515F/806R (5′-GTGCCAGCMGCCGCGGTAA-3′ and 5′-GGACTACHVGGGTWTCTAAT-3′), and the amplicons were purified and quantified using Agencourt AMPure Beads and the PicoGreen dsDNA Assay Kit (Invitrogen, Carlsbad, CA, USA), respectively. After quantification, the barcoded V4 amplicons were pooled and subsequently sequenced using an Illumina MiSeq platform (Illumina, San Diego, CA) to generate 300 bp paired-end reads at Shenzhen BGI Technology Services Co., Ltd. For each sample, there were approximately 50000 clean reads. Amplicon sequencing bioinformatics was performed with EasyAmplicon v1.0 . Paired-end sequence data were merged, quality filtered, and dereplication using VSEARCH v2.15 subcommand --fastq_mergepairs, --fastx_filter, and --derep_fulllength, respectively . Then, the non-redundancy sequences were denoising into amplicon sequence variants (ASVs) with USEARCH v10.0  (via -cluster_otus or unoise3). Chimaera was removed by VSEARCH --uchime_ref against the SILVA database . Feature tables were created by vsearch --usearch_global. The taxonomy of the features was classified by the USEARCH sintax algorithm in SILVA v123. Diversity analysis was carried out using the vegan v2.5-6 package (https://cran.r-project.org/web/packages/vegan/), and visualised by using the ggplot2 v3.3.2 (https://cran.r-project.org/web/packages/qqman/) package in R v4.0.2. LEfSe was conducted with the online platform ImageGP (http://www.ehbio.com/ImageGP-/index.php/Home/Index/LEFSe.html). Functional profile prediction of microbial communities was conducted by PICRUSt , with the green genes as the reference database.
SCFA concentration determination
Three hundred caecal samples, the same as the those used for amplicon sequencing population, were used for SCFA concentration determination. Briefly, samples were thawed on ice, approximately 50 g of sample was added to 400 µl of saturated sodium chloride solution, and 50 µl 3 mmol of saturated sodium chloride solution of hydrochloric acid was added. Ultrasonic oscillation at low temperature was conducted for 20 min. Then, 500 µl ether was added, oscillated sufficiently, and extracted for 10 mins. Next, the supernatant was centrifuged for 10 mins at 12000 r/min and 4℃. Then, 50 mg anhydrous sodium sulfate was added into the supernatant and oscillated for 3 mins. Finally, the mixture was centrifuged at 4500 r/min and 4℃ for 5 mins, and the supernatant was used for analysis. A total of 2 µl of solution was analysed by a TRACE1300-TSQ9000 gas chromatography-mass spectrometry (GC-MS) instrument (Thermo Fisher Scientific, Waltham, MA, USA) at Shenzhen BGI Technology Services Co., Ltd. To determine the absolute SCFA concentration, SCFA standards were prepared at different dilutions with ultrapure water. Then, the protocol described above was conducted to generate standard curves for the seven SCFAs.
Evaluating the effects of host genetics on SCFAs
The GWAS for SCFA concentrations was performed directly using the univariate linear mixed model (LMM) implemented in GEMMA version 0.98.1 software (https://github.com/genetics-statistics/GEMMA/releases) . The concentration of SCFAs was log-2 transformed to make them follow a normal distribution (Additional file 5 and 6: Figure S3 and S4). A previous study described the detailed GWAS model in detail . SNP-based heritability is implemented in GCTA (ver 1.93.3) .
The genome-wide significance was assessed using the GEC method  to infer effective independent tests. A total of 9,540,946 independent tests over all chromosomal SNPs were obtained, and 8,562,703 SNPs were retained. Then, genome-wide significant and suggestive thresholds were set to 5.84e-9 (0.05/8,562,703) and 1.17e-7(1/8,562,703), respectively. Manhattan and Q-Q plots were constructed for each trait by the qqman package (https://cran.r-project.org/web/packages/qqman/) in R (version 4.1.0). SNP positions were updated according to the GRCg6a genome version from NCBI. Identifying the closest genes to genome-wide significant and suggestive variants was performed using NCBI annotation of the GRCg6a genome version (https://www.ncbi.nlm.nih.gov/data-hub/gene/table/taxon/9031/). The variance in SCFAs explained by SNPs from GWAS results was calculated by the formula described by Shim et al .
Identification of the specific microbiota association
The associations between qualified taxa, feed efficiency and SCFA traits were analysed using a two-part model described by Fu et al . This model accounts for both binary (present and absent) and quantitative features and is described as follows:
where y is the RFI value or SCFA concentration, b is a binary feature of a specific microorganism and coded as 0 for absent or 1 for present for each sample, and q is the log10-transformed abundance of a specific microorganism. β1 and β2 are the regression coefficients for the binary and quantitative models, respectively, and e is the intercept. The second part of the quantitative analysis was only for the samples in which the specific microorganism was present. The details of the two-part model are illustrated in Additional file 7: Figure S5. P values were obtained from the two-part model association analysis and adjusted by the BH method. If the adjusted P value from the binary model was less than 0.05, the presence or absence of microorganisms was considered to influence feed efficiency. If the adjusted P value from the quantitative model was less than 0.05, feed efficiency was considered to be associated with the relative abundances of the microorganisms. If the combined P value was less than 0.05, feed efficiency was considered to be associated with both the relative abundances of the microorganisms and the presence or absence of microorganisms.
A Spearman correlation analysis between microbiotas and RFI and FCR was conducted to detect specific microorganisms that significantly influenced feed efficiency. A microorganism was considered to have a significant effect if the adjusted P values from the two-part model association analysis and Spearman correlation were less than 0.05.
Statistics, plotting and others
Welch’s t-test detected the difference in the pathways/taxa relative abundance with FDR correction in STAMP v2.1.3 . Some scripts about data format are from Microbiome helper . All the pipeline, training materials and related scripts were deposited in the EasyAmplicon project in GitHub (https://github.com/YongxinLiu/EasyAmplicon2019). The plots were generated by the ggplot2 package (https://cran.r-project.org/web/packages/ggplot2/) in the R program (version 4.1.0).