Mapping statistics and CNV detection
In this study, we performed whole genome sequencing in nine different lines (Fig. 1) of female chickens using Illumina paired-end libraries. The average number of raw reads were approximately 242.26, 230.56, 127.06, 31.39, 34.50, 42.12, 61.06, 57.68 and 37.72 million for lines 63, 72, F1, and RCS-A, D, J, L, M, X, respectively (Additional file 1: Table S1). After quality control, an average of 30.42 to 226.30 million reads of each chicken line were successfully aligned to the reference genome with the mapping levels ranging from 90.04% to 98.10% for all the individuals. The sequencing effective depth varied from an average of 5.95× for six RCSs to an average of 19.84× for lines 63, 72 and their F1 hybrid, and the average coverage with respect to the reference genome was 88.25%. These high quality alignments were confident for the subsequent analysis with a minimum of false positives.
We then applied the CNVnator software for CNV detection and the average number of CNVs per individual was 1,888, ranging from 1,368 in line RCS-A to 2,476 in line RCS-J that passed our stringent filtering criteria. The size of these CNVs ranges from 1 Kb to 9.56 Mb, with an average of 95.56 Kb. Detailed statistics of CNV calls are shown in Additional file 2: Table S2. A total of 1,649 CNV regions (CNVRs) (Table 1) allowing for CNV overlaps of 1 bp or greater were obtained across all the chicken lines after merging, covering autosomes 1-28, and sex chromosomes Z and W. The chicken CNV map across the genome is shown in Fig. 2. The length of CNVRs ranged from 1 Kb to 18.19 Mb with an average of 0.36 Mb. In total, 1,200 (72.8%) out of all CNVRs had sizes varying from 1 to 200 Kb (Fig. 3A). The count of CNVRs on each chromosome was directly proportional to the chromosome length, and five macrochromosomes (chr1-5) possessed a large proportion (874, 53.0%) of all putative CNVRs. The number of CNVRs in different chicken lines varied greatly, ranging from 536 in RCS-L to 852 in RCS-A. Among all CNVRs, 495 (30.02%) were present in only one chicken line and 90 (5.46%) CNVRs are shared in all the nine chicken lines (Fig. 3B). In addition, the CNVRs belonging to gain and loss account for 47.1% (776 events) and 52.9% (873 events), respectively.
Gene annotation and functional analysis
The genes harbored in the inferred CNVRs were extracted using custom Perl scripts. As a result, a total of 2,588 RefSeq genes within the regions of the 1,649 CNVRs were obtained, where a majority of these genes were involved in immune, tumor and diseases. The identified genes were submitted to DAVID (version 6.8) for GO and pathway enrichment analyses. Using functional annotation clustering, at the highest classification stringency, 145 clusters were formed, where only 9 clusters were chosen after using an enrichment cutoff at > 1.0 (Additional file 3: Table S3). GO terms and KEGG pathways analyses invoked in DAVID yielded 36 significant enriched (28 terms of biological process, 2 terms of cellular component, and 6 terms of molecular function) functional terms (P < 0.05, Fig. 4), and 6 significant enriched pathways (P < 0.05, Table 2), including the JAK/STAT signaling pathway (gga04630, Additional file 4: Figure S1). The detailed information of all the GO terms and pathways are shown in Additional file 5: Table S4.
PCA analysis and cluster
To investigate genetic structure in nine inbred chicken lines, we performed a principal component analysis (PCA) using the CNV segments by custom R scripts. Nine principal components were calculated and the first two PCs were used in the plot (Fig. 5A). The nine lines were clustered to approximate four groups with the similar patterns, as indicated by dashed circles (Fig. 5A), which were consistent with their susceptibility to MD (Fig. 5B). Lines RCS-A, M and 72 were well clustered together with high MD incidence. Lines RCS-D, J, L and X were clustered together with high resistance to MD. Interestingly enough, as expected, F1 individuals with a medium MD incidence were in an intermediary position between line 72 and line 63, which provided the theoretical basis of heterosis and identification of imprinting genes for disease resistance.
Shared versus line-specific CNVRs
To investigate how frequently CNVRs were shared or lineage-specific across different lines, we calculated the CNVR numbers among the nine inbred chicken lines (Table 1). In total, 90 CNVRs were detected across all the individuals, which represented the commonly shared CNVRs. A total of 55, 44, 82, 15, 14, 72, 190, 18, and 5 CNVRs were detected as line-specific CNVRs in line 63, 72, F1, RCS-A, D, J, L, M, and X, respectively, as compared to other lines (Table1, Additional file 6: Table S5). Importantly, the line 63 and 72 lineage-specific CNVRs could potentially offer certain clues to explore the genetic mechanisms of MD resistance or susceptibility. So, a total of 62 and 57 harbored genes were identified in line 63 and 72, respectively, including several immune-, tumor- and disease-related genes, such as interferon regulatory factor 2 (IRF2), which was a line 72 line-specific gene in this study (Additional file 7: Table S6). Interestingly, our lab also found a MD-resistant associated differentially methylated region (DMR, chr4: 38,999,001-39,000,000), which was hypermethylated in line 63 than 72, in the previous DNA methylation study (not published). The harbored region was also IRF2, which is involved in immune response IFN alpha/beta signaling pathway. This gene could be a candidate gene associated with MD susceptibility.
CNVRs validation by qRT-PCR
To confirm the identified CNVRs, 10 CNVRs containing gains (duplications) and losses (deletions) detected here were validated by qRT-PCR using two reference genes (THRSP and β-actin). We found that all of the 10 CNVRs were confirmed in agreement with the CNVnator results (Fig. 6A), further supporting the reliability of the detected CNVRs. We also performed a qRT-PCR validation on two line 72 lineage-specific deletion CNVRs: CNVR6 (chr4: 38,999,001-39,000,200, harbored gene: IRF2) and CNVR7 (chr4: 82,407,001-82,409,800, harbored gene: MAX dimerization protein 4, MXD4). For CNVR6, a total deletion was detected in line 72, while line 63 had a normal status. For CNVR7, line 72 had two third of the normal copy numbers, while line 63 also had a normal status (Fig. 6B). Therefore, the copy numbers of these two loci were found significantly lower in line 72 as compared to line 63, again supporting our CNV calls and suggesting that they are potentially linked to MD susceptibility.
Comparison with other studies on CNV in chickens
Considering that most of the previous CNV detection studies based on the galGal2 and galGal3 genome assembly, coordinates of the CNVRs were converted using the UCSC liftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). We migrated all chromosomal CNVRs from galGal2 and galGal3 (used in previous studies) to galGal4. We eventually obtained 585 CNVRs in the present study for comparison. Our results were then compared to 12 previous reports on chicken genomic CNV (Table 3). As a result, about 3.7%, 5.2% and 4.1% of the Crooijmans et al.’s [27], Tian et al.’s [28] and Yi et al.’s [25] results can be validated in our study, respectively. Moreover, about 12.1%, 4.0% and 8.8% of the Luo et al.’s [20], Yan et al.’s [21] and Xu et al.’s [22] results that also involved in MD were validated in our study. Taken together, 42.2% of our CNVRs overlapped with these three MD studies. The detailed information of CNVRs identified in this study and previous studies is provided in Additional file 8: Table S7.