The profile of the parental genomes and gene expression in different tissues, sexes of progenies
The two inbred chicken strains, Cornish-Game and White-Leghorn, which exhibit major differences in growth rate, egg production and behavior, were used to generate purebred and reciprocal hybrid F1 progenies (Fig. 1). In order to identify breed-specific variants, we sequenced the four parents of the two reciprocal crosses, recovering on average 100.73 million pair-end reads per sample after quality control. We identified on average 4.74 million SNPs per parental genome which were used to generate simulated parental genomes. From these SNPs, we picked SNPs which were homozygous in each parental bird but different from each other in a same cross (heterozygous in the hybrid progenies), resulting in two heterozygous SNP lists with on average 1.4 million of heterozygous SNPs for the two reciprocal crosses, individually, to identify the allele-specific RNA-seq reads of offspring in the following steps.
For each hybrid cross, we collected RNA-Seq data from brain, liver and muscle of three male and three female F1 progenies 1-day post-hatch. On average, we recovered 29.17 million mappable reads per sample. In order to eliminate the effect of the sex chromosomes, we removed all Z and W genes from our analysis and focused entirely on autosomal loci. We recovered significant differences in gene expression between different tissues, sexes and parents of origin (Fig. 2). Tissue was the most powerful factor for gene expression. The sex played a leading role in brain, and the strain effected gene expression of liver the most, while in muscle, the parent of origin seems more powerful because samples were divided into two parts by mother origin. We therefore retained all three variables in our subsequent analysis. It resulted in 12 treatment groups, comprised of 3 tissues, 2 sexes and 2 reciprocal crosses in this study.
An effective pipeline was applied on the allele-specific expression analysis
To identify the parental origin of offspring’s mRNA, we tried a new pipeline by using a R package of ‘asSeq’ [31]. Briefly, a set of R scripts are available for genotype phasing with the 1.4 million of heterozygous SNPs identified in the previous step. About 2% of these SNPs located on the exon region. The large number of SNPs provide higher chance that an RNA-seq read could overlap with a heterozygous genetic marker, hence is identified as allele-specific read.
In order to validate the accuracy of our ASE pipeline, we generated two artificial hybrid F1 libraries. Specifically, we concatenated two male brain RNA-seq fastq files from cross 1 and cross 4, which had roughly equal read depth. We also concatenated two female liver samples the same way. The two simulated hybrid libraries and four original purebred libraries were handled consistently with other hybrid libraries, using the heterozygous SNP lists of both cross 2 and cross 3. We compared the expression ratio of two simulated alleles (CG/WL) to the real expression ratio of two samples (CG/WL) for each gene. A strong correlation between the two measurements were observed (Fig. S1), indicating that our ASE analysis pipeline was robust. Since our pipeline only counted the local reads containing the heterozygous SNPs, we further assessed the correlation of expression fold change (CG/WL) between the local reads method and the method of counting total reads using edgeR [32-34]. The correlation was also strong (Fig. S2). These results proved the feasibility of our pipeline.
Genes were classified into different categories by the type of regulatory divergency
A total of 24,881 genes from the Ensembl v87 annotation were analyzed. About one fifth of them contained heterozygous SNPs and expressed in our progeny samples (Table S1). For the genes containing heterozygous SNPs, we observed significant expression difference (p-value <0.05, binomial test corrected for multiple comparisons by q-value method) between the purebred females (cross 1 vs. cross 4) for 14.71 % of them in brain, 36.45 % in liver and 38.38 % in muscle (take the heterozygous SNP list of cross 2 for example). In males, we observed that 17.64 % of genes in brain, 41.87 % in liver and 37.84 % in muscle expressed significantly different (Table S1).
Expressive genes were classified into different categories by the type of regulatory divergency [7, 35, 36] (Fig. 3A, B, Table 1, Fig. S3-S5). Most genes showed conserved or ambiguous expression, as expected given the recent divergence time of these two breeds. There were more than 70% genes in brain, more than 40% genes in liver, and around 50% genes in muscle classified as conserved. Nonetheless, we observed substantial cis- and trans- variation in the hybrid crosses. There were a greater proportion of trans-regulated gene expression variations than cis in most tissues and across both sexes, particularly in muscle (Fig. 3C).
Genes regulated by both cis- and trans-regulatory variations were divided into four categories: “cis+trans (same)”, “cis+trans (opposite)”, “cis×trans” and “compensatory”. Genes classified as “cis+trans (same)” show evidence of cis and trans variations acting in the same direction, while genes classified into the other three categories show evidence of cis and trans variations acting in opposite directions, with different expression preference on the two alleles. We observed that genes more often display the latter pattern, and most genes were classified as “compensatory” (Fig. 3C).
The gene proportion of each regulatory category was similar between different tissues and different sexes, except for some differences between the muscle and other two tissues (Fisher’s exact test, Table S2). Unexpectedly, we observed only few loci with consistent cis- or trans-regulatory divergence across different groups (Fig. S6). These stable cis- or trans-regulatory divergence genes appear to play important roles in phenotypic divergence. For example, the gene IGFBP2, TGFBI, PDGFRL and IGF2R all showed significant evidence of expression bias between our two breeds. These genes are related to the chicken growth, which could explain the difference of growth rate between two chicken breeds (Table S3).
Genes regulated by trans-acting variation show greater sequence conservation
We counted the number of variants located within 1 kb upstream of transcription start sites of each gene with the genome data of the four parents. The results showed greater variations at upstream of cis-regulatory divergence genes than trans-acted genes in all samples (Fig. 4A).
The ratio of the number of non-synonymous SNPs to the number of synonymous SNPs (pN/pS) in coding sequence of each gene was calculated in this research. We found that the value of pN/pS in genes regulated by trans-variant were lower than cis-regulatory divergence genes in all samples (Fig. 4B, Fig. S7-S8).