Allele Specific Expression in the Horse Genome
Recent data from the equine functional annotation of animal genomes (FAANG) initiative was leveraged for this analysis.10 Haplotypes for each horse/tissue sample were identified from Iso-seq data. Across all samples, we identified 87,174 heterozygous loci. Using these loci to differentiate alleles, and subsequently quantifying the nucleotide reads at these positions using associated short-read RNA data from the same horse/tissue sample, 42,900 allele expression events were compared. After filtering and performing statistical analyses described in Methods, we compiled this data into an allele expression resource. Using this resource, we identified 635 (1.48%) of allele expression events as having demonstrated ASE. ASE was found to occur in each tissue analyzed, with the liver containing the highest proportion of ASE occurrences (Table 1). Of the genes showing evidence of ASE, referred to here as allele specific differentially expressed genes (ASDEGs), 80 exhibited ASE in more than one tissue or sample (Figure 1).
Table 1. Distribution of Analyzed Genes Across Tissues
An overview of the alleles examined across multiple tissue types. The "Alleles Compared" column enumerates the number of allele comparisons within each specific tissue type (2 alleles for each comparison). The "Significant Allele Imbalance" column identifies the subset of these alleles that exhibited notable expression differences from the expected equilibrium in our study.
Investigating Heterozygous Loci
A total of 774 heterozygous loci were identified within ASE events. At these loci, variant effects were predicted, and approximately 43% were located within 3' untranslated regions (Table 2). A total of 497 (64.2%) of the identified variants in ASE events fell within histone modified regions. ASE events were most commonly found in association with H3K27ac peaks (n=377, 55.3%), followed by H3K4me3 peaks (n=293, 43.0%), H3K4me1 peaks (n=268, 39.4%), and H3K27me3 peaks (n=170, 24.9%). From the 497 ASE events identified as having SNPs associated with histone modification regions, 369 (74.2%) showed overlap of multiple histone marks. The three most common overlapping histone modification regions with identified variants were H3K4me3 and H3K27ac, H3K27ac and H3K4me1, and H3K27ac with H3K4me1 and H3K4me3 (Figure 2).
Table 2. Variant Types Detected in ASE Events
Variant types identified in allele-specific expression events within the study's sample set. Variant predictions were made using VEP. 25
Differentially Expressed Gene Enrichment Analysis
We identified 168 KEGG pathways containing a significant proportion of ASDEGs, including metabolic pathways, endocytosis, and the Ras signaling pathway. In our study, the liver contained the greatest number of pathways significantly impacted by ASE (Figure 3).
Validation of Allele Specific Variants
To validate our putatively identified ASE loci in liver tissue, we examined the loci in a larger dataset of liver tissue, consisting of 66 samples from horses of various breeds. All 155 heterozygous loci identified in liver tissue from our original FAANG horses were tested in the validation set, resulting in 8,849 heterozygous loci expression comparisons. Specifically, 7,436 (84%) of the comparisons made across all n=66 samples in our validation set were confirmed to show ASE, with 7019 (94.4%) of these comparisons in the same direction (i.e. allele A demonstrates higher expression and allele B demonstrates lower expression: Figure 4). Of the 155 ASE loci we tested, 96.7% showed ASE in at least one of the samples used in our validation set, with 85 (54.8%) showing ASE in at least 90% of the n=66 samples tested (Supplementary Figure 4). There was no specific effect of breed (data not shown).