Read alignments and Principal Component Analysis
We evaluated alignment results of different species within the Bovinae subfamily against the latest cattle reference sequence ARS-UCD1.2[16]. On average, the genome was covered by ~ 5x for banteng, taurine cattle, European bison, gayal, and yak, ~4x for American bison and zebu cattle, and ~ 3x for aurochs. Principle component analysis (PCA) formed clusters and separation of individuals among these nine groups (Fig. 1). Four principal components (PC) explained 36.7%, 24.9%, 20.5%, and 17.7% of the variance for first, second, third, and fourth PC, respectively. Projected by the PC1 and PC2, these Bovinae individuals are clustered together with its closest relatives evidencing genetic relatedness within its sub-species. PC1 explains divergence of cattle (aurochs, zebu, and taurine), from the rest. PC2 gives divergence between cluster containing gagaba from clusters containing yak and bison. Thus, we can group these individuals into four, namely cattle-aurochs cluster, gagaba cluster, bison cluster, and yak cluster. Outlier individuals, i.e. two gayals and the American bison, may indicate individuals carrying introgression from cattle.
Phylogenetic trees
Maximum Likelihood phylogenetic trees were constructed for each chromosome [see Additional file 1]. Inferred trees were all similar with Fig. 2 below displaying the tree from chromosome one. In concordance with the principal component analysis, 13 yak individuals are situated together in the top clade of the tree. European bison and American bison have the same node of ancestor, with American bison perceived to be more ancestral. This is in line with a previous study where sister relationships were indicated between American bison and European bison and also between bison clade and yak [17]. Banteng-gaur-gayal share a clade together, however, variations in the order within these three species exist in trees inferred from different chromosomes [see Additional file 1]. Zebu cattle reside on the same upper node with the taurine cattle group. Each breed of taurine cattle is well clustered together except for several Holstein individuals. Based on all trees, we defined yak as the most distant relative as it is positioned on the furthest node from cattle.
Inferring ancestral allelic states
The main output of this paper is a list of defined ancestral alleles for cattle [see in Additional file 2]. This list is necessary for several tools used for studying selection signature such as iSAFE, iHS, xp-EHH, EHHST, and hapFLK [18–23] which were built for human population genetics study. We provide this dataset as a foundation for future comparisons of selection signatures in various cattle breeds. It is stored in a simple format of .txt and comprised of 6 columns of chromosome, position, number of alleles, defined ancestral allele, frequency, and which groups agree on the defined ancestral allele. AA were determined as alleles that are fixed in two of three outgroup lineages. Using allele frequency over all individuals in outgroup, we defined ~ 32.4 million variants that are fixed across 29 chromosomes as AA corresponding to 1.2% of the total genome. A full list of these alleles with their positions is provided [see Additional file 2]. As in Fig. 1, 3.75 million alleles were defined as ancestral from all three lineages of bison, yak, and gayal-gaur-banteng (gagaba). GC content percentage of ancestral alleles is 58 percent, which is higher than the GC content of the reference genome (~ 42%). Yet, it is worth noting that 22% of these AA are within active transcript regions.
Windows with high ancestral allele counts in taurine and zebu cattle
We counted AA by non-overlapping windows of 10 Kb in taurine and zebu cattle separately. Figures 4 and 5 present the distribution of AA on chromosome 27 for taurine and zebu, respectively (The distribution of AA for all chromosomes can be found in Additional file 3). For taurine cattle, ancestral allele counts arguably tend to decrease towards the end of chromosome, as demonstrated by the fitted red trend lines. In zebu cattle, ancestral counts are relatively flat throughout the chromosome. Yet, the amplitude pattern is stable for taurine, but more variable for zebu cattle (blue trend line). Peaks of high ancestral alleles count regions in contrast with background averages number of ancestral alleles are clearly distinguished in chromosome 1, 4, 5, 7, 10, 12, 13, 14, 15, 18, 27, 29 in taurine cattle and 1, 2, 3, 4, 6, 10, 12, 13, 14, 15, 18, 23, 27 in zebu cattle [see Additional file 3].
Ancestral counts for the top 0.1% are beyond the mean plus three standard deviations. For taurine cattle, the lowest chromosome specific threshold for ancestral count was 122 on chromosome 25 while the highest was 302 on chromosome 14, while for zebu cattle, it was 102 in chromosome 1 while the highest 200 on chromosome 12. The trends for both groups were similar as shown in Fig. 6. Taurine cattle has mostly higher thresholds implying there are more windows with higher counts of AA compared to zebu cattle.
Windows without the occurrence of ancestral alleles
We found 3306 windows without AA in taurine and 2189 windows in zebu. The highest ratio of windows with null AA counts to total windows was 2.9% on chromosome 29 in taurine and the lowest is 0.14% in chromosome 25 of zebu cattle (Fig. 7). Overall, taurine has more windows without AA except for chromosome 1, 8, 10, and 27. Windows without AA could be explained by a lack of defined AA from outgroups, meaning, there were no fixed alleles that can be found in at least two lineages. Another reason could be that derived alleles are now the major alleles on polymorphic sites, therefore we could not find AA within these windows. In taurine cattle, 65% of windows without AA are due to the latter reason, while in zebu it is 46%.
Annotation of scanning windows with high number of ancestral alleles
We annotated each scanning window passing the respective threshold of top 0.1%, corresponding to 255 regions in taurine and 258 regions in zebu across 29 chromosomes. These regions contained 20 genes in taurine and 40 genes in Zebu. Both groups retained genes functioning in arachidonic acid secretion (GO:0050482), phospholipid metabolic process (GO:0006644), and lipid catabolic process (GO:0016042) indicated by LOC100125947 and PLAG2A, as shown in Table 1. These three terms are mainly functioning in primary metabolic process of lipid. Function of defense response to bacterium (GO:0042742) was exclusive to taurine. DEFB genes family in GO:004742 were secreted by leukocytes and epithelial tissues. It is known for its function similar to antimicrobial defense by penetration to microbial’s cell membrane and cause microbial death [24]. While calcium ion imports (GO:0070509), represented by SLC8A1 and CACNA1D, was exclusive to zebu defined as function of maintaining and transporting cellular entity in a specific location.
Table 1
GO terms of genes indicated by high count ancestral alleles
GOTerm
|
Function
|
Count
|
PValue
|
Genes
|
Fold Enrichment
|
Bonferroni
|
Taurine
|
|
|
|
|
|
|
GO:0050482
|
Arachidonic acid secretion
|
3
|
5.0E-04
|
LOC100125947, PLA2G2A
|
84.12
|
0.02
|
GO:0006644
|
Phospholipid metabolic process
|
3
|
7.7E-04
|
LOC100125947, PLA2G2A
|
67.76
|
0.03
|
GO:0016042
|
Lipid catabolic process
|
3
|
2.9E-03
|
LOC100125947, PLA2G2A
|
34.85
|
0.10
|
GO:0042742
|
Defense response to bacterium
|
2
|
9.0E-02
|
DEFB7, DEFB3
|
20.08
|
0.97
|
Zebu
|
|
|
|
|
|
|
GO:0050482
|
Arachidonic acid secretion
|
3
|
8.7E-04
|
LOC100125947, PLA2G2A
|
65.00
|
0.06
|
GO:0006644
|
Phospholipid metabolic process
|
3
|
1.3E-03
|
LOC100125947, PLA2G2A
|
52.36
|
0.10
|
GO:0016042
|
Lipid catabolic process
|
3
|
5.0E-03
|
LOC100125947, PLA2G2A
|
26.93
|
0.32
|
GO:0070509
|
Calcium ion import
|
2
|
2.4E-02
|
SLC8A1, CACNA1D
|
78.55
|
0.85
|
Annotation of scanning windows without ancestral alleles
There were 713 windows in taurine with protein coding genes, while in zebu 121 windows were found. GO terms of regions within scanning windows without AA are attached [see Additional file 4]. There are 42 GO terms defined for taurine and 7 GO terms for zebu. Among those, three terms were found in both, i.e. two antigen processing terms (GO: GO:0002474 and GO:0019882) and negative regulation of endopeptidase activity (GO:0010951).
In taurine cattle, apart from terms related to immune system process and cellular function, there are GO terms exclusive to taurine cattle that are related to production traits. For example, GO:0008654, GO:0043410, GO:0045725, GO:0060048, GO:0008016, are related to metabolic process of phospholipid, protein, glycogen, and regulation of muscle and heart contraction. GO:0007613 and GO:0035176 are related to mental information processing systems and is part of learning or memory abilities which can affect cognition and behavior as indicated by CRTC1, TH, ITPR3, DBH, SORCS3 genes. ITPR3 is known as well for process of sensory perception of taste. CRTC1 gene in human has highest transcript expression in brain compared to other tissues and is known for affecting eating behavior [25].
GO:0009611, GO:0071364, GO:0071560 and GO:0008286 are related to response of stimulus such as stress from wounding and transforming growth factor. GO:0048469, GO:0010976, GO:0060425, GO:0002062, are terms related to development of cell, neuron, lung morphogenesis and chondrocyte differentiation in cartilage outgrowth as part of skeletal system and animal organ development as pointed by PTH1R, COL2A1, COL11A2, WNT7A, RUNX3, SOX10, GATA2, PTH1R, and SOX18 genes.
Regions without AA in zebu were mainly related to 5 GO terms in domain of immune response and one term related to cellular process of transmembrane transport. Figure 8 represented distribution of terms found in regions without AA. It is dominated by metabolism terms in taurine and immune response in zebu.