Whole genome sequencing, assembly and variant identification
Genomic DNA from the cattle breeds Gudali, White Fulani, Red Fulani, Namchi (Doayo) and Kapsiki were sequenced with the Illumina HiSeq4000 sequencing platform and libraries were sequenced using 150-bp paired-end reads. This generated a total of 835 Gb of raw reads with an average of 167 Gb per sample which provides, to the best of our knowledge, the first comprehensive set of full genome variant data of these breeds. The average genome-wide sequence coverage from the mapped reads ranged from 22.8× for Namchi (Doayo) up to 30.8× for Red Fulani (Table 1). This lies in the range of other cattle re-sequencing studies published [14, 19, 20] whereas the depth of coverage is fairly high in comparison to 10.8 and 15.8-fold coverage obtained by Kim et al. [15] and Kawahara-Miki et al. [19], respectively. Taylor et al. [21] have suggested that about 95% of the total variants within the genome of cattle are discovered at an average sequence depth of 23.3x which implies that the data obtained in this study is sufficient to detect SNPs and Indels variants with high confidence.
The chosen approach of high depth sequencing yielded approximately 109 reads per sample (Table 1) which allowed us to obtain a high coverage per animal tested. However, it also resulted in a relatively low percentage of uniquely mapped reads when aligned to the reference genome (Hereford breed UDM3.1) that was subsequently used for variant calling (ranging from 63% to 65% mapped reads for the Cameroonian cattle breeds, Table 1). This result is consistent with the 60% of uniquely mapped reads by Kawahara-Miki et al. on Japanese Kuchinoshima-Ushi bulls [19]. However, while using the same UMD3.1 cattle reference genome, our mapping rates were markedly lower than the 98.5% reported by Kim et al. [15] from other indigenous East African cattle breeds (Ankole, Boran and Ogaden). Our rather low mapping rate could be explained either by the PCR-free preparation of sequencing libraries in our case which implies that bovine DNA and non-bovine DNA such as blood microbes and parasites could have been sequenced at similar rates or that the African cattle breed samples chosen are evolutionarily more distant compared to the reference genome and therefore contain sequences of genomics regions not present in the UMD3.1 cattle reference genome.
Variant calling results
A total of 50 million (M) SNPs were identified in this study of which 2.68 M (18.7 %) were not found in dbSNP and considered as novels variants (Table 1; Fig. 1A, Supplemental file Fig. S2). Similar results were obtained by Stafuzza et al. [22] on Gyr, Girolando, Gruzerat and Holstein cattle breeds from Brazil. The ratio of the number of heterozygous to homozygous SNP variants were different across the cattle breeds. Brahman and Namchi (Doayo) had the lowest rate, whereas Kapsiki had the highest (Table 1, Supplemental file Fig. S1). The low ratio of heterozygous to homozygous in Brahman and Namchi (Doayo) cattle could mean that they experience admixture, as reported by Freemann et al. [23] in African taurines from Cameroon. On average, 0.58 M (8%) of the detected variants had small insertions and deletions (Indels, Table 1, Fig. 1B).
De novo Assembly and analysis of unmapped genomic sequence reads
In order to better understand the low mapping rate, unmapped reads were assembled into contigs using the de novo sequence assembler ABySS and compared against the NCBI Blastn database. These results did not support the hypothesis of microbial and parasitic DNA contamination that could be sequenced at similar rates as the host DNA using the PCR free library preparation protocol. Rather, it supported the idea that the breeds analyzed here are evolutionary more distant compared to the reference genome. Bos mutus was found as a best scoring Blast results in 65% of the unmapped Blastn alignments in all samples, followed by Ovis canadensis with 17% of the Blastn alignments [Supplemental file 3 Figure S 3, Supplement file 5 Table Supplemental S1]. These findings indicate that the most common sequences of the unmapped read contigs were those of other Bovidae. The mean sequence identity for the Bos mutus Blastn hits was at 98% with an average coverage of 700bp, and 92% sequence identity with an average coverage of 650bp for Ovis canadensis indicating that these reads are derived from Bovidae but have not been found in the reference genome used for read mapping. Bos taurus and Bos indicus were only found in ~3% and 1% of the Blastn hits of the unmapped reads, respectively, which demonstrates that most of the reads originating from Bos taurus and Bos indicus were correctly mapped. We postulate that this high percentage of reads deriving from other Bovidae, might arise due to the evolutionary divergence of Cameroonian cattle breeds to the other investigated breeds. There were no obvious differences in Blastn results found when comparing African Zebu cattle with Namchi (Doayo) and Kapsiki [Supplemental file 3 Figure S3, Table Supplemental S1] although it seems conceivable to expect Namchi (Doayo) and Kapsiki breeds rather distinct to the reference genome when compared to the Zebu cattle. Further investigation using tools that can measure levels of hybridization is needed in order to solve this in the future. Furthermore, the construction of an African breed reference genome or an African pan-genome might help to optimize genome research on African cattle breeds.
Among the species that cover at least 0.5% of the total scoring Blast results, most were of vertebrate origin. Exceptions of the invertebrate kingdom were Trichogramma pretiosum in the Brahman control sample, and the bacteria Lelliottia nimipressuralis and Enterobacter spec in the White Fulani sample (see Supplemental Table S1), albeit all at very low levels (2.4%, 0.8% and 0.6% of the total Blast Scoring results. At even lower rates also Babesia spp., a blood-borne parasite known to cause Texas fever in cattle and Theileria spp., a cosmopolitan blood parasite of cattle and blood-invading bacteria of the Anaplasma genus were also detected in Namchi (Doayo) [see Supplement file 5 Table Supplemental S1]. Although these finding are only supported by a very low number of alignments of assembled contigs to the blastn database, this data is still in line with a recent epizootiological survey in the same indigenous Cameroonian cattle breeds which revealed that nearly 90% of animals were infected with tick-borne bacterial, piroplasmid and protozoan pathogens [24, 25].
Genetic variability and similarity across breeds
The largest number of SNPs was found in Zebu breeds Brahman, Red Fulani, Gudali and White Fulani, respectively. When looking at the SNP distribution across the taurine breeds the lowest numbers were found in Holstein and N’Dama as compared to Kapsiki and Namchi (Doayo) cattle (Table 1). A total of 1,013,395 SNPs were common across all breeds, and 121,776 SNPs were Zebu-specific, distributed between Brahman, Red Fulani, White Fulani and Gudali cattle breeds. More surprisingly, there were no SNPs exclusively shared between the European taurine Holstein and WAS taurine (N’Dama, Kapsiki and Namchi (Doayo), Fig. 2), apart from 73,366 SNPs which were shared between N’Dama and Kapsiki only. Furthermore, 85,307 SNPs were common between all tested cattle breeds except Brahman cattle. The highest proportion of breed-specific (bs) SNPs were found in Bos indicus: Brahman (759,804), Red Fulani (473,688), Gudali (461,043) and White Fulani (420,114), respectively, and the lowest breed-specific SNPs were found on taurine breeds N’dama (220,302), Holstein (328,560), Kapsiki (370,074) and Namchi (Doayo) (402,114), respectively (bs SNPs are color labelled in Fig. 2). This apparently lower genetic diversity in African taurine breeds was already earlier argued by Kim et al. [15], who linked it to the low effective population size and/or population bottlenecks following fatal disease outbreaks such as the Rinderpest. In contrast, indicine Zebu cattle and composites with larger effective population size exhibit a higher level of nucleotide diversity. Furthermore, the higher nucleotide diversity of taurine Namchi (Doayo) and Kapsiki as compared to N’Dama and Holstein may be due to the long history of Bos indicus introgression [23].
The density of variants per chromosome was proportional to the chromosome length, except for the X chromosomes which showed a lower number of variants identified (Supplemental file Fig. S2). These findings were expected because the DNA of X chromosomes is subject to an increased natural selection, which leads to less genetic diversity.
Breed clustering and relationships
The cluster relationship between breeds was analyzed by a principal component analysis (PCA) using all autosomal SNPs (Fig. 3A). The first two principal components explained 22% and 16% of the total variance, respectively. Except for Namchi (Doayo), the other WAS breeds N’Dama, and Kapsiki form a separate cluster from WAZ breeds. The WAS breeds N’dama, and Kapsiki are also closer to European taurine Holstein than WAZ breeds and both, WAS and WAZ are clearly separated from Zebu Brahman. This indicates the possibility of admixture events between the West African cattle breeds. To further understand the genetic network among those breeds a phylogenetic tree analysis (Fig. 3B) was carried out with the same autosomal SNPs data as for PCA analysis by using Randomized Accelerated Maximum Likelihood models (RAxML). Again, except for Namchi (Doayo), the Bos taurus breeds Kapsiki, N’Dama and Holstein cluster together while the B. indicus breeds White Fulani, Gudali, Brahman clustered on a separate clade. The WAS Kapsiki and Namchi (Doayo) cattle are closer to WAZ cattle as compared to European taurine Holstein. In addition, the WAZ are evolutionary distant to Indian Zebu Brahman. This observation concords with previous studies of WAS indicating they possess admixture with indicine ancestry between 22.7% and 74.1% in Central Africa [26, 27]. Gudali are more closely related to Indian Brahman cattle than White Fulani and Red Fulani (Fig. 3B). The Indian Zebu genes introgression into African Zebu breeds has been reported based on autosomal microsatellite markers, between 55 and 83% [3, 27]. The PCA and RAxML findings presented here show that the evolution of Cameroonian cattle breeds is distant both to Indian Zebu Brahman and European taurine Holstein. The higher number of heterozygous to homozygous variant ratio in Kapsiki (2.5) than in Namchi (Doayo) (1.5) was unexpected, because Kapsiki has been regarded as an indigenous taurine population with highest Zebu gene introgression over the last three decades based on microsatellite data [11, 23]. Namchi (Doayo) and Kapsiki have been classified by Freeman et al. [23] as hybrids rather than pure breeds. The phylogenetic position of Namchi (Doyao) more closely related to Red Fulani than WAS indicated recent Zebu introgression into the genome of Namchi (Doayo). Although the selected Namchi (Doayo) was not different in appearance to the other animals in the region, we cannot exclude whether it has been a product of a recent cross-hybridization with another cattle breed, and thus not representing the pure-breed genome. It is reported that there are still some isolated herds of purebred Namchi (Doayo) cattle in the Poli area, but the present study did not have the tools to screen hybridization levels in the selected animal for whole genome data generation. Such screening would be necessary in the present context where traditional husbandry systems face numerous challenges towards maintaining purely taurine breeds due to rampant cross breeding.
SNPs and Indels functional annotations
The SNPs and Indels were annotated in order to identify the location of the variant in terms of genomic features using snpEFF [28]. In general, all the eight breeds exhibited similar distributions of SNPs and Indels in various genomic annotation categories. Most annotated variants were located in intergenic regions (62%) and introns (27%). The remaining SNPs (11%) were found on downstream genes (4.4%), upstream genes (4.7%), untranslated regions (UTR) (0.5%), missense (0.6%), frameshift (0.02%) and other areas (0.7%) (Fig. 4A).
Breed-specific variants with high impact such as frameshift, splice acceptor, splice donor, start lost and stop gained that may putatively change amino-acids codons are located in and/or close to genes that may lead to functional changes were examined in each chromosome. Overall, 607 genes were identified; 98, 90, 85, 73 and 62 in Red Fulani, Gudali, White Fulani, Kapsiki and Namchi (Doayo), respectively (Additional file 6, Table S1-11). The majority of these genes were widely involved in olfactory receptors, carbohydrate metabolism, transcription regulation, ion binding, nucleotide binding, protein transport, fatty acid metabolism, stress response, regulatory elements, proteolysis and immune responses (Additional file 6, Table S8).
Gene ontology and pathway enrichment analysis of high and moderate impact breed-specific SNPs and Indels variants
Based on the Gene ontology (GO) enrichment analysis of bs-ns SNPs with high and moderated impact, we identified 162 significantly enriched GO terms. The majority of the enriched GO terms were associated with biological processes (n=90, Fig. 4B). “Serine-type endopeptidase inhibitor activity, GO:0004867” terms were shared across all N'Dama, White Fulani and Namchi (Doayo) cattle breeds. “Negative regulation of coagulation, GO:0050819” was shared between the two taurines, Kapsiki and Holstein.
The analysis of GO enrichment from bs-Indels from different cattle breeds identified 50 significantly enriched terms (Fig. 4C), and 41 GO enrichments were associated with biological processes.
The GO terms related to adaptation to the high-altitude environment and heat tolerance were enriched in Namchi (Doayo) and Kapsiki. Also, the GO terms “response to decreased oxygen levels, GO:0036293”, “response to hypoxia, GO:0001666”, “localization of cell, GO:0051674” were enriched in Kapsiki whereas in Namchi (Doayo) the GO terms “cellular response to peptide hormone stimulus, GO:0071375”, “cellular response to peptide, GO:1901653”, “cellular response to stress, GO:0033554” and “cellular response to hormone stimulus, GO:0032870” were evident.
In the African Zebu cattle, GO terms associated with the adaptation to infectious diseases were enriched on immune responses. In Gudali these were the terms “antigen processing and presentation, GO:0019882” and “plasma membrane protein complex, GO:0098797”. In Red Fulani these were the GO terms “acute inflammatory response, GO:0002526”, “inflammatory response, GO:0006954” whereas in White Fulani the GO term “antigen processing and presentation of peptide antigen, GO:0048002” was enriched.
The KEGG pathway analysis identified 31 pathways with at least one SNP in the gene that may explain individual attributes per breed (Fig. 5). Two pathways which carry at least 10 SNPs were discovered in Namchi (Doayo): phagosome and antigen processing and presentation. In Kapsiki: cell adhesion molecules (CAMs) and vascular smooth muscle contraction were found.
Adaptation to tropical climate and high altitude
Adaptation to local environment is multifactorial involving several genes located on different chromosomes and selection [1-3]. To cope with heat, poor feed and high altitude, African cattle have developed behavioral, cellular and physiological mechanism involved in the intensive responses to the mechanical stress, oxygen, food deprivation and homeostasis [29]. During the evolution of Zebu cattle, they have acquired genes for heat-tolerance at the physiological and cellular levels [30]. The superior ability for regulation of body temperature during heat stress is the result of lower metabolic rates as well as increased capacity of heat tolerance. Heat stress also leads to lightening of the coat, because light colored hair coats have a sleek and shiny reflection [30]. However, the lower metabolic rates under heat stress condition are related to reduction in feed intake, milk yield, thyroid hormone secretion, and growth. This finding may explain the lower performance of meat growth in African Zebu cattle as compared to taurine breeds of European descent. Four heat shock factor (HSF) genes (HSF1, HSF2, HSF3, and HSF4) have been isolated in vertebrates, and HSF1, located on chromosome 14, is a master regulator of Heat Shock Protein (HSP70) expression during stress, including heat shock [31]. European taurine Holstein, WAS, WAZ and Indian Zebu Brahman cattle possess distinct patterns of homozygosity and heterozygosity for the SNPs alleles of HSF1 (n= 37 SNPs), HSPA1A (n=22 SNPs), HSPA12B (n= 32 SNPs) and HSPA13 (n=54 SNPs). The heterozygosity alleles in these genes were over represented in WAS and WAZ as compared to Brahman and Holstein. The increased heterozygosity among the African cattle breeds (WAS and WAZ) indicates the combined effects of genetic isolation and long selection history.
Adaptation to tropical pathogens
Stress response, olfactory receptors and immune responses play a critical role in adaptation to the tropical environment and diseases [15, 16]. Mammalian olfactory receptors (ORs) are encoded by the largest mammalian multigene family with more than 1000 genes organized in clusters on 26 cattle chromosomes [32]. They are essential for avoiding danger, food search, reproduction, and behavior [32].
Chemokines play a role in the inflammation that enables the phagocytic leukocytes of the immune system to be the first line of defense against infectious agents like protozoa and helminth parasites [33]. The tolerance of Namchi (Doayo) cattle against trypanosomiasis (trypanotolerance) caused by the protozoan parasites Trypanosoma congolense, T. vivax and T. brucei is actively driven by the innate immune response. IL-12, INF-γ and TNF-α that are primarily produced by cells of the innate immune system would trigger phagocytic cell activation and inflammation, thus contributing to the control of parasites growth [34]. Furthermore, SIGLEC-1 and BOLA are key molecules involved in regulations of the chemokines and cells of innate and adaptive immune responses. Genetic polymorphisms have been linked to resistance and susceptibility to various pathogens. For instance, polymorphisms in BOLA-DRB3 stands for resistance to bovine virus bacteria and parasites infections [35, 36, 40]. Two novel frameshift variants in BoLA-DQB were identified in Namchi (Doayo) and Gudali [Additional file 4, Figure S4]. Such polymorphisms in BoLA class II genes have been associated with viral, bacterial and parasites resistance [35-37]. We found twenty alleles located in BoLA-DQB1 (n=8), BoLA-DQB2 (n=8), BoLA-DQB3 (n=2), BoLA-DQB5 (n=2). Three genotypes were observed as two homozygous (reference and alternative) and one heterozygous. The Namchi (Doayo) cattle carried the highest alternative homozygous alleles in the BoLA-BQB region whereas the Kapsiki possessed the highest heterozygous alleles. IRAK1BP1, sialic acid binding immunoglobuline-like lectin (Siglec), MYO1H and Heat Shock Protein family genes were found carrying mutational SNPs. MYO1H plays roles in cell motility, phagocytosis, and vesicle transport [38] and Siglecs are expressed on various white blood cells of the immune system and are involved in the regulation of innate and adaptive immunity [39]. Studies have shown that many coated sialylated viruses, bacteria and parasites are capable to mimic self-recognition and thus dampen or evade an immune response [39].
We found also one frameshift variant on SIGLEC-1, SIGLEC-11 and SIGLEC-14 genes on Kapsiki, Namchi (Doayo) and White Fulani cattle breeds, respectively.
Breed specific variants with high impact were associated with the quantitative trait loci database CattleQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/BT/index). In Namchi (Doayo) cattle we found four frameshift (rs448373338, rs721512537, rs724126999, rs518575055) and one stop codon gained (rs208021401) variants, on chr 1 regions associated with “Bovine tuberculosis susceptibility QTL (96157)”, one variant in chr 10 (rs524374275) located in the QTLs region associated with Tick resistance QTL (101167) and two variants (rs716221069 and rs458413320) in the regions associated with Longissimus muscle area QTL (56136) and Marbling score QTL (10919). In contrast one new splice donor and one splice acceptor variant (rs523455261) on chr 5 were found in Gudali cattle, and one New start loss on chr 8 associated with Carcass weight QTL (chr 5:11314779-11314819, chr 8:85937078-85937118) and on chr 11 one variant (rs516544521) was located in the QTLs region associated with ovine tuberculosis susceptibility QTL (96344).
Taken together these findings indicate that both, Gudali and Namchi (Doayo) cattle possess genotypes and phenotypes associated with disease susceptibility/resistance, and meat and carcass production. This is in line with previous findings and therefore the high impact variants found in this study are potential markers for genome-wide association studies (GWAS) and should be further investigated.