Whole genome sequencing analysis
Genomic DNA from the cattle breeds Gudali, White Fulani, Red Fulani, Namchi and Kapsiki were sequenced with the Illumina HiSeq4000 sequencing platform and libraries were sequenced using 150-bp paired-end reads. This generated a total of ~840 Gb of raw reads for all analyzed five breed samples together, averaging to ~167 Gb per sample which provides, to the best of our knowledge, the first comprehensive set of high depth, whole genome variant data of these breeds.
The chosen approach of high depth sequencing yielded approximately 109 reads per sample (Table 1A & B) which allowed us to obtain a high coverage per animal tested. However, it also resulted in a relatively low mapping rate for the African cattle breeds ranging from 63% to 65% when aligned to the reference genome UMD3.1 (Table 1A). This low mapping rate could be explained by 1) the PCR-free preparation of sequencing libraries, which implies that bovine DNA and non-bovine DNA such as blood microbes and parasites could have been sequenced at similar rates, or 2) the reference genome is incomplete, or 3) the African cattle breed samples chosen are evolutionarily more distant compared to the reference genome and therefore contain sequences of genomic regions not present in the UMD3.1 cattle reference genome. In order to better understand this, unmapped reads were assembled into contigs using the de novo sequence assembler ABySS and compared against the NCBI Blastn database [Figure 1, Additional file 1 Table S1]. The results obtained from this analysis did not support the hypothesis of microbial or parasitic DNA contamination. Species such as Trichogramma pretiosum in the Brahman control sample, the bacteria Lelliottianimi pressuralis and Enterobacter spp. in White Fulani, Babesia spp. and Theileria spp. cosmopolitan blood parasites of ruminants which are known to inflict diseases were detected in Namchi, but only supported by a very low number of Blastn alignments [Additional file 1 Table S1]. Still, the presence of such organisms in our samples is in line with a recent epizootiological survey in the same indigenous Cameroonian cattle breeds that revealed nearly 90% of animals carried tick-borne bacterial, piroplasmid and protozoan pathogens [18, 19].
Rather, the mapping results indicated that the analyzed breeds are evolutionary more distant compared to the reference genome UMD3.1, or that this genome is not complete. This assumption is supported that Bos mutus was the best scoring result in 65% of the Blastn alignments with a mean sequence identity of 98% across all samples, indicating that most unmapped read contigs were of Bovidae origin, but have not been found in the reference genome UMD3.1. In contrast, Bos taurus and Bos indicus reads were only found in ~3% and ~1% of the Blastn hits, respectively, demonstrating that most of the reads originating from those species were correctly mapped. There were no obvious differences in Blastn results when comparing African Zebu cattle with Namchi and Kapsiki [Figure 1, Additional file 1 Table S1], although it seems conceivable to expect Namchi and Kapsiki taurine breeds rather distinct to the reference genome in comparison to the Zebu cattle. The recently published reference genome assembly ARS-UCD 1.2 (NCBI RefSeq accession GCF_002263795.1), based on the same original animal (Hereford breed UMD3.1) was created by applying a combination of long and short reads for a de novo assembly strategy, and showed a >200-fold improvement in continuity, as well as 10-fold improvement in accuracy and completeness than the previous cattle reference genome [20]. Therefore, this optimized genome was also used as reference to map the reads of the Cameroonian cattle breeds. Interestingly, a very high proportion of raw reads was mapped ranging from 98.9 % for Namchi up to 99.6 % for Red Fulani (Table 1B). Our mapping rates were even higher than reported by Kim et al. [17] from other indigenous East African cattle breeds (Ankole, Boran and Ogaden) and of other cattle re-sequencing studies published [16, 17, 21-23]. Further, the depth of coverage, ranging from 103-fold for Bos taurus Namchi to 140-fold for Bos indicus Zebu Gudali is also considerably high in comparison to 10.8- and 15.8-fold coverage obtained by Kim et al. [17] and Kawahara-Miki et al. [21], respectively. Taylor et al. [22] suggested that about 95 % of the total variants within the genome of cattle are discovered at an average sequence depth of 23.3-fold which implies the data obtained in this study is sufficient to detect SNPs and InDels variants with high confidence.
Variant calling results
A total of ~100 million (M) SNPs were identified in this study of which 7.7 M (~8%) were not found in the 1000 Bulls Genomes Project and considered as novel variants (Table 1B; Figure 2A). On average for each breed, 1.4 M (12%) of the detected variants had small insertions and deletions (InDels, Table 1B). The SNP variants results from Cameroonian cattle were much higher as compared to the 27 M SNPs obtained by Stafuzza et al. [23] on Bos indicus Gyr, Girolando, Gruzerat and Bos taurus Holstein cattle breeds from Brazil, whereas our obtained SNP variants were markedly lower as compared to those reported by Kim et al. [17] on East African zebu (Boran, Ogaden, Kenana) and Sanga (Ankole, taurine/zebu crossbreeds). The ratio of the number of heterozygous to homozygous SNP variants were different across the cattle breeds. Brahman, Holstein and Namchi had the lowest rate, whereas Kapsiki had the highest (Table 1B). The low ratio of heterozygous to homozygous SNPs in Brahman and Namchi cattle could mean that they experience admixture, as reported by Freemann et al. [24] in African taurines from Cameroon.
Genetic variability and similarity across breeds
A total of 1,649,795 SNPs were common across all breeds, and 302,546 SNPs were Zebu-specific, distributed between Brahman, Red Fulani, White Fulani and Gudali cattle breeds (Figure 3). More surprisingly, there were 27,443 SNPs exclusively shared between the European taurine Holstein and WAS taurine N’Dama, Kapsiki and Namchi, apart from 162,940 SNPs which were shared between N’Dama and Kapsiki only. 151,865 SNPs and 163,784 SNPs were shared between Cameroonian taurine (Kapsiki and Namchi) and Zebu (Red Fulani, Gudali and White Fulani), respectively. Furthermore, 170,672 SNPs were common between all tested cattle breeds except Brahman cattle.
In general, we observed a lower genetic diversity in African taurine cattle breeds than in the Cameroonian Bos indicus breeds (Table 1B). The highest proportion of breed-specific (bs) SNPs were found in Bos indicus: Brahman , Red Fulani, Gudali and White Fulani , respectively, and the lowest breed-specific SNPs were found on taurine breeds N’dama, Holstein, Kapsiki and Namchi, respectively (bs-SNPs are color labelled in Figure 3). This apparently lower genetic diversity in African taurine breeds was already earlier argued by Kim et al. [17] who linked it to the low effective population size and/or population bottlenecks following fatal disease outbreaks such as Rinderpest. In contrast, indicine Zebu cattle and composites with larger effective population size exhibit a higher level of nucleotide diversity. Furthermore, the higher nucleotide diversity of taurine Namchi and Kapsiki as compared to N’Dama and Holstein may be due to the long history of Bos indicus introgression [24, 26].
The density of variants per chromosome was proportional to the chromosome length, except for the X chromosomes which showed a lower number of variants identified (Additional file 2 Figure S2). These findings were expected because the DNA of X chromosomes is subject to an increased natural selection, which leads to less genetic diversity [23].
Breed clustering and relationships
The cluster relationship between breeds was analyzed by a principal component analysis (PCA) using all autosomal SNPs (Figure 4A). The first two principal components explain 22 % and 16 % of the total variance, respectively. Except for Namchi, the other WAS breeds N’Dama, and Kapsiki form a separate cluster from WAZ breeds. The WAS breeds N’dama, and Kapsiki are also closer to European taurine Holstein than WAZ breeds, and both WAS and WAZ are clearly separated from Zebu Brahman. This indicates the possibility of admixture events between the West African cattle breeds. To further understand the genetic network among those breeds, a phylogenetic tree analysis (Figure 4B) was carried out with the same autosomal SNPs data as for PCA analysis by using Randomized Accelerated Maximum Likelihood models (RAxML). Again, except for Namchi , the Bos taurus breeds Kapsiki, N’Dama and Holstein cluster together while the B. indicus breeds White Fulani, Gudali, Brahman clustered on a separate clade. The WAS Kapsiki and Namchi cattle are closer to WAZ cattle as compared to European taurine Holstein. In addition, the WAZ are evolutionary distant to Indian Zebu Brahman. This observation concords with previous studies of WAS indicating they possess admixture with indicine ancestry between 22.7 and 74.1 % in Central Africa [27, 28]. Gudali are more closely related to Indian Brahman cattle than White Fulani and Red Fulani (Figure 4B). The Indian Zebu genes introgression into African Zebu breeds has been reported based on autosomal microsatellite markers between 55 and 83 % [3, 28]. The PCA and RaxML findings presented here illustrate the evolution of Cameroonian cattle breeds is distant both to Indian Zebu Brahman and European taurine Holstein. The higher number of heterozygous to homozygous variants ratio in Kapsiki (2.5) than in Namchi (1.5) (Table 1A & B) was unexpected, because Kapsiki has been regarded as an indigenous taurine population with highest Zebu gene introgression over the last three decades based on microsatellite data [11, 24, 26]. Namchi and Kapsiki have been classified by Freeman et al. [24] as hybrids rather than pure breeds. The phylogenetic position of Namchi more closely related to Red Fulani than WAS indicated recent Zebu introgression into the genome of Namchi. Although the selected Namchi was not different in appearance to other animals in the region, we cannot exclude whether it has been a product of a recent cross-hybridization with another cattle breed, and thus not representing the pure-breed genome. It is reported that there are still some isolated herds of purebred Namchi cattle in the Poli area, but the present study did not have the tools to screen hybridization levels in the selected animal for whole genome data generation. Such screening would be necessary in the present context where traditional farming systems face numerous challenges towards maintaining purely taurine breeds due to rampant cross breeding.
Functional annotation and Gene ontology analysis of high and moderate impact breed-specific SNPs and InDels
The SNPs and InDels were annotated in order to identify the location of the variant in terms of genomic features using the tool snpEFF [25]. In general, all the eight breeds exhibited similar distributions of SNPs and InDels in various genomic annotation categories. Most annotated variants were located in intergenic regions (60 %) and introns (30 %). The remaining SNPs were found on downstream genes (4 %), upstream genes (5 %), untranslated regions (UTR) (0.5 %), missense (0.6%), frameshift (0.02%) and other areas (0.7 %) (Figure 2B).
Breed-specific variants with high and moderate impact such as frameshift, missense, splice acceptor, splice donor, start lost and stop gained, that may putatively change amino-acids codons are located in and/or close to genes that may lead to functional changes were examined in each chromosome. Overall, 4349 genes with such mutations were identified: Most of them were from the breeds White Fulani and Kapsiki (both > 2000 genes) and the remaining genes were separated on the other breeds as follows: in Red Fulani 11 genes, in Gudali 8 genes, in Namchi 6 genes, and in the three control samples Brahman 27 genes, Holstein 11 genes, and in N’Dama 10 genes, respectively (Additional file 3 Table S3-S10).
These genes were then used to perform a Gene ontology (GO) enrichment analysis for each breed separately. Fifty-two significantly enriched GO terms were identified (Figure 5) with most significantly enriched terms derived from Kapsiki and White Fulani. Those breeds showed the highest number of genes with bs-SNPs and bs-InDels of high and moderate impact. Interestingly, we found enrichment of GO terms related to adaption to high-altitude environment and heat tolerance in Kapsiki, namely the GO terms “peptidase activity” and “scavenger receptor activity”. In Namchi, although such GO terms were not significantly enriched, genes such as ADAMTSL1, an ADAMTS like gene, OR9G1, an olfactory receptor (OR) and the surfactant protein SFTPD were identified carrying either missense variants or frameshift variants (Additional file 3 Table S3). These genes are often reported in the context of heat stress via their interaction with heat shock proteins, but have been also often reported in the context of wound healing [38-39]. This would imply some genes that contribute to important resistance traits are passed onto hybrid offspring in the Namchi, and might therefore be interesting candidates for further research on increasing trypanotolerance in WAS. In contrast, several enriched GO terms were found in the trypano-susceptible African zebu cattle White and Red Fulani which were absent in the taurine breeds Kapsiki and Namchi. Those GO terms were linked to a negative regulation of wounding or wound healing (Figure 5, for White Fulani) and a positive regulation of the mitochondria and reactive oxygen species (ROS) metabolic processes (Figure 5, for Red Fulani). Mitochondria are an important source of ROS within most mammalian cells. They are also generated at wound sites, and act as long-range signals in wound healing. Hence, controlling genes associated with these GO terms might play a vital role in the adaptation to infectious diseases in Zebu cattle breeds. In Gudali, another trypanosusceptible African zebu cattle breed, no enriched GO terms were found for SNPs and InDels (Figure 5). However, several missense variants of high impact were found on an Interferon-inducible GTPase gene (Additional file 3 Table S7). These GTPases provide host resistance to a variety of viral, bacterial and protozoan pathogens through the sequestration of microbial proteins, manipulation of vesicle trafficking, regulation of antimicrobial autophagy [38, 39], which are all congruent for a significant role in the adaptation to infectious diseases.
Taken together, the functional annotation and Gene Ontology analysis identified breed-specific high and moderate impact variants as genetic traits which could help explaining cattle-breed specific phenotypes, such as heat tolerance in Kapsiki and trypanotolerance in Namchi.
Adaptation to tropical climate and high altitude
Adaptation to local environment is multifactorial involving several genes [1-3]. To cope with heat, poor food and high altitude, African cattle have developed behavioral, cellular and physiological responses to mechanical stress, oxygen, food deprivation and homeostasis [29]. During the evolution of Zebu cattle, they have acquired genes for heat-tolerance at the physiological and cellular levels [30]. The superior ability for regulation of body temperature during heat stress is the result of lower metabolic rates as well as increased capacity of heat tolerance. Heat stress also leads to lightening of the coat, because light colored hair coats have a sleek and shiny reflection [30]. However, the lower metabolic rates under heat stress condition are related to reduction in feed intake, milk yield, thyroid hormone secretion, and growth. This finding may explain the lower performance of meat growth in African Zebu cattle as compared to taurine breeds of European descent. Among many other genes involved in heat stress, four heat shock factor (HSF) genes (HSF1, HSF2, HSF3, and HSF4) have been isolated in vertebrates. HSF1, which is located on chromosome 14, is a master regulator of Heat Shock Protein (HSP70) expression during heat shock [31]. Its interaction with the heat shock proteins HSPA1A/HSP70 or DNAJB1 result in the inhibition of heat shock- and HSF1-induced transcriptional activity during the attenuation and recovery phase from heat shock [32, 33]. European taurine Holstein, WAS, WAZ and Indian Zebu Brahman cattle possess distinct patterns of homozygosity and heterozygosity for the SNP alleles of HSF1. The heterozygosity alleles in these genes were over-represented in WAS and WAZ as compared to Brahman and Holstein. The increased heterozygosity among the African cattle breeds (WAS and WAZ) indicates the combined effects of genetic isolation and long selection history. In addition, when looking at high and moderate impact bs-SNPs only, in the trypanosusceptible breeds Kapsiki (WAS) and White Fulani (WAZ), but not in the trypanotolerant Namchi and N’Dama, mutations in heat shock proteins were found. The observation that some trypanosusceptible Zebu breeds such as White Fulani carry many mutations in heat shock protein encoding genes [see Additional file 3 Table S6] while other trypanosusceptible Zebu breeds such as Red Fulani and Gudali do not carry any mutations in heat shock proteins, could be further investigated in future genomics research of African cattle breeds towards improving heat stress resistance of those cattle breeds.
Adaptation to pathogens
Stress response, olfactory receptors and immune responses play a critical role in adaptation to the tropical environment and diseases [17, 26]. Mammalian olfactory receptors (Ors) are encoded by the largest mammalian multigene family with more than 1000 genes organized in clusters on 26 cattle chromosomes [34]. They are essential for avoiding danger, food search, reproduction, and behaviour [34]. Ors have been linked to heat stress but were also reported to accelerate wound healing [35-38], where chemokines play a critical role by enabling the phagocytic leukocytes of the immune system to be the first line of defense against infectious agents, such as protozoa and helminth parasites [39]. The tolerance of Namchi cattle against trypanosomiasis (trypanotolerance) caused by the protozoan parasites Trypanosoma congolense, T. vivax and T. brucei is actively driven by the innate immune response. IL-12, INF-γ and TNF-α are primarily produced by cells of the innate immune system and trigger phagocytic cell activation and inflammation, thus contributing to the control of parasite growth [40]. Sialic acid binding immunoglobulin-like lectin (SIGLEC) and the major histocompatibility complex (MHC) gene family, also known in cattle as bovine leukocyte antigens (BoLA) genes, are key players involved in the regulation of chemokines and cells of innate and adaptive immune responses. SIGLECs are expressed on various white blood cells of the immune system and are involved in the regulation of innate and adaptive immunity [41]. In contrast, studies have also shown that many coated sialylated viruses, bacteria and parasites are capable to mimic self-recognition and thus dampen or evade an immune response [41].
Genetic polymorphisms in the mentioned genes have been often linked to wounding processes and pathogen resistance. For instance, polymorphisms in BOLA-DRB3 and other BoLA genes were linked to resistance against viral, bacterial and other parasite infections [42 - 45]. In this study, 6 high and moderate impact variants such as frameshift or missense variants were detected in BoLA genes in Kapsiki, and 9 were found in White Fulani. Furthermore, high and moderate impact variants were found in Namchi, located in Ors such as the previously mentioned OR9G1 gene and in the SIGLEC-1 gene in Kapsiki (Additional file 3, Table S3 and S5). In addition, a different level of polymorphisms in the genomic region of the BoLA genes on chromosome 23 of the cattle reference genome was observed for all breeds analyzed (Additional file 4, Figure S11). The trypanosusceptible cattle Kapsiki carried the highest number of homozygous and heterozygous alleles in this region while the other studied trypanotolerant cattle breeds such as Namchi and N’Dama showed a lower level of polymorphism, especially in the BOLA-DYB and BOLA-DOB regions (Additional file 4, Figure S11). However, the results obtained here for Namchi could also be due to previously mentioned possible hybrid status of the selected animal with a recent introgression of Zebu genes into its genome.
The findings provide the first comprehensive set of genome-wide high quality sequencing and variant data of the most important Cameroonian cattle breeds. Although this study was conducted on single samples per breed only, which does not allow us to correctly separate within from across breed variation, we think that the obtained high quality genomic data shall constitute a foundation for breed amelioration whilst exploiting the heritable traits and support conservation efforts for the endangered local cattle breeds.