Whole genome characterization of autochthonous Bos taurus brachyceros and introduced Bos indicus indicus cattle breeds in Cameroon regarding their adaptive phenotypic traits and pathogen resistance

DOI: https://doi.org/10.21203/rs.2.20033/v1



West African indigenous taurine cattle display unique adaptive traits shaped by husbandry management, regional climate and exposure to endemic pathogens. They are less productive with respect to milk and meat production which has been associated with a number of factors, amongst others small size, traditional beliefs and husbandry practices. This resulted in the severe dwindling of their populations size rendering them vulnerable to extinction.

The Namchi (Doayo) taurine cattle breed has documented resistance traits against trypanosome infection and exposure to tick infestation. Nonetheless, the historically later introduced Zebu cattle are the main cattle breeds in Africa today, even though they suffer more from locally prevailing pathogens.

By using a reference-based whole genome sequencing approach, we sequenced for the first time the genomes of five cattle breeds from Cameroon: the Namchi (Doayo), an endangered trypanotolerant taurine breed, the Kapsiki, an indigenous trypanosusceptible taurine breed, and three Zebu (Bos indicus indicus) breeds: Ngaoundere Gudali, White Fulani and Red Fulani.


Approximately 167 Giga bases of raw sequencing data were generated and mapped to the cattle reference genome UMD3.1. The coverage was 22 to 30-fold. The single nucleotide polymorphisms (SNPs) were compared with reference genomes of European Bos taurus Holstein and of Asian Bos indicus Brahman and the African trypanotolerant N’Dama breeds.

Of a total of 50 million SNPs identified, 3.43 million were breed-specific ranging from 0.37 to 0.47 million SNPs in the domestic Cameroonian breeds and approximately 0.58 million constituted of small insertions and deletions. We identified breed specific-non-synonymous variants as genetic traits that could explain certain cattle-breed specific phenotypes such as increased tolerance against trypanosome parasites in the Namchi (Doayo) breed, heat tolerance in the Kapsiki breed, and growth, metabolism and meat quality in the Gudali breeds. Phylogenetic comparison grouped Namchi (Doayo) to the African Zebu clade indicating a hybrid status of the selected animal with a Zebu breed, albeit it showed the Namchi breed’s phenotype.


The findings provide the first comprehensive set of full genome variant data of the most important Cameroonian cattle breeds. The genomic data shall constitute a foundation for breed amelioration whilst exploiting the heritable traits and support conservation efforts for the endangered local cattle breeds.


More than 150 cattle breeds or distinct populations have been recorded in Africa [1, 2]. Their phenotypes cluster into the humpless taurine, the humped Zebu, and the anciently fixed taurine-Zebu crossbreeds known as Sanga in East Africa [3].

In Sub-Saharan Africa, trypanosomiasis (Nagana), dermatophilosis, tick-borne diseases and gastrointestinal helminthiasis are the major endemic diseases affecting cattle productivity [4, 5]. Indigenous local taurine breeds like Doayo (also known as under the Fulani word Namchi) are more resistant or tolerant to most endemic diseases than Zebu cattle [5]. They originated from ancestral aurochs populations Bos primigenius primigenius and B. primigenius opisthonomus from two centers of domestication, namely the Middle East and North Africa, respectively [6, 7].

Today Namchi (Doayo) and Kapsiki are geographically restricted to endemic areas of human and animal trypanosomiasis in Northern Cameroon. Whereas N'dama and Kuri cattle are grouped as residual longhorn Bos taurus longifrons introduced already 10,000 years ago [5, 9], Baoulé, Namchi (Doayo) and Kapsiki belong to the West African Shorthorn (WAS) Bos taurus brachyceros domesticated on the continent some 6,500 years ago [6, 7].

The Kapsiki cattle form a population of approximately 5000 animals that are found mainly in the Mayo Tsanaga (Rhumsiki) area of the Far North region [8]. In contrast, the Namchi (Doayo) cattle have a population size of only 1000 to 2000 heads in the Poli mountains, which are up to1,900-meter-high and surrounding savannah low lands in the Faro division of Cameroon’s North region [9, 10]. They are well adapted to the local environment including endemic parasites like trypanosomes and ticks [9, 11], but of small size and weight, thus economically not interesting for milk and meat production. The usually small herds of 5 to 50 animals which are kept semi-wild, are neither milked nor exploited commercially. They rather play an important role in the traditional culture of local tribes, like dowries, special feasts and rituals. Decades of uncontrolled crossbreeding with Zebu cattle have severely dwindled the gene pool of this taurine cattle population [9]. In 1992, these breeds have been classified by the Food and Agricultural Organization (FAO) as being at risk of becoming extinct [10], hence the conservation of their genetic resources has been highly prioritized. The continuous influx of Zebu genes into the WAS breeds stands to threaten the innate characteristics of trypanotolerance and other disease resistance [3].

Bos indicus Zebu cattle in Africa fall into two distinct groups, the West African Zebu (WAZ) and East African Zebu (EAZ). In Cameroon, 99% of the estimated population of six million cattle are WAZ breeds. They consist of two major sub-types of the Sokoto and Adamawa Gudali [17]. In Central Africa, they have the highest potential for beef and dairy production in comparison to other regional WAZ breeds, like White Fulani and Red Fulani. These Fulani cattle are long-horned and long-legged Zebu cattle and are mainly kept by the nomadic Bororo people [18]. All Zebu breeds were introduced through the Nile-valley and the Horn of Africa around 2,000 years ago. They started to become more widespread about 700 years ago with hamitic migrations in North and East Africa [7, 13] and throughout the Sahel zone south of the Sahara. They arrived in Northern Cameroon, coming from the Bornu (Nigeria today) some 200 years ago. This relatively short time span for evolutionary adaption is reflected by a higher susceptibility to locally endemic diseases and vectors making reliance on veterinary drug interventions essential for their survival.

Better knowledge of unique adaptive traits for locally prevailing pathogens is needed not only for breed conservation, but also for future genetic amelioration of cattle breeds to mitigate food insecurity problems in Africa. Long-term selection pressure has operated on the genomic architecture and on regions that control traits for adaptive fitness [1]. For example, autosomal and Y-chromosomal microsatellites indicate a high level of genetic diversity in African cattle breeds as a consequence of repetitive introgression of Zebu genes into autochthonous taurine genome across the continent. Genome research initiatives, like Bovine Genome Sequencing, HapMap and 1000 Bulls have fostered our understanding of bovine evolution and the complex formation of genetic variants [14-16]. The free availability of cattle reference genomes facilitates whole genome re-sequencing approaches, which are steadily expanding [14-16].

In this study, we report and characterize for the first time the complete genomes of five cattle breeds in Cameroon, namely the endangered taurine trypanotolerant Namchi (Doayo), the trypanosusceptible Kapsiki taurine, and the three Zebu breeds Gudali, White Fulani and Red Fulani, which are all trypanosusceptible. Using the genomic data, 50 million (M) SNPs were identified in this study of which 2.68 M (18.7%) were considered as novels variants. Lower genetic diversity was also found in African taurine cattle breeds than in the Cameroonian Bos indicus breeds. Furthermore, specific-non-synonymous variants were detected such as trypanotolerance in Namchi (Doayo), heat tolerance in Kapsiki, and growth, metabolism and meat quality in Gudali.

Results And Discussion

Whole genome sequencing, assembly and variant identification

Genomic DNA from the cattle breeds Gudali, White Fulani, Red Fulani, Namchi (Doayo) and Kapsiki were sequenced with the Illumina HiSeq4000 sequencing platform and libraries were sequenced using 150-bp paired-end reads. This generated a total of 835 Gb of raw reads with an average of 167 Gb per sample which provides, to the best of our knowledge, the first comprehensive set of full genome variant data of these breeds. The average genome-wide sequence coverage from the mapped reads ranged from 22.8× for Namchi (Doayo) up to 30.8× for Red Fulani (Table 1). This lies in the range of other cattle re-sequencing studies published [14, 19, 20] whereas the depth of coverage is fairly high in comparison to 10.8 and 15.8-fold coverage obtained by Kim et al. [15] and Kawahara-Miki et al. [19], respectively. Taylor et al. [21] have suggested that about 95% of the total variants within the genome of cattle are discovered at an average sequence depth of 23.3x which implies that the data obtained in this study is sufficient to detect SNPs and Indels variants with high confidence.

The chosen approach of high depth sequencing yielded approximately 109 reads per sample (Table 1) which allowed us to obtain a high coverage per animal tested. However, it also resulted in a relatively low percentage of uniquely mapped reads when aligned to the reference genome (Hereford breed UDM3.1) that was subsequently used for variant calling (ranging from 63% to 65% mapped reads for the Cameroonian cattle breeds, Table 1). This result is consistent with the 60% of uniquely mapped reads by Kawahara-Miki et al. on Japanese Kuchinoshima-Ushi bulls [19]. However, while using the same UMD3.1 cattle reference genome, our mapping rates were markedly lower than the 98.5% reported by Kim et al. [15] from other indigenous East African cattle breeds (Ankole, Boran and Ogaden). Our rather low mapping rate could be explained either by the PCR-free preparation of sequencing libraries in our case which implies that bovine DNA and non-bovine DNA such as blood microbes and parasites could have been sequenced at similar rates or that the African cattle breed samples chosen are evolutionarily more distant compared to the reference genome and therefore contain sequences of genomics regions not present in the UMD3.1 cattle reference genome.

Variant calling results

A total of 50 million (M) SNPs were identified in this study of which 2.68 M (18.7 %) were not found in dbSNP and considered as novels variants (Table 1; Fig. 1A, Supplemental file Fig. S2). Similar results were obtained by Stafuzza et al. [22] on Gyr, Girolando, Gruzerat and Holstein cattle breeds from Brazil. The ratio of the number of heterozygous to homozygous SNP variants were different across the cattle breeds. Brahman and Namchi (Doayo) had the lowest rate, whereas Kapsiki had the highest (Table 1, Supplemental file Fig. S1). The low ratio of heterozygous to homozygous in Brahman and Namchi (Doayo) cattle could mean that they experience admixture, as reported by Freemann et al. [23] in African taurines from Cameroon. On average, 0.58 M (8%) of the detected variants had small insertions and deletions (Indels, Table 1, Fig. 1B).

De novo Assembly and analysis of unmapped genomic sequence reads

In order to better understand the low mapping rate, unmapped reads were assembled into contigs using the de novo sequence assembler ABySS and compared against the NCBI Blastn database. These results did not support the hypothesis of microbial and parasitic DNA contamination that could be sequenced at similar rates as the host DNA using the PCR free library preparation protocol. Rather, it supported the idea that the breeds analyzed here are evolutionary more distant compared to the reference genome. Bos mutus was found as a best scoring Blast results in 65% of the unmapped Blastn alignments in all samples, followed by Ovis canadensis with 17% of the Blastn alignments [Supplemental file 3 Figure S 3, Supplement file 5 Table Supplemental S1]. These findings indicate that the most common sequences of the unmapped read contigs were those of other Bovidae. The mean sequence identity for the Bos mutus Blastn hits was at 98% with an average coverage of 700bp, and 92% sequence identity with an average coverage of 650bp for Ovis canadensis indicating that these reads are derived from Bovidae but have not been found in the reference genome used for read mapping. Bos taurus and Bos indicus were only found in ~3% and 1% of the Blastn hits of the unmapped reads, respectively, which demonstrates that most of the reads originating from Bos taurus and Bos indicus were correctly mapped. We postulate that this high percentage of reads deriving from other Bovidae, might arise due to the evolutionary divergence of Cameroonian cattle breeds to the other investigated breeds. There were no obvious differences in Blastn results found when comparing African Zebu cattle with Namchi (Doayo) and Kapsiki [Supplemental file 3 Figure S3, Table Supplemental S1] although it seems conceivable to expect Namchi (Doayo) and Kapsiki breeds rather distinct to the reference genome when compared to the Zebu cattle. Further investigation using tools that can measure levels of hybridization is needed in order to solve this in the future. Furthermore, the construction of an African breed reference genome or an African pan-genome might help to optimize genome research on African cattle breeds.

Among the species that cover at least 0.5% of the total scoring Blast results, most were of vertebrate origin. Exceptions of the invertebrate kingdom were Trichogramma pretiosum in the Brahman control sample, and the bacteria Lelliottia nimipressuralis and Enterobacter spec in the White Fulani sample (see Supplemental Table S1), albeit all at very low levels (2.4%, 0.8% and 0.6% of the total Blast Scoring results. At even lower rates also Babesia spp., a blood-borne parasite known to cause Texas fever in cattle and Theileria spp., a cosmopolitan blood parasite of cattle and blood-invading bacteria of the Anaplasma genus were also detected in Namchi (Doayo) [see Supplement file 5 Table Supplemental S1]. Although these finding are only supported by a very low number of alignments of assembled contigs to the blastn database, this data is still in line with a recent epizootiological survey in the same indigenous Cameroonian cattle breeds which revealed that nearly 90% of animals were infected with tick-borne bacterial, piroplasmid and protozoan pathogens [24, 25].

Genetic variability and similarity across breeds

The largest number of SNPs was found in Zebu breeds Brahman, Red Fulani, Gudali and White Fulani, respectively. When looking at the SNP distribution across the taurine breeds the lowest numbers were found in Holstein and N’Dama as compared to Kapsiki and Namchi (Doayo) cattle (Table 1). A total of 1,013,395 SNPs were common across all breeds, and 121,776 SNPs were Zebu-specific, distributed between Brahman, Red Fulani, White Fulani and Gudali cattle breeds. More surprisingly, there were no SNPs exclusively shared between the European taurine Holstein and WAS taurine (N’Dama, Kapsiki and Namchi (Doayo), Fig. 2), apart from 73,366 SNPs which were shared between N’Dama and Kapsiki only. Furthermore, 85,307 SNPs were common between all tested cattle breeds except Brahman cattle. The highest proportion of breed-specific (bs) SNPs were found in Bos indicus: Brahman (759,804), Red Fulani (473,688), Gudali (461,043) and White Fulani (420,114), respectively, and the lowest breed-specific SNPs were found on taurine breeds N’dama (220,302), Holstein (328,560), Kapsiki (370,074) and Namchi (Doayo) (402,114), respectively (bs SNPs are color labelled in Fig. 2). This apparently lower genetic diversity in African taurine breeds was already earlier argued by Kim et al. [15], who linked it to the low effective population size and/or population bottlenecks following fatal disease outbreaks such as the Rinderpest. In contrast, indicine Zebu cattle and composites with larger effective population size exhibit a higher level of nucleotide diversity. Furthermore, the higher nucleotide diversity of taurine Namchi (Doayo) and Kapsiki as compared to N’Dama and Holstein may be due to the long history of Bos indicus introgression [23].

The density of variants per chromosome was proportional to the chromosome length, except for the X chromosomes which showed a lower number of variants identified (Supplemental file Fig. S2). These findings were expected because the DNA of X chromosomes is subject to an increased natural selection, which leads to less genetic diversity.

Breed clustering and relationships

The cluster relationship between breeds was analyzed by a principal component analysis (PCA) using all autosomal SNPs (Fig. 3A). The first two principal components explained 22% and 16% of the total variance, respectively. Except for Namchi (Doayo), the other WAS breeds N’Dama, and Kapsiki form a separate cluster from WAZ breeds. The WAS breeds N’dama, and Kapsiki are also closer to European taurine Holstein than WAZ breeds and both, WAS and WAZ are clearly separated from Zebu Brahman. This indicates the possibility of admixture events between the West African cattle breeds. To further understand the genetic network among those breeds a phylogenetic tree analysis (Fig. 3B) was carried out with the same autosomal SNPs data as for PCA analysis by using Randomized Accelerated Maximum Likelihood models (RAxML). Again, except for Namchi (Doayo), the Bos taurus breeds Kapsiki, N’Dama and Holstein cluster together while the B. indicus breeds White Fulani, Gudali, Brahman clustered on a separate clade. The WAS Kapsiki and Namchi (Doayo) cattle are closer to WAZ cattle as compared to European taurine Holstein. In addition, the WAZ are evolutionary distant to Indian Zebu Brahman. This observation concords with previous studies of WAS indicating they possess admixture with indicine ancestry between 22.7% and 74.1% in Central Africa [26, 27]. Gudali are more closely related to Indian Brahman cattle than White Fulani and Red Fulani (Fig. 3B). The Indian Zebu genes introgression into African Zebu breeds has been reported based on autosomal microsatellite markers, between 55 and 83% [3, 27]. The PCA and RAxML findings presented here show that the evolution of Cameroonian cattle breeds is distant both to Indian Zebu Brahman and European taurine Holstein. The higher number of heterozygous to homozygous variant ratio in Kapsiki (2.5) than in Namchi (Doayo) (1.5) was unexpected, because Kapsiki has been regarded as an indigenous taurine population with highest Zebu gene introgression over the last three decades based on microsatellite data [11, 23]. Namchi (Doayo) and Kapsiki have been classified by Freeman et al. [23] as hybrids rather than pure breeds. The phylogenetic position of Namchi (Doyao) more closely related to Red Fulani than WAS indicated recent Zebu introgression into the genome of Namchi (Doayo). Although the selected Namchi (Doayo) was not different in appearance to the other animals in the region, we cannot exclude whether it has been a product of a recent cross-hybridization with another cattle breed, and thus not representing the pure-breed genome. It is reported that there are still some isolated herds of purebred Namchi (Doayo) cattle in the Poli area, but the present study did not have the tools to screen hybridization levels in the selected animal for whole genome data generation. Such screening would be necessary in the present context where traditional husbandry systems face numerous challenges towards maintaining purely taurine breeds due to rampant cross breeding.

SNPs and Indels functional annotations

The SNPs and Indels were annotated in order to identify the location of the variant in terms of genomic features using snpEFF [28]. In general, all the eight breeds exhibited similar distributions of SNPs and Indels in various genomic annotation categories. Most annotated variants were located in intergenic regions (62%) and introns (27%). The remaining SNPs (11%) were found on downstream genes (4.4%), upstream genes (4.7%), untranslated regions (UTR) (0.5%), missense (0.6%), frameshift (0.02%) and other areas (0.7%) (Fig. 4A).

Breed-specific variants with high impact such as frameshift, splice acceptor, splice donor, start lost and stop gained that may putatively change amino-acids codons are located in and/or close to genes that may lead to functional changes were examined in each chromosome. Overall, 607 genes were identified; 98, 90, 85, 73 and 62 in Red Fulani, Gudali, White Fulani, Kapsiki and Namchi (Doayo), respectively (Additional file 6, Table S1-11). The majority of these genes were widely involved in olfactory receptors, carbohydrate metabolism, transcription regulation, ion binding, nucleotide binding, protein transport, fatty acid metabolism, stress response, regulatory elements, proteolysis and immune responses (Additional file 6, Table S8).

Gene ontology and pathway enrichment analysis of high and moderate impact breed-specific SNPs and Indels variants

Based on the Gene ontology (GO) enrichment analysis of bs-ns SNPs with high and moderated impact, we identified 162 significantly enriched GO terms. The majority of the enriched GO terms were associated with biological processes (n=90, Fig. 4B). “Serine-type endopeptidase inhibitor activity, GO:0004867” terms were shared across all N'Dama, White Fulani and Namchi (Doayo) cattle breeds. “Negative regulation of coagulation, GO:0050819” was shared between the two taurines, Kapsiki and Holstein.

The analysis of GO enrichment from bs-Indels from different cattle breeds identified 50 significantly enriched terms (Fig. 4C), and 41 GO enrichments were associated with biological processes.

The GO terms related to adaptation to the high-altitude environment and heat tolerance were enriched in Namchi (Doayo) and Kapsiki. Also, the GO termsresponse to decreased oxygen levels, GO:0036293”, “response to hypoxia, GO:0001666”, “localization of cell, GO:0051674” were enriched in Kapsiki whereas in Namchi (Doayo) the GO termscellular response to peptide hormone stimulus, GO:0071375”, “cellular response to peptide, GO:1901653”, “cellular response to stress, GO:0033554” and “cellular response to hormone stimulus, GO:0032870” were evident.

In the African Zebu cattle, GO terms associated with the adaptation to infectious diseases were enriched on immune responses. In Gudali these were the terms “antigen processing and presentation, GO:0019882” and “plasma membrane protein complex, GO:0098797”. In Red Fulani these were the GO terms “acute inflammatory response, GO:0002526”, “inflammatory response, GO:0006954” whereas in White Fulani the GO term “antigen processing and presentation of peptide antigen, GO:0048002” was enriched.

The KEGG pathway analysis identified 31 pathways with at least one SNP in the gene that may explain individual attributes per breed (Fig. 5). Two pathways which carry at least 10 SNPs were discovered in Namchi (Doayo): phagosome and antigen processing and presentation. In Kapsiki: cell adhesion molecules (CAMs) and vascular smooth muscle contraction were found.

Adaptation to tropical climate and high altitude

Adaptation to local environment is multifactorial involving several genes located on different chromosomes and selection [1-3]. To cope with heat, poor feed and high altitude, African cattle have developed behavioral, cellular and physiological mechanism involved in the intensive responses to the mechanical stress, oxygen, food deprivation and homeostasis [29]. During the evolution of Zebu cattle, they have acquired genes for heat-tolerance at the physiological and cellular levels [30]. The superior ability for regulation of body temperature during heat stress is the result of lower metabolic rates as well as increased capacity of heat tolerance. Heat stress also leads to lightening of the coat, because light colored hair coats have a sleek and shiny reflection [30]. However, the lower metabolic rates under heat stress condition are related to reduction in feed intake, milk yield, thyroid hormone secretion, and growth. This finding may explain the lower performance of meat growth in African Zebu cattle as compared to taurine breeds of European descent. Four heat shock factor (HSF) genes (HSF1, HSF2, HSF3, and HSF4) have been isolated in vertebrates, and HSF1, located on chromosome 14, is a master regulator of Heat Shock Protein (HSP70) expression during stress, including heat shock [31]. European taurine Holstein, WAS, WAZ and Indian Zebu Brahman cattle possess distinct patterns of homozygosity and heterozygosity for the SNPs alleles of HSF1 (n= 37 SNPs), HSPA1A (n=22 SNPs), HSPA12B (n= 32 SNPs) and HSPA13 (n=54 SNPs). The heterozygosity alleles in these genes were over represented in WAS and WAZ as compared to Brahman and Holstein. The increased heterozygosity among the African cattle breeds (WAS and WAZ) indicates the combined effects of genetic isolation and long selection history.

Adaptation to tropical pathogens

Stress response, olfactory receptors and immune responses play a critical role in adaptation to the tropical environment and diseases [15, 16]. Mammalian olfactory receptors (ORs) are encoded by the largest mammalian multigene family with more than 1000 genes organized in clusters on 26 cattle chromosomes [32]. They are essential for avoiding danger, food search, reproduction, and behavior [32].

Chemokines play a role in the inflammation that enables the phagocytic leukocytes of the immune system to be the first line of defense against infectious agents like protozoa and helminth parasites [33]. The tolerance of Namchi (Doayo) cattle against trypanosomiasis (trypanotolerance) caused by the protozoan parasites Trypanosoma congolense, T. vivax and T. brucei is actively driven by the innate immune response. IL-12, INF-γ and TNF-α that are primarily produced by cells of the innate immune system would trigger phagocytic cell activation and inflammation, thus contributing to the control of parasites growth [34]. Furthermore, SIGLEC-1 and BOLA are key molecules involved in regulations of the chemokines and cells of innate and adaptive immune responses. Genetic polymorphisms have been linked to resistance and susceptibility to various pathogens. For instance, polymorphisms in BOLA-DRB3 stands for resistance to bovine virus bacteria and parasites infections [35, 36, 40]. Two novel frameshift variants in BoLA-DQB were identified in Namchi (Doayo) and Gudali [Additional file 4, Figure S4]. Such polymorphisms in BoLA class II genes have been associated with viral, bacterial and parasites resistance [35-37]. We found twenty alleles located in BoLA-DQB1 (n=8), BoLA-DQB2 (n=8), BoLA-DQB3 (n=2), BoLA-DQB5 (n=2). Three genotypes were observed as two homozygous (reference and alternative) and one heterozygous. The Namchi (Doayo) cattle carried the highest alternative homozygous alleles in the BoLA-BQB region whereas the Kapsiki possessed the highest heterozygous alleles. IRAK1BP1, sialic acid binding immunoglobuline-like lectin (Siglec), MYO1H and Heat Shock Protein family genes were found carrying mutational SNPs. MYO1H plays roles in cell motility, phagocytosis, and vesicle transport [38] and Siglecs are expressed on various white blood cells of the immune system and are involved in the regulation of innate and adaptive immunity [39]. Studies have shown that many coated sialylated viruses, bacteria and parasites are capable to mimic self-recognition and thus dampen or evade an immune response [39].

We found also one frameshift variant on SIGLEC-1, SIGLEC-11 and SIGLEC-14 genes on Kapsiki, Namchi (Doayo) and White Fulani cattle breeds, respectively.

Breed specific variants with high impact were associated with the quantitative trait loci database CattleQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/BT/index). In Namchi (Doayo) cattle we found four frameshift (rs448373338, rs721512537, rs724126999, rs518575055) and one stop codon gained (rs208021401) variants, on chr 1 regions associated with “Bovine tuberculosis susceptibility QTL (96157)”, one variant in chr 10 (rs524374275) located in the QTLs region associated with Tick resistance QTL (101167) and two variants (rs716221069 and rs458413320) in the regions associated with Longissimus muscle area QTL (56136) and Marbling score QTL (10919). In contrast one new splice donor and one splice acceptor variant (rs523455261) on chr 5 were found in Gudali cattle, and one New start loss on chr 8 associated with Carcass weight QTL (chr 5:11314779-11314819, chr 8:85937078-85937118) and on chr 11 one variant (rs516544521) was located in the QTLs region associated with ovine tuberculosis susceptibility QTL (96344).

Taken together these findings indicate that both, Gudali and Namchi (Doayo) cattle possess genotypes and phenotypes associated with disease susceptibility/resistance, and meat and carcass production. This is in line with previous findings and therefore the high impact variants found in this study are potential markers for genome-wide association studies (GWAS) and should be further investigated.


The whole genome of five indigenous Cameroonian cattle Namchi (Doayo), Kapsiki, Gudali, White and Red Fulani was re-sequenced and analyzed for the first time, and compared to the reference genomes of European Bos taurus Holstein, African Bos taurus N’Dama and one Asian Zebu Bos indicus Brahman. A number of gene pathways were identified as potential candidates for improved adaptation to drought and growth. Moreover, the co-identification of growth-related Gene Ontology terms in Gudali and Holstein is of economic importance, which may indicate the potential of Gudali cattle for genetic improvement for milk and fertility traits. This will need several decades of selection experiments using the purebred Gudali. Heat tolerance and trypanotolerance traits are complex mechanism involving several gene pathways located on different chromosomes. In the trypanotolerant breeds Namchi (Doayo) we have identified eight potential gene ontology terms, including glucose-related genes involved in the control of trypanosome proliferation in the bloodstream. All these candidate genes constitute a valuable resource for development and genetic amelioration of tropical cattle breeds particular in Africa. Furthermore, the full sequence data widens our knowledge on the value of native breeds as genetic resources for future cattle breeding, and the power of selection signature analyses.


Sampling, library construction and sequencing

The data used for this paper was obtained from the project “Pathogen detection in African cattle Breeds” Abanda et al., [24] and Paguem et al., [25].

One representative individual of each of the five different cattle breeds was selected (Table 2).

Blood samples of 5 ml volume per animal were collected in ethylene diamine tetra acetic acid (EDTA)-coated vacutainers during the routine examination. The blood was centrifuged at 3000 rpm for 15 minutes. Then, genomic DNA was extracted from the buffy coat (cellular layer including leucocytes, erythrocytes and blood-dwelling parasites like Anaplasma bacteria, piroplasmids, microfilariae of Setaria, trypanosomes and Borrelia spp.(see Additional file 7 Table S1 for trypanosome, Anaplasma bacteria, piroplasmids, Onchocerca filarial and gastro intestinal parasites detected on those animals) using the Wizard Genomic DNA Purification Kit (Promega, Germany) according to the manufacturer’s instructions. DNA isolation and concentration was verified by fluorescent methods using Picogreen (Life Technologies). Libraries were generated from 2 µg of genomic DNA per specimen using the Illumina TruSeq DNA PCR-Free Library Prep Kit (Illumina, San Diego, CA, USA) following the manufacturer’s protocol. 2x 150bp paired-end libraries sequencing was conducted on the Illumina HiSeq4000 platform with the manufacturer’s proprietary TruSeq SBS Kit V3-HS.

Short read mapping, variant calling and annotation

The quality of the generated raw Illumina reads was determined using FastQC software (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Adaptor read sequences were removed using SeqPurge from ngs-bits4 (https://github.com/imgag/ngs-bits, version 0.1-4-gaed0c94). For comparison with other cattle breeds, whole genome raw sequencing data from NCBI Sequence Read Archive SRA was extracted for the breeds Holstein (SRR934414), N’Dama (SRR3693376) and Brahman (SRR6649996). Paired-end reads from the five samples along with these three controls from the SRA archive were mapped against the reference Bos taurus Hereford breed genome UMD3.1 using BWA-MEM version 0.7.10-r789 [41]. Reads that mapped to a single location in the genome (uniquely mapped reads) were selected, and those with multiple region mapping were excluded using the MarkDuplicates tool of Picard5 v.1.137 (http://broadinstitute.github.io/picard). After sequence alignment, the resulting SAM files format were converted to BAM files using Samtools v.1.3 [41]. Then BAM files were sorted and local realignment of reads was performed to correct misalignment due to the presence of small Indels using Genome Analysis Tool Kit 3.1 (GTAK). SNPs and Indels calling were performed using Freebayes v.0.9.21-19-gc003c1e [42]. SNPs and Indels were annotated using snpEFF [28] and Bcftools [41]. To have many of these processes parallelized and automated, a workflow written in the workflow language Snakemake from QBiC was used which is freely available at Github (https://github.com/qbicsoftware/exomseq).

The variants that were identified in only one cattle breed and have no corresponding entries in the dbSNP database were classified as breed-specific novel variants. The average ratios of homozygous versus heterozygous SNPs were calculated for each breed. This ratio is expected to be 1:2 in a freely mating population; therefore, any departure from this condition such as the presence of admixture in the population will be manifested by an increase in the homozygous/heterozygous ratio [43].

Unmapped read analysis

Reads that were not mapped to the reference genome UMD3.1 were extracted from alignment BAM files and sorted by name using Samtools. The sorted BAM files were given as input to ABySS (version 2.1.5) and assembled using the parameter “k=25” indicating k-mer size = 25 in standard de Bruijn graph mode. Resulting contigs.fa files were subdivided into contigs with a length > 500bp. Then the remaining contigs were searched against Blastn database using Nucleotide-Nucleotide BLAST (version 2.8.1+) with the parameters “-num_alignments 1”and “-num_descriptions 1” to show alignments and descriptions for the top 1 matching database match only. The BLAST output was then parsed using the R language (version 3.4.0) to determine for each sample the species of the BLAST hit, the percent identity, length of match and query, and BLAST e-value. Mean values of these statistics were calculated for each species in each sample.

Gene enrichment and functional analysis

Breed-specific non-synonymous (bs-ns) SNPs, Indels with moderate and high impact in the genome and new variants not found in any publicly available database were extracted from WAS and WAZ using the data repositories Ensembl release 76, dbSNP138, Entrez Gene, NCBI and Uniprot. Gene pathway networks analysis was performed using the R (v3.5.2) package clusterProfiler and the Kyoto Encyclopaedia of Genes and Genome (KEGG) database [44]. The variant carrying genes were functionally characterized based on different gene ontology (GO) terms using clusterProfiler (v3.12) R package(v3.5.2) package. To investigate whether bs-ns SNPs and Indels genes were associated with economic traits, the quantitative trait loci database CattleQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/BT/index) was screened using an integrated data warehouse of the bovine genome database web server BovineMine v1.4.

Phylogeny of bovine-related species

To understand the genetic relationships between indigenous cattle breeds and other subfamilies of Bovidae, a principal component analysis (PCA) was performed with EIGENSTRAT. For the phylogenetic tree reconstruction, the variant files were converted to FASTA format with Vcf-kit8 (https://vcf-kit.readthedocs.io/en/latest/). Multiple sequence alignment (MSA) was generated using Muscle with default options [45]. Prottest3 [46] was used to find the best substitution model for the MSA, and Raxml was used to generate the Maximum Likelihood (ML) tree with Blossum62 as best substitution model along with Gamma distribution for rate heterogeneity, estimation for proportion of invariable sites and 100 non-parametric bootstrap replicates using Brahman as out group [47]. Visualization of the tree was generated using ape (v5.3) R-package [48].


KEGG: Kyoto Encyclopaedia of Genes and Genome GO: gene ontology PCA: principal component analysis WAS: West African Shorthorn WAZ: West African Zebu bs-ns: Breed-specific non-synonymous HSPA: Heat Shock 70 KDa protein HSF: heat shock factor ORs: olfactory receptors BoLA: Bovine leucocyte antigen SNPs: single nucleotide polymorphism variants InDels: Insertions and Deletions variants Gb: Giga base pairs


Ethics approval and consent to participate

Permission for the study and ethical approval were obtained from the Scientific Directorate of the Institute of Agricultural Research for Development (IRAD) in Cameroon, which is the country’s government research institution for animal health and livestock husbandry improvement. Furthermore, verbal consent was given by the cattle owners and herdsmen.

Consent for publication

Not applicable

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files are available from the corresponding author on reasonable request. The five newly sequenced African cattle genomes in this study are publicly available from GenBank with the Bio project accession number (will be uploaded after acceptance of the manuscript).

Competing interests

The authors declare that they have no competing interests.


Research grants from the Otto Bayer Foundation (F-2013BS522), International Foundation for Science (IFS); Stockholm, Sweden (B/5864-1) and German Research Foundation (DFG, grant no. RE 1536/2) funded the fieldwork data used for this paper from the project “Pathogen detection in African cattle breeds” [Paguem et al., 2019; Abanda et al.,2019], whereas the full genome sequences of five cattle breeds and bioinformatics analysis was funded by the joint RiSC program of the State Ministry of Science, Research and Arts Baden-Württemberg and the University of Tübingen (PSP-no. 4041002616).

Authors’ Contributions

Conceptualization: AE, MDA, AR, SC. Formal analysis: AP, PB, SC. Investigation: AP, BA, MDA. Project administration: AE, Resources: AE, AP, AR. Supervision: AR, AE, MDA.

Writing, review and editing: AP, BA, MDA, AR, PB, SC, AE. All authors read and approved the final manuscript.


The authors are indebted to Drs. Madi Palou Aboubakar and Manchang Kingsley from the Wakwa Centre of the Institute of Agricultural Research for Development, and the research staff of the Programme Onchocercoses field station of the University of Tübingen in Ngaoundéré for logistical support and assistance during the fieldwork, and Dr. Fernanda Ruiz-Fadel from the University of Tübingen for proofreading the manuscript.


  1. Mwai O, Hanotte O, Kwon YJ, Cho S: African Indigenous Cattle: Unique Genetic Resources in a Rapidly Changing World. Asian-Australas J Anim Sci 2015, 28(7):911-921.
  2. Rege J, Hanotte O, Mamo Y, Asrat B, Dessie T: Domestic Animal Genetic Resources Information System (DAGRIS) Addis Ababa: International Livestock Research Institute; 2007.
  3. Hanotte O, Bradley DG, Ochieng JW, Verjee Y, Hill EW, Rege JE: African pastoralism: genetic imprints of origins and migrations. Science 2002, 296(5566):336-339.
  4. Achukwi MD, Tanya VN, Messine O, Njongmeta LM: Etude comparative de l’infestation des bovins Namchi (Bos taurus) et Goudali de Ngaoundere (Bos indicus) par des tiques adultes Amblyomma variegatum. Rev Elev Méd Vét Pays Trop 2001, 54 (1):37-41.
  5. Awa DN, Achukwi MD: Livestock pathology in the central African region: some epidemiological considerations and control strategies. Anim Health Res Rev 2010, 11(2):235-244.
  6. Epstein H: The Origin of the Domestic Animals of Africa. New York, NY; London; Munich: Africana Publishing Corporation; 1971.
  7. Loftus RT, Ertugrul O, Harba AH, El-Barody MA, MacHugh DE, Park SD, Bradley DG: A microsatellite survey of cattle from a centre of origin: the Near East. Mol Ecol 1999, 8(12):2015-2022.
  8. Dineur B, Thys E: Les Kapsiki: race taurine de l'extrême nord camerounais. I. Introduction et barymétrie. Rev Elev Méd Vét Pays Trop 1986, 39(3-4):435-442.
  9. Achukwi MD, Ibeagha-Awemu EM, Musongong GA, Erhardt G: Doayo (Namchi) Bos taurus cattle with low Zebu attributes are trypanotolerant under natural vector challenge, Online J Vet Res 2009, 13 (1):94-105.
  10. FAO: The management of global animal genetic resources. Rome: Anim Prod Health; 1992.
  11. Achukwi MD, Tanya VN, Hill EW, Bradley DG, Meghen C, Sauveroche B, Banser JT, Ndoki JN: Susceptibility of the Namchi and Kapsiki cattle of Cameroon to trypanosome infection. Trop Anim Health Prod 1997, 29(4):219-226.
  12. Fréchou H: L'élevage et le commerce du bétail dans le nord du Cameroun. In Série Sciences Humaines. Edited by ORSTOM Cdl. Cahiers d’outre-mer. Bordeaux: Ateliers de l'Union Française d'Impression; 1966: 319-320.
  13. Bradley DG, MacHugh DE, Cunningham P, Loftus RT: Mitochondrial diversity and the origins of African and European cattle. Proc Natl Acad Sci USA 1996, 93(10):5131-5135.
  14. Taye M, Lee W, Caetano-Anolles K, Dessie T, Hanotte O, Mwai OA, Kemp S, Cho S, Oh SJ, Lee HK et al: Whole genome detection of signature of positive selection in African cattle reveals selection for thermotolerance. Anim Sci J 2017, 88(12):1889-1901.
  15. Kim J, Hanotte O, Mwai OA, Dessie T, Bashir S, Diallo B, Agaba M, Kim K, Kwak W, Sung S et al: The genome landscape of indigenous African cattle. Genome Biol 2017, 18(1):34.
  16. Bahbahani H, Tijjani A, Mukasa C, Wragg D, Almathen F, Nash O, Akpa GN, Mbole-Kariuki M, Malla S, Woolhouse M et al: Signatures of Selection for Environmental Adaptation and Zebu x Taurine Hybrid Fitness in East African Shorthorn Zebu. Front Genet 2017, 8(68):68.
  17. Tawah CL, Rege JEO: Gudali cattle of West and Central Africa. In: Animal Genetic Resources Information. Rome, Italy: FAO; 1994.
  18. Tawah CL, Rege J: White Fulani cattle of West and Central Africa. Anim Gen Res Info 1996, 17:127-145.
  19. Kawahara-Miki R, Tsuda K, Shiwa Y, Arai-Kichise Y, Matsumoto T, Kanesaki Y, Oda S, Ebihara S, Yajima S, Yoshikawa H et al: Whole-genome resequencing shows numerous genes with nonsynonymous SNPs in the Japanese native cattle Kuchinoshima-Ushi. BMC Genomics 2011, 12(103):103.
  20. Das A, Panitz F, Gregersen VR, Bendixen C, Holm LE: Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes. BMC Genomics 2015, 16:1043.
  21. Taylor JF, Whitacre LK, Hoff JL, Tizioto PC, Kim J, Decker JE, Schnabel RD: Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals. Genet Sel Evol 2016, 48(1):59.
  22. Stafuzza NB, Zerlotini A, Lobo FP, Yamagishi ME, Chud TC, Caetano AR, Munari DP, Garrick DJ, Machado MA, Martins MF et al: Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds. PLoS One 2017, 12(3): e0173954.
  23. Freeman AR, Meghen CM, MacHugh DE, Loftus RT, Achukwi MD, Bado A, Sauveroche B, Bradley DG: Admixture and diversity in West African cattle populations. Mol Ecol 2004, 13(11):3477-3487.
  24. Abanda B, Paguem A, Mamoudou A, Manchang TK, Renz A, Eisenbarth A: Molecular identification and prevalence of tick-borne pathogens in Zebu and taurine cattle in North Cameroon. Parasites Vectors 2019.12:448.
  25. Paguem A, Abanda B, Ndjonka D, Weber JS, Ngomtcho SCH, Manchang TK, Mamoudou A, Eisenbarth A, Renz A, Kelm S, Achukwi MD: Widespread co-endemic occurrence of Trypanosoma species infecting cattle in the Sahel and Guinea Savannah zones of Cameroon. BMC Vet Research 2019.15:344.
  26. Decker JE, McKay SD, Rolf MM, Kim J, Molina Alcala A, Sonstegard TS, Hanotte O, Gotherstrom A, Seabury CM, Praharani L et al: Worldwide patterns of ancestry, divergence, and admixture in domesticated cattle. PLoS Genet 2014, 10(3):e1004254.
  27. MacHugh DE, Shriver MD, Loftus RT, Cunningham P, Bradley DG: Microsatellite DNA variation and the evolution, domestication and phylogeography of taurine and Zebu cattle (Bos taurus and Bos indicus). Genetics 1997, 146(3):1071-1086.
  28. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012, 6(2):80-92.
  29. Erb L, Weisman GA: Coupling of P2Y receptors to G proteins and other signaling pathways. Wiley Interdiscip Rev Membr Transp Signal 2012, 1(6):789-803.
  30. Hansen PJ: Physiological and cellular adaptations of Zebu cattle to thermal stress. Anim Reprod Sci 2004, 82-83:349-360.
  31. Fujimoto M, Nakai A: The heat shock factor family and adaptation to proteotoxic stress. FEBS J 2010, 277(20):4112-4125.
  32. Lee K, Nguyen DT, Choi M, Cha SY, Kim JH, Dadi H, Seo HG, Seo K, Chun T, Park C: Analysis of cattle olfactory subgenome: the first detail study on the characteristics of the complete olfactory receptor repertoire of a ruminant. BMC Genomics 2013, 14:596.
  33. Murdoch C, Finn A: Chemokine receptors and their role in inflammation and infectious diseases. Blood 2000, 95(10):3032-3043.
  34. Abrahamsohn IA: Cytokines in innate and acquired immunity to Trypanosoma cruzi infection. Braz J Med Biol Res 1998, 31(1):117-121.
  35. Takeshima S, Aida Y: Structure, function and disease susceptibility of the bovine major histocompatibility complex. Anim Science J 2006, 77(2):138-150.
  36. Takeshima S, Matsumoto Y, Chen J, Yoshida T, Mukoyama H, Aida Y: Evidence for cattle major histocompatibility complex (BoLA) class II DQA1 gene heterozygote advantage against clinical mastitis caused by Streptococci and Escherichia species. Tissue Antigens 2008, 72(6):525-531.
  37. Untalan PM, Pruett JH, Steelman CD: Association of the bovine leukocyte antigen major histocompatibility complex class II DRB3*4401 allele with host resistance to the Lone Star tick, Amblyomma americanum. Vet Parasitol 2007, 145(1-2):190-195.
  38. Tassopoulou-Fishell M, Deeley K, Harvey EM, Sciote J, Vieira AR: Genetic variation in myosin 1H contributes to mandibular prognathism. Am J Orthod Dentofacial Orthop 2012, 141(1):51-59.
  39. Paulson JC, Macauley MS, Kawasaki N: Siglecs as sensors of self in innate and adaptive immune responses. Ann N Y Acad Sci 2012, 1253:37-48.
  40. Xu A, Van Eijk M, Park C, Lewin HA: Polymorphism in BoLADRB3 exon 2 correlates with resistance to persistent lymphocytosis caused by bovine leukemia virus. J Immunol 1993, 151(12):6977–6985.
  41. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
  42. Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing. arXiv Preprint; 2012. Available from: http://arxiv.org/abs/1207.3907.
  43. Scally A, Yngvadottir B, Xue Y, Ayub Q, Durbin R, Tyler-Smith C: A genome-wide survey of genetic variation in gorillas using reduced representation sequencing. PLoS One 2013, 8(6):e65066.
  44. Yu G, Wang LG, Han Y, He QY: clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 2012, 16(5):284-287.
  45. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792-1797.
  46. Darriba D, Taboada GL, Doallo R, Posada D: ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 2011, 27(8):1164-1165.
  47. Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30(9):1312-1313.
  48. Paradis E, Claude J, Strimmer K: APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 2004, 20(2):289-290.


Table 1. Summary of sequencing results of the genomes of five Cameroonian cattle breeds including the number of total reads and variants called in million (M) reads.


Mapped Reads

Total Reads 

Mapping rate (%)

Coverage [x]



Bs- SNPs



Het/ Hom























W. Fulani











R. Fulani























































The reference genome breed was Hereford (UMD3.1). Whole genome data of the breeds N’Dama, Brahman and Holstein were retrieved from the NCBI archive SRA [Holstein (SRR934414), N’Dama (SRR3693376) and Brahman (SRR6649996)]Hom = homozygous, Het= heterozygous, Het/ Hom = heterozygous to homozygousratio, W. Fulani = White Fulani; R. Fulani = Red Fulani.Bs-SNPs= breeds specific SNPs


Table 2. Information of the selected animals of Cameroonian cattle breeds for whole genome re-sequencing.


Age [years]


Sampling sites

GPS Coodinates


LW [kg] 


















Bos taurus brachyceros











Bos taurus brachyceros

W. Fulani









Bos indicus indicus

R. Fulani









Bos indicus indicus 










Bos indicus indicus 

W. Fulani = White Fulani; R. Fulani = Red Fulani. LW: Live weight