DOI: https://doi.org/10.21203/rs.2.20033/v1
Background
West African indigenous taurine cattle display unique adaptive traits shaped by husbandry management, regional climate and exposure to endemic pathogens. They are less productive with respect to milk and meat production which has been associated with a number of factors, amongst others small size, traditional beliefs and husbandry practices. This resulted in the severe dwindling of their populations size rendering them vulnerable to extinction.
The Namchi (Doayo) taurine cattle breed has documented resistance traits against trypanosome infection and exposure to tick infestation. Nonetheless, the historically later introduced Zebu cattle are the main cattle breeds in Africa today, even though they suffer more from locally prevailing pathogens.
By using a reference-based whole genome sequencing approach, we sequenced for the first time the genomes of five cattle breeds from Cameroon: the Namchi (Doayo), an endangered trypanotolerant taurine breed, the Kapsiki, an indigenous trypanosusceptible taurine breed, and three Zebu (Bos indicus indicus) breeds: Ngaoundere Gudali, White Fulani and Red Fulani.
Results
Approximately 167 Giga bases of raw sequencing data were generated and mapped to the cattle reference genome UMD3.1. The coverage was 22 to 30-fold. The single nucleotide polymorphisms (SNPs) were compared with reference genomes of European Bos taurus Holstein and of Asian Bos indicus Brahman and the African trypanotolerant N’Dama breeds.
Of a total of 50 million SNPs identified, 3.43 million were breed-specific ranging from 0.37 to 0.47 million SNPs in the domestic Cameroonian breeds and approximately 0.58 million constituted of small insertions and deletions. We identified breed specific-non-synonymous variants as genetic traits that could explain certain cattle-breed specific phenotypes such as increased tolerance against trypanosome parasites in the Namchi (Doayo) breed, heat tolerance in the Kapsiki breed, and growth, metabolism and meat quality in the Gudali breeds. Phylogenetic comparison grouped Namchi (Doayo) to the African Zebu clade indicating a hybrid status of the selected animal with a Zebu breed, albeit it showed the Namchi breed’s phenotype.
Conclusions
The findings provide the first comprehensive set of full genome variant data of the most important Cameroonian cattle breeds. The genomic data shall constitute a foundation for breed amelioration whilst exploiting the heritable traits and support conservation efforts for the endangered local cattle breeds.
More than 150 cattle breeds or distinct populations have been recorded in Africa [1, 2]. Their phenotypes cluster into the humpless taurine, the humped Zebu, and the anciently fixed taurine-Zebu crossbreeds known as Sanga in East Africa [3].
In Sub-Saharan Africa, trypanosomiasis (Nagana), dermatophilosis, tick-borne diseases and gastrointestinal helminthiasis are the major endemic diseases affecting cattle productivity [4, 5]. Indigenous local taurine breeds like Doayo (also known as under the Fulani word Namchi) are more resistant or tolerant to most endemic diseases than Zebu cattle [5]. They originated from ancestral aurochs populations Bos primigenius primigenius and B. primigenius opisthonomus from two centers of domestication, namely the Middle East and North Africa, respectively [6, 7].
Today Namchi (Doayo) and Kapsiki are geographically restricted to endemic areas of human and animal trypanosomiasis in Northern Cameroon. Whereas N'dama and Kuri cattle are grouped as residual longhorn Bos taurus longifrons introduced already 10,000 years ago [5, 9], Baoulé, Namchi (Doayo) and Kapsiki belong to the West African Shorthorn (WAS) Bos taurus brachyceros domesticated on the continent some 6,500 years ago [6, 7].
The Kapsiki cattle form a population of approximately 5000 animals that are found mainly in the Mayo Tsanaga (Rhumsiki) area of the Far North region [8]. In contrast, the Namchi (Doayo) cattle have a population size of only 1000 to 2000 heads in the Poli mountains, which are up to1,900-meter-high and surrounding savannah low lands in the Faro division of Cameroon’s North region [9, 10]. They are well adapted to the local environment including endemic parasites like trypanosomes and ticks [9, 11], but of small size and weight, thus economically not interesting for milk and meat production. The usually small herds of 5 to 50 animals which are kept semi-wild, are neither milked nor exploited commercially. They rather play an important role in the traditional culture of local tribes, like dowries, special feasts and rituals. Decades of uncontrolled crossbreeding with Zebu cattle have severely dwindled the gene pool of this taurine cattle population [9]. In 1992, these breeds have been classified by the Food and Agricultural Organization (FAO) as being at risk of becoming extinct [10], hence the conservation of their genetic resources has been highly prioritized. The continuous influx of Zebu genes into the WAS breeds stands to threaten the innate characteristics of trypanotolerance and other disease resistance [3].
Bos indicus Zebu cattle in Africa fall into two distinct groups, the West African Zebu (WAZ) and East African Zebu (EAZ). In Cameroon, 99% of the estimated population of six million cattle are WAZ breeds. They consist of two major sub-types of the Sokoto and Adamawa Gudali [17]. In Central Africa, they have the highest potential for beef and dairy production in comparison to other regional WAZ breeds, like White Fulani and Red Fulani. These Fulani cattle are long-horned and long-legged Zebu cattle and are mainly kept by the nomadic Bororo people [18]. All Zebu breeds were introduced through the Nile-valley and the Horn of Africa around 2,000 years ago. They started to become more widespread about 700 years ago with hamitic migrations in North and East Africa [7, 13] and throughout the Sahel zone south of the Sahara. They arrived in Northern Cameroon, coming from the Bornu (Nigeria today) some 200 years ago. This relatively short time span for evolutionary adaption is reflected by a higher susceptibility to locally endemic diseases and vectors making reliance on veterinary drug interventions essential for their survival.
Better knowledge of unique adaptive traits for locally prevailing pathogens is needed not only for breed conservation, but also for future genetic amelioration of cattle breeds to mitigate food insecurity problems in Africa. Long-term selection pressure has operated on the genomic architecture and on regions that control traits for adaptive fitness [1]. For example, autosomal and Y-chromosomal microsatellites indicate a high level of genetic diversity in African cattle breeds as a consequence of repetitive introgression of Zebu genes into autochthonous taurine genome across the continent. Genome research initiatives, like Bovine Genome Sequencing, HapMap and 1000 Bulls have fostered our understanding of bovine evolution and the complex formation of genetic variants [14-16]. The free availability of cattle reference genomes facilitates whole genome re-sequencing approaches, which are steadily expanding [14-16].
In this study, we report and characterize for the first time the complete genomes of five cattle breeds in Cameroon, namely the endangered taurine trypanotolerant Namchi (Doayo), the trypanosusceptible Kapsiki taurine, and the three Zebu breeds Gudali, White Fulani and Red Fulani, which are all trypanosusceptible. Using the genomic data, 50 million (M) SNPs were identified in this study of which 2.68 M (18.7%) were considered as novels variants. Lower genetic diversity was also found in African taurine cattle breeds than in the Cameroonian Bos indicus breeds. Furthermore, specific-non-synonymous variants were detected such as trypanotolerance in Namchi (Doayo), heat tolerance in Kapsiki, and growth, metabolism and meat quality in Gudali.
Whole genome sequencing, assembly and variant identification
Genomic DNA from the cattle breeds Gudali, White Fulani, Red Fulani, Namchi (Doayo) and Kapsiki were sequenced with the Illumina HiSeq4000 sequencing platform and libraries were sequenced using 150-bp paired-end reads. This generated a total of 835 Gb of raw reads with an average of 167 Gb per sample which provides, to the best of our knowledge, the first comprehensive set of full genome variant data of these breeds. The average genome-wide sequence coverage from the mapped reads ranged from 22.8× for Namchi (Doayo) up to 30.8× for Red Fulani (Table 1). This lies in the range of other cattle re-sequencing studies published [14, 19, 20] whereas the depth of coverage is fairly high in comparison to 10.8 and 15.8-fold coverage obtained by Kim et al. [15] and Kawahara-Miki et al. [19], respectively. Taylor et al. [21] have suggested that about 95% of the total variants within the genome of cattle are discovered at an average sequence depth of 23.3x which implies that the data obtained in this study is sufficient to detect SNPs and Indels variants with high confidence.
The chosen approach of high depth sequencing yielded approximately 109 reads per sample (Table 1) which allowed us to obtain a high coverage per animal tested. However, it also resulted in a relatively low percentage of uniquely mapped reads when aligned to the reference genome (Hereford breed UDM3.1) that was subsequently used for variant calling (ranging from 63% to 65% mapped reads for the Cameroonian cattle breeds, Table 1). This result is consistent with the 60% of uniquely mapped reads by Kawahara-Miki et al. on Japanese Kuchinoshima-Ushi bulls [19]. However, while using the same UMD3.1 cattle reference genome, our mapping rates were markedly lower than the 98.5% reported by Kim et al. [15] from other indigenous East African cattle breeds (Ankole, Boran and Ogaden). Our rather low mapping rate could be explained either by the PCR-free preparation of sequencing libraries in our case which implies that bovine DNA and non-bovine DNA such as blood microbes and parasites could have been sequenced at similar rates or that the African cattle breed samples chosen are evolutionarily more distant compared to the reference genome and therefore contain sequences of genomics regions not present in the UMD3.1 cattle reference genome.
Variant calling results
A total of 50 million (M) SNPs were identified in this study of which 2.68 M (18.7 %) were not found in dbSNP and considered as novels variants (Table 1; Fig. 1A, Supplemental file Fig. S2). Similar results were obtained by Stafuzza et al. [22] on Gyr, Girolando, Gruzerat and Holstein cattle breeds from Brazil. The ratio of the number of heterozygous to homozygous SNP variants were different across the cattle breeds. Brahman and Namchi (Doayo) had the lowest rate, whereas Kapsiki had the highest (Table 1, Supplemental file Fig. S1). The low ratio of heterozygous to homozygous in Brahman and Namchi (Doayo) cattle could mean that they experience admixture, as reported by Freemann et al. [23] in African taurines from Cameroon. On average, 0.58 M (8%) of the detected variants had small insertions and deletions (Indels, Table 1, Fig. 1B).
De novo Assembly and analysis of unmapped genomic sequence reads
In order to better understand the low mapping rate, unmapped reads were assembled into contigs using the de novo sequence assembler ABySS and compared against the NCBI Blastn database. These results did not support the hypothesis of microbial and parasitic DNA contamination that could be sequenced at similar rates as the host DNA using the PCR free library preparation protocol. Rather, it supported the idea that the breeds analyzed here are evolutionary more distant compared to the reference genome. Bos mutus was found as a best scoring Blast results in 65% of the unmapped Blastn alignments in all samples, followed by Ovis canadensis with 17% of the Blastn alignments [Supplemental file 3 Figure S 3, Supplement file 5 Table Supplemental S1]. These findings indicate that the most common sequences of the unmapped read contigs were those of other Bovidae. The mean sequence identity for the Bos mutus Blastn hits was at 98% with an average coverage of 700bp, and 92% sequence identity with an average coverage of 650bp for Ovis canadensis indicating that these reads are derived from Bovidae but have not been found in the reference genome used for read mapping. Bos taurus and Bos indicus were only found in ~3% and 1% of the Blastn hits of the unmapped reads, respectively, which demonstrates that most of the reads originating from Bos taurus and Bos indicus were correctly mapped. We postulate that this high percentage of reads deriving from other Bovidae, might arise due to the evolutionary divergence of Cameroonian cattle breeds to the other investigated breeds. There were no obvious differences in Blastn results found when comparing African Zebu cattle with Namchi (Doayo) and Kapsiki [Supplemental file 3 Figure S3, Table Supplemental S1] although it seems conceivable to expect Namchi (Doayo) and Kapsiki breeds rather distinct to the reference genome when compared to the Zebu cattle. Further investigation using tools that can measure levels of hybridization is needed in order to solve this in the future. Furthermore, the construction of an African breed reference genome or an African pan-genome might help to optimize genome research on African cattle breeds.
Among the species that cover at least 0.5% of the total scoring Blast results, most were of vertebrate origin. Exceptions of the invertebrate kingdom were Trichogramma pretiosum in the Brahman control sample, and the bacteria Lelliottia nimipressuralis and Enterobacter spec in the White Fulani sample (see Supplemental Table S1), albeit all at very low levels (2.4%, 0.8% and 0.6% of the total Blast Scoring results. At even lower rates also Babesia spp., a blood-borne parasite known to cause Texas fever in cattle and Theileria spp., a cosmopolitan blood parasite of cattle and blood-invading bacteria of the Anaplasma genus were also detected in Namchi (Doayo) [see Supplement file 5 Table Supplemental S1]. Although these finding are only supported by a very low number of alignments of assembled contigs to the blastn database, this data is still in line with a recent epizootiological survey in the same indigenous Cameroonian cattle breeds which revealed that nearly 90% of animals were infected with tick-borne bacterial, piroplasmid and protozoan pathogens [24, 25].
Genetic variability and similarity across breeds
The largest number of SNPs was found in Zebu breeds Brahman, Red Fulani, Gudali and White Fulani, respectively. When looking at the SNP distribution across the taurine breeds the lowest numbers were found in Holstein and N’Dama as compared to Kapsiki and Namchi (Doayo) cattle (Table 1). A total of 1,013,395 SNPs were common across all breeds, and 121,776 SNPs were Zebu-specific, distributed between Brahman, Red Fulani, White Fulani and Gudali cattle breeds. More surprisingly, there were no SNPs exclusively shared between the European taurine Holstein and WAS taurine (N’Dama, Kapsiki and Namchi (Doayo), Fig. 2), apart from 73,366 SNPs which were shared between N’Dama and Kapsiki only. Furthermore, 85,307 SNPs were common between all tested cattle breeds except Brahman cattle. The highest proportion of breed-specific (bs) SNPs were found in Bos indicus: Brahman (759,804), Red Fulani (473,688), Gudali (461,043) and White Fulani (420,114), respectively, and the lowest breed-specific SNPs were found on taurine breeds N’dama (220,302), Holstein (328,560), Kapsiki (370,074) and Namchi (Doayo) (402,114), respectively (bs SNPs are color labelled in Fig. 2). This apparently lower genetic diversity in African taurine breeds was already earlier argued by Kim et al. [15], who linked it to the low effective population size and/or population bottlenecks following fatal disease outbreaks such as the Rinderpest. In contrast, indicine Zebu cattle and composites with larger effective population size exhibit a higher level of nucleotide diversity. Furthermore, the higher nucleotide diversity of taurine Namchi (Doayo) and Kapsiki as compared to N’Dama and Holstein may be due to the long history of Bos indicus introgression [23].
The density of variants per chromosome was proportional to the chromosome length, except for the X chromosomes which showed a lower number of variants identified (Supplemental file Fig. S2). These findings were expected because the DNA of X chromosomes is subject to an increased natural selection, which leads to less genetic diversity.
Breed clustering and relationships
The cluster relationship between breeds was analyzed by a principal component analysis (PCA) using all autosomal SNPs (Fig. 3A). The first two principal components explained 22% and 16% of the total variance, respectively. Except for Namchi (Doayo), the other WAS breeds N’Dama, and Kapsiki form a separate cluster from WAZ breeds. The WAS breeds N’dama, and Kapsiki are also closer to European taurine Holstein than WAZ breeds and both, WAS and WAZ are clearly separated from Zebu Brahman. This indicates the possibility of admixture events between the West African cattle breeds. To further understand the genetic network among those breeds a phylogenetic tree analysis (Fig. 3B) was carried out with the same autosomal SNPs data as for PCA analysis by using Randomized Accelerated Maximum Likelihood models (RAxML). Again, except for Namchi (Doayo), the Bos taurus breeds Kapsiki, N’Dama and Holstein cluster together while the B. indicus breeds White Fulani, Gudali, Brahman clustered on a separate clade. The WAS Kapsiki and Namchi (Doayo) cattle are closer to WAZ cattle as compared to European taurine Holstein. In addition, the WAZ are evolutionary distant to Indian Zebu Brahman. This observation concords with previous studies of WAS indicating they possess admixture with indicine ancestry between 22.7% and 74.1% in Central Africa [26, 27]. Gudali are more closely related to Indian Brahman cattle than White Fulani and Red Fulani (Fig. 3B). The Indian Zebu genes introgression into African Zebu breeds has been reported based on autosomal microsatellite markers, between 55 and 83% [3, 27]. The PCA and RAxML findings presented here show that the evolution of Cameroonian cattle breeds is distant both to Indian Zebu Brahman and European taurine Holstein. The higher number of heterozygous to homozygous variant ratio in Kapsiki (2.5) than in Namchi (Doayo) (1.5) was unexpected, because Kapsiki has been regarded as an indigenous taurine population with highest Zebu gene introgression over the last three decades based on microsatellite data [11, 23]. Namchi (Doayo) and Kapsiki have been classified by Freeman et al. [23] as hybrids rather than pure breeds. The phylogenetic position of Namchi (Doyao) more closely related to Red Fulani than WAS indicated recent Zebu introgression into the genome of Namchi (Doayo). Although the selected Namchi (Doayo) was not different in appearance to the other animals in the region, we cannot exclude whether it has been a product of a recent cross-hybridization with another cattle breed, and thus not representing the pure-breed genome. It is reported that there are still some isolated herds of purebred Namchi (Doayo) cattle in the Poli area, but the present study did not have the tools to screen hybridization levels in the selected animal for whole genome data generation. Such screening would be necessary in the present context where traditional husbandry systems face numerous challenges towards maintaining purely taurine breeds due to rampant cross breeding.
SNPs and Indels functional annotations
The SNPs and Indels were annotated in order to identify the location of the variant in terms of genomic features using snpEFF [28]. In general, all the eight breeds exhibited similar distributions of SNPs and Indels in various genomic annotation categories. Most annotated variants were located in intergenic regions (62%) and introns (27%). The remaining SNPs (11%) were found on downstream genes (4.4%), upstream genes (4.7%), untranslated regions (UTR) (0.5%), missense (0.6%), frameshift (0.02%) and other areas (0.7%) (Fig. 4A).
Breed-specific variants with high impact such as frameshift, splice acceptor, splice donor, start lost and stop gained that may putatively change amino-acids codons are located in and/or close to genes that may lead to functional changes were examined in each chromosome. Overall, 607 genes were identified; 98, 90, 85, 73 and 62 in Red Fulani, Gudali, White Fulani, Kapsiki and Namchi (Doayo), respectively (Additional file 6, Table S1-11). The majority of these genes were widely involved in olfactory receptors, carbohydrate metabolism, transcription regulation, ion binding, nucleotide binding, protein transport, fatty acid metabolism, stress response, regulatory elements, proteolysis and immune responses (Additional file 6, Table S8).
Gene ontology and pathway enrichment analysis of high and moderate impact breed-specific SNPs and Indels variants
Based on the Gene ontology (GO) enrichment analysis of bs-ns SNPs with high and moderated impact, we identified 162 significantly enriched GO terms. The majority of the enriched GO terms were associated with biological processes (n=90, Fig. 4B). “Serine-type endopeptidase inhibitor activity, GO:0004867” terms were shared across all N'Dama, White Fulani and Namchi (Doayo) cattle breeds. “Negative regulation of coagulation, GO:0050819” was shared between the two taurines, Kapsiki and Holstein.
The analysis of GO enrichment from bs-Indels from different cattle breeds identified 50 significantly enriched terms (Fig. 4C), and 41 GO enrichments were associated with biological processes.
The GO terms related to adaptation to the high-altitude environment and heat tolerance were enriched in Namchi (Doayo) and Kapsiki. Also, the GO terms “response to decreased oxygen levels, GO:0036293”, “response to hypoxia, GO:0001666”, “localization of cell, GO:0051674” were enriched in Kapsiki whereas in Namchi (Doayo) the GO terms “cellular response to peptide hormone stimulus, GO:0071375”, “cellular response to peptide, GO:1901653”, “cellular response to stress, GO:0033554” and “cellular response to hormone stimulus, GO:0032870” were evident.
In the African Zebu cattle, GO terms associated with the adaptation to infectious diseases were enriched on immune responses. In Gudali these were the terms “antigen processing and presentation, GO:0019882” and “plasma membrane protein complex, GO:0098797”. In Red Fulani these were the GO terms “acute inflammatory response, GO:0002526”, “inflammatory response, GO:0006954” whereas in White Fulani the GO term “antigen processing and presentation of peptide antigen, GO:0048002” was enriched.
The KEGG pathway analysis identified 31 pathways with at least one SNP in the gene that may explain individual attributes per breed (Fig. 5). Two pathways which carry at least 10 SNPs were discovered in Namchi (Doayo): phagosome and antigen processing and presentation. In Kapsiki: cell adhesion molecules (CAMs) and vascular smooth muscle contraction were found.
Adaptation to tropical climate and high altitude
Adaptation to local environment is multifactorial involving several genes located on different chromosomes and selection [1-3]. To cope with heat, poor feed and high altitude, African cattle have developed behavioral, cellular and physiological mechanism involved in the intensive responses to the mechanical stress, oxygen, food deprivation and homeostasis [29]. During the evolution of Zebu cattle, they have acquired genes for heat-tolerance at the physiological and cellular levels [30]. The superior ability for regulation of body temperature during heat stress is the result of lower metabolic rates as well as increased capacity of heat tolerance. Heat stress also leads to lightening of the coat, because light colored hair coats have a sleek and shiny reflection [30]. However, the lower metabolic rates under heat stress condition are related to reduction in feed intake, milk yield, thyroid hormone secretion, and growth. This finding may explain the lower performance of meat growth in African Zebu cattle as compared to taurine breeds of European descent. Four heat shock factor (HSF) genes (HSF1, HSF2, HSF3, and HSF4) have been isolated in vertebrates, and HSF1, located on chromosome 14, is a master regulator of Heat Shock Protein (HSP70) expression during stress, including heat shock [31]. European taurine Holstein, WAS, WAZ and Indian Zebu Brahman cattle possess distinct patterns of homozygosity and heterozygosity for the SNPs alleles of HSF1 (n= 37 SNPs), HSPA1A (n=22 SNPs), HSPA12B (n= 32 SNPs) and HSPA13 (n=54 SNPs). The heterozygosity alleles in these genes were over represented in WAS and WAZ as compared to Brahman and Holstein. The increased heterozygosity among the African cattle breeds (WAS and WAZ) indicates the combined effects of genetic isolation and long selection history.
Adaptation to tropical pathogens
Stress response, olfactory receptors and immune responses play a critical role in adaptation to the tropical environment and diseases [15, 16]. Mammalian olfactory receptors (ORs) are encoded by the largest mammalian multigene family with more than 1000 genes organized in clusters on 26 cattle chromosomes [32]. They are essential for avoiding danger, food search, reproduction, and behavior [32].
Chemokines play a role in the inflammation that enables the phagocytic leukocytes of the immune system to be the first line of defense against infectious agents like protozoa and helminth parasites [33]. The tolerance of Namchi (Doayo) cattle against trypanosomiasis (trypanotolerance) caused by the protozoan parasites Trypanosoma congolense, T. vivax and T. brucei is actively driven by the innate immune response. IL-12, INF-γ and TNF-α that are primarily produced by cells of the innate immune system would trigger phagocytic cell activation and inflammation, thus contributing to the control of parasites growth [34]. Furthermore, SIGLEC-1 and BOLA are key molecules involved in regulations of the chemokines and cells of innate and adaptive immune responses. Genetic polymorphisms have been linked to resistance and susceptibility to various pathogens. For instance, polymorphisms in BOLA-DRB3 stands for resistance to bovine virus bacteria and parasites infections [35, 36, 40]. Two novel frameshift variants in BoLA-DQB were identified in Namchi (Doayo) and Gudali [Additional file 4, Figure S4]. Such polymorphisms in BoLA class II genes have been associated with viral, bacterial and parasites resistance [35-37]. We found twenty alleles located in BoLA-DQB1 (n=8), BoLA-DQB2 (n=8), BoLA-DQB3 (n=2), BoLA-DQB5 (n=2). Three genotypes were observed as two homozygous (reference and alternative) and one heterozygous. The Namchi (Doayo) cattle carried the highest alternative homozygous alleles in the BoLA-BQB region whereas the Kapsiki possessed the highest heterozygous alleles. IRAK1BP1, sialic acid binding immunoglobuline-like lectin (Siglec), MYO1H and Heat Shock Protein family genes were found carrying mutational SNPs. MYO1H plays roles in cell motility, phagocytosis, and vesicle transport [38] and Siglecs are expressed on various white blood cells of the immune system and are involved in the regulation of innate and adaptive immunity [39]. Studies have shown that many coated sialylated viruses, bacteria and parasites are capable to mimic self-recognition and thus dampen or evade an immune response [39].
We found also one frameshift variant on SIGLEC-1, SIGLEC-11 and SIGLEC-14 genes on Kapsiki, Namchi (Doayo) and White Fulani cattle breeds, respectively.
Breed specific variants with high impact were associated with the quantitative trait loci database CattleQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/BT/index). In Namchi (Doayo) cattle we found four frameshift (rs448373338, rs721512537, rs724126999, rs518575055) and one stop codon gained (rs208021401) variants, on chr 1 regions associated with “Bovine tuberculosis susceptibility QTL (96157)”, one variant in chr 10 (rs524374275) located in the QTLs region associated with Tick resistance QTL (101167) and two variants (rs716221069 and rs458413320) in the regions associated with Longissimus muscle area QTL (56136) and Marbling score QTL (10919). In contrast one new splice donor and one splice acceptor variant (rs523455261) on chr 5 were found in Gudali cattle, and one New start loss on chr 8 associated with Carcass weight QTL (chr 5:11314779-11314819, chr 8:85937078-85937118) and on chr 11 one variant (rs516544521) was located in the QTLs region associated with ovine tuberculosis susceptibility QTL (96344).
Taken together these findings indicate that both, Gudali and Namchi (Doayo) cattle possess genotypes and phenotypes associated with disease susceptibility/resistance, and meat and carcass production. This is in line with previous findings and therefore the high impact variants found in this study are potential markers for genome-wide association studies (GWAS) and should be further investigated.
The whole genome of five indigenous Cameroonian cattle Namchi (Doayo), Kapsiki, Gudali, White and Red Fulani was re-sequenced and analyzed for the first time, and compared to the reference genomes of European Bos taurus Holstein, African Bos taurus N’Dama and one Asian Zebu Bos indicus Brahman. A number of gene pathways were identified as potential candidates for improved adaptation to drought and growth. Moreover, the co-identification of growth-related Gene Ontology terms in Gudali and Holstein is of economic importance, which may indicate the potential of Gudali cattle for genetic improvement for milk and fertility traits. This will need several decades of selection experiments using the purebred Gudali. Heat tolerance and trypanotolerance traits are complex mechanism involving several gene pathways located on different chromosomes. In the trypanotolerant breeds Namchi (Doayo) we have identified eight potential gene ontology terms, including glucose-related genes involved in the control of trypanosome proliferation in the bloodstream. All these candidate genes constitute a valuable resource for development and genetic amelioration of tropical cattle breeds particular in Africa. Furthermore, the full sequence data widens our knowledge on the value of native breeds as genetic resources for future cattle breeding, and the power of selection signature analyses.
Sampling, library construction and sequencing
The data used for this paper was obtained from the project “Pathogen detection in African cattle Breeds” Abanda et al., [24] and Paguem et al., [25].
One representative individual of each of the five different cattle breeds was selected (Table 2).
Blood samples of 5 ml volume per animal were collected in ethylene diamine tetra acetic acid (EDTA)-coated vacutainers during the routine examination. The blood was centrifuged at 3000 rpm for 15 minutes. Then, genomic DNA was extracted from the buffy coat (cellular layer including leucocytes, erythrocytes and blood-dwelling parasites like Anaplasma bacteria, piroplasmids, microfilariae of Setaria, trypanosomes and Borrelia spp.(see Additional file 7 Table S1 for trypanosome, Anaplasma bacteria, piroplasmids, Onchocerca filarial and gastro intestinal parasites detected on those animals) using the Wizard Genomic DNA Purification Kit (Promega, Germany) according to the manufacturer’s instructions. DNA isolation and concentration was verified by fluorescent methods using Picogreen (Life Technologies). Libraries were generated from 2 µg of genomic DNA per specimen using the Illumina TruSeq DNA PCR-Free Library Prep Kit (Illumina, San Diego, CA, USA) following the manufacturer’s protocol. 2x 150bp paired-end libraries sequencing was conducted on the Illumina HiSeq4000 platform with the manufacturer’s proprietary TruSeq SBS Kit V3-HS.
Short read mapping, variant calling and annotation
The quality of the generated raw Illumina reads was determined using FastQC software (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Adaptor read sequences were removed using SeqPurge from ngs-bits4 (https://github.com/imgag/ngs-bits, version 0.1-4-gaed0c94). For comparison with other cattle breeds, whole genome raw sequencing data from NCBI Sequence Read Archive SRA was extracted for the breeds Holstein (SRR934414), N’Dama (SRR3693376) and Brahman (SRR6649996). Paired-end reads from the five samples along with these three controls from the SRA archive were mapped against the reference Bos taurus Hereford breed genome UMD3.1 using BWA-MEM version 0.7.10-r789 [41]. Reads that mapped to a single location in the genome (uniquely mapped reads) were selected, and those with multiple region mapping were excluded using the MarkDuplicates tool of Picard5 v.1.137 (http://broadinstitute.github.io/picard). After sequence alignment, the resulting SAM files format were converted to BAM files using Samtools v.1.3 [41]. Then BAM files were sorted and local realignment of reads was performed to correct misalignment due to the presence of small Indels using Genome Analysis Tool Kit 3.1 (GTAK). SNPs and Indels calling were performed using Freebayes v.0.9.21-19-gc003c1e [42]. SNPs and Indels were annotated using snpEFF [28] and Bcftools [41]. To have many of these processes parallelized and automated, a workflow written in the workflow language Snakemake from QBiC was used which is freely available at Github (https://github.com/qbicsoftware/exomseq).
The variants that were identified in only one cattle breed and have no corresponding entries in the dbSNP database were classified as breed-specific novel variants. The average ratios of homozygous versus heterozygous SNPs were calculated for each breed. This ratio is expected to be 1:2 in a freely mating population; therefore, any departure from this condition such as the presence of admixture in the population will be manifested by an increase in the homozygous/heterozygous ratio [43].
Unmapped read analysis
Reads that were not mapped to the reference genome UMD3.1 were extracted from alignment BAM files and sorted by name using Samtools. The sorted BAM files were given as input to ABySS (version 2.1.5) and assembled using the parameter “k=25” indicating k-mer size = 25 in standard de Bruijn graph mode. Resulting contigs.fa files were subdivided into contigs with a length > 500bp. Then the remaining contigs were searched against Blastn database using Nucleotide-Nucleotide BLAST (version 2.8.1+) with the parameters “-num_alignments 1”and “-num_descriptions 1” to show alignments and descriptions for the top 1 matching database match only. The BLAST output was then parsed using the R language (version 3.4.0) to determine for each sample the species of the BLAST hit, the percent identity, length of match and query, and BLAST e-value. Mean values of these statistics were calculated for each species in each sample.
Gene enrichment and functional analysis
Breed-specific non-synonymous (bs-ns) SNPs, Indels with moderate and high impact in the genome and new variants not found in any publicly available database were extracted from WAS and WAZ using the data repositories Ensembl release 76, dbSNP138, Entrez Gene, NCBI and Uniprot. Gene pathway networks analysis was performed using the R (v3.5.2) package clusterProfiler and the Kyoto Encyclopaedia of Genes and Genome (KEGG) database [44]. The variant carrying genes were functionally characterized based on different gene ontology (GO) terms using clusterProfiler (v3.12) R package(v3.5.2) package. To investigate whether bs-ns SNPs and Indels genes were associated with economic traits, the quantitative trait loci database CattleQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/BT/index) was screened using an integrated data warehouse of the bovine genome database web server BovineMine v1.4.
Phylogeny of bovine-related species
To understand the genetic relationships between indigenous cattle breeds and other subfamilies of Bovidae, a principal component analysis (PCA) was performed with EIGENSTRAT. For the phylogenetic tree reconstruction, the variant files were converted to FASTA format with Vcf-kit8 (https://vcf-kit.readthedocs.io/en/latest/). Multiple sequence alignment (MSA) was generated using Muscle with default options [45]. Prottest3 [46] was used to find the best substitution model for the MSA, and Raxml was used to generate the Maximum Likelihood (ML) tree with Blossum62 as best substitution model along with Gamma distribution for rate heterogeneity, estimation for proportion of invariable sites and 100 non-parametric bootstrap replicates using Brahman as out group [47]. Visualization of the tree was generated using ape (v5.3) R-package [48].
KEGG: Kyoto Encyclopaedia of Genes and Genome GO: gene ontology PCA: principal component analysis WAS: West African Shorthorn WAZ: West African Zebu bs-ns: Breed-specific non-synonymous HSPA: Heat Shock 70 KDa protein HSF: heat shock factor ORs: olfactory receptors BoLA: Bovine leucocyte antigen SNPs: single nucleotide polymorphism variants InDels: Insertions and Deletions variants Gb: Giga base pairs
Ethics approval and consent to participate
Permission for the study and ethical approval were obtained from the Scientific Directorate of the Institute of Agricultural Research for Development (IRAD) in Cameroon, which is the country’s government research institution for animal health and livestock husbandry improvement. Furthermore, verbal consent was given by the cattle owners and herdsmen.
Consent for publication
Not applicable
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information files are available from the corresponding author on reasonable request. The five newly sequenced African cattle genomes in this study are publicly available from GenBank with the Bio project accession number (will be uploaded after acceptance of the manuscript).
Competing interests
The authors declare that they have no competing interests.
Funding
Research grants from the Otto Bayer Foundation (F-2013BS522), International Foundation for Science (IFS); Stockholm, Sweden (B/5864-1) and German Research Foundation (DFG, grant no. RE 1536/2) funded the fieldwork data used for this paper from the project “Pathogen detection in African cattle breeds” [Paguem et al., 2019; Abanda et al.,2019], whereas the full genome sequences of five cattle breeds and bioinformatics analysis was funded by the joint RiSC program of the State Ministry of Science, Research and Arts Baden-Württemberg and the University of Tübingen (PSP-no. 4041002616).
Authors’ Contributions
Conceptualization: AE, MDA, AR, SC. Formal analysis: AP, PB, SC. Investigation: AP, BA, MDA. Project administration: AE, Resources: AE, AP, AR. Supervision: AR, AE, MDA.
Writing, review and editing: AP, BA, MDA, AR, PB, SC, AE. All authors read and approved the final manuscript.
Acknowledgements
The authors are indebted to Drs. Madi Palou Aboubakar and Manchang Kingsley from the Wakwa Centre of the Institute of Agricultural Research for Development, and the research staff of the Programme Onchocercoses field station of the University of Tübingen in Ngaoundéré for logistical support and assistance during the fieldwork, and Dr. Fernanda Ruiz-Fadel from the University of Tübingen for proofreading the manuscript.
Table 1. Summary of sequencing results of the genomes of five Cameroonian cattle breeds including the number of total reads and variants called in million (M) reads.
Breeds |
Mapped Reads |
Total Reads |
Mapping rate (%) |
Coverage [x] |
SNPs |
Indels |
Bs- SNPs |
Hom |
Het |
Het/ Hom |
Namchi |
596.3 |
935.3 |
63.7 |
22.8 |
6.31 |
0.53 |
0.40 |
2.51 |
3.80 |
1.5 |
Kapsiki |
743.7 |
1160.6 |
64.1 |
28.6 |
5.40 |
0.47 |
0.37 |
1.55 |
3.85 |
2.5 |
W. Fulani |
707.6 |
1103.1 |
64.1 |
27.2 |
6.42 |
0.55 |
0.42 |
2.29 |
4.13 |
1.8 |
R. Fulani |
716.3 |
1102.2 |
65.0 |
27.6 |
6.70 |
0.57 |
0.47 |
2.15 |
4.55 |
2.1 |
Gudali |
804.9 |
1271.1 |
63.3 |
30.8 |
6.65 |
0.57 |
0.46 |
2.17 |
4.49 |
2.1 |
N’Dama |
154.5 |
282.1 |
54.8 |
4.7 |
4.26 |
0.35 |
0.22 |
1.53 |
2.73 |
1.8 |
Brahman |
146.4 |
177.0 |
82.7 |
5.1 |
7.31 |
0.60 |
0.76 |
2.96 |
4.36 |
1.5 |
Holstein |
255.7 |
460.6 |
55.5 |
7.6 |
3.05 |
0.26 |
0.33 |
1.19 |
1.87 |
1.6 |
The reference genome breed was Hereford (UMD3.1). Whole genome data of the breeds N’Dama, Brahman and Holstein were retrieved from the NCBI archive SRA [Holstein (SRR934414), N’Dama (SRR3693376) and Brahman (SRR6649996)]. Hom = homozygous, Het= heterozygous, Het/ Hom = heterozygous to homozygousratio, W. Fulani = White Fulani; R. Fulani = Red Fulani.Bs-SNPs= breeds specific SNPs
Table 2. Information of the selected animals of Cameroonian cattle breeds for whole genome re-sequencing.
Breed |
Age [years] |
Sex |
Sampling sites |
GPS Coodinates |
|
LW [kg] |
Subspecies |
|
||
|
||||||||||
Region |
Village |
N |
E |
Altitude |
|
|
|
|||
Namchi |
6 |
male |
Faro |
Herko |
8°30'05.1'' |
13°08'28.7'' |
520m |
252 |
Bos taurus brachyceros |
|
(Doayo) |
|
|||||||||
Kapsiki |
5 |
female |
Mayo-Tsanaga |
Rhumsiki/Kila |
10°27'45.5'' |
13°38'22.9'' |
956m |
252 |
Bos taurus brachyceros |
|
W. Fulani |
5 |
female |
Mayo-Rey |
Bini |
07°37'29.6'' |
14°32'10.1'' |
780m |
240 |
Bos indicus indicus |
|
R. Fulani |
5 |
female |
Mayo-Rey |
Bini |
07°37'29.6'' |
14°32'10.1'' |
780m |
313 |
Bos indicus indicus |
|
Gudali |
7 |
female |
Vina |
Galim |
07°12'2.39'' |
13°34'49.70'' |
1050m |
400 |
Bos indicus indicus |
|
W. Fulani = White Fulani; R. Fulani = Red Fulani. LW: Live weight