Identification of the members in the DGAT family
To identify the DGAT family members , we used 21 verified DGAT amino acid sequences from bovine (Bos taurus, 3), human (Homo sapiens, 9), mouse (Mus musculus, 7) and rat (Rattus norvegicus, 2) as the query for genome-wide detection of the homologous sequences (Additional file 1). As a result, 24 non-redundant protein sequences encoded by eight DGAT genes (DGATs), including DGAT1, DGAT2, DGAT2L1/MOGAT1, DGAT2L3/AWAT1, DGAT2L4/AWAT2, DGAT2L5/MOGAT2, DGAT2L7/MOGAT3, and DGAT2L6 were identified in Bubalus bubalis (Table 1). In parallel, 15 DGAT protein homologous sequences of these eight DGAT genes, were recognized in Bos taurus (Additional file 2).
The length of amino acid sequences of 24 buffalo DGAT protein isoforms ranged from 282 (DGAT2L4) to 600 (DGAT2L6), and their molecular weight was 32.14 – 68.21kDa that correlated well with the protein length. The value of isoelectric points was higher than 8.0, which indicated that buffalo DGAT proteins containing more basic amino acids than acidic amino acids. Moreover, we detected eight DGAT1 protein isoforms (DGAT1 – DGAT1.7) all contained the Membrane-Bound O-acyl Transferase (MBOAT) conserved domain (Additional file 3). The other seven DGATs produced amino acid sequences mainly harbored the Diacylglycerol Acyltransferase (DAGAT) or Lysophosphatidic Acyltransferase (LPLAT) conserved domain. Results of subcellular localization prediction showed buffalo DGAT proteins all located in the endoplasmic reticulum membrane. Since triacylglycerol biosynthesis occurs mainly at the endoplasmic reticulum, DGAT enzymes in the membrane are conducive to the synthesis of catalytic lipids [23].
Structural features of buffalo DGAT family members
In order to explore the structural characteristics of buffalo DGAT proteins and genes, the conserved motifs and gene structures were projected based on their phylogenetic relationships (Fig. 1). As the results have shown, the buffalo DGAT family members initially categorized into two main clades: DGAT1 and DGAT2 subfamily. Among eight buffalo DGAT genes, DGAT1 gene belongs to DGAT1 subfamily, and the other seven DGATs all assigned to DGAT2 subfamily. In DGAT1 subfamily, five conserved domains including Motif 2, 1, 3, 9, 5, 4, and 6 were shared among the majority protein isoforms of DGAT1, DGAT1.1, DGAT1.2, DGAT1.3, DGAT1.4 and DGAT1.6. The prediction of their gene structures was highly similar in the coding areas, which contained 17 coding sequences (CDSs) and 18 introns. While the length and layout of 3’ untranslated region (UTR) and 5’ UTR were various in the non-coding areas. On the other side, for DGAT2 subfamily members, there were seven conserved motifs (Motif 8, 3, 7, 9, 1, 10, and 5) in almost all of their amino acid sequences. Gene structural analysis discovered, although the introns and UTRs structure varied greatly, the import coding sequences were consistent among the nucleotide sequences of all DGAT2 subfamily genes.
The same conserved patterns among DGAT1 and DGAT2 subfamily proteins were Motif 1, Motif 3, Motif 5 and Motif 9, which composited of 29, 29, 41 and 29 amino acids, respectively (Additional file 4). Meanwhile, Motif 2, 4, and 6 were three unique conserved amino acid sequences of DGAT1 subfamily, and DGAT2 subfamily-specific conserved motifs included in Motif 7, 8, and 10. subsequently, to better discern the structural features of buffalo DGATs, we predicted the transmembrane helixes for eight DGAT enzymes (Additional file 5). The topological studies displayed that DGAT1 subfamily protein contained ten transmembrane domains, and DGAT2 subfamily proteins generally have 3 – 5 transmembrane structures. Besides, DGAT1 protein has N-terminus oriented towards the cytosol with the C-terminal region, which accounts for approximately 50% of the protein, and is present in the endoplasmic reticulum lumen.
Phylogenetic relationship of DGAT proteins in different organisms
To assess evolutionary relationships of DGAT proteins in buffalo and other organisms, we conducted a broad phylogenetic analysis of animals (Bubalus bubalis, Bos taurus, Homo sapiens, Mus musculus, Rattus norvegicus, Capra hircus, Ovis aries, Equus caballus, Chlorocebus aethiops, Danio rerio, Xenopus laevis, Xenopus tropicalis), plants (Arabidopsis thaliana, Oryza sativa subsp. Japonica, Glycine max, Corylus americana) and microbes (Dictyostelium discoideum, Umbelopsis ramanniana, Acinetobacter baylyi). Accordingly, 85 amino acid sequences containing DGAT proteins from different organisms and all protein isoforms identified in buffalo and cattle, were aligned to generate nonrooted Neighbor-Joining (NJ) tree (Fig. 2) and confirmed by Maximum Likelihood (ML) tree (Additional file 6). Both phylogenetic analyses revealed similar topologies and evolutionary partitioning of DGAT family proteins into two major clades: DGAT1 and DGAT2. While the DGAT3 in Arabidopsis thaliana and WAX-DGAT in Acinetobacter baylyi constructed an independent branch.
As shown, DGAT1 clade covered DGAT1 in different organisms, but DGAT2 clade consisted of several clusters, including DGAT2, DGAT2L1, DGAT2L3, DGAT2L4, DGAT2L5, DGAT2L6, and DGAT2L7. Moreover, DGAT1 proteins in animals (buffalo, cattle, goat, sheep, horse, mouse, rat, monkey, and human) clustered separately from that of plants (Arabidopsis, rice, soya bean, and Corylus). Similarly, DGAT2 proteins from animals, plants, and microbes bunched, respectively. Except for DGAT2, other seven DGAT2 family members only existed vertebrate taxa, rather than in an invertebrate. Comparing each member either in DGAT1 or DGAT2 subfamily, the evolutionary relationship between buffalo and cattle was particular closer than buffalo with any other organisms.
Chromosomal distribution and collinearity analysis of DGAT genes
Based on genes mapping information of buffalo chromosome (BBU) and cattle chromosome (BTA), eight buffalo DGAT genes distribute on five chromosomes including BBU2 (1), BBU15 (1), BBU16 (2), BBU24 (1) and BBUX (3), and cattle DGATs located on BTA2 (1), BTA14 (1), BTA15 (2), BTA25 (1) and BTAX (3) (Fig. 3a). Among them, buffalo DGAT2L1 has a similar position with cattle DGAT2L1 on the second chromosome at 163.89 – 163.93 Mb and 110.84 – 110.88 Mb. Whereas the location of DGAT1 was different between two species, at the end region (81.68 – 81.69 Mb) of BBU15 in buffalo, but at the start region (0.60 – 0.61 Mb) of BTA14 in cattle. DGAT2 and DGAT2L5 were tandem genes in buffalo at the location of 29.66 – 29.69 Mb and 29.72 – 29.92 Mb on BBU16. In contrast, positions of these two genes exchanged in cattle on BTA15, that DGAT2L5 and DGAT2 located at 54.98 – 55.12 Mb and 55.16 – 55.19 Mb. What is more, DGAT2L3, DGAT2L4, and DGAT2L6 were three X-chromosome linear genes, which located within 80.20 – 80.40 Mb region for buffalo, and at the region of 61.78 – 61.98 Mb for cattle.
Collinearity analysis of the genome between buffalo and cattle resulted in the identification of 45,021 pairs of collinear genes (Fig. 3b). The syntenic genes blocks almost covered all the chromosomes between buffalo and cattle. Although the number of chromosomes is different between river buffalo (2N = 50) and cattle (2N = 60), a large chromosome homologous existed between two species. BBU1 appears to be a fusion of BTA1 and BTA27, BBU2 equals to BTA2 and BTA 23, BBU3 equals BTA8 and BTA 19, BBU4 equals BTA5 and BTA 28, and BBU5 equals BTA16 and BTA 29 references to the state of collinear banding. Furthermore, the chromosomes where DGAT genes located in, BBU2, BBU15, BBU16, BBU24, and BBUX have one to one match to BBU2, BTA14, BTA15, BTA25, and BTAX, respectively. Among the DGAT family genes, DGAT2L1 was collinear between buffalo and cattle, the other seven pairs of DGAT genes were syntenic with each other.
Haplotype association analyses for buffalo milk production traits
To determine whether there are any DGAT family genes are genetically associated with milk production traits in buffalo, we employed the genotypic and phenotypic datasets of 489 buffalos with 1,424 lactation records reported in our previous study [4]. Based on the genotyping data after quality control, single nucleotide polymorphisms (SNPs) in the 0.5 Mb genomic region upstream and downstream of each DGAT were filtered and used in the present study [24]. As the results, there were 20, 23, 19, and 23 SNPs identified within the DGAT1, DGAT2 and DGAT2L5, DGAT2L7, DGAT2L3, DGAT2L6, and DGAT2L4 (DGAT2Ln) genomic windows, respectively. The linkage disequilibrium (LD) relationships among SNPs were determined to construct haplotype blocks (Figure 4). Therefore, two haplotype blocks (D1.1 and D1.2) recognized in DGAT1 genomic region. The first one spanning 13 Kb consisted of 2 SNPs and the second block with 6 SNPs spanned 152 Kb. In the tandem region of DGAT2 and DGAT2L5, we detected two blocks (D2.1 and D2.2) with a length of 191 Kb (4 SNPs) and 170 Kb (8 SNPs) as well. Within the DGAT2L7 genomic region window, only one haplotype block (D3.1) constructed among three SNPs (45 Kb). For the three X-chromosome linear genes DGAT2L3, DGAT2L6, and DGAT2L4, three haplotype blocks including D4.1, D4.2 and D4.3 built in the combined genomic region. Interestingly, the second block (D4.2) constituent SNP (Affx-79571165) embraced inside of DGAT2L6.
For each haplotype block detected in the DGATs extended regions, we performed haplotype association analyses with six buffalo milk production traits including 270-day adjusted peak milk yield (PM270), total milk yield (MY270), fat yield (FY270), fat percentage (FP270), protein yield (PY270), and protein percentage (PP270) (Table 2). Thus, the D1.2 haplotype block in DGAT1 genomic region displayed to have significant effects on the FP270 and PP270, and the least square mean (LSM) of FP270 (8.02 ± 0.1%) and PP270 (4.51 ± 0.03 %) for diplotype H2H2 were lowerst among all haplotype combinations. The D2.1 haplotype block next to DGAT2 and DGAT2L5 were identified to influence the peak milk yield, total milk yield, protein yield, and protein percentage. For this block, buffalos with H2H3 diplotype showed the lower LSM of milk yield (2705.0 ± 66.4 Kg) and protein yield (128.0 ± 2.9 Kg) but had the higher protein percentage (8.46 ± 0.12%) compared to that of some other diplotypes. For D2.2 block, buffalo individuals had H2H2 diplotypes with the high total milk yield (3080.0 ± 68.3 Kg), fat yield (256.0 ± 6.3 Kg) and protein yield (144.0 ± 3.0 Kg). Within the genomic window of DGAT2L7, the D3.1 haplotype was associated with all of the six milk production traits in buffalo. Moreover, the H3H3 haplotype combination could be the favorable diplotype in the dairy buffalo breeding program, which showed the high peak milk yield (16.6 ± 0.64 Kg), total yield (3248.0 ± 124.0 Kg), fat yield (300.0 ± 11.5 Kg) and protein yield (157.0 ± 5.5 Kg) and had the high milk fat percentage (9.25 ± 0.22%) and protein percentage (4.84 ± 0.06%). Three adjacent haplotype blocks on chromosome X were detected to influence different milk production traits: D4.1 mainly had effect on MY270, FY270 and PP270, and its H1H1 diplotype showed high total milk yield (3138.0 ± 146.0 Kg) and fat yield (260.0 ± 13.2 Kg); D4.2 had effect on FY270, FP270, PY270 and PP270, but its H1H1 diplotype showed low fat yield (220.0 ± 8.8 Kg), fat percentage (7.94 ± 0.17%), protein yield (128.0 ± 4.3 Kg), protein percentage (4.58 ± 0.05%); D4.3 only had some effects on PP270.
Candidate makers affecting buffalo milk production traits
With the single marker and single trait association between SNPs and buffalo milk production traits, we identified the most significant SNP in each objective region (0.5 Mb upstream and downstream of each DGATs) (Fig. 5). Accordingly, a total of 20 SNPs was identified to be associated with different milk production traits at the level of P-value < 10-10. As the most significant SNP in the DGAT1 genomic region, Affx-79549398 presented to have impacts on buffalo milk protein percentage and fat percentage. The other three SNPs, Affx-79540124 (DGAT2 or DGAT2L5 genomic region), Affx-79591356 (DGAT2L7 genomic region) and Affx-79564544 (DGAT2Ln genomic region), all were associated with milk fat yield of buffalo.
For each of the four SNPs, we further calculated the least square mean of the three genotypes affecting the trait, in order to investigate their genetic contribution. At the locus of Affx-79549398 affecting PP270, buffalo individuals with AA (4.90 ± 0.05%) genotype have the significant higher protein percentage than that of AG (4.76 ± 0.04%) and GG (4.67 ± 0.04%) genotypes. Hence, buffalo with AA genotype can be selected to improve the protein percentage in milk effectively. In the studied buffalo population, individuals with AA genotyped only occupied 5% of the total population. The low frequency of the favorable genotype provides more opportunity for improvement in the whole buffalo population. For the Affx-79540124, the 270-day milk fat yield produced by buffalos with TT (250 ± 2.41 Kg) and TG (228 ± 3.30 Kg) were both significantly higher than that of GG genotype (247 ± 1.98 Kg). At the Affx-79591356 locus, individuals with CC genotype (252 ± 2.87 Kg) had a significant higher fat yield in buffalo milk compared with that of TC genotypes (241 ± 2.16 Kg). Conclusively, AX-85063131 affecting FY270 showed that the individuals with TT homozygous genotype showed the highest milk fat yield (255 ± 3.02 Kg) among all three genotypes. These results indicated that G allele at Affx-79540124 locus and T allele at Affx-79564544 seem to be the favorable alleles used to improve the milk fat yield in the buffalo breeding program.