Primary analysis of mono clone D-loop sequences.
Sanger sequencing was conducted on randomly selected 10 D-loop PCR production mono clones. The sequences alignment was carried out by Clust W in Mega 6.0. Unexpectedly, the obtained mono clone sequences did not show a 100% identity match with each other. The sequencing chromatograms were checked by Chromas, showing clean peaks and valleys in each sequence. No apparent heterozygous peaks were detected (Fig.1). This suggests that variation in D-loop sequences between the mono clones are less likely to have occurred during the sequencing procedure.
Sequencing of a low copy gene and D-loop PCR products
Since a series of experimental manipulation techniques such as DNA extraction, PCR amplification, DNA ligating, transformation, etc. was carried out before sequencing, the clone sequence mutation may have occurred artificially during the experimental procedure. Therefore, two additional controlled experiments were carried out by sequencing clones of a nuclear single copy gene α1-globin in yak and then directly sequencing the D-loop PCR fragments. The clones of yak α1-globin were prepared following the same procedure as the those of the D-loops, and PCR was conducted following Yuan et al. [26]. Sixteen clones were successfully sequenced, with a target length 706bp. Alignment of the sequences showed that 12 clones shared one identical haplotype, while 4 clones shared another. The first haplotype showed 100% homology with the α1-globin sequence described by Yuan [26]. BLAST (Basic Local Alignment Search Tool) searching against the NCBI gene bank yielded a 100% query and identity with the sequences of 5 chromosomes of wild yak (CP027093.1). The second haplotype showed three mutated sites when compared with the first type, among which the mutation sites 254bp and 316bp were strongly consistent with mutation sites of the α2-globin identified by Yuan [26], while site 263bp is a synonymous mutation. Additional sequencing was conducted on the PCR products of the D-loop. Although D-loop sequences were obtained by base calling, checking of the chromatogram demonstrated high noise. In particular, distinct heterozygous peaks were detected along the sequences, strongly suggestive of heterogeneity in the D-loop PCR product. The chromatogram for the randomly-selected 125bp-255bp region revealed three apparent heterozygous peaks at sites 133bp, 146bp, and 171bp which showed C/T, G/A, and G/A respectively (Fig. 2). The alignment between clone sequences uncovered about 10 mutations between 125bp-255bp. Most mutations were detected in only sequenced clone, except those at sites 133bp and 171bp, the alternative bases of which were detected in more than one third of the total clones. By coincidence, the two alternative bases in 133bp and 177bp in the clone sequences were the same as the two heterozygous bases at 133bp and 177bp in the D-loop PCR product sequences (Fig. 1). Even though the heterozygous peaks were apparent at 146bp in the PCR product sequence, no mutation of this site was detected in any clones. This indicates that most of the highly frequent clone sequence mutations can also be detected by checking the heterozygous peaks of the PCR products sequences. The reliably stable sequences of the single copy gene and high frequency of mutations revealed both in clone sequences and PCR product sequences strongly suggest that the highly-variable D-loop clone sequences were obtained by an appropriated procedure.
Analysis of the D-loop sequences of clones from individual breeds
Totally, 25 clones from an individual yak (Y1) were successfully sequenced. The middle regions of the D-loop sequence (with a length of about 607bp) were truncated for further analysis to ensure the accuracy of sequences. The total clone sequences had lengths varying from 605bp to 607bp, with an average of 606.4bp and T, C, A and G contents of 31.2%, 15.8%, 27.9% and 25.0% respectively. Thirty-three Parsim-Informative sites and 30 singletons were analyzed between the sequences. The mean distance between sequences was 99.99%. Twenty-five clone sequences could be ascribed to 22 variants based on similarity, we called them Y1-V1 to Y2-V22. BLAST searching against the NCBI gene bank showed a homology of 98.5~100% with the putative D-loop sequences of the yak. One D-loop variant (Y1-V9), which was commonly shared by 3 clones, showed 100% homology with an accession (DQ139035.1) from the gene bank. Phylogenetic analysis based on neighbor-joining clearly divided all 22 variants into two clusters with a bootstrap value of 100. Cluster 1 included 13 variants of 15 clones, while Cluster 2 included 9 variants of 10 clones (Fig. 3).
To validate variations of the D-loop, we sequenced D-loop clones from another individual yak (Y2). In total, 38 clone sequences were obtained. All the sequences showed similar base compositions to those in Y1. The variable sites were identified as 26 Parsim-Informative sites and 45 singleton sites. Twenty-seven variants were identified by sequence identity in 38 clones, we called them Y2-V1 to Y2-V27. One variant (Y2-V14) was commonly shared by 11 clones, which showed 100% query coverage and identity with 15 D-loop accessions from the NCBI gene bank. Phylogenetic analysis revealed two distinct clusters, which contained 9 variants of 9 clones in Cluster 1 and 18 variants of 28 clones in Cluster 2 (Fig. 4).
Analysis of D-loop sequences from the gene bank
Seventeen and 10 sequences originated from domesticated yaks and wild yaks, respectively, were randomly downloaded from the NCBI gene bank. The same truncated regions as those in yak individuals Y1 and Y2 were used for phylogenetic analysis. The results showed 8 D-loop variants in 17 sequences from domesticated yaks. Eight variants could be divided into two distinct clusters, which contain 6 and 2 variants in each (Fig. 5). Eight variants were uncovered in 10 sequences from wild yaks. Similarly, the variants were distinctly divided into two clusters containing 3 and 5 variants in each (Fig. 6).
Analysis of the integrative D-loop sequences
The downloaded D-loop sequences originated from different yak individuals, and were reported as being obtained by PCR product sequencing. The varied D-loop sequences are conventionally regarded as representing different mitochondrial haplotypes. Both the D-loop clone variants and downloaded D-loop haplotypes were distinctly clustered as two clades. It was expected that the D-loop haplotypes obtained by PCR product sequencing could represent one D-loop variant, due to the biased amplification. Clustering was conducted on the integrative D-loop sequences including clone variants and haplotypes. The results showed that all sourced sequences could be distinctly divided as two clusters. Cluster 1 includes 13 variants of Y1, 9 variants of Y2, 6 haplotypes of domesticated yaks, and 3 haplotypes of wild yaks. Cluster 2 includes 9 variants of Y1, 18 variants of Y2, 2 haplotypes of domesticated yaks, and 5 haplotypes of wild yaks (Fig. 7). As expected, the variants and haplotypes in clusters 1 and 2 were grouped into the respective clusters found by the previous analysis (Fig. 3, 4, 5, and 6). Meanwhile, the clustering revealed an identical sequence commonly shared by variants of Y1-V9 and Y2-V24, and haplotypes of KY807492.1 and FJ548841.1.
Analysis of the consistent differential sites between clusters 1 and 2 of the D-loop sequences
The consistent differential sites of the D-loop sequences were analyzed between the 22 variants of Cluster 1 and 27 variants of Cluster 2 of the clone sequences derived from individuals Y1 and Y2 by alignment of Mega 6. The results showed 13 consistent differential sites between clusters 1 and 2 (Fig. 8). The consistent differential sites are distributed in the regions of 280bp-423bp (Table 1). Further, analysis of the downloaded sequences of the domesticated yaks showed that the 13 sites in clusters 1 and 2 are strongly conserved, with single nucleotide polymorphisms (SNP) at 281bp and 423bp of Cluster 1, and with insertion/deletion (Indel) polymorphisms at 338bp and distinct mutation at 422bp of Cluster 2(Table 1). Analysis of the wild yaks showed that 13 differential sites in Cluster 1 are completely identical to those in the clone sequences, and most of the differential sites are conserved in Cluster 2 with Indel detected at 280bp, 316bp, 338bp, and 422bp (Table 1).
Table 1 The common differentiated sites between Cluster 1 and Cluster 2 of D-loop sequences.
Cluster
|
1
|
2
|
Origin
|
Clones
|
Domesticated Yaks (gene bank)
|
Wild yaks (gene bank)
|
Clones
|
Domesticated Yaks (gene bank)
|
Wild yaks (gene bank)
|
Sequence position (bp)
|
280
|
C
|
C
|
C
|
T
|
T
|
T/-
|
281
|
C
|
C/T
|
C
|
T
|
T
|
T
|
316
|
G
|
G
|
G
|
A
|
A
|
A/-
|
338
|
G
|
G
|
G
|
A
|
A/-
|
A/-
|
346
|
A
|
A
|
A
|
G
|
G
|
G
|
355
|
T
|
T
|
T
|
C
|
C
|
C
|
362
|
A
|
A
|
A
|
G
|
G
|
G
|
363
|
G
|
G
|
G
|
A
|
A
|
G
|
379
|
T
|
T
|
T
|
C
|
C
|
C
|
396
|
T
|
T
|
T
|
C
|
C
|
C
|
416
|
C
|
C
|
C
|
T
|
T
|
T
|
422
|
G
|
G
|
G
|
A
|
G
|
A/-
|
423
|
A
|
A/G
|
A
|
G
|
G
|
G
|
Predicting the recombination between different variants
Predicting the recombination between D-loop variants was conducted by RDP4 [27]. Two recombination events, involving Y1-V10 and Y1-V11 in Event 1 and Y1-V15 and Y1-V16 in Event 2, were identified in D-loop variants of individual yak Y1. Y1-V10 was predicted as a distinct recombinant between Y1-V6 and an unknown parent by two models. Y1-V16 was predicted as a distinct recombinant between Y1-V18 and Y1-V6 by three models. One recombination event, involving Y2-V15,Y2-V16 and Y2-V17, was identified in individual yak Y2. Y2-17 was suggested as a distinct recombinant between Y2-V1 and an unknown parent by three models. Integrative analysis of the total variants in both individuals revealed two recombination events, which involved Y1-V15, Y1-V16 and Y1-V18 in Event 1and Y1-V10, Y1-V11, Y2-V19, Y2-V20 and Y2-V21 in Event 2. Y1-16 was suggested as a distinct recombinant between Y2-V18 and an unknown parent by five models, whereas Y1-V10 was suggested as a recombinant between Y1-V6 and Y2-V18 parent by four models (Table 2).
In total, 11 D-loop variants were identified as being involved in recombination events respective or integrative of Y1 and Y2. Retrospective inspection of the phylogenetic status of those variants in the phylogenetic trees showed that most of the recombinations involved variants that are distinctly divergent with the others in the same clade (Fig. 3, Fig. 4 and Fig. 7). This suggests that the D-loop sequence differentiation may be promoted by the recombination between variants, especially by the recombination between variants belonging to the different clusters.
Table 2 Prediction of recombination between variants of clones in Y1 and Y2.
|
Y1
|
Y2
|
Y1+Y2
|
Event Number
|
1
|
2
|
1
|
1
|
2
|
Found in
|
2
|
3
|
3
|
5
|
4
|
Recombinants
|
Y1-V10
|
Y1-V16
|
Y2-V17
|
Y1-V16
|
Y1-V10
|
Major parent
|
Y1-V6
|
Y1-V18
|
Y2-V1
|
Y2-V18
|
Y1-V6
|
Minor parent
|
unknown
|
Y1-V6
|
unknown
|
unknown
|
Y2-V18
|
Detection methods
|
R
|
-
|
-
|
-
|
-
|
-
|
G
|
-
|
-
|
-
|
+
|
+
|
B
|
-
|
-
|
-
|
+
|
|
M
|
+
|
+
|
+
|
+
|
+
|
C
|
-
|
+
|
+
|
+
|
+
|
S
|
-
|
+
|
-
|
-
|
-
|
T
|
+
|
|
+
|
+
|
+
|