Alterations in the genome sequences of the newly synthesized autotetraploids
A total of four individual autotetraploid C. nankingense plants were employed in the present study. Ten AFLP primer combinations were used to amplify 422 fragments from diploid C. nankingense. The expectation was that autopolyploidization would have no effect on the AFLP genotype of autotetraploid compared with diploid species; however, the AFLP profiles revealed an obvious fragment variation in autotetraploid lines. For example, in autotetraploid line 1 (T1), the AFLP profiles consisted of 415 fragments, of which 6 novel fragments were observed, and 13 fragments present in diploid species were not detected following autopolyploidization (Fig. 1). In addition, the autotetraploid lines 2/3/4 consisted of 418/415/416 fragments, of which 3/5/3 fragments were not present in diploid species, and 7/12/9 fragments were lost in the three autotetraploid lines, respectively.
To obtain additional information, the 17 novel fragments specific to the autotetraploid profile were recycled and sequenced. Among these fragments, seven yielded poor sequence quality, and only ten fragments were successfully sequenced. We compared these ten sequences with known genes for general similarities and the presence of specific motifs in the NCBI database. The blast search revealed that only one fragment (belonging to autotetraploid (T1)) showed high similarity with a known rice gene (E-value = 6e-34) as an LTR retrotransposon (Ty1-copia type).
Cloning and analysis of the RT region of LTR retrotransposons
In LTR retrotransposons, the reverse transcriptase (RT) domain, encoding the enzymes responsible for the generation of a DNA copy from the genomic RNA template, is the major regulatory gene of the transposition cycle. Consequently, the replication of many transposons was first characterized through the PCR amplification of the RT domain. These fragments were subsequently shown to transpose. The process of reverse transcription was prone to error, suggesting the vast potential for variability of the RT domain, which also played a role in the evolution of the control of retrotransposon activity (i.e., transcription termination). Using two pairs of degenerated primers, the conserved fragments of the RT domains from Ty1-copia and Ty3-gypsy elements were PCR-amplified. After differential screening, sequencing and removing redundant sequences, a total of 76 Ty1-copia-RT-domain sequences (abbreviated as Ty1RT) and 50 Ty3-gypsy-RT-domain sequences (abbreviated as Ty3RT) were obtained from 232 positive clones.
The Ty1RT sequence variable length ranged from 216 to 401 bp. Among these, sequences 263 bp in length were primarily observed, accounting for 48.7% of the observed sequence lengths, followed by sequences 266 bp in length, accounting for one sixth of the former (8.0%). The Ty3RT sequence length ranged from 421 to 433 bp, and the length polymorphism was significantly lower than Ty1RT, with a vast majority of sequences (42/50, 84.0%) 432 bp in length, without considering the nucleotide polymorphisms. This analysis also revealed that all nucleotide sequences were AT-rich, with AT/GC ratios of 1.43 and 1.46 for Ty1RT and Ty1RT, respectively.
When translated into amino acid sequences using a bioinformatics approach, 40.8% (31/76) of Ty1RT and 28% (14/50) of Ty3RT sequences showed common abnormally processed transcripts, resulting in frame-shifts with premature termination through the introduction of stop codons. Further analysis showed that among the 37 total 263-bp, seven 266-bp and one 255-bp sequence Ty1RT sequences examined, only one 263-bp sequence (2.7%) showed early termination. Without exception, all other Ty1RT sequences showed abnormal transcription (Table 1). For Ty3RT, all eight non-432 bp genes harboured one or more stop codons in the sequence, while the proportion of 432-bp sequences was 16.7% (7/42).
Table 1
Analysis of sequence premature transcription termination and the fragment length of TyRT1
Fragment length | Premature transcription termination number/Total number | Percentage |
263 | 1/37 | 2.7% |
216 | 4/4 | 100% |
221 | 1/1 | 100% |
225 | 1/1 | 100% |
226 | 2/2 | 100% |
231 | 1/1 | 100% |
232 | 4/4 | 100% |
250 | 1/1 | 100% |
254 | 2/2 | 100% |
255 | 0/1 | 0% |
261 | 1/1 | 100% |
264 | 1/1 | 100% |
265 | 1/1 | 100% |
266 | 0/7 | 0% |
274 | 1/1 | 100% |
276 | 0/1 | 0% |
302 | 1/1 | 100% |
317 | 3/3 | 100% |
332 | 1/1 | 100% |
386 | 4/4 | 100% |
401 | 1/1 | 100% |
The remaining nucleotide sequences could be translated to amino acid sequences without any difficulties. For Ty1RT, 45 sequences had amino acid similarities of 80.1%, and all sequences contained conserved regions of the Ty1-copia-RT-domain, including the 5' end of "TAFLHG", the middle of "SLYGLKQ" and the 3' end of "YVDDM". The identity of the amino acid sequences of Ty1RT-20, Ty1RT-21, Ty1RT-23, Ty1RT-35 and Ty1RT-7 was 100%, suggesting that "TAFLHGQLKETVYVSQPDGFVDPEFPNHVYKLNKALYGLKQAPRAWYDKLSSFLIANNFTKGSVDPTLFIQYHGAHILIVQIYVDDM" might be the original CnTy1RT sequence. The sequences of Ty1RT-39, Ty1RT-41 and Ty1RT-42 were also similar, indicating that these sequences were also in a primitive state in the C. nankingense genome (Figure S1). The sequence similarity of Ty3RT (approximately 90.0%) was higher than that of Ty1RT, and both of these sequences contained "RMCVDY" in the 5' region and "VMPFGL" in the middle region as conserved areas of the Ty3- gypsy-RT-domain. Although the length variation of Ty3RT was small, the amino acid variation was higher than that in Ty1RT, and only Ty3RT-14 and Ty3RT-26 were consistent at the amino acid level (Figure S2). The codon substitution and maximum likelihood models of dN (nonsynonymous) and dS (synonymous) were used for the reliable detection of positive selection at individual amino acid sites. Consequently, all Ty1RT sites showed no significant selection pressure, suggesting that these sites might maintain the gene function via the reduction of nonsynonymous mutations during the independent evolutionary events of Ty1RT (Table S1). For sites 2, 6, 72, 87, 138, 139 and 142 of Ty3RT, the positive selective effects might reflect selection pressure during evolution (Table S2).
A WebLogo plot showed the amino acid frequencies at conserved regions within C. nankengese, and although most of these loci shared a highly conservative character and specificity, some sites showed similar stack heights, indicating a similar frequency of occurrence of each amino acid at that position. Comparatively fewer amino acid mutations were observed in CnTy1RT, and the stack height of each site was constant, with only limited thin stacks observed at sites 11 and 18, while in CnTy3RT, thin stacks were more common at sites 14, 36, 51, 70, 113, 114, 130, 131, 142, 143, etc. (Fig. 2a). However, completely different results were observed for the related Ty1RT and Ty3RT sequences from other species in NCBI. The thin stacks were more apparent in Ty1RT than in Ty3RT (Fig. 2b). A phylogenetic analysis of different species showed that all Ty1RT sequences could be divided among four clades, while all Ty3RT sequences could be divided into five clades (Fig. 2c). In the four clades of Ty1RT, the sequence from C. nankengese was classified into two groups (Ty1RT-42, -39, -41, -43, -44, -38, -40 and − 22 vs. others). These two groups showed the most closely related similarity with the Ty1RT sequences from Solanum lycopersicum, A. thaliana and Prunus mume, indicating that the Ty1-copia elements of these species might exist in horizontal transmission (Fig. 2c). A similar pattern was also observed in Ty3RT, and all sequences could be further divided into three groups (Ty3RT-31 vs. Ty3RT-29, -5, -36, -7, -34 and − 8 vs. the others) (Fig. 2c). These apparent clades indicated the proliferation of corresponding LTR retrotransposons in genome differentiation. A random analysis of the sequence homology of each group showed that the sequence similarities were ~ 63% for Ty1RT and 79% for Ty3RT. These findings were consistent with the WebLogo plot results.
Transcriptional activation of LTR retrotransposons following autopolyploidization
Many of the plant LTR retrotransposons are transcriptionally activated through various biotic and abiotic stress factors. Indeed, polyploidization is a strong stimulant for plants, causing genome-wide genetic perturbations, particularly in allopolyploids. In the present study, we employed diploid organisms (tissue-cultured and non-tissue-cultured) as controls to investigate the transcriptional activation of the RT domain of C. nankingense following autopolyploidization. The PCR amplicons were separated through polyacrylamide gel electrophoresis. The nearly invisible bands in tissue-cultured and non-tissue cultured diploid C. nankingense showed a complete turnaround, becoming obvious in all four individual autotetraploid plants, although LTR retrotransposon-like AFLP fragments were not detected in the autotetraploid line 2/3/4, suggesting that tissue culture could not induce the expression of Ty1RT and Ty3RT elements, while the transcriptional activation of LTR retrotransposons might result from autopolyploidization (Fig. 3a).
To further verify the transpositional activity following autopolyploidization, the autotetraploid C. nankingense line 1 (T1), which showed fragment polymorphisms in the AFLP genotype (fragment not present in diploid organisms) and had high similarity with known rice Ty1-copia elements, was employed to sequence the transcriptome using high-throughput sequencing. Compared with diploid species, the autotetraploid line (T1) showed the marked transcriptional regulation of approximately 60,000 detected genes, and 1/3 of these genes showed more than two-fold up or down regulation at the transcriptional level (Fig. 3b), involved in metabolism, cellular processes, enzyme activities, protein binding and other processes or functions (Figure S3). Further analysis revealed that a total of 285 retrotransposon fragments were detected in the diploid or tetraploid line, including gag protein and gag-pol protein, pol-poly protein, int protein, the RT domain and other proteins with the retrotransposon annotation. These annotations suggest the existence of retrotransposons; however, the type of retrotransposon remains unknown. The sequencing of the rice genome has recently been completed, and this information is valuable for understanding the composition and structural features of the plant genome. Therefore, we re-annotated the transcriptome sequence using rice genome databases and the expressed sequence tag [40] database. These 285 retrotransposon fragments were classed as 139 Ty1-copia-like, 118 Ty3-gypsy-like, 21 non-LTR type (SINE) elements, and 7 non-specific sequences (Fig. 3c).
A total of 104 Ty1-copia-like, 92 Ty3-gypsy-like and 13 non-LTR type (SINE) elements were observed in diploid lines, and most of these elements had a low reads per kilobase per million reads mapped (RPKM) values, showing > 100-fold lower levels than housekeeping genes, such as EF1α and GAPDH. Among the 104 Ty1-copia-like elements, the average RPKM value was only 1.4; however, following autopolyploidization, the average RPKM value increased to 4.6, with 49% gene activation and 25% gene silencing. In addition, 35 new Ty1-copia-like elements were first observed in autotetraploids. The gene activation was 3.3-fold higher than the gene silencing. A similar result was observed for the Ty3-gypsy-like and non-LTR type (SINE) elements, and the gene activation was 3-fold higher than the gene silencing (75/23 and 12/4, respectively). The average RPKM value of the Ty3-gypsy-like elements in autotetraploid (T1) increased to 2.2, a value 1.6-fold higher than that observed in diploid species (Fig. 3c).
LTR region analysis of Ty1-copia-like elements
Based on the transcriptome database, we selected three Ty1-copia-like elements with higher up-regulation expression alterations and high homology with known retrotransposons in the autotetraploid (T1) as target genes for cloning and sequence analysis. Unfortunately, only two of these target genes were successfully amplified, reflecting the high polymorphism of this subtype of retrotransposon. The 4,598 and 4,616 bp sequences were renamed as CnMp1 and CnMp2, respectively.
Sequence analysis revealed that the CnMp1 and CnMp2 sequences contained conserved poly A sites in the LTR region. The boundary sequence (CA) of R-U5 was also detected among the 40 amino acid sequences of the LTR region. The PBS region also harboured the characteristic base sequence of "TGGT", followed closely by LTR regions at 3 and 0 bp in distance, respectively. For the coding regions, the GAG-PR-INT-RT-RNaseH genes were closely interlinked, and the GAG region contained the conserved nucleic acid-binding sequence "C-C-H-C". The PR region followed, and CnMp1 contained the conserved "D (T/S) G" sequence, while the last amino acid of CnMp2 was "A", which was not conserved. The INT regions of these two sequences had a conserved zinc finger domain (H-H-C-C). Three motifs, "AFLHG", "YGLKQ", and "YVDDM", in the RT region and two motifs, "KHID" and "KHID", in the RNaseH region were all relatively conserved (Fig. 4). The analysis revealed that the cloned sequence was a Ty1-copia-type retrotransposon.
SSAP analysis of the insertion sites in CnMp1 and CnMp2
The LTR region was sequence-specific for a given retrotransposon. The insertion of a retrotransposon into the plant genome, as opposed to DNA transposons, did not result in sequence divergences, rather showing stable insertions in the genome that could be exploited using recently developed marker systems. The retrotransposon-based sequence-specific amplification polymorphism (SSAP) is a novel molecular marker technique based on insertion polymorphisms generated through the specificity of an oligonucleotide primer anchored on the terminal sequences of a retrotransposon, typically in the LTR region[41].
In the present study, the primers were derived from the LTR regions of CnMp1 and CnMp2, reflecting the actual loci of CnMp1 and CnMp2 in the autotetraploid (T1) genome. A total of fourteen primer combinations (seven combinations for each gene) produced a multi-fragment SSAP profile. Most of the fragments of diploid species inherited by the autotetraploid (T1) line; however, five of the CnMp1 primer combinations generated at least one polymorphic fragment, while only two CnMp2 primer combinations showed certain polymorphisms, generating up to 12 polymorphic fragments, of which 8 fragments (seven fragments for CnMp1 and one fragment for CnMp2) were inserted into new loci, and 4 fragments (three fragments for CnMp1 and one fragment for CnMp2) showed the loss of the original loci, suggesting the activation and insertion of the retrotransposon following autopolyploidization (Fig. 5a and 5b). Notably, although the transcript abundance of CnMp1 and CnMp2 in the autotetraploid (T1) was similar (RPKM 6.9 vs. 7.6), and CnMp2 was also transcribed in the diploid organism (RPKM = 3.0), the total and polymorphic numbers of SSAP fragments of CnMp1 were higher than those of CnMp2, suggesting that CnMp1 might be a relatively young retrotransposon, and a mechanism for the inhibition of CnMp2 insertion might exist in C. nankingense.