Pacbio sequencing improves short-read assemblies of T. fuciformis
With the rapid development of sequencing technologies and a sharp decline in the cost of whole genome sequencing, more fungal genomes have been sequenced and annotated. As an accessory of whole genome sequencing, fungal mitochondrial genomes can be assembled and identified using raw sequence data obtained [6, 9, 31, 32] based on its special characteristics, such as high copy number and a set of highly conserved genes, and then synthesized into intact molecules by PCR-based approaches. However, the presence of repetitive or non-unique DNA within mitochondrial genomes in fungi may hinder their successful de novo assembly from short reads [33]. To assess the quality of assemblies obtained from Illumina sequencing data, we generated complete mtDNAs using the Pacbio sequencing method, and aligned mitochondrial sequences from both sequencing methods of T. fuciformis TF13 and TF15. The differences between the two mtDNA sequences of TF13 were nine singleton indels (~0.022 % disagreement), and for TF15 there was one singleton indel (~0.0025 % disagreement). All indels occurred within homopolymer areas. Consistency of indels among mitochondrial genomes from different datasets (Pacbio and Illumina) of the same isolate has also been reported in Saccharomyces cerevisiae [8]. Sanger sequencing of these indel areas indicated that these indels resulted from sequencing or/and assembly errors using PacBio data. Thus, Illumina sequencing with 125 bp paired-end reads appeared to yield higher quality intact mitochondrial genomes for T. fuciformis even though the reads lengths were much shorter.
High frequency of mitochondrial intron gain/loss in T. fuciformis
None of the 24 introns presented simultaneously throughout all the tested isolates. It indicates that at least one event of gain/loss took place in each of the introns after the speciation of T. fuciformis. Three pairs of introns, in particular, the rnl-i3 versus rnl-i4, rnl-i5 versus rnl-i6, and rnl-i8 versus rnl-i9, each pair had the same insertion site but low sequence similarity between the two introns. It means that two different introns located at the same insertion site. At least two gain/loss events took place since the speciation, in spite of the introns inserted at the same site or not. Both evidences suggest high frequency of mitochondrial intron movement among the T. fuciformis population.
Losses of introns are much more frequent than gains as for the spliceosomal introns in nuclear genomes [34]. Different from most nuclear introns, typical mitochondrial introns are mobile genetic elements that form self-splicing RNA molecules. The mitochondrial introns are divided into Group I and Group II according to their secondary structures and splicing mechanisms [18]. Dependent on the splicing mechanisms, introns can move either from one place to another, or even from one organism to another [18]. Taking into account the distribution pattern of introns in combination with the phylogenetic tree (Figure 1), eight introns of the cox1-i1, trnL-i1, cox2-i2, trnI-i1, cob-i1, cob-i2, nad4-i2, and trnP-i1, are likely to gain during the population evolution of T. fuciformis. At least one event of intron-gain occurred at each insertion site of rnl-i3/rnl-i4, rnl-i5/rnl-i6, and rnl-i8/rnl-i9. However, no evidence supports a higher frequency of intron-loss than intron-gain in mitochondria.
A proposed model of gene fragment exchange through gain or loss of intron with N-terminal duplication
Six introns containing N-terminal duplication were predicted from the mtDNAs of 16 T. fuciformis isolates. The duplications shared high similarity with exons. Each predicted intron was hypothesized to be a transposon element (TE) with host gene N-terminal homolog, which was then inserted into mtDNA of T. fuciformis to become an intron.
Homing reactions need three components, including 1) laterally transferred genetic elements, 2) a homing endonuclease protein, and 3) a target site [20]. Homing endonucleases with high sequence identity share homogeneous target sites [20]. It is suggested that homing reaction of the TEs (mobile intron) is performed by HE proteins they harbor, or from other places for those non-carrying HE genes. These HE genes also determine the insertion position of TEs. Speculatively, the N-terminal homologs that are just by-products of introns, may not affect the efficiency of homing reaction.
After insertion, TEs with host gene N-terminal homolog become introns of their target gene. However, PCR results of cDNAs revealed the predicted introns no longer functioned as introns for cox1-D, nad3-D, nad4-1-D, nad5-D, or cob-D. Predicted nad4-i1, nad5-i1, and cob-i2 became a part of the cDNAs. cox1-i2 and nad3-i1 as well as their following predicted exons separated from the genes, and became their downstream sequences (Figure 3). These results indicated that different parts of D type gene may change their roles during evolution.
A possible model (Figure 4) was proposed to account for the discrepancy between predicted and experimental results which the following steps: 1) a TE with exogenetic gene N-terminal homolog inserts into the conserved protein-coding gene of mtDNA in T. fuciformis, and becomes an intron of the gene, transforming the N type gene (cox1-N2, nad4-2-N, nad4-1-N, nad3-N, nad5-N, cob-N) into the D type gene (nad4-2-D); 2) the intron transforms to become a part of the exon (nad4-1-D, nad5-D, cob-D); 3) transposon components as well as its predicted exon separate from the gene, and become a downstream TE (cox1-D, nad3-D); 4) the downstream TE breaks away from the mitochondrial genome, transforming the D type gene into an N type gene (cox1-N1). The steps, including processes of insertion and differential loss of intron, result in sequence substitution of the host gene.
Host gene fragment exchange via intron mobility is a new gene evolution approach
Lateral gene transfer refers to genetic material from a donor exchanging and stably integrating into different strains or species [35]. Previous studies on lateral gene transfer in fungi revealed that the genetic material may be individual genes, like ToxA [36] and Mpk1 [37], gene clusters [38-40] and chromosomes [41, 42]. Transfer of the genetic materials import new genes or new copy of genes into host strain, which have a deep effect on disease emergence, niche specification, or shift in metabolic capabilities [43]. However, as far as we know, there is no reported evidence that fungal mitochondrial genes evolved by partial fragment exchange via lateral transfer. The above model put forward partial gene lateral transfer through gain or loss of an intron with a truncated host gene precursor, resulting in T. fuciformis source N-terminal of the conserved gene being replaced by an exogenetic one.
It has been assumed that the phylogenetic signal of each mtDNA gene is identical or highly similar, due to their physical locations within the same molecule [6]. However, mtDNA analyses have revealed divergence in the phylogenetic signal strength of mt genes among and within species [44]. For example, topologies inferred from concatenated rnl and cox1 sequences showed significant concordance to topologies inferred from nad4L and cob among 16 isolates of Rhizophagus or Glomus species [6]. The divergence often takes place in the N-terminal, other than whole genes [6]. The above model might be a resolution for this dissimilarity: N-terminal exchange to import an exogenetic gene fragment into one gene, and greatly alter its phylogenetic signal in a single event, leading to multiple transfers during evolution that result in divergence of phylogenetic signals, where similarity is expected.
Duplication of truncated conserved genes may be induced by introns with N-terminal homolog of host gene through horizontal gene transfer
The phenomenon with duplicated copies of conserved genes has been often found in fungal mitochondrial genomes. Large segments (more than 6 kb) were hypothesized to invert into the mtDNA of both Phlebia radiata [45] and Candida albicans[46], resulting in the duplication of atp6 and cox3 genes, respectively. Both inverted duplications were hypothesized to have occurred by replication-directing recombination [45, 46]. Two large inverted repeats both containing identical copies of nad4 genes were separated by a single copy region of 5834 bp in the Agrocybe aegerita mitochondrial genome [47]. Duplicated sets of tRNA genes were reported in the mtDNA in Agaricus bisporus [16]. Duplication of the nad4 gene in A. aegerita and of tRNA genes in A. bisporus were obtained by plasmid integration [16, 47]. Furthermore, an extra truncated atp9 gene was found in the mtDNA of Phialocephala subalpine [48] and Sclerotinia borealis [31], and truncated atp6 genes were detected in Botryotinia fuckeliana[31].
Six introns were investigated in this study and found to harbor a fragment in their 5' end, which was a duplication of the truncated host gene, and showed high similarity with products of their subsequent exons. The length of the duplications depended on intron insertion site. If the insertion site was near the 5' of a gene, the length of the duplication was long; if the insertion site was near the 3' end of a gene, the length of the duplication was short. Introns with N-terminal homolog of host gene may contain fragments of other conserved genes. An extra truncated copy of the nad2 gene was found in cox3-i1, located downstream of a truncated copy of the nad3 gene. Extra truncated copies of nad2 and nad3 genes were always present or absent in all isolates at the same time. It is supposed that both truncated genes in cox3-i1 were obtained in the same way. All extra truncated genes investigated in this study were found in introns, with their coding sequences sharing high similarity with the downstream exon of host gene. The results implied that gene duplication through intron insertion is a common feature in T. fuciformis mitochondrial genomes.
Although occasionally, duplication of host-gene extrons in mtDNA could tend to take place more frequently near N-terminal than C-terminal. A possible explanation is that, deletion of introns together with host-gene fragments from donor mtDNA might result in loss of the fragment, while host genes near N-terminal could be more tolerant to fragment loss than those near C-terminal. In case the lost fragment locates near C-terminal, host gene would be unable to transcribe, leading to a loss of gene function. However, host gene would still be able to transcribe, completely or partially, when the lost fragment locates near N-terminal. Consequently, host-gene segments carried by introns were observed to concentrate near N-terminal, and then insertion of the introns into recipient mtDNAs leads to host-gene duplications near N-terminal.
Annotation errors without intra-specific comparisons
Conserved protein-coding genes and rRNAs in fungal mitochondrial genome were annotated by the MFannot[11, 49] or BLAST [50] programs; their intron-exon boundaries were identified by Clustal W by comparison with intron-free homologous genes of closely related species [50]; and tRNAs were identified by MFannot[49], tRNAscan-SE [51], RNAweasel, and/or Rfam [32]. However, because of the great differences existing among intergenic regions of interspecific mitochondrial genomes, some annotation errors might have occurred in these alignments. These errors were reflected mainly in the annotations of introns with N-terminal homolog of host gene, and introns within tRNAs. In this study, six introns with truncated host gene precursor were not detected by MFannot, but by alignment with the corresponding intron-free genes of intra-specific isolates. The common feature of these introns was that they contained a fragment at the 5' ends, which was a 'duplication' of their following exon. As a result, the software could not identify the real exons. RNAweasel, Mfannot, and tRNA-SE were used to identify tRNAs in T. fuciformis, and no intron containing tRNA was found. However, three tRNAs with introns were identified among 16 mitochondrial genomes by intra-specific comparison, which were trnL in TF06, TF07, and TF09, trnI in TF11, TF14, and TF15, and trnP in TF05, and TF06. The short sequence length of tRNAs made it difficult for programs to annotate the introns they carried. High similarity of sequences not only in conserved genes but also intergenic regions among intra-specific mtDNAs made intron-insertion boundaries clearer. Intra-specific mitochondrial genome comparison improved quality of their gene annotation.