Genome sequencing and assembly
A total of 23,757 reads (171,039,810 bp) were identified as mitochondrial among 601,168 reads (5,005,308,071 bp) of the raw sequencing output for the whole genome of Pleurocordyceps sinensis. The lengths of the putative mitochondrial reads ranged from 276 bp to 45,245 bp with an average length of 7,200 bp, reaching a coverage depth of 5,371× over the mt genome of the species. The mitochondrial reads were passed through the program BLASR and assembled with Celera Assembler program and Quiver, resulting in a circular DNA of 31,841 bp (Fig. 1).
Conserved protein genes and non-conserved open reading frames (ncORFs)
The mt genome of Pleurocordyceps sinensis had a low GC content of 25.46% and encoded 15 protein genes conserved within the order Hypocreales (Li et al. 2015) including seven subunits of the electron transport complex I (nad1, nad2, nad3, nad4, nad4L, nad5 and nad6), cytochrome b (cob), three subunits of complex IV (cox1, cox2 and cox3), three F0 subunits of the ATP-synthase complex (atp6, atp8 and atp9) and the rps3 gene which encodes 40S ribosomal protein S3 (Table 1, Fig. 1). In addition to those genes, 10 ncORFs (7,194 bp totally in length) were also predicted, among which two (ncORF3 and ncORF9) were found to encode homing endonucleases with motif patterns GIY-YIG and LAGLIDADG, respectively (Table 1).
Table 1
Mitochondrial genome annotation of Pleurocordyceps sinensis
Genes
|
Strands
|
Positions
|
Lengths (bp)
|
Introns
|
Start/stop codons
|
Anticodons
|
tRNA-Pro [P]
|
+/CW
|
53–125
|
73
|
|
|
TGG
|
rnl
|
+/CW
|
155–4876
|
4722
|
IA (1643), 2583–4225
|
|
|
ncORF1
|
–/CW
|
818–519
|
300
|
|
ATG/TAA
|
|
ncORF2
|
–/CW
|
1581–1252
|
330
|
|
ATG/TAA
|
|
rps3
|
+/CW
|
2816–4132
|
1317
|
|
ATG/TAA
|
|
tRNA-Thr [T]
|
+/CW
|
4793–4863
|
71
|
|
|
TGT
|
tRNA-Glu [E]
|
+/CW
|
4869–4941
|
73
|
|
|
TTC
|
tRNA-Met [M1]
|
+/CW
|
4942–5012
|
71
|
|
|
CAT
|
tRNA-Met [M2]
|
+/CW
|
5019–5091
|
73
|
|
|
CAT
|
tRNA-Leu [L]
|
+/CW
|
5177–5258
|
82
|
|
|
TAA
|
ncORF3 (GIY-YIG)
|
+/CW
|
5295–5942
|
648
|
|
ATG/TAA
|
|
tRNA-Ala [A]
|
+/CW
|
5933–6005
|
73
|
|
|
CGC
|
ncORF4
|
+/CW
|
6307–7539
|
1233
|
|
ATG/TAA
|
|
ncORF5
|
+/CW
|
7629–8987
|
1359
|
|
ATG/TAA
|
|
ncORF6
|
+/CW
|
9018–9470
|
453
|
|
ATG/TAA
|
|
ncORF7
|
+/CW
|
9619–10149
|
531
|
|
ATG/TAA
|
|
ncORF8
|
+/CW
|
10473–10958
|
486
|
|
ATG/TAG
|
|
tRNA-Phe [F]
|
+/CW
|
11156–11228
|
73
|
|
|
GAA
|
tRNA-Lys [K]
|
+/CW
|
11230–11302
|
73
|
|
|
TTT
|
tRNA-Leu [L2]
|
+/CW
|
11354–11435
|
82
|
|
|
TAG
|
tRNA-Gln [E2]
|
+/CW
|
11764–11836
|
73
|
|
|
TTG
|
tRNA-His [H]
|
+/CW
|
11841–11914
|
74
|
|
|
GTG
|
ncORF9 (LAGLIDADG)
|
+/CW
|
11968–12888
|
921
|
|
ATG/TAA
|
|
tRNA-Met [M3]
|
+/CW
|
12947–13019
|
73
|
|
|
CAT
|
nad2
|
+/CW
|
13061–14737
|
1677
|
|
ATG/TAA
|
|
nad3
|
+/CW
|
14738–15151
|
414
|
|
ATG/TAA
|
|
atp9
|
+/CW
|
15260–15484
|
225
|
|
ATG/TAA
|
|
cox2
|
+/CW
|
15598–16344
|
747
|
|
ATG/TAA
|
|
tRNA-Arg [R1]
|
+/CW
|
16391–16461
|
71
|
|
|
ACG
|
nad4L
|
+/CW
|
16526–16795
|
270
|
|
ATG/TAA
|
|
nad5
|
+/CW
|
16795–18792
|
1998
|
|
ATG/TAA
|
|
cob
|
+/CW
|
18951–20120
|
1170
|
|
ATG/TAA
|
|
tRNA-Cys [C]
|
+/CW
|
20176–20247
|
72
|
|
|
GCA
|
cox1
|
+/CW
|
20597–23225
|
2629
|
IB (1036), 21336–22372
|
ATA/TAA
|
|
ncORF10
|
+/CW
|
21335–22285
|
933
|
|
ATA/TAA
|
|
tRNA-Arg [R2]
|
+/CW
|
23276–23346
|
71
|
|
|
TCT
|
nad1
|
+/CW
|
23495–24616
|
1122
|
|
ATG/TAA
|
|
nad4
|
+/CW
|
24699–26156
|
1458
|
|
ATG/TAA
|
|
atp8
|
+/CW
|
26228–26374
|
147
|
|
ATG/TAA
|
|
atp6
|
+/CW
|
26450–27235
|
786
|
|
ATG/TAA
|
|
rns
|
+/CW
|
27513–29036
|
1524
|
|
|
|
tRNA-Tyr [Y]
|
+/CW
|
29189–29272
|
84
|
|
|
GTA
|
tRNA-Asp [D]
|
+/CW
|
29284–29357
|
74
|
|
|
GTC
|
tRNA-Ser [S1]
|
+/CW
|
29370–29452
|
83
|
|
|
GCT
|
tRNA-Asn [N]
|
+/CW
|
29619–29690
|
72
|
|
|
GTT
|
cox3
|
+/CW
|
29733–30542
|
810
|
|
ATG/TAA
|
|
tRNA-Gly [G]
|
+/CW
|
30578–30648
|
71
|
|
|
TCC
|
nad6
|
+/CW
|
30732–31418
|
687
|
|
ATG/TAA
|
|
tRNA-Val [V]
|
+/CW
|
31452–31524
|
73
|
|
|
TAC
|
tRNA-Ile [I]
|
+/CW
|
31580–31651
|
72
|
|
|
GAT
|
tRNA-Ser [S2]
|
+/CW
|
31656–31742
|
87
|
|
|
TGA
|
tRNA-Trp [T]
|
+/CW
|
31755–31826
|
72
|
|
|
TCA
|
Note: +, genes encoded on positive strain; –, genes encoded on negative strain; CW, genes were clockwise oriented. |
All conserved protein coding genes and ncORFs were found on the positive strand and oriented clockwise except for ncORF1 and ncORF2 which were on the negative strand and anticlockwise oriented. As found in Pleurotus ostreatus (Wang et al. 2008), Rhizoctonia solani (Losada et al. 2014) and Ophiocodyceps sinensis (Li et al. 2015), the nad2/nad3 genes were joined and nad4L/nad5 genes were fused, i.e., the initial codon of the nad3 gene (ATG) followed the terminal codon of the nad2 gene (TAA); the terminal codon of nad4L (TAA) uses the same nucleotide A with the initial codon (ATG) of nad5 (Fig. 1, Table 1). Other protein coding genes and ncORFs were separated by either long or short intergenic regions (Fig. 1).
All the 15 protein-coding genes and 10 predicted ncORFs employed the standard fungal mitochondrial start codon ATG, except cox1 and ncORF10, which were initiated by ATA. In addition, 24 of those genes used TAA as the stop codon except ncORF8 which used TAG (Table 1).
Noncoding RNAs
In addition to the 15 protein-coding genes, a large and a small ribosomal RNA (rnl and rns) and 25 tRNA genes corresponding to 20 amino acids were also identified (Table 1). The tRNA genes ranged in size from 71 to 87 bp. A majority of amino acids were coded by only one tRNA gene; however, Serine (Ser), Arginine (Arg), Methionine (Met) and Leucine (Leu) had 2, 2, 3 and 2 tRNA genes, respectively (Table 1). All noncoding RNAs (tRNA, rRNA) were found on the positive strand and oriented clockwise.
Intronic and intergenic regions
Exons of protein-coding genes, rRNA and tRNA genes had a total length of 20,873 bp accounting for 65.55% of the mt genome. Ten ncORFs (7,194 bp) accounted for 22.59% of the mt genome. Two group I introns were predicted, including one further classified into subgroup IA (1,643 bp) in rnl and one classified into subgroup IB in cox1 (1,036 bp) respectively, making up 8.5% of the entire mito-genome. The intergenic sequences had a total length of 1,070 bp covering 3.4% of the genome.
Gene component and synteny
Although different numbers of ncORFs (hypothetical proteins) would be predicted for hypocrealean fungi, the content and synteny of 15 protein coding genes (rps3, cox1, cox2, cox3, cob, atp6, atp8, atp9, nad1, nad2, nad3, nad4, nad5, nad4L and nad6) remained largely conserved in this order, except in a few species. As predicted, location of the cox2 gene was shifted in three species of Acremonium chrysogenum, A. fuci and Clonostachys rose compared to other hypocreales (Table S1). An additional copy of rps3 and atp9 gene was found in Beauveria malawiensis and Fusarium solani IISc-1 (CM023198), respectively. In F. oxysporum UASWS AC1 (KR952337), an extra copy was found for nad1 and nad4, while in F. oxysporum f. sp. matthiolae (CM019668), the location of the two genes were found to be reversed. The mt genome of F. oxysporum f. sp. fragariae GL1381 (CM029251) was found to lose cox3 and nad6 genes and possess an extra reversed copy of genes of cob, cox1, nad1, nad4, atp8 and atp6. An extreme case was found in Sarocladium implicatum in which three genes (cob, cox3 and nad6) were lost and the nad4 gene shifted its location from the nad1-atp8 junction to a position between rps3 and nad2 (Table S1).
Phylogenetic analyses
Eighty-one complete mt genome representing 63 distinct species from the order Hypocreales were included in phylogenetic analyses. After excluding the ambiguous aligned regions, a total of 4345 amino acid sequences of 14 conserved proteins were retained and used. All species of Hypocreales formed a well-supported clade (BP = 100%) in ML analysis. Within the clade, four family level subclades were recognized with very strong supports (BP = 100%), i.e. Nectriaceae, Bionectriaceae, Hypocreaceae and Clavicipitaceae (Fig. 2). Species in the family Ophiocordycipitaceae were clustered into two subclades, one subclade that consists of four Tolypocladium species showed a sister group relationship with the Clavicipitaceae clade with low bootstrap support (BP = 75%), the other highly supported (BP = 100%) subclade comprised of four Hirsutella species (H. minnesotensis, H. rhossiliensis, H. thompsonii and H. vermicola) and Ophiocordyceps sinensis (Fig. 2). It is interesting to find that Pleurocordyceps sinensis was clustered with the Clavicipitaceae clade with 100% bootstrap support, while not grouped with either two subclades of Ophiocordycipitaceae.