Annotation of C. rotundus organellar genomes
The C. rotundus mitogenome has four circular DNA molecules, namely, mt1 (mitochondrial chromosome 1), mt2, mt3, and mt4, with mt1 being the main circular DNA molecule. The lengths of mt1, mt2, mt3, and mt4 were 706,916, 399,727, 332,552, and 52,163 bp, respectively, and GC contents were 40.66%, 40.55%, 40.71%, and 41.02%, respectively (Table S1). In Fig. 1 and Table 1, the functional categories and physical locations of the annotated genes are displayed. There were 19 (including 14 core genes and 5 variable genes), 11 (including 9 core genes and 2 variable genes), and 10 (including 8 core genes and 2 variable genes) protein-coding genes (PCGs) in mt1, mt2, and mt3, respectively. mt1, mt2, and mt3 had three, four, and one rRNA, respectively. There were nine, ten, and seven tRNAs in mt1, mt2, and mt3, respectively. However, mt4 contains only one rRNA gene (rrn18).
Table 1
Gene annotation information of C. rotundus mitogenome
Group of genes | Name of genes |
mt1 | mt2 | mt3 | mt4 |
Core genes | ATP synthase | atp6 | atp1, atp8, atp9 | atp4, atp8 | |
NADH dehydrogenase | nad1(x2), nad2, nad4, nad5(x2), nad6, nad9 | nad1, nad2, nad3, nad7 | nad1, nad2, nad4L, nad5 | |
Cytochrome c biogenesis | cob | | | |
Ubiquinol cytochrome c reductase | ccmFC, ccmFN | ccmC | ccmB, ccmC | |
Maturase | matR | | | |
Transport membrane protein | mttB | | | |
Cytochrome c oxidase | cox2, cox3 | cox1 | | |
Variable genes | Ribosome large subunit | rpl5 | | rpl16 | |
Ribosome small subunit | rps1, rps4, rps7, rps13 | rps4, rps12 | rps3 | |
rRNA genes | Ribosome RNA | rrn18(x2), rrn26 | rrn5(x3), rrn26 | rrn26 | rrn18 |
tRNA genes | Transfer RNA | trnE-TTC, trnK-TTT, trnM-CAT (x3), trnR-GCG, trnS-GCT, trnS-TGA, trnY-GTA | trnC-GCA, trnD-GTC (x2), trnE-TTC, trnH-GTG, trnM-CAT, trnN-GTT, trnQ-TTG, trnS-GGA, trnW-CCA | trnC-GCA, trnD-GTC, trnF-GAA, trnM-CAT, trnP-TGG, trnQ-TTG, trnS-GCT | |
Note: Bracketed numbers indicate the number of gene repeats. |
The length of C. rotundus cpgenome was 186,119 bp, consisting of two reverse repeat (IR, 74,843 bp) sequences, a short single copy (SSC, 10,315 bp) sequence, and a long single copy (LSC, 100,961 bp) sequence (Fig. S1). C. rotundus genome had a GC content of 33.19% (Table S1). Specifically, the LSC, SSC, and IR regions had GC contents of 29.85%, 18.6%, and 46.5%, respectively. The cpgenome was annotated with 121 genes, including 69 PCGs, 8 rRNA genes, and 44 tRNA genes (Table S2).
Repetitive sequences of C. rotundus organellar genomes
Microsatellite DNA, also known as Simple Sequence Repeats (SSRs), is a tandem repeat sequence consisting of several nucleotides (usually 1–6) as repeating units with lengths of 50–100 bp. It is commonly found in eukaryotic genomes [28]. The mitogenome and cpgenome of C. rotundus included 350 and 88 SSRs, respectively (Fig. 2A-B, Table S3, and S4). In the mitogenome, monomers and dimers accounted for 55.14% of all SSRs. Among the 97 monomer SSRs, adenine (A) and thymine (T) monomers accounted for 44.33% and 55.67%, respectively. AT repeats were the most frequent dimeric SSRs, accounting for 55.21% of all dimeric SSRs. The most common SSRs were tetranucleotides, accounting for 30.57% of all the SSRs. However, monomer SSRs were the most abundant in the cpgenome, accounting for 64.77% of the total SSRs. Among the 57 monomer SSRs, A and T monomers accounted for 45.61% and 54.39%, respectively. Furthermore, these SSRs can potentially serve as C. rotundus identification markers.
Tandem repeats consist of multiple copies of repeating units of ≥ 7 nucleotides located next to one another [29]. The mitogenome contained 144 tandem repeats, with lengths ranging from 2 to 70 bp (Fig. 2A, Fig. 2C, and Table S5). There were 123 tandem repeats in the cpgenome, with lengths ranging from 2 to 73 bp (Fig. 2A, Fig. 2C, and Table S6). These repeats will be further evaluated in future studies for their potential as molecular markers.
Interspersed repeats are crucial for genetic diversity and assist in plant genome evolution [30]. In total, 686 interspersed repeats were found in the mitogenome (Table S7), including 312 palindromic repeats, 371 forward repeats, 1 reverse repeat, and 2 complementary repeats (Fig. 2A and Fig. 2C). Simultaneously, 1,210 interspersed repeats were identified in the cpgenome (Table S8), including 394 palindromic, 797 forward, 10 reverse, and 9 complementary repeats (Fig. 2A and Fig. 2C). The proportion of forward repeats in both the mitogenome and cpgenome was the highest at 54.08% and 65.87%, respectively.
Homologous fragments analysis of C. rotundus organellar genomes
In the C. rotundus annotated mitogenome and the cpgenome, 11 identical fragments were found in both genomes (Fig. 3 and Table S9). The 11 fragments comprised 3,901 bp in total, accounting for 0.26% of the mitogenome and 2.10% of the cpgenome. The longest aligned fragment was 741 bp, and the shortest aligned fragment was only 60 bp. The cp DNA migrated six fragments to mt1 DNA, containing one complete gene (trnM-CAU) and four partial genes (rrn16, rrn18, rrn26, and petA). The cp DNA migrated two fragments to the mt2 DNA, containing four partial genes (rrn5, rrn16, rps15, and ndhH). The cp DNA migrated as two fragments to mt3 DNA, containing one complete gene (trnM-CAU) and one partial gene (rpl16). The cp DNA migrated from one fragment to mt4 DNA, containing two partial genes (rrn16 and rrn18). These fragments included one complete gene (trnM-CAU) and eight partial genes (rrn16, rrn18, rrn5, rrn26, rps15, rpl16, petA, and ndhH). Among these, the sequence fragments of eight partial genes underwent a certain degree of loss, indicating that these genes may be non-functional.
The codon makeup of the organellar genome of C. rotundus was examined using a self-coding Perl script. In the mitogenome, the codon numbers of mt1, mt2, mt3, and mt4 in all PCGs were 13,492, 7,560, 6,352, and 1,004, respectively (Table 2). The average GC content across all GC, as well as the GC1, GC2, and GC3 contents, was less than 41.50%. The effective number of codons (ENC) was higher than 56.24, indicating that the mitogenome has a weak codon preference. There were 30 codons with a relative synonymous codon usage (RSCU) > 1 (Fig. 4), which was the same across all four chromosomes, demonstrating that the usage of these codons was higher than that of other synonymous codons. The C. rotundus cpgenome had 3,761 codons, and its average GC content, as well as the contents of GC1, GC2, and GC3, were all less than 33.70% (Table 2). It also exhibited a weak codon preference based on its ENC, which was 52.86. Additionally, there were 29, 2, and 33 codons with RSCU values > 1, = 1, and < 1, in the C. rotundus cpgenome, respectively (Fig. 4). Similar to the mitogenome, the RSCU value of AGA was the highest among the codons used, exceeding 2.10, indicating that this codon was the most frequently used in the C. rotundius organellar genome.
Table 2
Overall features of codon usage in the C. rotundus organellar genomes
Genome | Codon number | GC1 | GC2 | GC3 | GC all | ENC |
mt1 | 13,492 | 40.49 | 40.91 | 40.60 | 40.67 | 56.260 |
mt2 | 7,560 | 40.66 | 40.44 | 40.54 | 40.55 | 56.324 |
mt3 | 6,352 | 40.31 | 40.99 | 40.81 | 40.70 | 56.237 |
mt4 | 1,004 | 41.41 | 40.40 | 41.23 | 41.01 | 56.439 |
cp | 3,761 | 33.62 | 32.54 | 33.43 | 33.19 | 52.860 |
RNA editing, which refers to the addition, loss, or conversion of bases in the coding region of transcribed RNA, is a common phenomenon in organellar genomes [31]. In this study, the RNA-editing events in C. rotundus organellar genomes had been focused. The number of PCGs that underwent RNA editing events was determined to be nine for the mitogenome (Fig. 5A) and nine for the cpgenome (Fig. 5B). There were 13 and 23 RNA editing events in the mitogenome (Table S10) and the cpgenome (Table S11), respectively. In the PCGs of the mitogenome, most RNA-editing sites were found in nad1, followed by atp1 and ccmB with 3, 2, and 2 sites, respectively. In the PCGs of the cpgenome, most RNA editing sites were discovered in ndhB, followed by rpoC2 and ndhC, with 7, 5, and 3, respectively. The mitogenome contained five different RNA editing types, whereas the cpgenome contained six different RNA editing types (Fig. 5C). The C-to-T editing type predominated in both the mitogenome and the cpgenome. In terms of RNA editing efficiency, C. rotundus had 7 (53.85%) mitochondrial PCGs and 17 (73.91%) chloroplast PCGs with editing efficiencies greater than 80% (Fig. 5D). After RNA editing, the C. rotundus mitogenome retained 15.38% of the hydrophilic amino acids and 61.54% of the hydrophobic amino acids, while changing 23.08% of the hydrophilic amino acids into hydrophobic amino acids. In the C. rotundus cpgenome, 26.09% of the hydrophobic amino acids and 13.04% of the hydrophilic amino acids were unaltered, 4.35% of the hydrophobic amino acids were converted to hydrophilic amino acids, and 56.52% of the hydrophilic amino acids were converted to hydrophobic amino acids .
To identify the phylogenetic relationship of C. rotundus organellar genome, a phylogenetic tree was constructed using 17 mitochondrial and 64 chloroplast PCGs shared by 10 related plants, with Arabidopsis thaliana and Brassica napus as the outer group. The 17 mitochondrial genes in all 10 plants were nad1, nad3, nad4, nad4L, nad5, nad6, nad7, cob, cox1, cox3, atp1, atp6, atp9, rps3, rps4, rps12 and ccmC. The chloroplast genes used to construct the phylogenetic tree were 64 chloroplast PCGs of C. rotundus, in addition to six genes, atpE, psaJ, rpl 22, rps16, ycf3 and ycf4, as they were more or less missing in the cpgenomes of 10 plant species. Phylogenetic trees constructed based on mitochondrial and chloroplast genes were identical, and the results showed that the phylogenetic relationship between C. rotundus and C. esculentus was the closest (Fig. 6).
Seventeen PCGs of C. rotundus mitogenome were calculated using the non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks). As shown in Fig. 7A, the Ka/Ks value of nad6 was > 1, indicating that it had undergone positive selection. In contrast, the Ka/Ks values of the other genes were < 1, indicating negative selection. In particular, atp9, cox1, nad4L, and rps12 had low Ka/Ks values with the smallest variations, demonstrating that they are crucial to the function of the mitogenome. Figure 7B shows the pairwise Ka/Ks values for the ten different species. In addition to the Ka/Ks values of C. rotundus nad6 gene, the differences in the Ka/Ks values of these species were not particularly noteworthy.