Mitogenome organization and composition
The filtered dataset of the sequenced mt genome of Contracaecum sp. is nearly 2Gb with a total of 8,677,194 x 2 clean reads for further assembling. The circular mt genome of Contracaecum sp. (GenBank accession: WM056322) assembled was 14,082 bp in size, shorter than Zhang et al. (2021) published, with 12 PCGs, 22 tRNAs, 2 rRNAs, and two non-coding regions (NCRs) (Table 1, Fig. 2). Total of 36 genes were transcribed in the forward direction and gene arrangement was recognized into the typical GA3 pattern which is mostly observed in roundworms (Liu et al. 2013a). Consistent with previous reports, there was a tendency of T base (48.7%) companying high A-T bases biased (71.2%). Total ten intergenic regions among the complete mt genome of Contracaecum sp. ranging from 1 bp to 16 bp (Table 1). One short (122 bp) was located between nad4 and cox1, and one long non-coding region (691 bp) was placed in tRNA-Ser2 and tRNA-Asn. The values of AT-skew were negative from − 0.475 (nad6) to -0.111 (NCRs), inversely, the values of GC- skew were positive with scope 0.226 (nad4) – 0.674 (nad3), suggesting Ts and Gs were more frequently used in the genome.
Table 1
The organization of the complete mt genome of Contracaecum sp. from Beijing, China.
Gene/Region
|
Strand
|
Positions
|
Size (bp)
|
Number of aaa
|
Ini/Ter codons
|
Anticodons
|
In
|
tRNA-Asn (N)
|
H
|
1-60
|
60
|
|
|
GTT
|
0
|
tRNA-Tyr (Y)
|
H
|
61-116
|
56
|
|
|
GTA
|
0
|
nad1
|
H
|
117-989
|
873
|
290
|
TTG/TAG
|
|
0
|
atp6
|
H
|
993-1591
|
599
|
199
|
ATT/TA
|
|
+3
|
tRNA-Lys (K)
|
H
|
1592-1653
|
62
|
|
|
TTT
|
0
|
tRNA-Leu2 (L2)
|
H
|
1654-1708
|
55
|
|
|
TAA
|
0
|
tRNA-Ser1 (S1)
|
H
|
1709-1759
|
51
|
|
|
TCT
|
0
|
nad2
|
H
|
1760-2605
|
846
|
281
|
TTG/TAA
|
|
0
|
tRNA-Ile (I)
|
H
|
2619-2678
|
60
|
|
|
GAT
|
+13
|
tRNA-Arg (R)
|
H
|
2679-2732
|
54
|
|
|
GCG
|
0
|
tRNA-Gln (Q)
|
H
|
2733-2787
|
55
|
|
|
TTG
|
0
|
tRNA-Phe (F)
|
H
|
2788-2846
|
59
|
|
|
GAA
|
0
|
cytb
|
H
|
2847-3953
|
1107
|
368
|
TTG/TAA
|
|
0
|
tRNA-Leu1 (L1)
|
H
|
3961-4017
|
57
|
|
|
TAG
|
+7
|
cox3
|
H
|
4018-4782
|
766
|
255
|
TTG/T
|
|
0
|
tRNA-Thr (T)
|
H
|
4783-4843
|
60
|
|
|
TGT
|
0
|
nad4
|
H
|
4844-6073
|
1230
|
409
|
TTG/TAA
|
|
0
|
Intergenic region
|
H
|
6074-6195
|
122
|
|
|
|
0
|
cox1
|
H
|
6196-7771
|
1576
|
525
|
TTG/T
|
|
0
|
tRNA-Cys (C)
|
H
|
7772-7829
|
58
|
|
|
GCA
|
0
|
tRNA-Met (M)
|
H
|
7831-7890
|
60
|
|
|
CAT
|
+1
|
tRNA-Asp (D)
|
H
|
7907-7963
|
57
|
|
|
GTC
|
+16
|
tRNA-Gly (G)
|
H
|
7965-8021
|
57
|
|
|
TCC
|
+1
|
cox2
|
H
|
8022-8713
|
692
|
230
|
TTG/TA
|
|
0
|
tRNA-His (H)
|
H
|
8714-8778
|
65
|
|
|
GTG
|
0
|
rrnL
|
H
|
8779-9737
|
959
|
|
|
|
0
|
nad3
|
H
|
9738-10073
|
336
|
111
|
TTG/TAG
|
|
0
|
nad5
|
H
|
10077-11659
|
1583
|
527
|
ATT/TA
|
|
+3
|
tRNA-Ala (A)
|
H
|
11660-11716
|
57
|
|
|
TGC
|
0
|
tRNA-Pro (P)
|
H
|
11724-11780
|
57
|
|
|
TGG
|
+7
|
tRNA-Val (V)
|
H
|
11781-11837
|
57
|
|
|
TAC
|
0
|
nad6
|
H
|
11838-12272
|
435
|
144
|
TTG/TAA
|
|
0
|
nad4L
|
H
|
12275-12505
|
231
|
76
|
ATT/TAA
|
|
+2
|
tRNA-Trp (W)
|
H
|
12506-12563
|
58
|
|
|
TCA
|
0
|
tRNA-Glu (E)
|
H
|
12565-12624
|
60
|
|
|
TTC
|
+1
|
rrnS
|
H
|
12625-13335
|
711
|
|
|
|
0
|
tRNA-Ser2 (S2)
|
H
|
13336-13391
|
56
|
|
|
TGA
|
0
|
Non-coding region
|
H
|
13392-14082
|
691
|
|
|
|
0
|
In: Intergenic nucleotides. |
aThe inferred length of amino acid (aa) sequence of 13 protein-coding genes; Ini/Ter codons: initiation and termination codons. |
Protein-coding genes
TTG was the most common initial codon in this study, followed by ATT. TTG was used as the start codon for nine genes (cox1-3, cytb, nad1-4, and nad6), excluding genes atp6, nad4L, and nad5 (Table 1). The rest three PCGs used ATT as the initial codon. Generally, TAG and TAA were shared as common stop codons in metazoan (Hu et al. 2004). In this study, TAA was the most frequent termination among nad6, nad4L, nad4, cytb, and nad2. the genes nad1 and nad3 used TAG as the stop codon. The rest genes respectively used incomplete stop codons T (cox1 and cox3) or TA (atp6, cox2, and nad5).
A sum of 3422 amino acids was translated by 12 PCGs. TTT (480) was the most common codon used in encoding with Phe. Followed by codons GTT (Val), TTG (Leu), and ATT (Ile), the amounts were 219, 216, and 214 (Table 2), respectively. Leu (519) and Phe (499) were the most frequently coded amino acids, while Arg (34) was the rarest. There was a tendency of Gs and Ts in the same amino acid by comparing the relative synonymous codon usage (RSCU) (Table 2). The AT content of 12 protein genes ranged from 66.7% (cox1) to 78.9% (nad6) (Table 3). There was an obvious use of Ts and Gs, all rates of AT-skews were negative ranging from − 0.475 to -0.111, and whole values for GC-skew were positive from 0.226 to 0.674. Ts were the most used bases in nad6 (-0.475), followed by nad3 (-0.464) and nad2 (-0.451). Similarly, Gs were the most used bases in nad3 (0.674), nad4L (0.569) and atp6 (0.526).
Table 2
Amino acids frequency of Contracaecum sp. mitochondrial protein-coding genes.
Amino acid
|
Codon
|
Number
|
RSCU (%)
|
Amino acid
|
Codon
|
Number
|
RSCU (%)
|
Phe
|
TTT
|
480
|
1.92
|
Tyr
|
TAT
|
154
|
1.84
|
Phe
|
TTC
|
19
|
0.08
|
Tyr
|
TAC
|
13
|
0.16
|
Leu
|
TTA
|
199
|
2.3
|
Stop
|
TAA
|
5
|
1.43
|
Leu
|
TTG
|
216
|
2.5
|
Stop
|
TAG
|
2
|
0.57
|
Leu
|
CTT
|
76
|
0.88
|
His
|
CAT
|
54
|
1.86
|
Leu
|
CTC
|
2
|
0.02
|
His
|
CAC
|
4
|
0.14
|
Leu
|
CTA
|
10
|
0.12
|
Gln
|
CAA
|
20
|
0.98
|
Leu
|
CTG
|
16
|
0.18
|
Gln
|
CAG
|
21
|
1.02
|
Ile
|
ATT
|
214
|
1.92
|
Asn
|
AAT
|
100
|
1.79
|
Ile
|
ATC
|
9
|
0.08
|
Asn
|
AAC
|
12
|
0.21
|
Met
|
ATA
|
76
|
0.86
|
Lys
|
AAA
|
35
|
0.71
|
Met
|
ATG
|
101
|
1.14
|
Lys
|
AAG
|
63
|
1.29
|
Val
|
GTT
|
219
|
2.61
|
Asp
|
GAT
|
62
|
1.65
|
Val
|
GTC
|
13
|
0.16
|
Asp
|
GAC
|
13
|
0.35
|
Val
|
GTA
|
49
|
0.59
|
Glu
|
GAA
|
32
|
0.84
|
Val
|
GTG
|
54
|
0.64
|
Glu
|
GAG
|
44
|
1.16
|
Ser
|
TCT
|
139
|
3.08
|
Cys
|
TGT
|
53
|
1.96
|
Ser
|
TCC
|
6
|
0.13
|
Cys
|
TGC
|
1
|
0.04
|
Ser
|
TCA
|
14
|
0.31
|
Trp
|
TGA
|
21
|
0.57
|
Ser
|
TCG
|
5
|
0.11
|
Trp
|
TGG
|
53
|
1.43
|
Pro
|
CCT
|
66
|
3.11
|
Arg
|
CGT
|
33
|
3.88
|
Pro
|
CCC
|
7
|
0.33
|
Arg
|
CGC
|
1
|
0.12
|
Pro
|
CCA
|
9
|
0.42
|
Arg
|
CGA
|
0
|
0
|
Pro
|
CCG
|
3
|
0.14
|
Arg
|
CGG
|
0
|
0
|
Thr
|
ACT
|
89
|
3.24
|
Ser
|
AGT
|
121
|
2.68
|
Thr
|
ACC
|
6
|
0.22
|
Ser
|
AGC
|
2
|
0.04
|
Thr
|
ACA
|
9
|
0.33
|
Ser
|
AGA
|
36
|
0.8
|
Thr
|
ACG
|
6
|
0.22
|
Ser
|
AGG
|
38
|
0.84
|
Ala
|
GCT
|
72
|
2.5
|
Gly
|
GGT
|
112
|
2.22
|
Ala
|
GCC
|
24
|
0.83
|
Gly
|
GGC
|
21
|
0.42
|
Ala
|
GCA
|
11
|
0.38
|
Gly
|
GGA
|
23
|
0.46
|
Ala
|
GCG
|
8
|
0.28
|
Gly
|
GGG
|
46
|
0.91
|
Excluding abbreviated stop codons (TA and T). |
Stop = Stop codon. |
Table 3
Nucleotide composition and skews of Contracaecum sp. mitochondrial genome.
Gene
|
Nucleotide frequency
|
A + T (%)
|
AT-skew
|
GC-skew
|
A (%) G (%) T (%) C (%)
|
atp6
|
22.0
|
22.0
|
49.1
|
6.9
|
71.1
|
-0.380
|
0.526
|
cox1
|
19.5
|
21.8
|
47.2
|
11.5
|
66.7
|
-0.416
|
0.307
|
cox2
|
21.2
|
22.1
|
46.5
|
10.1
|
67.7
|
-0.373
|
0.372
|
cox3
|
18.9
|
20.9
|
49.8
|
10.4
|
68.7
|
-0.449
|
0.333
|
cytb
|
19.7
|
22.0
|
47.6
|
10.7
|
67.3
|
-0.415
|
0.343
|
nad1
|
19.5
|
20.5
|
50.5
|
9.5
|
70.0
|
-0.444
|
0.364
|
nad2
|
20.7
|
18.2
|
54.6
|
6.5
|
75.3
|
-0.451
|
0.474
|
nad3
|
20.0
|
21.4
|
54.4
|
4.2
|
74.4
|
-0.464
|
0.674
|
nad4
|
21.4
|
17.0
|
50.9
|
10.7
|
72.3
|
-0.408
|
0.226
|
nad4L
|
22.9
|
17.3
|
55.0
|
4.8
|
77.9
|
-0.411
|
0.569
|
nad5
|
21.2
|
18.8
|
51.9
|
8.1
|
73.1
|
-0.420
|
0.398
|
nad6
|
20.7
|
13.3
|
58.2
|
7.8
|
78.9
|
-0.475
|
0.261
|
rrnS
|
30.2
|
19.7
|
40.4
|
9.7
|
70.6
|
-0.143
|
0.340
|
rrnL
22 tRNA
|
27.3
31.5
|
17.5
18.7
|
48.3
40.8
|
6.9
9.0
|
75.6
72.3
|
-0.277
-0.129
|
0.436
0.352
|
NCR
|
37.4
|
10.3
|
46.7
|
5.6
|
84.1
|
-0.111
|
0.290
|
Total
|
23.5
|
19.0
|
48.7
|
8.9
|
72.2
|
-0.350
|
0.364
|
Transfer RNA genes, ribosomal RNA genes and non-coding region
The length of 22 tRNAs ranged from 51 bp (tRNA-Ser1) to 65 bp (tRNA-His). As one of the most conserved and amplest RNA, the secondary structure of a typical cloverleaf consisted of one acceptor stem, a dihydrouridine loop (D-loop), an anticodon loop, a TΨC loop, and related arms fixing with them (Su et al. 2020). However, in nematodes, most tRNAs were different from other metazoan animals. In our study, 16 of 20 tRNAs (excluding tRNA-Ser1 and tRNA-Ser2) lacked a TΨC loop, replaced by several nucleotide residues which compromised the TV-replacement loop (Hu et al. 2004). The tRNAs tRNA-His, tRNA-Ile, and tRNA-Met were observed in a relatively standard cloverleaf structure with a TΨC loop, though the latter two (tRNA-Ile and tRNA-Met) lacked DHU-stem. The tRNA-Ser1 and tRNA-Ser2 were similar to previous reports with one TΨC-loop but lacked D-loop (Su et al. 2020), while tRNA-Lys had a TΨC-arm with a short of TΨC-loop.
Ribosomal RNAs of Contracaecum sp. were fixed as GA3 pattern. The rrnL was located between tRNA-His and nad3 with a size of 959 bp, the rrnS gene was located between tRNA-Glu and tRNA-Ser2 with a size of 711 bp (Table 1). The content of A + T for rrnL and rrnS were 75.6% and 70.6%, respectively. There were two NCRs among the mt genome of Contracaecum sp.. One short region was placed in nad4 and cox1 with a length of 122 bp, and the long region was situated between tRNA-Ser2 and tRNA-Asn with a length of 691 bp.
Nucleotide variation of genus Contracaecum
Based on aligned nucleotide sequences among species C. osculatum, C. rudolphii, C. ogmorhini, and Contracaecum sp., nucleotide diversities (Pi) were calculated based on the sliding window. The values of Pi were ranged from 0.124 to 0.181 by analyzing window 300 bp and default step 25 bp (Fig. 3). The most variable genes were cytb (0.178), nad2 (0.181), nad4 (0.179) and nad6 (0.172), and the most conserved genes were cox1 (0.124) and cox2 (0.130) in Contracaecum (Fig. 3). Protein genes cox1 and cox2 seemed to be the most stable genes in Contracaecum nematodes with the least variation, which could be used as molecular markers to identify species from Contracaecum. Results also supported that nad2 and nad4 could act as alternative markers among nematodes isolated from different environments.
Phylogenetic analyses
The present phylogenetic tree is based on the 12 PCGs of 41 available mt sequences from the superfamily Ascaridoidea and Heterakoidea (Table S2). Two phylogenetic trees, both BI and ML, had similar topologies, excluding species within the superfamily Heterakoidea. The topologies of ML and BI phylogenetic trees were highly similar to previous studies (Liu et al. 2016; Zhang et al. 2021; Zhao et al. 2021). The present sample formed a branch with Contracaecum nematodes, indicating a closer relationship within the genus Contracaecum with strong support (Fig. 4), but a distinct distance from species that had been reported. According to the structure of phylogenetic trees, results supported previous reports that superfamily Ascaridoidea and Heterakoidea were monophyly, and families (including Ascarididae, Anisakidae, Heterocheiidae, Toxocaridae, and Cucullanidae) within it were all monophyletic (Li et al. 2018; Zhao et al. 2021).
For ML and BI analyses, two trees had identical topologies within the superfamily Ascaridoidea. Among the family Ascarididae, the genera Ascaris, Baylisascaris, Toxascaris, and Parascaris had a closer relationship than Ophidascaris similarly to Zhou et al. (2021) reported. According to morphological descriptions of the genus Ophidascaris, Ophidascaris had been classified as a member genus of the superfamily Ascaridoidea (Pinto et al. 2010), and based on phylogenetic analyses, Ophidascaris more related to the family Ascaridae. While compared with other genera in Ascaridae, there was a relatively long evolutionary distance in Ascaridae. A previous study presented that the family Ascarididae was more related to Toxocaridae (Zhou et al. 2021). In this study, the family Ascarididae was closer taxa to Anisakidae than Toxocaridae (Fig. 4). In addition, all five families and all eleven genera within the superfamily Ascaridoidea were monophyletic with strong support (Bpp = 1, Bf > 70, Fig. 4), verifying the correctness of former studies (Liu et al. 2016; Zhao et al. 2018).
In ML analysis, present findings showed strong statistical support (Bf = 100) that Ascaridia galli and Heterakis species were sister taxa, similar to former studies report (Liu et al. 2016). Similar to Liu et al. (2013b) reported Ascaridia columbae was more related to Ascaridia sp. than A. galli. However, especially in the BI analyzing tree, species A. galli formed a separate genus which created sister relationships with genera Heterakis and Ascaridia with strong support (Bpp = 1). The results proposed the hypothesis that Heterakidae had a closer relationship with Ascaridiidae, and the family Ascaridiidae was paraphyly within Heterakoidea. However, due to limited molecular data among Heterakoidea, it is better to catch more mt sequences to verify the relationship within this superfamily.