A trade-off between genome quality and sample size
In the NCBI genome database [46], the assembly levels of nuclear genomes are divided into four categories: complete genome, chromosome, scaffold, and contig. For a trade-off between genome quality and sample size, we evaluated the effects of genome assembly level on the detected abundances of NUMTs/NUPTs using a phylogenetic comparative method, phylogenetic generalized least squares (PGLS). The contig level was designated 1, and the other three levels (scaffold, chromosome, and complete genome) were designated 2; the PGLS regression subsequently revealed that the assembly level was significantly correlated with the number and total length of NUPTs (p < 0.05 for both cases) and the number of NUMTs (p = 0.037) but not the total length of NUMTs (Table 1). These results indicate that the abundances of NUMTs/NUPTs might be overestimated or underestimated when the analysed nuclear genomes are assembled at the contig level. Furthermore, we performed PGLS regression between NUMT/NUPT abundance and assembly level by assigning the scaffold level as 3 and the other two levels (chromosome and complete genome) as 4 (Table 1). The number of NUPTs was also significantly correlated with genome assembly level (p = 0.044). However, no significant correlations were detected between the assembly level and the number of NUMTs or the total length of NUMTs or NUPTs (Table 1). The scaffold level might also distort the estimation of the abundance of NUPTs, but the effects at the scaffold level are much weaker than those at the contig level. With a trade-off between the sample size and the accuracy of the estimation, we retained all of the nuclear genomes assembled at the scaffold, chromosome, and complete genome levels.
Table 1 Correlations between nuclear genome assembly level and the abundance of NUMTs/NUPTs.
|
NUMT
|
NUPT
|
Number of species
|
|
|
Contig (1)
|
16
|
8
|
Scaffold/chromosome/complete genome (2)
|
216
|
106
|
Number of NUMTs/NUPTs
|
|
|
l
|
0.983
|
0.401
|
R2
|
0.019
|
0.150
|
Slope
|
-0.713
|
1.393
|
p
|
0.037
|
2.1 ´ 10-5
|
Total length of NUMTs/NUPTs
|
|
|
l
|
0.942
|
0.718
|
R2
|
0.008
|
0.046
|
Slope
|
-0.468
|
0.807
|
p
|
0.185
|
0.022
|
|
|
|
Number of species
|
|
|
Scaffold (3)
|
151
|
56
|
Chromosome/complete genome (4)
|
65
|
50
|
Number of NUMTs/NUPTs
|
|
|
l
|
0.985
|
0.507
|
R2
|
4 ´ 10-4
|
0.038
|
Slope
|
0.050
|
0.346
|
p
|
0.769
|
0.044
|
Total length of NUMTs/NUPTs
|
|
|
l
|
0.935
|
0.745
|
R2
|
0.004
|
0.015
|
Slope
|
-0.187
|
0.239
|
p
|
0.346
|
0.218
|
For one plant for which the data of both NUMTs and NUPTs were available, we regarded it as two samples, one NUMT sample and one NUPT sample, rather than combining them into one norgDNA sample. The correlations were analysed by phylogenetic generalized least squares (PGLS) regression. The numbers in the parentheses after the genome assembly levels are the numbers that were assigned to the assembly levels in the PGLS regression analysis. l is the phylogenetic signal. The approach of this value to one necessitates the use of phylogenetic comparative methods, such as PGLS.
The most desiccation-tolerant organisms do not have excessive NUMTs/NUPTs
Water deficiency is one of the most common abiotic stress factors for organisms living on land, and terrestrial organisms have evolved two solutions to cope with environmental drying [47]. The first is to conserve water and avoid severe body water deprivation, such as waxy coatings on plant shoots and the protective cocoon of the African lungfish, Protopterus annectens [48]. The second solution is to tolerate body water loss. The term anhydrobiosis is often used for the almost completely dehydrated but viable state of an organism experiencing extreme desiccation [49, 50]. Desiccation tolerance exists on a scale where the most desiccation-tolerant organisms are those that can enter anhydrobiosis and thus survive desiccation at any stage of their life cycle. The commonly studied organisms of this group are bdelloid rotifers, tardigrades, and resurrection plants [51-53].
DSBs induced by prolonged desiccation have been suggested to open the “gateway to genetic exchange” and account for the high frequency of HGT in bdelloid rotifers [26-28, 30, 31, 35]. However, prolonged desiccation in tardigrades does not result in an elevated level of HGT [44]. Using EDT as a proxy for HDT, we tested whether prolonged desiccation enhanced HDT in these desiccation-tolerant invertebrates. We detected 22 and 4 NUMTs, with total lengths of 32 kb and 14 kb, in the genomes of the desiccation-tolerant bdelloid rotifers A. vaga and A. ricciae, respectively. Similarly, in the genome of the desiccation-tolerant tardigrade R. varieornatus, we found 53 NUMTs, with a total length of 33 kb. If exceptionally high HGT levels in bdelloid rotifers are attributed to the anticipated high accessibility of their nuclear genomes to exogenous DNA, the NUMT content in bdelloid rotifers should also be exceptionally high compared with that in other invertebrates. In invertebrates, the NUMT density in honeybee Apis mellifera [54] is considered "exceptionally high". To reduce methodological artefacts, we re-surveyed the honeybee genome with our parameters and detected 1791 NUMTs, with a total length of 724 kb. We also surveyed the NUMT abundances of other invertebrates and confirmed that the NUMT contents of bdelloid rotifers and tardigrades are not exceptionally high (Table 2). The NUMT contents of the desiccation-tolerant bdelloid rotifers and tardigrades are similar to the median NUMT content of other invertebrates but are 13-185 times lower than the average NUMT number of other invertebrates and are 2-6 times lower than the average value of the total NUMT length in other invertebrates (Table 2). The high level of HGT in desiccation-tolerant bdelloid rotifers is not accompanied by a high frequency of EDT.
Table 2 The norgDNA contents of invertebrates and plants
|
Type
|
Number of species
|
Number of norgDNAs
|
Total length of norgDNAs (kb)
|
Adineta vaga
|
NUMT
|
|
22
|
32
|
A. ricciae
|
NUMT
|
|
4
|
14
|
Ramazzottius varieornatus
|
NUMT
|
|
53
|
33
|
Other invertebrates
|
NUMT
|
193
|
742 ± 7810 (32)
|
97 ± 781 (15)
|
|
|
|
|
|
Dorcoceras hygrometricum
|
NUMT
|
|
3696
|
1232
|
|
NUPT
|
|
1610
|
467
|
Selaginella tamariscina
|
NUPT
|
|
637
|
520
|
Other plants
|
NUMT
|
67
|
9309 ± 26177 (4182)
|
2735 ± 9044 (895)
|
|
NUPT
|
118
|
3015 ± 3907 (2014)
|
940 ± 1280 (509)
|
The values of other invertebrates and other plants presented in this table are mean value ± S. D. (median value).
Resurrection plants can survive extreme desiccation and maintain a quiescent state for months to years [55]. To examine the relationship between prolonged desiccation and the frequency of DNA transfer, we also surveyed the abundance of NUMTs/NUPTs in these plants. In the genome of the flowering plant Dorcoceras hygrometricum, we detected 3696 NUMTs with a total length of 1232 kb and 1610 NUPTs with a total length of 467 kb. In the genome of the spike moss Selaginella tamariscina, we detected 637 NUPTs with a total length of 520 kb. We did not obtain the mitochondrial genome sequences of S. tamariscina and consequently did not obtain any results regarding its NUMT content. The NUMT content of the resurrection plant D. hygrometricum was much higher than the NUMT contents of the desiccation-tolerant bdelloid rotifers and tardigrades (Table 2). Nonetheless, previous studies, together with our own survey, have shown that plant genomes generally have much higher NUMT contents than invertebrates [20, 21]. The norgDNA contents we detected in the resurrection plants were lower than the average and median values of the norgDNA contents in other plants except that the total length of NUPTs in S. tamariscina was a little longer than the median value of the lengths of NUPTs in other plants (520 vs. 509 kb, Table 2).
Pairwise comparison of organisms differing in desiccation tolerance
In the second grade of desiccation tolerance, anhydrobiosis is restricted to particular developmental stages, such as dormant eggs of the water flea Daphnia, cysts of primitive crustaceans such as the brine shrimp Artemia salina, and orthodox seeds of most angiosperms [56, 57]. In the last grade, all stages of the life cycle are desiccation sensitive, including the eggs of animals, recalcitrant seeds, and the embryos of viviparous plants. From this perspective, a viviparous plant or a plant with recalcitrant seeds is more sensitive to desiccation than a plant with orthodox seeds.
To reach a general conclusion regarding the relationship between desiccation and the frequency of EDT, we compared 24 pairs of lineages for which each pair of lineages differed in desiccation tolerance (Table 3). The closest relatives used as controls for desiccation-tolerant organisms were selected from widely used phylogenetic databases, including Timetree, NCBI taxonomy, and the Angiosperm Phylogeny Website [58-60]. In cases where one phylogenetic branch contained two desiccation-tolerant species or two control species, we used the average value of the NUMT/NUPT contents of the two species. For example, the average value of desiccation-tolerant bdelloid rotifers A. vaga and A. ricciae was used to compare the average value of desiccation-sensitive bdelloid rotifers R. magnacalcarata and R. macrura. Nonparametric pairwise comparison did not reveal significant differences between desiccation-tolerant lineages and their controls in either the number of NUMTs/NUPTs (Wilcoxon signed-ranks test, P = 0.269, Fig. 1a) or the total length of NUMTs/NUPTs (Wilcoxon signed-ranks test, P = 0.881, Fig. 1b). Overall, a large nuclear genome is expected to have additional sites for NUMT/NUPT integration and is thus likely to contain additional NUMTs/NUPTs. To control for the influence of nuclear genome size, we compared the density of NUMTs/NUPTs in nuclear genomes and still found no significant differences between desiccation-tolerant lineages and their controls (Fig. 1c, d). These pairwise comparisons were also performed within invertebrates and within plants separately. Nevertheless, no significant differences were observed (Wilcoxon signed-ranks test, P > 0.10 for all the cases). Furthermore, we separated the 24 pairs into two groups according to the average values of the nuclear genome size of each pair. No significant differences were observed in either organisms with large nuclear genomes or organisms with small nuclear genomes (Wilcoxon signed-ranks test, P > 0.10 for all the cases).
Because insertions of norgDNAs into coding regions would disrupt gene function, a compact nuclear genome is a less accessible acceptor than a nuclear genome with a high percentage of noncoding sequences. Therefore, it was necessary to check whether the paired genomes had similar levels of compactness. From the NCBI genome database [46], the Ensembl Genomes database (Release 41) [61], and the Ensembl database (Release 94) [62], we got the annotation files of coding regions for six pairs of species. Pairwise comparison did not reveal a significant difference in genome compactness (Wilcoxon signed-ranks test, P = 0.293). As the NUMTs and NUPTs of Sorghum bicolor and Zea mays were regarded as independent samples, these six pairs of species have seven pairs of NUMT/NUPT data. The Wilcoxon signed-ranks test of these seven pairs did not reveal any significant differences in the abundances of NUMTs/NUPTs (P > 0.10).
Although most EDTs occur via direct transfer of organellar DNA to the nucleus, there is also evidence supporting EDT mediated by RNA molecules [63]. Therefore, we also compared the NUMTs/NUPTs contributed by the transcribed regions of organellar genomes, i.e., protein-coding genes, tRNA, and rRNA. Similar to the above results, no significant difference was detected between desiccation-tolerant species and their controls (Wilcoxon signed-ranks test, P > 0.10 for all cases, Additional file 1: Table S1).
Table 3 Desiccation-tolerant organisms and their controls used in this study
|
Desiccation-tolerant species
|
Control
|
Invertebrates
|
|
|
Bdelloidea
|
Adineta ricciae and A. vaga [29]{Nowell, 2018 #30}
|
Rotaria macrura and R. magnacalcarata [29]
|
Hypsibioidea
|
Ramazzottius varieornatus [44]
|
Hypsibius dujardini [44, 102]
|
Culicidae
|
Aedes aegypti [105, 106]{Nowell, 2018 #30}
|
Culex quinquefasciatus [107]
|
Anopheles
|
Anopheles culicifacies [108]
|
Anopheles funestus [109]
|
Dermatophagoides
|
Dermatophagoides farina [110]
|
Dermatophagoides pteronyssinus [110]
|
Meloidogyne
|
Meloidogyne javanica [111]
|
Meloidogyne incognita [111]
|
Polypedilum
|
Polypedilum vanderplanki [112]
|
Polypedilum nubifer [112]
|
Steinernema
|
Steinernema carpocapsae [113]
|
Steinernema glaseri [113]
|
Plants
|
|
|
Pooideae
|
Aegilops tauschii [114]
|
Lolium perenne [115]
|
Fagales
|
Betula nana [101]
|
Juglans regia [116]
|
Malvaceae
|
Corchorus capsularis [101]
|
Theobroma cacao [117]
|
Pentapetalae
|
Dorcoceras hygrometricum [118]
|
Fagopyrum esculentum [119]
|
Malvoidae
|
Hibiscus syriacus [120]
|
Gossypium hirsutum [121]
|
Crotonoideae
|
Manihot esculenta [122]
|
Hevea brasiliensis [123]
|
Oryzeae
|
Oryza sativa Japonica Group [101]
|
Zizania latifolia [124]
|
Ericales
|
Primula veris [125]
|
Actinidia chinensis [126]
|
Selaginella
|
Selaginella tamariscina [127]
|
Selaginella moellendorffii [127]
|
Caryophyllales
|
Silene latifolia [128]
|
Spinacia oleracea [129]
|
Solanum
|
Solanum commersonii [130]
|
Solanum tuberosum [131]
|
Solanum
|
Solanum pennellii [132]
|
Solanum melongena [133]
|
Andropogonodae
|
Sorghum bicolor [134]
|
Zea mays [135]
|
Fungi
|
|
|
Onygenales
|
Coccidioides immitis [136]
|
Paracoccidioides brasiliensis [137]
|
Hypocreales
|
Fusarium graminearum [138]
|
Ophiocordyceps sinensis [139]
|
Pseudogymnoascus
|
Pseudogymnoascus pannorum [140]
|
Pseudogymnoascus destructans [140]
|
The particular words or sentences supporting the desiccation tolerance/sensitivity of the above species were extracted from the references and are shown in Additional file 1: Table S3.