A hot spring Asgard archaeon sheds light on the origin of eukaryotic endosomal system. CURRENT STATUS:

Recent metagenomics studies have identified a novel archaeal superphylum namely Asgard, which are characterized by enriched eukaryotic-specific proteins. In this study, we screened unclassified archaeal genomes in public databases and obtained a high-qualified metagenome-assembled genome that can be assigned as a novel family-level Asgard member namely Odinarchaeceae Tengchong. Metabolic analysis indicates an autotrophic lifestyle of this hot spring archaeon with a complete tetrahydromethanopterin Wood-Ljungdahl pathway for carbon dioxide reduction and an arsenic efflux detoxification. Examination of public databases found that thus far Odinarchaeceae Tengchong may be the only prokaryote that encodes a C-terminal domain of Vps28 in the endosomal sorting complex required for transport (ESCRT), a critical connector of multiple ESCRT components. Therefore, the identification of this archaeon provides valuable evidence of the archaeal origin of eukaryotic ESCRT. We posit that all the key components of the eukaryotic endosomal system might have evolved from a common ancestor of Asgard archaea and eukaryotes. suggests functional divergence of these proteins in evolutionary history of this superphylum. In this study, we performed analysis of unclassified archaeal MAGs across public databases. A novel family-level Asgard archaeon Odinarchaeceae Tengchong was identified and it is distinguishable from other prokaryotes in that it encodes all conserved components of ESCRT machinery including Vps28 CTD . These ESCRT components shares highly similar characteristics with eukaryotic counterparts. The complete and conserved ESCRT machinery in Odinarchaeceae Tengchong indicates that ESCRT-I and -II systems possibly originated from the common ancestor of Asgard archaea and eukaryotes. Future work is encouraged to explore the details in the archaeal origin of the eukaryotic ESCRT machinery.


Introduction
Asgard archaea are comprised of at least five phyla including Heimdallarchaeota, Lokiarchaeota, Odinarchaeota, Thorarchaeota and Helarchaeota (Seitz et al. 2019;Zaremba-Niedzwiedzka et al. 2017). They are found from a variety of environments including lake sediments, mangrove sediments, estuarine sediments and mud volcanos (Cai et al. 2018). Recently, a Lokiarchaeota-related archaeon (Candidatus Prometheoarchaeum syntrophicum strain MK-D1) has been reported to grow in co-culture with a methanogen (Imachi et al. 2019). Metagenomic analysis indicated that Asgard archaea are of divergent metabolic capabilities and potentially live a mixotrophic life style (Cai et al. 2018). For example, Lokiarchaeota and Thorarchaeota use both the tetrahydrofolate Wood-Ljungdahl (THF/H 4 F-WL) and tetrahydromethanopterin Wood-Ljungdahl (THMPT/H 4 MPT-WL) pathway for fixing CO 2 and performing acetogenesis. They also have the capability to degrade organic matter (Cai et al. 2018;Spang et al. 2019). Heimdallarchaeota and Odinarchaeota can only use THF-WL or THMPT-WL pathway, respectively (Cai et al. 2018). In particular, Heimdallarchaeota is the only group of Asgard archaea that possesses the complete set of genes for the forward and reverse tricarboxylic acid (TCA) cycle (Cai et al. 2018).
Asgard archaea are also noted by enrichment in eukaryotic signature proteins (ESPs) (Zaremba-Niedzwiedzka et al. 2017), which are considered to be ubiquitous in eukaryotes but have few homologs in bacteria and other archaea. For instance, metagenome-assembled genomes (MAGs) of Thorarchaeota encode several eukaryotic membrane-trafficking components and a odinarchaeal MAG possesses a bona fide eukaryotic tubulin (Zaremba-Niedzwiedzka et al. 2017). These ESPs shed light on the origin of eukaryotic cellular complexity. However, only partial components of some key eukaryotic-specific processes are encoded by Asgard archaea. Archaeal homologues of some other essential membrane-trafficking components including coat protein and adaptor protein complexes are yet to be discovered (Dacks and Robinson 2017).
The endosomal sorting complex required for transport (ESCRT) is an important part of the eukaryotic endomembrane system and a substantial feature that distinguishes eukaryotes from prokaryotes.
ESCRT is involved in cytokinetic abscission and phagophore formation. Moreover, it mediates the sorting of ubiquitylated membrane proteins into multivesicular bodies and plays an important role in vesicular trafficking processes (Henne et al. 2011). Since the development of vesicle trafficking is pivotal in the formation of a more complex endomembrane system, the discovery of ESCRT components in archaeal genomes is of great significance. However, up to now, only part of the ESCRT components have been found in archaea. Among all ESCRT components, three are conserved across the eukaryotic lineages: ESCRT-I, -II and -III (Leung et al. 2008). ESCRT-III has been found in many archaeal lineages (Obita et al. 2007;Samson et al. 2008), while ESCRT-I and ESCRT-II are only enriched in Asgard group (Spang et al. 2015;Zaremba-Niedzwiedzka et al. 2017). ESCRT-II is encoded by all Asgard archaea but ESCRT-I is only reported in Heimdallarchaeota LC_3 and Lokiarchaeum sp.
GC14_75 (Zaremba-Niedzwiedzka et al. 2017). The incompleteness of ESCRT in Asgard archaea (Spang et al. 2015) limits our understanding of the origin of this system.
The lack of high-quality genomes in each phylum of Asgard has created confusion and contradiction in Asgard phylogeny and evolution (Betts et al. 2018;Da Cunha et al. 2017;Rokas et al. 2018;Spang et al. 2018;Xiao et al. 2019;Zaremba-Niedzwiedzka et al. 2017), especially for the phylum Odinarchaeota. Here we conducted a screening of unclassified archaeal MAGs in public databases and identified a novel Asgard archaeon, which was originally sampled from hot spring sediments. This MAG belongs to the phylum Odinarchaeota and is 98.13% complete and has 2.8% contamination. It is to date the only discovered Asgard archaeon that contains a complete and conserved ESCRT system, which may shed light on the origination of eukaryotic endosomal system.

Materials And Methods
Phylogenetic identification of novel archaeal MAGs. The unclassified archaeal MAGs were downloaded from IMG and NCBI databases. Their information is provided in Table S1. The methods of assembly and binning of Odinarchaeceae Tengchong were described in the analysis project in JGI database (https://gold.jgi.doe.gov/analysis_projects?id=Ga0181714). The JGI study project Gs0127627 was recently published in the study of Hua et al (Hua et al. 2019). Specifically, metagenomic assembly and binning were performed with SPAdes v.3.9.0 (Bankevich et al. 2012) and Metabat (Kang et al. 2015) & ESOM (Ultsch and Mörchen 2005), respectively. Protein sequences of the studied MAGs were downloaded from the IMG database. Specifically, protein sequences of Odinarchaeceae Tengchong were predicted with Prodigal (v2.6.3) (Hyatt et al. 2010) with the "-p single" option, as described in the study of Hua et al (Hua et al. 2019).
Identification of ESPs. Protein domains were identified with IPRscan with default parameters (Jones et al. 2014). As to the definition of ESP (Hartman and Fedorov 2002), they are present in most eukaryotes but absent in almost all archaea and bacteria. Accordingly, we selected eukaryoticspecific IPR domains based on the standard that each was detected in more than 500 eukaryotes but less than 20 archaea and 20 bacteria. Proteins with eukaryotic-specific IPR domains were identified as ESPs. It is needed to be specified here that both COG5491 of the CDD database and PF03357 of the Pfam database are annotated as Snf7 family. However, contents of COG5491 and PF03357 are different in two ways. First, the seed alignment of PF03357 contains 32 eukaryotic sequences (https://pfam.xfam.org/family/PF03357#tabview=tab5), while that of COG5491 contain six archaeal sequences and six yeast sequences (https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi? uid=227778). Second, PF03357 contains sequences annotated as eukaryotic-specific endosomemediated trafficking in function, while those of COG5491 are annotated as general cellular functions such as cell cycle control, cell division and chromosome partitioning. Therefore, COG5491 and PF03357 represent two functionally distinguishable groups of proteins although these proteins all belong to the Snf7 family. In this study, we think PF03357 is more suitable to annotate eukaryotic specific proteins in archaeal genomes. Furthermore, ESPs of ESCRT components in Asgard were also searched using HHpred (Soding et al. 2005) and PSI-BLAST in HHpred Server (Zimmermann et al. 2018) with default E-value cutoff (< 1-e3). Standard databases used in HHpred and PSI-BLAST analyses were PDB_mmCIF70_4_Feb (default) and uniprot_tembl_6_Dec, respectively. ESPs of ESCRT were strictly identified as positive protein domains only when they were detected by all of the IPRscan, HHpred and PSI-BLAST methods. Phylogenetic analyses were performed on selective ESPs.

Sequences of the selected ESPs were aligned by using MAFFT-L-INS-I (Katoh and Standley 2013) and
trimmed by using trimAl (Capella-Gutierrez et al. 2009) with gappyout option. Maximum likelihood inference was performed on the trimmed alignments by using RAxML (Stamatakis 2014) with PROTGAMMALG model (10 independent trees were generated for optimization) and 100 nonparametric bootstrap replicates. Protein structure of ESPs were predicted with homology modelling by using Phyre2 (Kelley et al. 2015) tool. Chimera (Pettersen et al. 2004)

Taxonomic classification
Phylogenetic analysis of 51 unclassified archaeal MAGs based on 15 concatenated ribosomal proteins revealed a novel Asgard archaeon (IMG Taxon ID 2721755898) that is closely related to Odinarchaeceae LCB4 (Fig. 1). The identity of 16S rRNA gene sequences between the novel archaeon and Odinarchaeceae LCB4 is 85.8%, which is below the family threshold of 86.5% but above the order threshold of 83.6% (Yarza et al. 2014). Therefore, this novel archaeon is a family-level novel Asgard member. We named it Odinarchaeceae Tengchong, referring to the place where its discovered -Tengchong, Yunnan, China (Hua et al. 2019). The taxonomic classification of Odinarchaeceae Tengchong was futher confirmed with in-depth phylogenetic analyses of 55 concatenated ribosomal proteins in more qualified Asgard genomes (Fig. 2). The MAG of Odinarchaeceae Tengchong is 2.21M bp in size with a G+C content of 47.6%. It is 98.13% complete and contains 2.8% contaminated sequences as assessed by using CheckM (Parks et al. 2015), while is 91.36% complete as predicted by using A'nvio (Eren et al. 2015).

Metabolism of Odinarchaeceae Tengchong
Metabolic reconstruction indicated that Odinarchaeceae Tengchong encodes the complete reductive acetyl-CoA WL pathway suggesting its ability of assimilating CO 2 (Fig. 3) (Borrel et al. 2016). However, Odinarchaeceae Tengchong lacks the gene encoding methyl-CoM reductase, a key enzyme required for methane production.
The MAG of Odinarchaeceae Tengchong contained a nearly complete glycolytic pathway except two key enzymes, glucose-6-phosphate isomerase (5.3.1.9) and pyruvate kinase (2.7.1.40) (Fig. 3). This result is consistent with previous studies for Odinarchaeceae LCB4 (Cai et al. 2018;Spang et al. 2019), indicating that archaea in the Odinarchaeota phylum may not have a functional glycolysis pathway. In addition, most of the key enzymes in TCA cycle were missing in Odinarchaeceae Tengchong, suggesting that this archaeon does not rely on this pathway for energy generation.

Distribution of ESPs in Asgard phyla
Over 1,860 eukaryotic-specific InterPro (IPR) domains (Jones et al. 2014) were identified in this study (Table S4). Among Asgard archaea, two types of ESP distribution patterns are noticeable. In Odinarchaeota, Lokiarchaeota and Heimdalarchaeota, the ESCRT and ubiquitin modifier systems are enriched but the trafficking machinery is lacking, whereas in Thorarchaeota, the opposite is true (Fig.   4). We named them 'ESCRT-ubiquitin modifier-enriched pattern' and 'trafficking machinery-enriched pattern', respectively.
Previous studies have questioned the purity of the MAGs of Asgard archea (Da Cunha et al., 2017;(Garg et al. 2019). While the MAG of Odinarchaeceae Tengchong appears to be high quality based on single-copy gene justification, we cannot rule out the possibility that some of the ESPs predicted in this MAG may be exogenous. To further ensure the ESPs discovered are encoded by Odinarchaeceae Tengchong, we manually examinated the G+C content distribution in the MAG of Odinarchaeceae Tengchong especially around the regions encoding the ESPs (Table S5) and detected no evidence of outstanding differences with the rest regions indicating that the Odinarchaeceae Tengchong MAG suffers little in assembly or binning errors.

Similarity of the ESCRT proteins between Odinarchaeceae Tengchong, other archaea and eukaryotes
All conserved components of the ESCRT system were identified in the MAG of Odinarchaeceae Tengchong including an accessory component -vacuolar fusion protein Mon1 (Table 1, Table S6 and Fig. 4). Notably, Odinarchaeceae Tengchong uniquely encodes the C-terminal domain of Vps28 (Vps28 CTD ) in ESCRT-I compared to other prokaryotes including other Asgard archaea (Fig. 5). The Vps28 CTD of Odinarchaeceae Tengchong locates precisely at the C-terminal of Vps28, which is similar to most eukaryotic counterparts (Fig. 5). Homology modelling analysis shows the higher similiarity of the C-terminal squences of Vps28 (with 26% identity in the aligned 119 th -128 th amino acid) between Odinarchaeceae Tengchong and its eukaryotic templete (c2j9wB_ in Xenopus laevis), compared to other Asgard archaea.
The structural models of ESCRT components between archaea and eukaryotic counterparts were generated and compared based on the PDB database. (Fig. 6 and Table 2). In both Odinarchaeceae Tengchong and Caenorhabditis briggsae, Vps28-like proteins of ESCRT-I have alpha helixes of Cterminal domains (Fig. 6a-6b). Their steadiness box proteins of ESCRT-I have both alpha helixes and beta strands (Fig. 6e-6f), while the homologues of Lokiarchaeum sp. GC14_75 and Thorarchaeota AB25 only have one secondary structure, respectively ( Fig. 6g-6h). Furthermer, Snf7 structure of C.
briggsae resembles those of Odinarchaeceae Tengchong and Heimdallarchaeota LC3, but is vastly different from the structure of remote homology in Nitrososphaera viennensis EN76, a species of Thaumarchaeota ( Fig. 6l-6o). Compared with other Asgard and TACK archaea (TACK superphylum comprises the Thaumarchaeota, Crenarchaeota and Korarchaeota as decrisebed in the study of Guy et al. (Guy and Ettema 2011) ), all ESCRT components of Odinarchaeceae Tengchong showed high model confidence, alignment coverage and identity to the eukaryotic template proteins (Table 2).
To further exclude the possibility that the ESCRT proteins of Odinarchaeceae Tengchong are from contamination of eukaryotic sequences, we conducted a genomic G+C content-based examination.
G+C contents of the DNA sequences of the ESPs range from 39.0% to 53.4%, which is similar to that of the rest of the MAG (i.e. 47.6%).

Odinarchaeceae Tengchong may be a thermophilic autotroph
Odinarchaeceae Tengchong has a complete WL pathway, indicating it has the potential for autotrophic metabolism. Meanwhile, the WL pathway is usually linked to methanogenesis and acetogenesis in Archaea (Ljungdahl 2009). Methanogenesis is considered to be one of the oldest metabolic pathways in Archaea (Borrel et al. 2016), and plays an important role in the global carbon cycle Lyu et al. 2018). Genomic analysis showed that Odinarchaeceae Tengchong had the potential to produce acetate but lacked some genes involved in methane production. The results are consistent with the metabolic characteristics of Asgard archaea and some Bathyarchaeota (Cai et al. 2018;Zhou et al. 2018). On the other hand, the missing of most of the enzymes in the TCA cycle is consistent with previous studies for Odinarchaeceae LCB4 (Cai et al. 2018;Spang et al. 2019), indicating that Odinarchaeota may be unable to conduct heterotrophic metabolism. Thus, we posit that Odinarchaeceae Tengchong is an anaerobic thermophilic autotroph living in the sediments of hot springs in which the oxygen content is relatively low and organic matter may be limited.

It is intriguing to find the complete pathway for arsenic efflux detoxification in Odinarchaeceae
Tengchong. Arsenic is a toxic element and, in response to its toxicity, prokaryotes have evolved a variety of coping strategies (Paez-Espino et al. 2009 (Guo et al. 2017). Other studies also reported genes related to arsenic metabolism in Odinarchaeceae LCB4 (Cai et al. 2018;Spang et al. 2019) and Thorarchaeota . Thus, the arsenic efflux detoxification pathway may reflect adaption towards arsenic toxicity by diverse species of archaea.

Odinarchaeceae Tengchong shares highly similar characteristics of ESCRT proteins with eukaryotes.
With an independently folded four-helical bundle, Vps28 CTD function as a critical connector of multiple ESCRT components (Pineda-Molina et al., 2006). The interactions between Vps28 CTD and other ESCRT components are essentially required for the sorting function (Bowers et al. 2004;Gill et al. 2007;Teo et al. 2006). The deletion of Vps28 CTD leads to accumulation of cargoes and large aberrant late multivesicular endosomes (MVE), and finally blocks the sorting function of endosome, which is termed "class E compartment" (Kostelansky et al. 2006). According to our analysis, all conserved ESCRT components (ESCRT-I, -II and -III) and vesicle trafficking protein related with endosomal sorting (vacuolar fusion protein Mon1) are encoded by Odinarchaeceae Tengchong. Specifically, Odinarchaeceae Tengchong to date is the only known prokaryote encodes Vps28 CTD (Fig. 4).
Vps28 CTD of Odinarchaeceae Tengchong locates precisely at the C-terminal of Vps28 as that of eukaryotes do (Fig. 5). Therefore, the MAG of Odinarchaeceae Tengchong may encode the currently most complete archaeal set of ESCRT proteins. On the other hand, homology modelling shows that ESCRT proteins of Odinarchaeceae Tengchong have high similarity evaluations with the eukaryotic counterparts, including model confidence, alignment coverage and identity (Table 2). ESCRT-I proteins of Odinarchaeceae Tengchong and eukaryotes have more common secondary structures, including alpha helixes of Vps28 CTD and alpha helixes and beta strands of steadiness box proteins ( Fig. 6a-6h). These results suggest ESCRT proteins of Odinarchaeceae Tengchong are among those mostly closely related to the eukaryotic counterparts.

Heterogeneous evolutionary histories of eukaryotic ESCRT components and membrane trafficking proteins
The respective origins of eukaryotic ESCRT components are still in mystery. Endosomal systems are comprised of four ESCRT components: ESCRT-0, -I, -II and -III, plus accessory components (such as vesicle trafficking proteins). Among them, ESCRT-I, -II and -III are conserved across the eukaryotic lineages (Leung et al. 2008). Phylogenetic analysis in this study revealed different evolutionary histories of ESCRT components: ESCRT-III might emerge earlier than ESCRT-I and ESCRT-II.
Specifically, ESCRT-III likely originated from the common ancestor of TACK, Asgard archaea and eukaryotes (Fig. S3e, Fig. S4e). A possible functional divergence of ESCRT-III occurred in the common ancestor of Thaumarchaeota archaea, Asgard archaea and eukaryotes supported by Snf7 protein structure divergence between Asgard archaea and N. viennensis EN76 (Fig. 6l-6o).
In contrast, ESCRT-I and ESCRT-II components seemed to originate later in the common ancestor of Asgard and eukaryotes (Fig. S3a-S3d, Fig. S4a-S4d and Fig. 7). During the divergence of Asgard lineages, Odinarchaeceae Tengchong preserved all the ESCRT components while other Asgard species in may have gradually lost certain ESCRT-I components. Furthermore, the obvious 'ESCRT-ubiquitin modifier-enriched pattern' and 'trafficking machinery-enriched pattern' of ESPs in Asgard archaea suggests functional divergence of these proteins in evolutionary history of this superphylum.
In this study, we performed analysis of unclassified archaeal MAGs across public databases. A novel family-level Asgard archaeon Odinarchaeceae Tengchong was identified and it is distinguishable from other prokaryotes in that it encodes all conserved components of ESCRT machinery including Vps28 CTD . These ESCRT components shares highly similar characteristics with eukaryotic counterparts. The complete and conserved ESCRT machinery in Odinarchaeceae Tengchong indicates that ESCRT-I and -II systems possibly originated from the common ancestor of Asgard archaea and eukaryotes. Future work is encouraged to explore the details in the archaeal origin of the eukaryotic ESCRT machinery.

Ethics approval and Consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and material
Not applicable.

Competing interests
The authors have declared that no competing interests exist.

Funding
This work was supported by the National Natural Science Foundation of China No. 81774152 (RZ) and No. 91951120 (LF), the Shanghai Committee of Science and Technology 16ZR1449800 (RZ), and the Shenzhen Science and Technology Innovation Commission JCYJ20180305123458107 (LF). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Authors' contributions
RZ and LF conceived and designed the project. Each author has contributed significantly to data analysis. WL and LF drafted the manuscript. JX, SF, YX, RZ, RQ and RZ revised the manuscript. All authors read and approved the final manuscript.       family protein of ESCRT-III. In each predicted protein structure, N-terminus to C-terminus are indicated by rainbow colours (red to purple). Helix shows alpha helix structure and arrow refers to beta strand structure.

Figure 7
Hypothetical origination of eukaryotic ESCRT components.