The quality of the genome was paramount and facilitated using domesticated sporophyte samples maintained in culture with relatively minimal microbial contamination. Genomic contaminants are a recurrent concern when creating any draft genome for an organism [50]. Microbes are a primary source of genomic contaminant in marine ecosystems, as they represent about 90% of the biomass in the ocean [51] and can establish intimate associations with multicellular hosts [52]. Seaweeds are classic examples of marine holobionts as they host dense communities of microbes that play key roles to host health and defense [10, 53]. In this regard, a complete separation between a seaweed and their microbial genome is often unfeasible, even in culture. However, minimizing microbial contamination, as well as acknowledging their genomic signatures should be considered prior to confirming a draft genome. Our microscopy images revealed extensive biofouling in wild samples of A. taxiformis (Fig. 1A,B), in comparison with the same species kept in cultures (Fig. 1C,D). The epiphytic aggregations in wild A. taxiformis morphologically resembled microalgae and prokaryotes, which are common epiphytes across seaweed groups [54].
The rich microbial community associated with wild A. taxiformis were reinforced by the genomic analysis, as there was a significantly higher abundance of microbial OTUs associated with wild individuals in comparison with cultured Asparagopsis (Table 1, Fig. S2). Both microscopy and genomic analysis indicated that the microbial community was less abundant on cultured A. taxiformis and, therefore, these were selected for establishing the A. taxiformis genome. Details of the microbiomes associated with different life history stages of this seaweed maintained under natural and cultured conditions, is of interest for future research.
Table 1
One-way ANOVA results for microbial OTU richness associated with wild versus cultured Asparagopsis taxiformis individuals. SS = sum of squares, df = degsrees of freendon, MS = mean square.
Source of Variation | SS | df | MS | F | P-value | F crit |
Between Groups | 1484185.34 | 1 | 1484185.34 | 40.5938681 | 0.00012957 | 5.11735503 |
Within Groups | 329056.3 | 9 | 36561.8111 | | | |
Total | 1813241.64 | 10 | | | | |
Phylogeny of Asparagopsis and sample used for genomic analysis
Our phylogenetic analysis of the cox2-cox3 spacer, commonly used in intra-specific diversity studies in Aaparagopsis, indicated that the sporophyte sample used in the genome sequencing (2019; submitted as NCBI GenBank accession number OP779373), is a member of lineage 6 of A. taxiformis [22] (Fig. S3). This lineage is common in tropical Australia, Queensland and Western Australia. Our sample is part of a strain designated the Great Barrier Reef strain (GBR) [22] and used in feeding trials in Australia (41–44). The genetic diversity of Asparagopsis, both A. taxiformis and A. armata, indicates that multiple distinct lineages are found in both species, that suggest there may be many cryptic species within these morphological forms, although formal taxonomic changes have not been made [22, 23]. While this tropical Australian lineage is the best studied so far, the monophyly of all these other taxonomically related ‘A. taxiformis’, and including A. armata, lineages suggested that the genomic information presented here will serve and an important resource, and reference for future comparative studies on these other lineages.
Genomic features of Asparagopsis taxiformis, Lineage 6 (L6)
We present the first draft genome for the red seaweed A. taxiformis (L6) at 142 Mb in size. This high-quality nuclear genome for Asparagopsis, with its large total assembly size (total base), average contig length and median contig length, provided the basis for the following multi-omics exploration of this seaweed. As shown in Table 2, the assembled genome of A. taxiformis (L6) contained 3,308 contigs with a median length of 55,006 bp. By comparison to the only other reported A. taxiformis assemblies [55], this assembly had a larger total assembly size (total base), higher average contig length and higher median contig length (N50). Based on the 255 Eukaryotic BUSCO groups, we found 208 complete BUSCOs (81.6% complete genes), among which there were 183 complete single copy BUSCOs. In addition, there were only 11 fragmented BUSCOs and 36 absent BUSCOs. Therefore, the A. taxiformis (L6) had a better overall assembly, compared to A. taxiformis Guam (GCA_018397955.1) and A. taxiformis California (GCA_018397975.1) (File S1b).
Table 2
Overall statistics for A. taxiformis Sunshine Coast genome assembly.
Assembly statistics | |
Total base | 142,472,235 | |
Number of contigs | 3,308 |
Average length | 43,069 |
Largest contig length | 523,412 |
N50 | 55,006 |
Gaps | 0 |
BUSCO overview | |
Overall coverage (C/Total) | 81.60% |
Complete BUSCOs (C) | 208 |
Complete and single-copy BUSCOs (S) | 183 |
Complete and duplicated BUSCOs (D) | 25 |
Fragmented BUSCOs (F) | 11 |
Missing BUSCOs (M) | 36 |
Total BUSCO groups searched | 255 |
Approximately 70.67% of the A. taxiformis (L6) assembly was interspersed repeats, including 20.71% retroelements, 33.48% DNA transposons, 0.21% rolling circles and 16.47% unclassified repeats. The conserved gene structures collected from BUSCO were used for training in AUGUSTUS and predicted 10,867 putative genes in A. taxiformis (L6) genome (File S1c). In addition, there are full-length genomes for the mitochondria (26,034 bp) and plastid (176,936 bp). By using sequence-based searching against our mitochondrial and plastid genomes, we predicted 50 mitochondrial genes and 227 plastid genes, respectively (File S1d). Among the 50 mitochondrial genes, we found two ribosomal RNA genes and 22 different tRNA genes.
We explored the comparative nuclear genome size and the number of protein-coding genes across 8 red seaweed whole genomes (Fig. 2A). The A. taxiformis (L6) had the largest genome at 142.472 Mb, yet ranked third in terms of number of protein-coding genes. Porphyra umbilicalis contains a predicted 13,125 genes, and a genome size of 87.889 Mb [1]. In total, the A. taxiformis (L6) genome contained 6,858 monoexonic genes (63.13%). The ratio between average and median intergenic gene distances (3.78) indicated the gene organization was not as compact as for C. crispus (8.625), whose genes had exceptionally low gene density and were highly clustered [40].
Metagenomic analysis of the genome presented strong evidence to support our microscopic analysis, in that A. taxiformis (L6) sporophytes cultured in semi-sterile conditions harbored a reduced microbial community, compared to wild sporophytes. Despite this, a total of 258 reads were taxonomically assigned to prokaryotes or viral groups (File S1e), which corresponded to 0.7% of the A. taxiformis (L6) sequences. A previous study using Kraken screening to investigate genomic contaminants in eukaryotic draft genomes demonstrated that an average of 10% of their pseudo-reads (reads reconstructed from assembled genomes) were identified as foreign species, mostly matching bacteria [50]. Thus, microbial sequences were present in low proportions in the A. taxiformis (L6) genome, primarily consisting of Alphaproteobacteria groups, in which Rhodobacterales were the most abundant order followed by Rhizobiales. The dominance of these groups in A. taxiformis (L6) genome are in accordance with other red seaweed microbial profiles [56]. Members of these orders are noted for often stablishing mutualistic and symbiotic relationships with marine hosts, playing key roles in biogeochemical cycling and metabolic pathways [57–59].
Comparative and functional features of the A. taxiformis (L6) genome
To provide a well-annotated genomics resource, we utilized a BLAST2GO pipeline, which is the most efficient approach to associate genes to known proteins with function. Of the 10,867 genes, there were 8,856 genes (81.52%) with homologous hits against the NCBI non-redundant (NR) database based on an E-value of 0.00001. The similarity for each BLAST hit was defined by the division of exact number of matches over the length of the hit. The majority of matching hits had over 60% similarity (Fig. S4) and the species distribution showed that the highest identity was with Chondrus crispus (2,583 genes) followed by Gracilariopsis chorda (2,182 genes) (Fig. 2B), reflecting their evolutionary history [60]. Among the genes with NR homologs, 6,786 genes (62.47%) were further mapped to gene ontology (GO). By examining the top 10 GO, we found the most abundant categories consisted of cytosol and membrane genes (cellular components), peptide transport and protein phosphorylation (biological processes) and protein and ATP binding genes (molecular function) (Fig. S5).
All 10,867 A. taxiformis (L6) genes contained a protein functional site and domain (assigned by InterPro annotations), while 6,300 genes (57.99%) had Pfam protein family/domain mapping. By comparing the Pfam annotation among 7 other red algae genomes (same species as Fig. 2A), we found that A. taxiformis shared 2,471 Pfam domains with Chondrus crispus (supporting our BLAST analysis) and 2,938 with other red algae species (Fig. 2C). Of interest, 381 A. taxiformis unique domains could be placed into categories of functional domains, protein families and repeats, based on InterPro annotations (Fig. S6). The most prominent of these (186 genes) were associated with the WD40 repeat, which encompass members of the beta-propeller domain family that are known to form interaction scaffolds in protein complexes [61]. Mapping A. taxiformis genes to the KEGG database, showed that 5,368 genes (49.42%) had KEGG orthologs, of which 1790 genes were categorized into “metabolism”. Of those with 50 or more genes, we found purine metabolism and pyruvate metabolism contain over 80 genes in the A. taxiformis genome (Fig. 2D).
As the key regulator in gene expression, transcription factors (TFs) are critical to understand the potential developmental processes of organisms. To explore the presence of TF families in the A. taxiformis genome compared to other red algae, we identified 145 TFs in the genome (File S1f) and performed a comparative genome-wide analysis of TFs in seaweeds (Fig. 2E). Similar to other seaweeds, TFs belonging to C2H2, bZIp, MYP-related and bHLH have the highest representation in A. taxiformis. Some TFs are located close to each other in the genome, where for example, two MYB-related TFs are gene neighbours. MYB TFs are generally associated with plant development and, biotic and abiotic stress responses in the plants [62]. A single Trihelix TF (TTF) was identified in A. taxiformis (L6), while no TTF was identified in other red seaweeds. The TTF has three-helix structure (helix-loop-helix-loop-helix) and is sensitive to light [63].
Ortholog analysis was performed on 8 red seaweed genome-derived proteome datasets (72,277 proteins) that had each been subjected to all-against-all BLASTp (E-value of 0.00001). We found that 56,813 proteins (78.6%) were clustered into 9,841 orthologous groups, which were used to construct a species tree (Fig. 3A,B). Among the orthologous groups, 2,430 (11,578 proteins) were species-specific. The average orthologous group size was 5.8 genes, while there were 604 single-copy orthologous groups (4,832 proteins). In A. taxiformis (L6), 8,898 proteins (81.9%) were mapped into 5,047 orthologous groups. Interestingly, 15,148 putative gene duplication events were identified among the eight genomes, all of which are traced back to the node of the species tree on which these duplications occurred. For A. taxiformis (L6), there were 1,359 non-terminal duplications, which may provide clues to the phylogenetic relationships of gene families. These were mapped to 1,233 Pfam domains and the number of genes in each protein family from each species was aligned with the species tree.
To better understand lineage-specific gene duplications, we mapped 4,829 A. taxiformis-specific duplicated genes to Pfam annotations, which could provide evidence of genome-wide duplications (e.g., polyploidy). Then we prioritized the Pfam domains by the number of duplicated genes (Fig. 3C). The gene duplications often generate new copies, which may have relatively less selection pressure. Therefore, these duplicated genes may change and be one of the major driving forces in genome evolution [64]. For instance, a relatively abundant number of WD40 repeat proteins were identified, supporting previously observed expansion of these protein domains. Potentially the most striking feature were the 66 genes associated with a N6_N4_Mtase domain in A. taxiformis (L6) compared to P. umbilicalis (5 genes), G. chorda (4 genes) and C. crispus (6 genes). In terms of metabolic enzyme duplication, we identified 25 glucose/sorbosone dehydrogenases (GSDHs) in the A. taxiformis (L6) genome, which was expanded from GSDHs in P. purpureum (1 gene), P. umbilicalis (5 genes), G. chorda (6 genes), C. crispus (5 genes). In plants, GSDH are important for ascorbic acid metabolism to generate ascorbate precursors [65]. The phosphatidylserine decarboxylases (EC:4.1.1.65, PS_Dcarbxylase), which are known to catalyse the reaction: phosphatidyl-L-serine <=> phosphatidylethanolamine + CO2 [66] are prominent in A. taxiformis (L6) (15 genes), compared to the other seaweeds that have a combined 9 phosphatidylserine decarboxylase genes. In summary, these duplicated enzymes may provide clues to explore gene innovations for the biosynthesis of ascorbate precursors and aminophospholipids in A. taxiformis (L6).
A. taxiformis (L6) cultured and wild sporophyte: an integrative omics analysis
The availability of a high-quality draft genome provided the resource for a first opportunity for an integrative Asparagopsis -omics investigation, to explore how environmental conditions alter sporophyte bromoform content and gene expression. Cultured A. taxiformis (L6) sporophytes were maintained in relatively sterile water at constant temperature, salinity and light cycles, whereas wild sporophytes are exposed to changing environmental conditions (e.g. temperature, salinity) and various predators (e.g. rabbitfish [67]), so genes associated with defence and stress-resistance are likely of significant importance. Also, cultured sporophytes (under the conditions described in this study) have never transitioned into reproductive sporophytes, suggesting a lack of specific cues, and control of reproduction might enable potential trade-offs between reproduction and growth to be managed [68].
Differences in gene expression were investigated between cultured and wild sporophytes of A. taxiformis. Overall, cultured sporophyte total gene expression exhibited stronger clustering based on principal component analysis, highlighting greater variation in gene transcripts of wild samples (Fig. S7). Of those genes expressed, 399 and 257 genes were identified as significantly differentially expressed (FDR P < 0.05, >±4 log2 fold-change) between cultured and wild sporophytes, respectively (Fig. 4A). In cultured sporophytes, there was a relative enrichment of genes that annotated to enzymes (e.g., kinases, serine carboxypeptidase, cytochrome P450) and regulatory factors (e.g., zinc fingers, translation initiation factors), as well as those with an unknown function (e.g. unknown, unnamed, hypothetical; Fig. 4B). In addition, cyclin (AtaL6 9753), which encodes a protein well known for complexing with cyclin dependent kinase (CDK-cyclin complexes) to regulate progression of the cell cycle [69], was notably absent, yet highly expressed in wild sporophytes (File S1g). This could help explain the observed suppression of cultured sporophyte progression through to the reproductive stage. We expect that more in-depth molecular comparative analysis of non-reproductive and reproductive (with sporangia) sporophytes, including changes in CDK-cyclin complex coupling, could provide the necessary knowledge to manipulate sporophyte transitions in seaweeds, including A. taxiformis (L6).
In wild sporophytes, there was a relative enrichment of genes that annotated to haloperoxidases, animal heme peroxidase-like homologs and heat shock proteins (HSPs), which fit within designated categories (based on C. crispus gene annotations [40]) related to defence and stress-resistance genes (Table 3 and Fig. 4C), and therefore, their relative abundance in wild sporophytes was of interest. The HSP family (including, but not limited to, HSP20, HSP70, HSP100) are well known for their role in facilitating protein folding, assembly, translocation and degradation during normal cellular homeostasis [70]. However, they have been of most interest due to their additional ability to stabilize proteins and assist refolding under stress conditions [71]. If not just by virtue of their name, the HSPs will remain targets for the development of stress resilience in organisms, including seaweeds. We identified that several stress HSPs were upregulated in the wild sporophyte, and specifically those within the HSP20 family (molecular weights ~ 20 kDa, so also known as small HSPs). This aligns with research in the red seaweed Pyropia yezoensis, where small HSPs were upregulated during high temperature, oxidative stress or copper exposure [72].
Table 3
Summary of A. taxiformis (L6) genes involved in defence-related and stress. Number of corresponding genes significantly upregulated (> 4-fold) in cultured or wild, is shown. For more details, see File S2.
Category | Number | Examples | Cultured | Wild |
Halogen metabolism | 49 | PAP2/vanadium haloperoxidases, non-animal heme peroxidases | 1 | 5 |
Potential pathogen receptors | 391 | WD40-repeat containing proteins, LRR and TRR domain proteins, Sel1 repeat containing proteins | 8 | 6 |
Potential defence effectors | 8 | SGT1 ortholog, apoptosis inducing factor, exportin | 0 | 0 |
Stress genes ROS scavenging | 27 | Ascorbate peroxidase, Cu/Zn superoxide dismutase, glutathione reductase, peroxiredoxin | 4 | 1 |
Stress genes heat shock proteins | 59 | HSP20, HSP40, HSP90, HSP100, HSP70 binding protein, heat shock transcription factor | 4 | 7 |
Stress related genes | 59 | T-complex alpha/beta/gamma, tubulin binding complex B/C/D, peptidyl-prolyl cis-trans isomerase | 4 | 5 |
Cytochrome P450 (CYP450) | 13 | CYP51G, CYP97G, CYP80B | 3 | 0 |
Halogenating peroxidases, or haloperoxidases (HPOs), are enzymes that oxidatively activate halides to the corresponding hypohalites at the expense of peroxides that belong to two groups: Heme-dependent or vanadium-dependent, which differ with respect to the prosthetic group and consequently in their catalytic mechanisms [73]. In seaweeds, numerous vanadium haloperoxidases (VHPOs) exist, while vanadium-dependent bromoperoxidases (VBPOs) have been of most interest due to their production of brominated compounds (e.g. bromine), which are assumed to be used in chemical defences in organisms [7]. Knowledge of how brominated compounds are biosynthesized could be used to improve their supply. We found that A. taxiformis (L6) sporophytes maintained in culture produced higher levels of bromoform (18.02 +/-2.3 mg/g dried algae) when compared to those in the wild (4.4 +/-0.96 mg/g dried algae) (Fig. 5A). Twenty-one genes identified in the A. taxiformis (L6) genome demonstrated significant (e-value > 10− 3) similarities with PAP2/VHPOs. Phylogenetic analysis supported 3 predominant clusters, of which 15 had at least some expression in the cultured or wild sporophytes (Fig. 5B). Only the VHPO gene AtaL6 10688 showed a significant difference, being higher in cultured sporophytes than in wild sporophytes. The AtaL6 9003 demonstrated highest overall expression, and with comparable expression in both cultured and wild experimental samples. AtaL6 9003 corresponds to a VBPO previously described as mbb 1 of the mbb locus [55]. The mbb locus, known to consist of 3 VBPOs and 1 NADPH-oxidase (NOX), was identified in A. taxiformis (L6), with VBPO AtaL6 8990 identified in close proximity (Fig. 5C). A 5th gene within the mbb locus, denoted as mbb2.5 (AtaL6 9001), was predicted and further confirmed to be present in the California and Guam genomes of A. taxiformis (GenBank MN966723 and MN893468, respectively) (Fig. 5D) [55]. Conservation of mbb locus genes between the known A. taxiformis genomes indicated significant intra-specific variability for these genes (e.g., mbb1: 96.56–97.74%). It is possible that the variability of the mbb locus genes could be because neither the California or Guam derived genomes were from Lineage 6 of Asparagopsis taxiformis, although this can’t be accertained from the data presented in [6]. Lineage 6 of A. taxiformis has not been confirmed from California or Guam. In comparison, sequence analysis of the 3 VBPOs demonstrated that these proteins are moderately conserved and have similar lengths, between 581 and 590 amino acids long. The AtaL6 VBPOs all featured a PAP2 domain, which is situated between the early-300 to the mid-500 amino acid range and covered short regions of both high and low conservation. In addition, all 3 VBPOs contained 8 highly conserved PAP2 active site residues, including a histidine (H), suggested to covalently bond to the vanadate cofactor [74].
An in silico secretome analysis of the A. taxiformis (L6) genome-derived proteins, using conventional protein secretion pathway predictions [75], found that 345 proteins were likely secreted from cells (File S1h). This included an expanded family of proteins annotated as ‘collagen-alpha-like’ proteins (~ 43; 12.5% of total proteins secreted), several of which were enriched (Fig. 6) and demonstrated significantly higher expression in the wild sporophyte (Fig. 6A). These proteins all shared the features of a signal peptide and at least one von Willebrand factor type A (VWA) region with spatially conserved cysteine residues; structure models indicated similar sequence identity (16.29%) to a proximal thread matrix protein-1 (PDB # 4cn9.2.A) found in the mussel byssus (Fig. 6B; Fig. S8) [76]. The proximal thread matrix protein 1 can bind collagen, thereby influencing the thread mechanical characteristics (e.g., flexibility) [77]. VWA-containing proteins are often associated with protein-protein interactions [78]. BLASTp similarity searches demonstrated some conservation (up to 62%) with predicted proteins from the red seaweeds G. chorda and C. crispus (Fig. 6C). Based on the aforementioned evidence, we suggest that these secreted proteins in A. taxiformis (L6), herein named rhodophyte collagen-alpha-like proteins (RCAPs), may help regulate the seaweeds flexibility and/or substrate attachment; cultured sporophytes used in this study were not attached to a substrate. A secreted liquid mucilage is critical for seaweed bioadhesion, allowing for cell wall adhesion to a substrate, then irreversibly hardening [79].
Asparagopsis proteomics
The comparative gene expression analysis provided protein targets for further investigation and their isolation (as native proteins) from seaweeds would enable further analysis, including structural and functional characterisation. For example, the A. taxiformis (L6) PAP2/VHPOs, animal heme peroxidase-like and RCAPs were of interest, so their isolation would help for future functional characterisation to provide a full understanding into their role in brominated compound biosynthesis and defence. The A. taxiformis (L6) genome provided an excellent opportunity for proteomic investigation of this seaweed, with specific interest to isolate the secretome proteins reported above. Of the proteins predicted from the genome, 421 proteins were identified following sporophyte extraction (with Tris and Urea) and proteomic analysis (Fig. 7 and File S3), of which, 25 were predicted to be secreted including a VHPO (encoded by AtaL6 1934), an animal heme peroxidase-like (AtaL6 3164) and an RCAP (AtaL6 10422). Besides putative secreted proteins, 45% of the extracted proteins consisted of enzymes (e.g., lipoxygenase, fructose-1,6-biphosphate aldolase, peroxiredoxin), 4% phycoproteins (e.g., allophycocyanin beta 18 subunit, phycoerythrin gamma 31 kDa subunit, phycobilisome linker polypeptides) and 44% that were either hypothetical or unknown.
This proteomic investigation of A. taxiformis (L6) will complement other experimental approaches to protein-based production in Asparagopsis (and other seaweeds), such as production of recombinant proteins (e.g., [55]) and gene/protein knockdowns. In addition, proteins that annotated with enzymes were highly represented in the proteome; seaweed enzymes show prebiotic and nutraceutical potential, primarily through applications that develop bioactive compounds from seaweeds, as well as assist in the extraction and hydrolysis of macromolecules [80]. Also, phycobiliproteins (water-soluble pigments) are highly sought after as natural pigments of food, cosmetics, dyes and other industries [81], so their purification and identification via a genomic resource database will assist the development of approaches for large-scale purification.