Improvement of genome assemblies and gene predictions for Montipora and Astreopora
Assembly error, including retention of allelic contigs in haploid assemblies, is problematic for downstream analyses, mainly due to redundant genome sequences (alleles from the same genetic locus). We curated scaffold sequences of M. cactus and M. efflorescens by removing scaffold sequences with high or low coverage and those that may have originated from one of two allelic copies in heterozygous regions. Numbers of scaffold sequences were significantly reduced from the previous version, from 4,925 to 3,521 in M. cactus and from 5,162 to 3,589 in M. efflorescens (Table 1). For Astreopora, possible allelic scaffold sequences were removed from the genome assembly during the previous study [23]. The previous version of gene models for M. cactus, M. efflorescens, and Astreopora were predicted using AUGUSTUS, based solely on a training set built for Acropora or for protein homology with gene models of other corals [23]. Thus, it was highly possible that lineage-specific genes were missed in the previous version. In this study, we performed gene prediction for M. cactus, M. efflorescens, and Astreopora myriophthalma using a combination of ab initio and RNA-seq evidence-based prediction. We predicted 29,158 protein-coding genes for M. cactus, 29,424 for M. efflorescens and 25,406 for Astreopora myriophthalma (Table 1). Benchmarking universal single-copy orthologs (BUSCO) completeness scores were 93.3% (of which 0.8% were duplicated) for M. cactus, 91.2% (of which 1.4% were duplicated) for M. efflorescens and 94.5% (of which 1.3% were duplicated) for Astreopora myriophthalma, which were considerably better scores than the previous version (Table 1). In comparison to other Montipora gene models, gene models reported by Shumaker et al. [28] may have contained a higher fraction of diploid copies (93.4% complete BUSCO, with 18.3% duplicated; Table1). Completeness of gene models reported by Helmkampf et al. [27] was lower than that reported by Shumaker et al. [28] (64.2%, with 0.5% duplicated; Table1). Thus, the gene models reported by Shumaker et al. [28] contained many duplicates, but those reported by Helmkampf et al. [27] lacked many genes. In contrast, BUSCO completeness scores of M. cactus, M. efflorescens and Astreoporamyriophthalma reported in this study were comparable to published gene models of other coral species, including A. millepora, predicted using the NCBI annotation pipeline (Table 1). These improvements to the Montipora and Astreopora genomes enabled more accurate comparative genomics among acroporids.
Comparison of gene families within the Acroporidae
Identifying orthologous relationships between sequences is fundamental for comparative genomic analyses. To obtain orthologous relationships among acroporid genomes, we used three Acropora species (A. digitifera, A. millepora, and A. tenuis), for which BUSCO completeness scores are high (Table 1), two Montipora species (M. cactus and M. efflorescens), and Astreoporamyriophthalma, representing the basal clade of the Acroporidae [29]. We obtained 12,769 gene families for Montipora, 11,007 for Acropora and 11,309 for Astreopora (Figure 2). We then categorized each gene family into seven groups, (1) common to all three genera (9,690 gene families), (2) common to Montipora and Acropora (743 gene families), (3) common to Montipora and Astreopora (665 gene families), (4) common to Acropora and Astreopora (257 gene families), (5) restricted to Montipora (1,670 gene families), (6) restricted to Astreopora (696 gene families) and (7) restricted to Acropora (316 gene families) (Figure 2). 75.8% (9,690/12,769) of the gene families in Montipora, 88% (9,690/11,007) in Acropora, and 85.7% (9,690/11,309) in Astreopora were shared among all three genera (Figure 2), indicating that a large number (~ 80 - 90%) of gene families are shared throughout the Acroporidae, and these are likely to be the core-gene families the Acroporidae.
The two major clades of reef-building corals possess different metabolic pathways [30]. From the six species, we compared 303 functional modules comprising ten categories in the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways and found that metabolic pathways were basically conserved in the three genera (Supplementary Table S1). An enzyme involved in cysteine biosynthesis (KEGG module ID: M00338) and methionine degradation (KEGG module ID: M00035) was not detected among the six species (Supplementary Table S1), as reported in Shinzato et al. [23, 24]. Although one gene (KEGG entry ID: K04486) involved in the histidine biosynthetic pathway (KEGG module ID: M00026) was detected in acroporid corals used in this study, the remaining genes required to complete the pathway were not detected (Supplementary Table S1), as reported in Ying et al. [30]. Taken together, gene families involved in common features, such as amino acid synthesis, are widely conserved in the three genera.
While we identified 696 lineage-specific gene families in Astreopora and 316 in Acropora, we identified 1,670 gene families restricted to Montipora (2,307 genes in M. cactus and 2,303 in M. efflorescens) (Figure 2). The proportion of lineage-specific gene families in Montipora (13.07%) was significantly larger than those in Acropora (2.87%) and Astreopora (6.15%) (Pairwise proportion test: p < 0.05). In addition, although we performed gene annotation with BLAST searches against the SwissProt database (BLASTP, e-value cutoff: 1e-5), the proportion of Montipora-specific gene families with SwissProt annotation was significantly lower than in Acropora and Astreopora (Pairwise proportion test: p < 0.05 for Montipora versus Acropora, p < 0.05 for Montipora versus Astreopora, and p = 0.59 for Acropora versus Astreopora; Figure 2). This indicates that functions of gene families restricted to Montipora are largely unknown.
Gene expansions in Montipora and comparisons among acroporids
Gene duplication has contributed to acquisition of new gene functions during evolution [31, 32]. To explore gene families that underwent expansions, we first compared gene numbers of 9,690 gene families common to the three genera and 743 gene families common to Montipora and Acropora (Figure 2). In these two groups, genes in families that underwent gene expansions in either Montipora or Acropora might have been duplicated after Montipora and Acropora diverged from their common ancestor. Three gene families, similar to dimethylsulfoniopropionate (DMSP) lyase (Alma; HOG0000829), Endonuclease-reverse transcriptase (GP1; HOG0000531), and Spondin (Spon1; HOG0001590), and three non-annotated gene families (NA; HOG0000965, HOG0001135, and HOG0001312), were significantly expanded in Acropora (Fisher’s exact test: p < 0.05; Figure 3a and 3b). Recently, it was reported that DMSP lyase is the most expanded gene family in Acropora [28], and our result is consistent with a previous report, supporting the accuracy of this analysis. We found that three gene families, transient receptor potential protein (TRPC; HOG0002487), collagen alpha-1 (VII) chain (COL7A1; HOG0003259) and non-annotated gene family (NA; HOG0001797) are significantly expanded in Montipora compared with Acropora (Fisher’s exact test: p < 0.05; Figure 3a and 3b).
Next, we compared gene numbers of 665 gene families common to Montipora and Astreopora (Figure 2), in which gene duplication may have occurred after divergence of Montipora or Astreopora. These genes may have been lost in Acropora. Two gene families (HOG0003949 and HOG0004557) lacking SwissProt annotation were significantly expanded in Astreopora (Fisher’s exact test: p < 0.05; Figure 3c), whereas one other gene family, tetratricopeptide repeat protein 28 (TTC28; HOG0000387), which is involved in the cell cycle in humans [33], was significantly expanded in Montipora compared with Astreopora (Fisher’s exact test: p < 0.05; Figure 3c).
Estimation of evolutionary rate in each Montipora gene family group
The ratio of nonsynonymous (Ka) to synonymous substitutions (Ks) reflects the strength of selective pressure on protein sequences [34]. For example, when Ka is less than Ks (Ka/Ks < 1), selection has occurred to eliminate mutations of protein sequences (negative or purifying selection). In contrast, when Ka is larger than Ks (Ka/Ks > 1), selection has occurred to mutate the protein sequences (positive selection). In order to evaluate the strength of selective pressure acting on protein sequences in each Montipora gene family, we calculated pairwise Ka/Ks ratios between Montipora single-copy orthologous gene pairs (M. cactus versus M. efflorescens) for each of the four groups: 1) gene families common to the three Acroporidae genera, 2) gene families common to Montipora and Acropora, 3) gene families common to Montipora and Astreopora, and 4) gene families restricted to Montipora (Figure 4). When we compared Ka/Ks ratio between groups, gene families restricted to Montipora showed a highest Ka/Ks ratio (Wilcoxon rank sum test: p < 0.05; Figure 4), indicating that this gene family group has undergone a relaxation of negative selection, and that functional constraints on this gene family group are relaxed. This could explain why the deduced gene functions of gene families restricted to Montipora are largely unknown.
Positive selection specific to Montipora
To identify genes with fast evolutionary rates that may be associated with adaptive evolution in Montipora, we focused on gene families exhibiting Ka/Ks > 1. We found evidence of positive selection in 40 gene families (rapidly evolving gene families) (Table 2). Of those, 10 families are common to the three genera or common to Montipora and Acropora, while the remaining 30 families are restricted to Montipora (Table 2), suggesting that these 30 gene families arose specifically in that lineage and likely contribute to biological traits unique to Montipora. Although 28 of the 30 gene families restricted to Montipora were without annotation, their possible subcellular localization ranging from membrane to organelle was predicted by DeepLoc, a deep learning neural networks model (Table 2).
Gene expression unique to early life stages of Montipora
Presence of maternally inherited algal symbionts at an early life stage is the most obvious difference between vertical and horizontal transmitters (Figure 1). In order to identify gene families specifically involved in symbiosis in vertical transmitters, we compared the repertoire of expressed genes in early life stages of Montipora with those expressed in Acropora. In this analysis, a gene family was considered expressed if even only one gene in that family was expressed (Transcript per million (TPM) > 1). We confirmed that 11,930 and 10,838 gene families were expressed at early life stages of Montipora and Acropora, respectively (Figure 5a). Of these, 10,051 gene families (84% in Montipora and 93% in Acropora) were common to both at early life stages (Figure 5a), suggesting that these are essential for early development of acroporid corals; thus, we did not focus on these in the present study. We identified 1,879 gene families that were exclusively expressed in Montipora (Figure 5b). Among those, 60% (1,132 gene families) were expressed in planula larvae, metamorphosed larvae and recruit stages (Figure 5b), suggesting that these genes may be related to maintenance of algal symbionts in Montipora. Interestingly, 97% of these gene families ((753 + 344) / 1,132, Figure 5b) that were expressed throughout the three life stages were specific to Montipora or shared by Astreopora (Supplementary Table S2). In contrast, the remaining 3% of gene families ((22 + 13) / 1,132, Figure 5b) have orthologs in Acropora, but were not expressed in Acropora. Nonetheless, they were expressed throughout early life stages of Montipora (Supplementary Table S3). Within gene families containing gene duplications in the Montipora genomes above, two gene families (HOG0001797 and HOG0000387) were exclusively expressed in at least one early life stage in Montipora, and one of them (HOG0000387) was expressed throughout all three early life stages (Supplementary Table S2). Among the identified 30 rapidly evolving gene families restricted to Montipora, we detected gene expression of 90% of these families. Expression of nine families was detected in at least one early life stage of Montipora, and the remaining 18 gene families were continuously expressed throughout all three early life stages (Table 2).