Comparative transcriptome profiling of heat stress response of the mangrove crab Scylla serrata across different sites

Background: The fishery and aquaculture of the widely distributed mangrove crab Scylla serrata is a steadily growing, high-value, global industry. Climate change poses a risk to this industry as temperature elevations are expected to threaten the mangrove crab habitat and the supply of mangrove crab seeds from the wild. It is therefore important to understand the genomic and molecular basis of how mangrove crab populations from sites with different climate profiles respond to heat stress. Towards this, we performed RNA-seq on the gill tissue of S. serrata individuals sampled from 3 sites (Cagayan, Bicol, and Bataan) in the Philippines, under normal and heat-stressed conditions. To compare the transcriptome expression profiles, we designed a 2-factor generalized linear model containing interaction terms, which allowed us to simultaneously analyze within-site response to heat-stress and across-site differences in the response. Results: We present the first ever transcriptome assembly of S. serrata obtained from a massive data set containing 66 Gbases of cleaned RNA-seq reads. With lowly-expressed and short contigs excluded, the assembly contains roughly 17,000 genes with an N50 length of 2,366 bp. Based on sequence comparison to the fruitfly and shrimp proteomes, our assembly contains several thousands of almost full-length transcripts. Differential expression analysis found population-specific differences in heat-stress response. Within-site analysis of heat response showed 177, 755, and 221 differentially expressed (DE) genes in the Cagayan, Bataan, and Bicol group, respectively. Across-site analysis of difference in heat response showed that between Cagayan and Bataan, there were 389 differently differentially expressed (DDE) genes associated with 48 signalling and stress-response pathways; and between Cagayan and Bicol, there were 101 DDE genes affecting 8 pathways. Conclusion: In light of previous work on climate profiling and on population genetics of marine species in the Philippines, our findings suggest that the variation in thermal response among populations might be derived from acclimatory plasticity due to pre-exposure to extreme temperature variations or from population structure shaped by connectivity which leads to adaptive genetic differences among populations.


Background
Scylla serrata is a commercially important, portunid mangrove crab. Of the four species that currently constitute the genus Scylla [1,2], S. serrata is the most widely distributed -occurring throughout coastal tropical and subtropical Indo-Pacific [3,4]. It is also the preferred species for farming due to its fast growth rate and large size [5]. The fishery and aqua-culture of mangrove crabs is a steadily growing, highvalue, global industry [4]. In the Philippines alone, its production increased from 17,000 metric tons in 2013 to almost 22,000 metric tons in 2018 [6,7].
Fluctuations in climate, particularly that of elevation in temperature, pose a threat to the mangrove crab industry. Climate change is expected to threaten the environment in which the mangrove crabs thrive and the supply of mangrove crab seeds from the wild. Adverse effects of high temperature to the development and survival of ectothermic crustaceans are well documented [8,9,10]. While little can be done to control temperature fluctuations, one viable mitigation option is to use crabs that are better adapted to temperature changes. In this regard, it is important to understand the differences in the molecular mechanisms by which mangrove crab populations from sites with different climate profiles respond to heat stress. This information would be useful, for example, in developing a base population for breeding programs.
Past studies on stress response in mangrove crabs have been limited to using qPCR to evaluate the expression level variations of a limited number of genes, such as the heat-shock proteins [11]. Similar studies in other crustaceans such as amphipods, clams, and shrimps have also been reported [12,13,14]. However, the qPCR approach is unable to detect responses that are multi-genic in nature or that involve new genes not yet known to be associated with stress response. Furthermore, previous studies have not looked into biogeographic region as a factor in stress response. While it is known that there exists a high degree of intra-and inter-species genetic diversity among mangrove crabs from different regions [15,16], the connection between population differentiation and resilience to temperature stress has not been adequately explored.
In this study, we used RNA-seq to compare the transcriptome expression profiles of the gill tissue of 29 S. serrata samples from 3 different populations in the Philippines, under normal and heat-stressed conditions. Cagayan, Bicol, and Bataan are located in North, South, and Central Luzon, respectively. Cagayan and Bicol lie on the eastern coast facing the Philippine Sea, while Bataan faces the South China Sea to the west of the Philippines (Figure 1a). They are major sites where crabs are caught, farmed, and fattened. The three sites fall in different climate zones and witness different weather profiles [17]. Of particular relevance to this study is sea-surface temperature. We show in Figure 1, the distribution of historical seasurface temperatures observed shown. Cagayan experiences a more extreme weather profile, with both a wider range of annual sea surface temperature (Figure 1b) and more extreme sea surface temperature anomalies (Figure 1c). Of the three, Bicol witnesses the least amount of variation.
To compare the transcriptome expression profiles of the 6 groups (3 sites × 2 conditions), we designed a 2-factor generalized linear model containing interaction terms, which allowed us to simultaneously analyze within-site response to heat-stress and across-site differences in the response. In the within-site analysis, we identified 177, 755, and 221 differentially expressed (DE) genes in the Cagayan, Bataan, and Bicol group, respectively. In the across-site analysis, we identified 389 differently differentially expressed (DDE) genes associated with 48 signalling and stressresponse pathways between Cagayan and Bataan, and 101 DDE genes affecting 8 pathway between Cagayan and Bicol. Through homology search and gene ontology analysis, we discovered that many of these genes are known to be associated with stress response pathways. Well-annotated reference transcriptome or genome resources are unavailable for mangrove crabs and scant even for crustaceans in general [18]. As part of the RNA-seq data analysis procedure, we computed the first ever transcriptome assembly of any tissue of S. serrata containing roughly 17,000 genes and N50 length of 2,366 bp. The assembly contains several thousands of transcripts that have long, high-confident alignments against sequences from the fruit fly proteome, Uniprot Swiss-prot, and the shrimp proteome.

Study design and data
Our raw data consists of a total of 725 million 100 bplong paired-end RNA-seq reads, obtained from the gill tissue of 29 adult S. serrata individuals. These individuals come from 6 groups -a combination of 3 sites (Cagayan, Bicol, and Bataan) and 2 treatments (Control and Heat-stressed). Individuals were sequenced at an average depth of 25 million reads. After removing low-quality read segments and potential rRNA-derived reads, there were a total of 663 million reads remaining.

Transcriptome assembly
From the cleaned reads, we obtained a de-novo transcriptome assembly of size 402 million bases containing 505,246 contigs grouped into 349,812 'genes'. The N50 statistic was 1,467 bp when accounting for all contigs, and 678 bp when selecting only one longest isoform per gene. Applying CD-HIT to cluster sequences with 95% similarity, reduced the number of transcripts to 435,969 and genes to 345,516.
An overwhelming majority of the estimated genes were lowly-expressed and have short transcripts. This can be seen by considering the subset S x of genes containing only the top highly expressed ones which account for x percent of the total expression -where gene expression is the sum of TMM value [19] across all isoforms and all samples. For x = 90, S x contains 17,162 out of the initial 349,812 genes. Further, as shown in Figure 2, the ExN50 statistic -which is the N50 value for S x with the length of a gene defined as the average length of its isoforms weighed by their expression -peaks at N50 length of 2,366 bp around x = 90.

Assembly quality assessment
We evaluated the quality and contiguity of the assembly using several metrics.

Perc. of protein length aligned
No. of fruitfly proteins No. of shrimp  proteins  91-100  2167  4015  81-100  3049  5229  71-100  3602  6118  61-100  4099  6940  51-100  4536  7794  Table 1 Presence of almost full-length transcripts in the assembly. The second (third) column shows the number of fruitfly (shrimp) proteins that align to a contig in the assembly such that the alignment covers the percentage in the first column of the protein length.
First, to check if sequences from paired-end reads appear as consistent pairs in the assembly, we used Bowtie2 [20] to align the RNA-seq reads to the assembly. An average of 86% of the read-pairs, across all samples, aligned concordantly, indicating that the read pairs are well-represented in the assembly.
Next, to test the presence of contigs that are likely full-length transcripts, we used Blastx to align the contigs to two reference proteomes from Uniprot. We chose the proteomes of the fruit fly (D. melanogaster, proteome ID: UP000000803) since it is a model organism for arthropods, and that of the whiteleg shrimp (P. vannamei , proteome ID: UP000283509 ), since it has one of the largest set of protein sequences in Uniprot for crustaceans. There are 13,790 genes in the former, and 25,399 in the latter. Table 1 summarizes the alignment results. Almost 15% of both the fruit fly genes and shrimp genes are represented in almost full length (91-100%) in the assembly. Roughly 20% of both gene sets have > 80% of the protein sequence locally aligned to a contig in the assembly, and this number rise to almost 30% at the >50% length coverage threshold (last row of table).
Finally, the contigs were subjected to a BUSCO analysis with the arthropoda_odb10 database, which contains 1013 BUSCO groups. We found 94.4% complete BUSCOs in the assembly, with only 4.3% missing and 1.3% fragmented. Of the complete BUSCOs, 38.3% were reported as single-copy and 56.1% duplicated, which is due to the presence of many isoforms per gene in the assembly.

Functional annotation of de-novo assembly
We used the Dammit pipeline [21] which uses LAST [22] to perform a conditional reciprocal besthit homology search [23] of our contigs against the Swiss-Prot database, the fruit fly proteome, and the proteome of the whiteleg shrimp. Of the alignments produced by Dammit, we considered as a hit only those that have an E-value at most 10 −10 and those that cover at least 50% of the protein length. Figure 3 summarizes the results of the database hits. There were 6,356 transcripts that had a hit in each of the three databases. At the gene-level, this number reduced to 4,232.
Based on the fruitfly annotations, gene ontology and pathway annotation of was performed using Omics-Box and Panther [24], respectively. A total of 4,299 transcripts were assigned to one of the three GO domains: biological processes (BP), molecular function (MF) and cellular component (CC). The top gene ontology annotations (BP, MF and CC) are shown in Figure 4. In the BP category, there was a high number of genes associated with oxidation-reduction process and protein transport and ubiquitination. These genes play a significant role in redox homeostasis, protein synthesis as well as maintaining protein structure. In the MF category, most genes were predominantly assigned to ATP binding, protein binding, RNA binding, GTP binding and calcium ion binding. These molecular activities are integral in regulation of calcium ion concentration, protein synthesis, as well as energy generation processes. In CC category, genes associated with the integral component of membrane, cytoplasm, nucleus, cytosol were found to be abundant in the transcriptome assembly.
Pathway analysis results are shown in Figure 5. We found that the highest number of genes were associated with Parkinson and Huntington disease, which are model diseases for neurodegenerative disorders. Genes associated with vital signaling pathways such as Wnt signaling pathway, EGF receptor signaling pathway, Integrin signaling pathway, PDGF signaling pathway were found to be abundant in the transcriptome. We also found a high number of genes associated with ubiquitin proteasome pathway. These mostrepresented pathways show an overview of the cellular processes which are involved in proper biological functioning as well as the processes which may respond in reaction to heat stress.

Differential gene expression analysis Site-specific heat-stress response (main effect)
We first consider the genes which were found to be differentially expressed between the control and heatstressed groups of a particular site. In the statistical model, these are the genes for which one of the main effect null hypotheses in Table 2 was rejected. We will henceforth refer to these site-specific differentially expressed genes as DE genes.
There were 177, 755, and 221 DE genes in the Cagayan, Bataan, and Bicol group, respectively. As shown in Figure 6, the number of up-regulated DE genes were 60, 394, 135 and down-regulated DE genes were 117, 361, and 86, respectively. The scatter plots in Figure 6 show for each site, the average mean expression (as mean normalized read counts) versus the magnitude of differential expression (as log 2 fold change). The DE genes are represented as non-grey dots. We call a gene to be stress-response related if its reciprocalbest-hit fruit fly gene is associated with a GO term that is a descendant of the GO term "response to stress" in the Gene Ontology graph. These genes can be loosely understood to be genes whose putative homologs in fruitfly have been known to be related to stress response. There were 24, 117, 39 stress-related DE genes in the Cagayan, Bataan, and Bicol group respectively, which are shown as red dots in the

Across-site difference in heat-stress response (interaction effect)
We next consider the genes for which there was an effect of site in how the expression levels varied between control and heat-stressed conditions. In other words, these are genes whose differential expression was different -either in magnitude (log2 fold-change) or the direction of regulation -between a site pair. In the statistical model, these are the genes for which an interaction effect null hypothesis in Table 2 was rejected. We will refer to them henceforth as differently differentially expressed (DDE) genes.
In the Cagayan-Bataan comparison, there were 389 DDE genes; and in the Cagayan-Bicol comparison, there were 101 DDE genes. The heatmaps of Figure 7 show the expression profiles of the DDE genes. Each row of the heatmap corresponds to a DDE gene, the colors represent deviation from row average of the variance-stabilized read counts -red for lower than average and green for higher than row average. The intensity of color represents the z-score. The horizontal black bars to the right indicate genes that are stressresponse related. The entire list of DDE genes along with the interaction plots of the top 10 (by value of estimated interaction coefficient) are provided in the Supplementary Material. GO analysis of DDE genes were performed using Blast2GO. Overall, most DDE genes were assigned to GO terms "cellular process", "metabolic process" and "localization" in the biological process category. Next we restricted focus on the stress-related DDE genes. In the Cagayan-Bataan comparison, we observed that DDE genes associated with GO terms "activation of innate immune response", "response to oxidative stress", "long-chain fatty acid metabolic transprot" were upregulated in Cagayan and downregulated in Bataan; whereas DDE genes responsible for "regulation of reactive oxygen species" and "cellular response to hypoxia" were downregulated in Cagayan and upregulated in Bicol. In the Cagayan-Bicol comparison, DDE genes associated with "regulation of reactive oxygen species" and "cellular response to hypoxia" were down-regulated in Cagayan and upregulated in Bicol; whereas DDE genes associated with "response to oxidative stress", "activation of innate immune response" and "synaptic target recognition" were upregulated in Cagayan and downregulated in Bicol.
Using Panther, the DDE genes were assigned to 48 and 8 reactome pathways for the Cagayan-Bataan and Cagayan-Bicol comparison, respectively. A majority of DDE genes were mapped to pathways related to stress signal transduction, metabolic processes, immune response, response to oxidative stress and cell death. When restricting focus on the stressrelated DDE genes, the enriched pathways were apoptotic pathways, oxidative stress-related pathways, and signal transduction pathways such that the Integrin signaling pathway, EGF receptor signaling pathway, Interferon-gamma signaling pathway, TGF-beta signaling pathway, Wnt signaling, and p53 pathway.
The list of top 20 DDE genes (by size of coefficient) for the two site-pairs and the corresponding interaction plots of gene expression is shown in the Supplementary Information.

Discussion
The first S. serrata transcriptome assembly We present the first transcriptome assembly of any tissue of S. serrata. Several quality assessment metric attest to the reliability of the assembly. The source reads are well represented in the assembly, with an average of 86% of the read pairs aligning concordantly to the assembly. Table 1 shows that more than 4000 protein sequences of the proteome of the closely related whiteleg shrimp P. vannamei, align almost fully to distinct contigs in the assembly. The soundness of the assembly is further supported by BUSCO analysis, which found almost all of the 1013 arthropod BUSCOs in the assembly.
Pathway analysis results showed that our assembly is enriched in genes associated with several vital signalling pathways such as the Wnt signaling pathway, EGF receptor signaling pathway, Integrin signaling pathway, and PDGF signaling. These signaling pathways are mostly evolutionarily conserved and are responsible for proper biological functioning and development. Interestingly, these pathways are also known to function in response to heat-stress [25,26,27,28]. High number of genes associated with ubiquitin proteasome pathway may indicate activated mechanism for protein degradation, whereas apoptosis signaling pathway may indicate potential severe protein damage which leads to programmed cell death. Surprisingly, we also found that a high number of genes were associated with Parkinson and Huntington disease, which are model diseases for neurodegenerative disorders. This may be indicative of potential neural function damage in which several cellular processes are involved.
A few transcriptome assemblies of other Scylla spp. have been previously reported. These include the assemblies of the testis of S. olivacea [29], and testis, ovary, and infected hemocytes of S. paramamosain, and megalopa stage of S. paramamosain [30,31,32]. Compared to previous assemblies, our assembly was computed using the largest amount of input data so far -roughly 66 Gbases collected from 29 individuals. Functional analysis of the assembly revealed enrichment in pathways that have not been reported in any of the previous mangrove crab assemblies. This hints at a vast transcriptomic complexity in mangrove crabs; and our S. serrata transcriptome assembly adds to the scant but growing genomic database resources of mangrove crabs and crustaceans, in general, allowing for future investigations to shed light into the functional genomics of Scylla spp.
Within-site main-effect of heat stress suggests differences in response between East and West Philippine mangrove crab populations The number of DE genes can be indicative of the degree of modifications in cellular processes that the organisms has to make in response to heat stress. In this regard, Figure 6 shows the stark differences in how the three sites responded differently to heat stress. The remarkably high number of DE genes in the Bataan population, may be a reflection of the drastic adjustments in metabolic processes to cope with the stress or it could be a manifestation of the damaging effects of heat stress.
In contrast, the Cagayan group had a 4.5-fold lower number of DE genes and the magnitude of expression level differences, measured as log 2 fold change, were also overall lower compared to Bataan. As illustrated in Figure 1, Cagayan experiences a highly variable temperature profile. It is possible that the constant exposure to highly variable temperature prompted an increase in heat tolerance in this group which resulted in a lesser number of DE genes observed. It has been hypothesized that response to acute stress may be dependent on the magnitude of stress organisms experience throughout their lifetime, with organisms that have existed in an environment with high temperature variations having broad thermal tolerance and high capacity for acclimation [33,34] In addition to this, the fewer number of DE genes coupled with higher percentage of downregulated genes observed in Cagayan population is analogous to the observations in a study done by Poong et al. [35], wherein it is considered a strategy for 'energy and resources saving' of heatacclimated cells. Owing to the broad heat tolerance of the Cagayan population due to natural exposure to highly variable temperature, transcription and translation of non-essential proteins has become unnecessary which consequently saves cell energy and resources.
Interestingly, the Bicol group, whose population does not seem to have natural exposure to extreme temperature, demonstrated a similar response as that of Cagayan, in terms of the number of DE genes. We postulate that this similarity may be a result of population connectivity due to the established East-West separation in the Philippines. Studies on population structure of marine fishes and invertebrates in the Philippines have revealed a strong East-West separationfor example in rabbitfish [36], damselfish [37], and seahorse [38]. A study on population structure of wild S. serrata in the Philippines using microsatellite and mitochondrial markers also shows distinct East-West groupings: east coast populations of Cagayan and Quezon, and west coast populations of Pangasinan and Bataan [39]. For the highly dispersive S. serrata, the limitations on gene flow between the east and west coasts due to sea-surface currents and geographic barriers may play a significant role in shaping the adaptive genetic differences among populations in response to thermal stress.
Analysis of interaction effect shows differences in biological pathways and genes responsible for difference in heat stress response While the site-specific main effect analysis provides a global quantitative view of the differences in heat response, the DDE genes provide insights into the cellular processes that differentiate the heat-stress response across sites. We found that there were 48 pathways that differentially activated between Cagayan and Bataan, and 8 pathways between Cagayan and Bicol. Of particular interest are the oxidative stress and cell apoptosis pathways.
Under normal temperature conditions, the mild production of reactive oxygen species (ROS) controlled by the antioxidant defense mechanisms is helpful for proper cell function [40]. However, various stress conditions such as elevated temperature could trigger the overproduction of ROS which in turn results in cell oxidative stress [41]. Oxidative stress pathway was observed in the Cagayan-Bataan comparison with genes associated in this pathway being upregulated in Bataan but downregulated in Cagayan. In a study on human umbilical vein endothelial cells, it was shown that heat stress increases the ROS production and induces mitochondrial p53 translocation and Ca2+ accumulation [42]. In our study, the gene associated with the p53 pathway was upregulated in Bataan and downregulated in Cagayan, and genes responding to calcium ions were expressed at various levels in 3 sites. Moreover, several studies have linked ROS-induced oxidative stress to cell death in heat-exposed cells. Different pathways are responsible for cell apoptosis which include extrinsic apoptotic pathway via death receptors and intrinsic apoptotic pathway involving the mitochondria and endoplasmic reticulum [42]. Cell apoptosis pathways were observed to differently activated in both Cagayan-Bataan and Cagayan-Bicol comparisons. Genes associated in this pathway were downregulated in Cagayan and upregulated in Bataan and Cagayan. Downregulation of genes associated with oxidative stress and cell apoptosis in the Cagayan population may be attributed to the thermotolerance of the population which indicates that such changes are part of the acclimation process rather than an onset of cell death. The downregulation of these genes may be a strategy to save energy and resources assuming that the population is already acclimated to elevated temperature.
Downregulation of genes associated with oxidative stress and cell apoptosis in the Cagayan population may be attributed to the thermotolerance of the population which indicates that such changes are part of the acclimation process rather than an onset of cell death. The downregulation of these genes may be a strategy to save energy and resources assuming that the population is already acclimated to elevated temperature.
Implications for fisheries and aquaculture management of S. serrata populations Aquaculture production of S. serrata in the Philippines reached up to 22,000 metric tons valued at Php 9 billion in 2018. As S. serrata is a preferred species for trading and farming, this species is subject to overexploitation for commercial purposes. Moreover, frequent exposure to environmental stressors could lead to population decline for this economically valuable crustacean. A study on domestication of mangrove crabs in the Philippines by Quinitio et al. [43] highlights the need for management of S. serrata populations due to a documented decrease in size, yield and catch per unit effort (CPUE) in key collection areas which are attributed to overexploitation and environmental factors.
Most mangrove crab farms in the Philippines practice a long-term grow-out approach [44] which is highly dependent on crab seeds collected from the wild. Current hatchery and nursery production of S. serrata juveniles are not enough to support the needs of the mangrove crab industry in view of the species' vulnerability to variations in environmental conditions (i.e. salinity, temperature, dissolved oxygen) especially for the early larval stages. With this, knowledge on the impact of environmental conditions particularly temperature variation to the populations of S. serrata is critical to support their population and maintain sustainable mangrove crab aquaculture.
The findings of this study show that populationspecific response of S. serrata to temperature variation is due to the physiological adaptation to their respective thermal optima which is derived from the local environment conditions. While our results indicate that S. serrata populations are capable of acclimatory plasticity, it may be metabolically expensive for the organisms to survive in a new environment with a new thermal regime. Hence in screening for potential sources of mangrove crabs for broodstock development either for aquaculture or for supportive breeding in stock management, it has been shown that one can explore the use of stocks that have broad heat tolerance and high capacity for acclimation, which may be hallmarks of success in adaptation to changing thermal environments [45].

Conclusion
In this study, we attempted to understand the genomic and molecular mechanisms that differentiate the heat-stress response of mangrove crab populations from sites with varying climate profiles. As part of the bioinformatic analysis, we constructed a first de-novo transcriptome assembly of S. serrata. Several quality assessment metric point to a sound and informative assembly, which adds to the growing genomic database resources of mangrove crabs. Results from the differential expression analysis showed genes and pathways responsible for population-specific response to heatstress in the 3 populations. Based on previous work population structure of marine species, our findings suggest that population-specific heat-stress response might be attributed to acclimatory plasticity due to pre-exposure to extreme temperature variations or might be due to the physiological adaptation to their respective thermal optima which is derived from the local environment conditions. Our results may serve as a basis for hypothesis testing regarding the impact of population connectivity to the adaptive genetic differences among populations. Future work may focus on genetic analysis to confirm genetic signatures for thermal adaptation among populations.

Methods
Sample collection and thermal stress exposure Adult S. serrata (90-120 mm carapace width) were collected from three sites of varying temperature profiles: Bataan, Cagayan and Bicol, Philippines. All crab samples were color-tagged according to collection sites and were subjected to formalin bath (100 ppm formalin solution) in one large tank for 1 hour. A total of six groups (3 sites × 2 conditions) were considered in the experiment, with 9 individuals per group. Prior to thermal stress exposure, all crab samples were acclimatized at an ambient temperature of 26±2°C for 48 hours. The control condition was set at 26±2°C for the whole duration of the study, whereas the heat stress condition was increased at a rate of 2°C/24 hr until the maximum experimental temperature of 32°C was reached and kept for 72 hours. All crab samples were fed with the same diet and exposed to a salinity of 30±2ppt. After the heat-stress experiment, gill tissue samples were harvested, placed in RNALater (Ambion) and stored at -80°C until ready for processing.
RNA isolation, library construction, and RNA Sequencing Tissue samples from each site and experimental condition were randomly selected and sent to an external service provider (Macrogen, Inc., South Korea) for RNA extraction, library construction, and sequencing. Total RNA was extracted from gill tissue of 30 crab samples (5 individuals from each group) following the TRIzol (Invitrogen) protocol. Quality of the extracted RNA was measured through RNA Integrity Number using 2200 TapeStation (Agilent). A total of 29 cDNA libraries (omitting one low-quality sample from Bataan-Experimental group) were constructed from poly(A)-captured mRNAs using TruSeq RNA Sample Prep Kit (Illumina, Inc.). A non-stranded RNA sequencing was performed for each library using the Illumina Hi-Seq 2000 in paired-end mode and 101 bp length.
Trimmomatic [47] (version 0.39) was used to remove adapters and filter or trim low-quality using the following parameters: LEADING:3 TRAILING:3 AVGQUAL:20 MINLEN:35. Only those read pairs for which both members of the pair survive the filtering were kept for further analysis.
Trinity [48] (version 2.8.5), in its default settings, was applied to the entire pool of filtered reads from all samples to obtain a transcriptome assembly.
To reduce computation time of the downstream annotation pipeline, we applied CD-HIT to cluster contigs with a sequence similarity of at least 95%.

Transcript abundance estimation
The align_and_estimate_abundance.pl script provided by Trinity were used to invoke the combination of RSEM [49] (version 1.3.3) and Bowtie2 [20] (version 2.3.5) to compute transcript abundance for each sample. The abundance_estimates_to_matrix.pl script was executed to integrate raw counts at the gene-level across samples, and also to compute crosssample normalized TMM counts. The former was used for differential gene expression analysis, while the latter for computing ExN50 statistics and for exploratory analysis prior to differential gene expression analysis.

Annotation
The Dammit annotation pipeline was used to perform conditional reciprocal best-hit search using LAST in OrthoDB, Swiss-Prot, and the Uniprot reference proteomes of the fruitfly D. melanogaster and the whiteleg shrimp P. vannamei.
Differential gene expression analysis Differential gene expression analysis was performed on the subset of genes which had a reciprocal-best-hit alignment to a protein in the fruitfly proteome, such that the alignment had an E-value of at most 10 −10 and it covered at least half of the protein length. There were 6,898 such genes in the assembly. While this is a conservative choice, it allows for a downstream functional analysis using the rich pathway and GO database of the fruit fly leading to more meaningful functional interpretation of the differential expressed genes.
We used DeSeq2 [50] (version 1.28.1) to perform differential gene expression analysis on the set containing Trinity-genes which found a hit in the D. melanogaster proteome with an E-value of at most 10 −10 and the alignment covering at least 50 percent of the protein length. Transcript-level count data produced by RSEM were aggregated at the gene-level using tximport [51] (version v1.16.1). Exploratory data analysis and quality control procedures taken prior to differential gene expression analysis are described in the Supplementary Information.
For each gene, we set the DeSeq2 negative binomial regression model to a 2-factor model with interaction terms: ∼ site + treatment + site : treatment, where site has 3 levels (CAG (Cagayan), BAT (Bataan), BIC (Bicol)) and treatment has 2 levels (Control (C) and Heat-stressed (E)). For the treatment factor, we chose Control as the reference level. For the site factor, we chose Cagayan as the reference level, since given its temperature extremes it faces, our prior expectation was for the Cagayan group to be the most thermally resilient. Our results suggest that this indeed seems to be the case.
With CAG and C set as reference levels for site and treatment, respectively, the regression model can be expressed as: where q i is the mean of negative binomial distribution for sample i normalized for sequencing depth, Xs are dummy variables that take values of 0 or 1 depending on the group membership of sample i, and βs are the coefficients of the model. The hypotheses tests performed on the model coefficients are shown in Table 2 Dispersion estimates were shrunk towards the dispersion of similarly expression genes using DESeq2's default parametric method for shrinking the dispersion. Likewise, for visualization and ranking purposes, estimated log fold change (LFC) was shrunk towards the more likely LFC estimate by pooling information from all genes [52]. To account for multiple testing due to the large number of genes as well as multiple comparisons per gene, we applied the Benjamini-Hochberg procedure [53] with the false discovery rate threshold at 0.1.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated during the current study are available in the DDBJ Sequence Read Archive under the accession number DRA010977. The computational pipeline used for data analysis is available at the following URL: https://bitbucket.org/s_serrata/dge/src/master/

Competing interests
The authors declare that they have no competing interests.         Each row corresponds to a DDE gene, each column to a site-treatment combination, and the color of each cell represents expression level -green for higher than row average, red for lower than row average, and the intensity of color represents the z-score of the normalized, variance-stabilized read counts.

Additional Files
Additional file 1 - Supplementary Information  Figure 1 (a) The 3 regions in the Philippines, Bataan (BAT), Bicol (BIC), and Cagayan (CAG), from where the S. serrata individuals were sampled. The map was produced using the R package sp [54] based on data from the Data of Global Administrative Areas, GADM version 3.6. [55]. (b) Distribution of monthly average temperature between 1900 -2019 and (c) Distribution of monthly average sea-surface temperature anomaly. Sea-surface temperature data was obtained from ERSST version 5 [56].

Figure 2
ExN50, which is the N50 length when considering only the top expressing genes that account for x percentage of expression, shown here for different values of x.

Figure 3
Venn diagram illustrating the number of hits found in Swiss-Prot, the fruitfly D. melanogaster proteome, and the shrimp P. vannamei proteome, for transcripts (right) and for genes (left), where a gene is said to have a hit if at least one of its isoforms in the assembly has a hit.    Each row corresponds to a DDE gene, each column to a site-treatment combination, and the color of each cell represents expression level -green for higher than row average, red for lower than row average, and the intensity of color represents the z-score of the normalized, variance-stabilized read counts.