In this study, using RNA-Seq data, we identified a list of putative lncRNA transcripts involved in MC-LR induced liver injury in whitefish, a non-model species without a reference genome. Further qPCR validation of selected putative lncRNA candidates confirmed the participation of these transcripts in MC-LR induced liver injury in whitefish. We showed that the altered expression profiles of lncRNAs could serve as a potential biomarkers of liver injury in whitefish.
The lack of standardized methodologies for discovery of lncRNAs poses a challenge for the analysis and interpretation of RNA sequencing data. The outcome of pipelines designed to discover lncRNAs in RNA-Seq data strongly depends on factors which precede in silico analysis, such as RNA isolation or the method of preparing sequencing libraries [46]. Therefore, methods designed for enriching polyadenylated protein-coding mRNAs may not be optimal for recovering lncRNAs that are present at low levels. On the other hand, designing large-scale RNA-Seq experiment with a large set of samples is demanding and usually some trade-offs must be made [47]. Because the main aim of our RNA-Seq experiment was to analyze the profiles of protein-coding genes, Illumina’s TruSeq Stranded mRNA protocol was chosen (Brzuzan et al., in preparation). However, this protocol can also be used for lncRNA discovery [46]. In fact, the majority of biologically functional lncRNAs reported to date are polyadenylated [48], thus approaches based on enriching polyadenylated transcripts to discover functional lncRNAs are common in pipelines based on annotated genomes, as well as those based on de novo assembled transcriptomes. For example 54,503 putative lncRNAs were discovered in rainbow trout [19] and 122,969 putative lncRNAs were reported in Rhinella arenarum [49]. Our pipeline based on the de novo assembly of the liver transcriptome allowed us to obtain 209,270 non-coding transcripts longer than 200 nt of which 84,974 were labeled as putative novel lncRNAs (see further discussion).
Difficulties in comparing results of RNA-Seq in silico analysis based on de novo assembled transcriptomes may be attributed to multiple factors. For example, poor reproducibility of de novo based analyses [50] could come from the source of the reference genome (i.e. selection of tissues), in addition to the methodologies used in preparation of the sequencing libraries. Last but not least, to obtain reliable and comparable results in discovery of lncRNAs, sequencing depth should be considered. It is estimated that, in human samples, >200 million paired-end reads are required to detect the full range of transcripts, including all possible isoforms [51]. However, this number can be much lower for differential expression analyses. For example, if the expectation is that the expression of abundant transcripts changes across conditions, 36 million reads per sample may be sufficient [47]. Because it was expected that (i) MC-LR will drastically change expression profiles of transcripts [16] and (ii) polyadenylated lncRNAs are expressed at higher abundances than non-polyadenylated lncRNAs [52], we sequenced our liver samples with 50 million reads per sample, which was sufficient to identify evolutionary-conserved and novel transcripts.
In organisms without a conclusively proven reference genome, good quality de novo assembled transcripts are a prerequisite for obtaining meaningful results in downstream analysis, such as discovery of novel transcripts [53]. We assembled whitefish liver transcriptome from 52 liver samples that originated from different experimental groups, including some that were part of another study, which extended the scope of available transcripts used for the assembly (Brzuzan et al., in preparation, Additional File 3). BUSCO analysis, which estimates assembly quality based on evolutionary-informed expectations of gene content from orthologues selected from OrthoDB, showed 74.9% completeness of Actinopterygian core genes (OrthoDB v10). For comparison, the current best assembly of a whitefish full-length transcriptome based on a whole fish homogenate showed only slightly higher completeness (76.6%, OrthoDB v10, Table 1) [31]. Importantly, BUSCO recovery estimates tend to be higher in full organism assemblies than in those assembled from a set of separate tissues. For example, a whitefish tissue-based full-length transcriptome deposited in the PhyloFish database showed only 26% completeness [31]. Because our assembly of a whitefish liver transcriptome showed completeness similar to the current best whitefish whole transcriptome assemblies, we believe that it not only provided a solid foundation for our analysis, but it could also extend the completeness of current and future assemblies of whitefish transcriptomes.
Designing an accurate step-by-step pipeline for discovery of lncRNAs is an essential step for producing high-quality results. This is even more important for non-model species without a reference genome, as redundancy tends to be higher in those analyses. As a consensus state-of-the-art integrated pipeline does not yet exist, we decided to base our pipeline on a previously described pipeline for identification of lncRNAs in non-model species [26]. The core aim of this pipeline was to remove known PCTs and ncRNAs in a sequence of filtration steps. As a result, the pipeline predicts novel lncRNAs, then uses software that employs Support Vector Machine in an attempt to validate the novel transcripts by assesing protein-coding potential. Unfortunately, as both the pipeline and the validation process use BLAST results with varying levels of confidence, this validation is in fact only pseudo-independent, and thus the presented transcripts are predictions at best, and only experimental evidence can validate their true function [26]. However, a lack of an assessment of the sensitivity or the specificity of a pipeline is common among current studies aiming to classify novel lncRNA transcripts, particularly in non-model species without well-annotated genomes. Here, we show that the putative lncRNAs differ substantially form the PCTs in terms of minimum length-corrected free energy, GC content and distribution of transcripts lengths (Fig. 6). These factors are considered crucial for RNA secondary-structure stability, and our results are in line with previous reports from studies of fish, in which lncRNAs differed from PCTs in the same manner [19]. Additionally, to validate the results of our pipeline, we performed a qPCR validation of selected putative lncRNA transcripts (Additional File 7).
Based on the similarity of our transcripts to the non-coding sequences deposited in the Rfam database, we identified putative whitefish liver lncRNAs whose expression was changed after MC-LR administration (known differentially-expressed lncRNAs). We found that MC-LR altered expression of potent regulators of genes, such as HOTAIR, HOTTIP, HULC or MALAT1 (Fig. 2E-H). Generally, lncRNAs are poorly conserved among species [8], but some of them, for example MALAT1, are conserved in mammals [54] as well as in other vertebrates, such as zebrafish [55], suggesting that they have important conserved biological roles. Moreover, MALAT1 is one of the most abundantly expressed lncRNAs in normal tissues [56, 57], even similar in this regard to many protein-coding genes, such as GAPDH [58]. MALAT1 expression has been shown to be either upregulated (lung cancer, hepatocellular carcinoma) or downregulated (colorectal cancer, breast cancer) [57], indicating its role is either cancer-promoting or tumor-suppressing. At the functional level, MALAT1 has been shown to bind various miRNAs, which promoted [59] or decreased cancer progression [60]. For example, knocking down MALAT1 in melanoma cells significantly upregulated the expression of tumor suppressing MiR34a [61]. Interestingly, our previous results demonstrated that MiR34a was upregulated in whitefish liver after MC-LR exposure [62].
Because our RNA-Seq and qPCR results showed that MALAT1 was present at a high level in normal whitefish liver, and most likely its expression was downregulated at 6 and 9d after MC-LR exposure, this could indicate that decreased abundance of MALAT1 may also be linked with upregulated levels of MiR34a in whitefish liver after MC-LR exposure. Although the qPCR data on MALAT1 was not statistically conclusive, it was consistent with the RNA-Seq data, in that both techniques suggested or indicated, respectively, downregulation of this transcript 6 and 9d after MC-LR exposure. Although it is too soon to speculate on whether and how this effect could potentially underlie the molecular mechanisms of MC-LR–induced liver injury, this finding adds to the little that is known about possible lncRNA and miRNA interactions in liver cells.
Previous research on MALAT1 variability has demonstrated that the majority of its transcripts are not polyadenylated [63]. However, it has been revealed that, in mammals, MALAT1 not only has a highly abundant short isoform that lacks a poly(A) tail, but also has a long isoform that is present at a much lower level and has a genomically encoded poly(A) tract. The longer isoform is further processed by RNAse P and RNAse Z to the shorter isoform. In this study, using a library preparation method that enriched only polyadenylated transcripts, we might have detected mainly polyadenylated MALAT1 transcripts, thus confirming the presence of that isoform in fish. To better study the biological function of these lncRNA, further research is required to investigate both the structural diversity and gene regulation of MALAT1.
After removal of PCTs and known NCTs, a large group of the remaining NCTs still mapped to the mRNAs deposited in the Reference Sequence database (RefSeq). However, after closer examination of particular BLAST hits, we noticed that the vast majority of our remaining NCTs mapped not to the coding parts of matched mRNA sequences, but to their non-coding 3’UTR regions. The presence of autonomous 3′UTR transcripts separated from their associated mRNAs has been documented in studies on mouse and human cells [64–66]. Because 3’UTR regions that were considered to be a part of the canonical transcripts are in fact biologically significant autonomous units participating in post-transcriptional regulation [66], we analyzed how expression of our putative autonomous 3’UTR transcripts corresponded to that of the PCT from the same mRNA. We found that, in the normal (unchallenged) condition, almost the same number of mRNAs had a significantly higher number of 3’UTR transcripts (48% of differentially expressed (DE) mRNAs) as those which had a higher number of PCTs (52% of DE mRNAs). This may indicate that, in normal whitefish liver, expression of those transcripts remains in a stable relationship. Moreover, we showed that exposure to MC-LR increased the abundance of PCTs while decreasing that of 3’UTR transcripts form the same mRNA (Fig. 3B). In pairs in which expression was changed after MC-LR exposure, 60% of the pairs had more PCTs than 3’UTR transcripts. In contrast, the same DE pairs showed the opposite pattern of expression in the control samples (i.e. 60% of the same pairs had more 3’UTR transcripts), indicating that MC-LR changed PCT/3’UTR ratio in about 20% of DE mRNAs (depending on the duration of exposure). This also indicated that, after MC-LR exposure, expression of our putative 3’UTR transcripts changed independently of PCTs expression.
In the majority of cases, if a putative autonomous 3’UTR was DE after MC-LR exposure, the associated PCT was also DE in the same direction. Our results show that, out of all DE PCT/3’UTR pairs after 1d of MC-LR exposure, over 82% of the paired transcripts were upregulated (Fig. 3A). In contrast, at after 6 and 9d of exposure, there was no longer a majority of PCT/3’UTR pairs that were DE in the same direction. Moreover, gene ontology terms analysis showed that, after 1d of exposure, the co-upregulated pairs were enriched in transcription regulators, suggesting that these transcripts play roles in regulating the response to severe liver damage (Fig. 5). Additionally, our results show that DE PCT/3’UTR pairs at 6 and 9d of the exposure are similar in terms of function and direction of expression changes (Additional File 6). The observed changes in the liver transcriptome between the 1st and the 6th or 9th day of exposure may support our previous finding that challenging whitefish with MC-LR results in severe liver damage, which is followed by resilience to further exposures to the toxin, allowing for regeneration of the damaged organ [14]. It should be investigated whether this apparent shift in the transcriptome profile reflects remodeling of liver cell processes for repair of the tissue.
Moreover, we also found that, in response to MC-LR exposure, some putative autonomous 3’UTRs and PCTs from the same mRNA were differently expressed in opposite directions, i.e. one was upregulated while the other was downregulated (Fig. 3A green and brown bars). We were particularly interested in pairs where the 3’UTR transcripts were upregulated and the PCTs were downregulated, as this could be attributed to a recently discovered mechanism in which the 3’UTR is cleaved and shortened [66]. The discoverers of that mechanism hypothesized that it serves as a global regulatory tool that works by increasing the effectiveness of miRNA binding sites upstream of the cleavage site. This would cause levels of 3’UTRs to rise after cleavage, while levels of the corresponding PCTs would drop as a result of more effective miRNA binding. Our previous study showed that miRNAs that play roles in transcription regulation are also aberrantly expressed after exposure to MC-LR [16]. Because there is a possible crosstalk between various types of NCTs participating in post-transcriptional regulation of PCTs in MC-LR–induced liver injury, our results suggest the necessity of adopting a wider perspective when investigating the effects of MC-LR–induced liver damage. Researchers looking to discover all aspects of transcriptome regulation by ncRNAs in MC-LR toxicity should focus on decoding the crosstalk between NCTs and PCTs, i.e. competition between endogenous RNAs [67]. It has been shown recently that this strategy can be applied to successfully investigate mechanisms of MC-LR toxicity in other tissues. For example, Meng et. al showed a transcriptomic regulatory network of miRNAs, piRNAs, circular RNAs, lncRNAs and mRNAs that were simultaneously involved in the cytotoxicity of MC-LR in testicular tissues in mice [5].
Although we showed that the putative 3’UTR contigs were expressed independently of PCTs, some of the autonomous 3’UTRs paired with PCTs from the same mRNA could in fact be fragmented or incomplete mRNAs which were not assembled properly. However, even if a de novo transcriptome assembly approach for detecting various types of non-coding RNAs is not without flaws, it can be used as an extension to analyses based on annotated genomes. Because the detection of autonomous 3’UTR transcripts using an annotated genome needs additional enrichment of 3’UTR transcripts in the RNA isolation or library preparation steps, it requires additional sequencing runs. Alternatively, pipelines designed to analyze autonomous 3’UTRs based on the de novo assembled transcriptome may be used in a preliminary research, eventually leading to more focused sequencing runs. Furthermore, the huge collection of deposited transcriptomic data may be reused for additional de novo analysis by teams seeking direction for their studies on non-coding RNAs.
The aforementioned group of 84,974 putative novel lncRNAs contained transcripts longer than 200 nucleotides with no homology to any tested database. We showed that MC-LR upregulated expression levels of 1739 putative novel lncRNAs and downregulated 2690 (Fig. 7). We found that MC-LR–induced change in expression of putative novel lncRNAs was also reflected in change in expression of PCTs (Fig. 4). Because, as for now, co-expression of lncRNAs with PCTs is the most common approach for identifying potential target genes of lncRNAs [12], this will allow to identify potential target genes of lncRNAs, and to investigate their biological function in MC-LR–induced liver injury in whitefish. Moreover, putative novel lncRNAs might also serve as potential biomarkers for early detection of severe liver injury in whitefish (1d), as well as the fish’s recovery form exposure to the toxin, including regeneration of liver tissues (6d, 9d). As demonstrated in certain types of cancers [68], lncRNAs have highly specific expression patterns and relatively stable secondary structures, and are efficiently detected in blood, plasma and urine. With these properties they have the potential to serve as novel noninvasive biomarkers for drug-induced liver injury. For example, our previous results suggests that the non-coding MiRNA-122 can be a non-invasive biomarker for detecting liver damage in fish, and promising alternative to current gold-standard hepatotoxicity markers [69].
The adverse effects of MC-LR in whitefish are not limited only to the liver. For example, we previously showed that MC-LR caused brain injury in whitefish [22]. Although plasma levels of brain-specific MiR124-3p were not altered, it is possible that some brain-specific lncRNAs could serve as biomarkers of brain injury. Interestingly, MALAT1 shows relatively high expression in the brain, where it is involved in regulating synaptogenesis [70]. MALAT1 was downregulated in glioma [71] and is able to regulate levels of MiR124 in various diseases, including Parkinson’s disease [72]. Future studies will advance understanding of the roles of lncRNAs in MC-LR toxicity and will likely reveal novel biomarkers and targets for treatment. It will also be important to determine whether the aberrantly expressed lncRNAs detected in this study can be detected in noninvasively collected samples.