2.1. Identification of the mouse retinal lncRNAs
In this work, we performed lncRNA identification using a hybrid strategy (Figure 1). First, strand-specific libraries were constructed and sequenced for mouse retinas from fifteen developmental stages starting from embryonic day 12.5 (E12.5) to postnatal day 28 (P28), respectively. The alignment of these short reads to the reference genome and the subsequent transcriptome reconstruction gave rise to 326,239 transcripts transcribed from 260,539 gene loci. After filtering out the single-exon encoded transcripts and transcripts that overlapped the exons of annotated protein-coding genes or showed detectable coding potential, we acquired 2,514 multiple-exon encoded noncoding transcripts from 1,925 loci (hereafter referred to as Short Reads derived lncRNAs, SR-lncRNAs).
In addition, we re-analyzed the full-length transcripts generated from iso-seq of the cDNA library of mixed retina tissues to identify the lncRNA candidates. The transcripts of known coding genes or detected coding potential were filtered out, resulting in 2,981 multiple-exon encoded noncoding transcripts from about 2,495 loci (hereafter referred to as Long Reads derived lncRNAs, LR-lncRNAs).
To generate an annotation of retinal lncRNA genes as much intact as possible, we merged the SR- and LR-lncRNAs and the lncRNAs from the genome annotation (VM22-lncRNAs), which generated 22,695 lncRNAs transcribed from 16,298 gene loci. After that, we removed the genes that were lowly-expressed (FPKM < 1) in all the investigated stages. Finally, we acquired a collection of 4,523 lncRNA genes that were considered the “Mouse Retinal LncRNA Genes” (MRLGs) for further analyses (Table S1).
The contribution of three independent data sources, i.e., SR-lncRNAs, LR-lncRNAs, and VM22-lncRNAs, to the final MRLG annotation was subsequently assessed (Figure 2). From gene identification, we found that VM22-lncRNAs contributed the largest number of MRLGs (2,889), including 2,332 specific gene loci. Mainly, this was ascribed to single-exon encoded lncRNA transcripts (SE-lncRNAs) being exclusively collected from VM22-lncRNAs in our strategy, i.e., 1,382 out of these 2,332 loci only contained SE-lncRNAs. In contrast, SR-lncRNAs and LR-lncRNAs identified 2,191 lncRNA gene loci, including 1,634 not reported in the genome annotation. Furthermore, long reads contributed more than short reads in gene identification, i.e., 1,200 MRLGs were specific to LR-lncRNAs, while only 365 to SR-lncRNAs. Similarly, for the 7,300 transcripts transcribed from the MRLG loci, most were descended from genome annotation, while 1,973 and 1,087 were evidenced by the full-length transcripts derived from iso-seq and the reconstructed transcripts derived from lncRNA-seq, respectively.
Notably, MRLG annotation found many novel isoforms for known lncRNA genes. For 430 VM22-lncRNA corresponding loci in the final MRLG annotation, iso-seq and lncRNA-seq respectively found 408 and 433 novel isoforms. Taking the genes that were most significantly varied in isoform numbers for example, we detected nearly all known isoforms (10/11) and 21 novel isoforms of the gene Six3os1 (XLOC_006735), as well as four out of ten known isoforms and 11 novel isoforms of Firre (XLOC_015623).
2.2. Genomic and expression features of the MRLGs
The MRLGs were distributed scattering through the whole genome, with chr5 (430) and chrY (6) containing the largest and smallest numbers, respectively (Figure 3A). Even if isoforms collected from the genome annotation, which might be not expressed in the retina, were included, fewer than three isoforms were found for most MRLGs (Figure 3B). In addition, the average length of the MRLG transcripts (ca. 2,900 nt) was much longer than that of the previously known mouse lncRNAs (ca. 1,380 nt), which probably benefited from the integration of long-read sequencing data in this work (Figure 3C). 1,563 MRLGs only contained SE-transcripts, while the others contained at least one ME-transcripts with an average exon number of 3.2 (ranging from 2 to 33) (Figure 3D). According to the relative position to protein-coding genes, MRLGs were classified into five groups, i.e., antisense (1,319), divergent (330), convergent (229), intronic (2,089), and strict lincRNAs (556).
Based on the 60-way multiple species alignment of vertebrate genomes, we found that the conservation of MRLGs exons was lower than coding exons but slightly higher than the random genomic regions (Figure S1). In addition, a primary sequence-based homology search revealed that 1,423 MRLGs (31.5%) were conserved in evolution, including 385 in more than two vertebrates, such as the well-known genes Miat, Malat1, Tug1, and Meg3. Furthermore, rat and human share 1,093 and 402 MRLGs with the mouse, respectively, in contrast to less than 100 in several other species, such as zebrafish, chicken, and platypus (Figure S2).
Derived from the lncRNA-seq data analyses, we obtained an FPKM matrix consisting of the expression levels of the protein-coding genes and MRLGs in developing mouse retinas. It appeared that MRLGs were expressed at a lower level than coding genes (Figure 4), with their median FPKM values ranging from 0.8-1.2 and 4.0-7.3 in the investigated samples, respectively. Notably, the content of the highest expressed MRLGs in individual samples were much the same, e.g., the expression of Rmrp, Gm28960, Gm37750, and Gm43940 were the top ten in all stages. In addition, the MRLGs showed higher stage specificity than the protein-coding genes (Figure S3) (t-test, p-value < 0.001). For example, 2.1% of the MRLGs (94/4523) appeared to be stage-specific (τ ≥ 0.8), which was significantly more so than protein-coding genes (1.3%, 168/13187).
We further compared the expression of MRLGs of different classes and found that the intronic MRLGs were most varied from the others (Figure S4). In contrast, expression levels between convergent and other MRLGs (i.e., antisense, divergent and intergenic lncRNAs), as well as between divergent and intergenic MRLGs, was very similar in most stages. Interestingly, in some stages such as E18.5, P0, P3, and P28, expression variation between classes was rarely observed.
2.3. Identification and function analyses of the cis- and trans-targets
Regulatory roles of lncRNAs could perform through cis- and trans-acting mechanisms. The cis-targets of lncRNAs usually refer to the nearest coding genes in the same chromosome. In contrast, the trans-regulation is not dependent on the relative position of lncRNA and targets in chromosomes. In this work, we identified 894 nearest protein-coding genes within 100 kb genomic regions up or downstream of the MRLGs. To increase the specificity of cis-targets identification, only 74 protein-coding genes that showed strong expression correlation (cor ≥ 0.9 and p-value ≤ 0.05) to their nearby lncRNAs were herein classified as the cis-targets. Functional analyses revealed the ‘retinal ganglion cell axon guidance’ (GO:0031290) related genes were enriched in the cis-targets (c2-test, p-value ≤ 0.05), including Pou4f2 (Brn3b), Efna5 and Zic2. Besides, the cis-targets also enriched genes related to cell differentiation and anatomical structure development, such as the homeobox transcription factors Tgif2, Lmx1b and Meis2 (Table S2).
On the other hand, we characterized the trans-targets of MRLGs through co-expression analyses. The protein-coding gene was determined as a trans-target of one MRLG if they were strongly correlated in expression and were clustered into the same WGCNA module. As a result, 133,371 trans-regulations between 1,223 MRLGs and 6,976 protein-coding genes were determined. Subsequently, the biological roles of 796 MRLGs were predicted based on the trans-targets. It revealed that more than 10% of them were involved in biological processes such as vesicle-mediated transport in the synapse, visual perception, autophagy, and detection of light stimulus, indicating their close association with the physiological roles of retina tissue (Figure S5).
2.4. Temporary expression of MRLGs in developing mouse retinas
Based on the expression profiles, pairwise Pearson correlation coefficients between stages were calculated, which clustered the developmental stages into two main groups. According to the majority, they were hereafter referred to as the embryonic group (E12.5-E17.5) and neonatal group (E18.5-P28) (Figure 5A). Generally, the MRLG expression profiles in adjacent periods showed a higher correlation. Interestingly, the P1 profile was more similar to the early days of retinal neurogenesis (E12.5 and E13.5) but not the adjacent period P0 or P3. Additional principal component analyses (PCA) also revealed this pattern according to PCA1 (Figure 5B).
To assess the variation of lncRNA content in developing retinas, we categorized the MRLGs into ‘expressed’ or ‘not expressed’ in individual samples with an empirically strict cutoff (FPKM ≥ 1 in more than one stage). As a result, about 15% MRLGs were detected in all the stages, while a similar number of MRLGs (704) were specifically expressed (Figure S6). For individual stages, the number of expressed MRLGs varied greatly, i.e., from about 1,800 in E18.5, P0, and P14 to 2,774 in P1. Meanwhile, for any two stages, the number of shared lncRNAs ranged from 41% (between P1 and P14) to 95% (between E14.5 and E15.5) (Table S3). The E12.5 and E13.5 retinas shared the most MRLGs (2,169 genes), accounting for about 90% of their lncRNA contents. In contrast, the E12.5 and P14 retinas shared the fewest MRLGs (1,004), with only about 42% and 55% of the lncRNAs expressed in these two stages, respectively.
Furthermore, we found that about 100 additional MRLGs were detected in each stage after E12.5 according to the chronological order of development (Figure 6). Exceptionally, 282 and 314 additional MRLGs were identified at E13.5 and P1, respectively. Interestingly, the detected MRLGs were much more in P1 (2,774) than in P0 (1,823) retinas, but only 314 additional lncRNA genes were identified from the period P0 to P1. An inspection of the shared MRLG contents between P1 retinas and others showed that E13.5 and E16.5 retinas shared the largest number of lncRNAs with P1. Still, in E17.5 and E18.5 retinas, the shared MRLG contents were largely decreased, suggesting that many MRLGs were down-regulated at or before P0 (e.g., E17.5 or E18.5) and up-regulated at P1.
2.5. Modulation of MRLGs’ roles during retina development
As mentioned above, MRLG content was altered a lot during retina development. However, due to the function ambiguity of most MRLGs, their association with the modulation of tissue physiological features during development remains unclear. In this work, through the co-expression analyses, we found about 560 to 880 of trans-acting lncRNAs were expressed in individual retina samples, representing about 26-39% of the total MRLGs. Thus, herein we tried to describe the general roles of MRLGs in developing retinas through these trans-acting lncRNAs.
First, functional enrichment analyses of the targets (coding genes) were performed for each trans-acting lncRNAs having three or more targets. Then, the functions of these lncRNAs were assigned by the top five enriched GO terms. We subsequently summarized the presented GO terms and their abundance in a matrix. Further clustering analyses based on the matrix revealed a close association between developmental stage and lncRNA function. It appeared that the main roles of lncRNAs continuously altered along with retina development. Consistent with gene expression profiles, more similar functions were also observed for lncRNAs expressed in adjacent periods. In contrast, significant difference was observed there between the early embryonic (E12.5/E13.5) and neonatal retinas (P14-P28) (Figure S7).
To further describe the variation of lncRNAs’ roles during retina development, we ranked the GO terms in each stage by their abundance (Table S4). It showed that the MRLGs involved in ‘Golgi vesicle transport’ (GO:0048193) and autophagy related biological processes (GO:0006914 and GO:0061919) were abundant in both embryonic and neonatal retinas, although dropped in the rankings from P5. In contrast, the rankings of some other GO terms rise along with the retina development, e.g., ‘sensory perception of light stimulus’ (GO:0050953) moved up from 24th at E12.5 to 1st after P5, as well as ‘synaptic vesicle cycle’ (GO:0099504) and ‘vesicle-mediated transport in synapse’ (GO:0099003) were about 30th at E12.5 and came in top ten around the day of birth. On the contrary, the processes associated with microtubule-based movement and transport were significant in the early embryonic days but their proportion were strikingly decreased around P3-P5. These facts revealed that the dynamic expression of MRLGs was associated with the process of establishing the functional retinas and the MRLGs might be involved in various processes including neuron cell differentiation, development, migration and function.
2.6. Tissue specificity of MRLG expression
Expression of lncRNAs usually shows extensive tissue specificity, which is also significantly associated with the tissues’ physiological roles. Herein, we analyzed the MRLG expression in 15 mouse tissues and found that 2,004 MRLGs were expressed in at least one of these tissues, including 73 expressed in all (e.g., the well-known lncRNAs Gas5, Tug1, and Malat1). The nervous tissue cerebrum expressed the largest number of MRLGs (1,003), followed by the thymus (797), testis (764) and mammary gland (744). In contrast, the digestive tissues, such as the stomach, liver and intestines, expressed the fewest number of MRLGs (< 400) (Figure S8).
498 MRLGs were only expressed in retina and cerebrum, which were enriched in biological processes such as excitatory synapse assembly and its regulation (GO:1904861 and GO:1904889), postsynaptic density organization (GO:0097106), and eye morphogenesis (GO:0048592) (c2-test, p-value ≤ 0.05). Likewise, according to our predicted cis-regulations, six of these MRLGs possibly cis-regulated their adjacent coding genes, i.e., XLOC_000612-Rgs20, XLOC_001325-Marcks, XLOC_006138-Olig2, XLOC_006289-Glo1, XLOC_007869-Lrrc55 and XLOC_015511-Zxdb. Notably, the involvement of the cis-targets Marcks and Olig2 in retina development has been previously reported. For example, disruption of Marcks would result in defects in retinal lamination in mice [20], and Olig2 is one of the markers for retinal progenitor cells and some differentiated retinal subpopulations [21].
In addition, 2,519 MRLGs (55.7%) were lowly expressed in tissues other than retina and were considered retina-specific. Comparative analyses revealed that they were expressed at a lower level than those non-specific ones on the whole (t-test, p-value: 5.6E-17). When checking their expression in individual stages, we found this variation was particularly evident from the day of E18.5 (Figure S9). Furthermore, from the aspect of functions, retina-specific MRLGs were over-represented in the biological process of ‘negative regulation of fibroblast cell migration’ (GO:0010764) (c2-test, p-value ≤ 0.05).
2.7. Differential expression of MRLGs induced by function loss of some transcription factors
To further understand the biological roles of the MRLGs, we analyzed their expression in the retina tissues of several gene knockout mutants. Due to the collected RNA-seq data are not derived from sequencing of strand-specific libraries, only the strict lincRNAs were investigated here to minimize the errors arising from the overlap of lncRNAs and mRNAs. As a result, we identified 97 intergenic MRLGs that were differentially expressed after function loss of Math5, Isl1, Brn3b, NRL, Onecut1 (Oc1) and Onecut2 (Oc2) (Table S5).
We found two and three intergenic MRLGs were varied in expression in Brn3b and Isl1 knockout mutants (E14.5), respectively, in contrast to 18 in the Math5-null mutant (E14.5). Notably, XLOC_013893 (Gm28511) was down-regulated after the function loss of all three TFs. According to the dynamic expression profiles, expression of XLOC_013893 peaked around E14.5 and was undetectable in the retinas after birth. Moreover, we found that XLOC_013893 is about 64 kb away from its potential cis-target Irx3, which two showed strong expression correlation during retina development (cor=0.74, p-value: 3.67E-06). The Irx3 gene was also significantly down-regulated in the retinas of Math5-, Isl1-, and Brn3b-null mutants (log2foldchange about -3.0 to -2.3). Comparative genomics analyses showed that XLOC_013893 is conserved in the human genome and that the full-length transcript sequence could be mapped to chr16 with an identity of 93.5%. The corresponding genomic region in the human genome encoded an RNA gene ENSG00000287885, which is also close to the Irx3 gene (about 83 Kb away), suggesting the XLOC_013893-Irx3 regulation is conserved between human and mouse.
Oc1 and Oc2 are involved in the fate determination of early retinal cell types, such as horizontal cells (HCs), RGCs, cones and amacrine cells [22]. Seven and one intergenic MRLGs were differentially expressed in E14.5 retinas after function loss of Oc1 and Oc2, respectively, while six were found after double knockout of Oc1 and Oc2 (Oc1/Oc2 DKO). Interestingly, rare DEGs were shared by Oc1-null, Oc2-null and DKO mutants. The biological function of these differentially expressed lncRNAs was not determined. However, while checking their potential trans-targets, we found that two were possibly associated with the visual perception (XLOC_010064, p-value: 1.14E-19) and vesicle-mediated transport in the synapse (XLOC_011622, p-value: 3.04E-08).
In addition, we assessed the expression of intergenic MRLGs in adult NRL-null mutants (P21). The neural retina leucine zipper protein NRL is required for rod photoreceptor development. Its function loss resulted in down- and up-regulation of 44 and 25 intergenic MRLGs, respectively. Inferred from the functions of trans-lncRNAs, we found that these 69 differentially expressed MRLGs were closely associated with muscle system process (GO:0003012), regulation of membrane potential (GO:0042391), and sensory perception of light stimulus (GO:0050953). We also found three cis-targets for DEGs, i.e., XLOC_007186-8030462N17Rik, XLOC_011105-Lcorl, and XLOC_012109-Strip2, but the involvement of these lncRNAs and their cis-targets in retina development remains still unclear.
Some transcription factors are redundant in biological functions, such as Brn3b and Isl1 in RGC differentiation and survival, as well as Oc1 and Oc2 in the fate determination of several early retinal cell types. However, few intergenic MRLGs were regulated in the same manner in their function loss mutants, i.e., we only found that XLOC_013893 were down-regulated in both Brn3b- and Isl1-null mutants, XLOC_005930 were up-regulated in Oc1- and Oc2-null mutants, as well as XLOC_002920 was up-regulated in Oc1-null and Oc1/Oc2 DKO mutants. Notably, XLOC_010659 was down-regulated in both Math5- and Oc1-null mutants, which might be consistent with the role of Math5 and Oc1 in regulating the differentiation of early cell types. Likewise, we also found some lncRNAs were differentially regulated among mutants, e.g., XLOC_011005 was down-regulated in Oc1- and Math5-null mutants but was up-regulated in NRL-null mice, XLOC_011568 was up-regulated in Oc1-null mutant but down-regulated in Math5-null mice, and XLOC_011622 was up-regulated in Oc1/Oc2 DKO but down-regulated in NRL-null mutants.