General features of circRNAs in both healthy and DCM human hearts
We performed a de novo analysis of public RNA-seq data sets that were generated from left ventricular tissues obtained from 5 healthy individuals and 5 DCM patients [10]. These data sets were selected because they were generated using a ribosome-depletion library, which is the gold standard method to identify back-spliced junctions for circRNAs detection [44]. We first removed low-quality reads and then utilized Bowtie2 [32] to align clean reads against the human reference genome. Reads continually aligned to the genome were used for linear gene expression analysis and the remaining un-mapped reads were further used as input for find_circ [36] to detect both linear and back-spliced junctions (Additional file 1: Figure S1). We first performed a quality control validation of this data by performing principal component analysis (PCA) based on the expression of linear genes with raw count > 10 in at least 5 samples. PCA analysis confirmed that samples from each group were clustered respectively as expected (Additional file 4: Figure S2A). We further examined the expression of a marker of HF, atrial natriuretic peptide α (NPPA). Consistent with the reported HF phenotype [45], our analysis revealed that the expression of NPPA is significantly increased in DCM specimens (Additional file 4: Figure S2B). Taken together, these initial analyses confirm the quality of this RNA-seq dataset and its suitability for the further analysis of circRNAs.
With a threshold cutoff of ³ 2 unique back-spliced reads present in at least one sample, we identified a total of 20,214 circRNAs in the 10 samples (Figure 1A and Additional file 5: Table S3). The identified circRNAs originated primarily from the coding-sequence (CDS) regions of exons of single linear genes (67.2%), followed by exons spanning coding and untranslated regions (UTR) (UTR-CDS) (11.6%) (Figure 1B). Notably, a total of 252 circRNAs (1.2%) were assigned to the exons of two adjacent genes on the same strand (Additional file 6: Table S4), likely representing a novel class of circRNAs termed as rt-circRNA [23]. In particular, the genes MYH6 and MYH7 produced the largest number (n = 22) of such circRNAs, which were also observed in human, mouse and rat hearts as reported by two earlier studies (Additional file 7: Figure S3A-C) [27, 29]. In addition, these MYH6/MYH7-derived circRNAs were highly abundant and conserved in hearts of human, mouse and rat (Additional file 7: Figure S3D). However, close inspection of the sequence between the exons where they are back-spliced suggests that the back-spliced junctions of these circRNAs are probably artifactual due to read misalignment caused by the high sequence homology between MYH6 and MYH7 (Additional file 7: Figure S3E). This result prompted us to employ more stringent criteria to filter circRNAs originating from two neighboring genes with highly homologous sequences. Using this revised filtering strategy, we identified 110 putative rt-circRNAs. Among them, SCAF8 and TIAM2 contain the largest number of rt-circRNAs (n = 4), all of which were also previously identified in the human heart (Figure 1C).
We next examined whether there was a correlative relationship between the expression of circRNAs and their parent linear genes. We found that there was a significantly higher correlation between circRNAs and their linear parent genes compared to that between circRNAs and randomly chosen linear genes (Figure 1D), suggesting that the generally high-abundance circRNAs tend to have high-abundance linear counterparts. We further found that the length of the majority of identified circRNAs was around 300-400 bp (Figure 1E). In accordance with previous studies [46-48], the most common splice accepting circularized exon was exon 2 (Figure 1F). Furthermore, we found that the majority of circRNA-producing genes generated 1 circRNA (41.7%, 2,231/5,350) and only 6.0% (n = 319) produced more than 10 circRNAs (Figure 1G). The top two genes producing the largest number of circRNAs were TTN (Titin) and RYR2 (Ryanodine receptor 2), which generated 197 and 173 circRNAs, respectively (Additional file 8: Figure S4A-B). Because the TTN and RYR2 genes have the largest number of exons in human genome, we hypothesize that the number of circRNAs produced is positively correlated to the number of exons within their parent genes. Indeed, a proportional increase in the number of circRNAs was observed to correlate with the number of exons per gene (Additional file 8: Figure S4C). Furthermore, we found 17,733 identified circRNAs, in total, contain exons. Among these, single exonic circRNAs were found in 8.4% (1,494/17,733) while the rest of circRNAs were encoded by more than two exons (Additional file 8: Figure S4D).
Next we compared the circRNAs that we identified with others that were previously identified in human hearts [27-29] and enlisted in circRNA databases [40, 41, 43, 49]. We found that 16,304 circRNAs (80.7%) were previously reported by either human heart circRNA studies or enlisted in circRNA databases (Figure 1H). In addition, 3,831 and 2,636 circRNAs have orthologs in the mouse and rat, respectively. Importantly, 1,757 were conserved in all three species by integrating mouse and rat circRNAs reported by previous studies and databases [26, 27, 29, 40, 41, 43, 49] (Figure 1I). We further found that 7,585 and 6,211 circRNAs were specifically identified in normal and DCM hearts, respectively, and that 6,418 were identified in both (Figure 1J). Further analysis revealed that only 729 circRNAs were present in all the 10 samples (Figure 1K), suggesting that many circRNAs exhibit a high degree of variation in expression among human individuals. Additionally, the majority of circRNAs (n=12,099) were observed in only one sample (Figure 1K), which likely represent false positives or low-abundance circRNAs. To clarify this, we compared the circRNAs observed once in this study with circRNAs reported by 3 previous human heart circRNA studies [27-29] and 4 circRNA databases [40, 41, 43, 49]. This analysis revealed that 8,666 (71.6%) among them were previously detected (Additional file 9: Figure S5), suggesting these circRNAs observed in a single sample are probably authentic and thus should be retained in studies aiming for circRNA discovery. Collectively, these results indicate that the expression of circRNAs, including rt-circRNAs, is widespread in the human heart with a certain degree of conservation between species.
Identification of the most abundant circRNAs in healthy human heart
Given the observation that circRNA abundance was not universally distributed across samples (Figure 1K), we restricted our analysis to 2,461 circRNAs which were considered to be high-confidence circRNAs as they had ³ 2 reads spanning the back-spliced junction in ³ 4 samples in at least one group (Additional file 10: Table S5). All these high-confidence circRNAs were overlapped with previously identified circRNAs (Additional file 11: Figure S6A) and 1,121 (45.6%) showed conservation in either mouse or rat (Additional file 11: Figure S6B), indicating the reliability of these circRNAs we identified.
CircRNAs that are abundantly expressed in normal heart likely contribute to the maintenance of cardiac homeostasis. We therefore focused on the top 30 circRNAs with the highest average expression level across the 5 healthy heart samples (Figure 2A). The top 2 most abundant circRNAs (circSLC8A1_11 and circSLC8A1_12) originate from the SLC8A1 (sodium/calcium exchanger 1) gene. While most of the top circRNAs were also identified in the studies of Tan et al. [27] and Werfel et al. [29], which found high expression levels of circRNAs such as circSLC8A1_12, circHIPK3_1 and circALPK2_2 in healthy hearts, some exceptions were observed. These include circSLC8A1_11 and circYY1AP1_3 that were not detected in study of Tan et al. [27], as well as circZNF91_2 and circPCMTD1_6 whose expression was identified to be abundant in the heart in only one of the previous studies (Figure 2A). Comparison of the results obtained from different species revealed that half of these highly abundant circRNAs in normal human hearts were conserved in mouse or rat hearts, but only 2 of them including circSLC8A1_11 and circHIPK3_1 exhibited high levels of expression in all species, and 2 of them including circQKI_4 and circTULP4_1 were abundantly expressed in human and mouse hearts, but not in rat. The expression levels of the remaining conserved circRNAs were moderate or low in rodent hearts. We then calculated the correlation between the abundance of top circRNAs and their host genes and found a highly positive correlation for most of these circRNAs (Additional file 12: Figure S7A), including the top 3 most abundant ones (i.e., circSLC8A1_11, circSLC8A1_12 and circHIPK3_1) (Figure 2B). In contrast, circSETD3_3, circMB_1 and circCORO1C_1, together with some others (Additional file 12: Figure S7B), appeared to be independently transcribed with their linear parent genes (Figure 2C). Taken together, by integrating results from multiple studies, our analyses define the most abundant circRNAs in the healthy human heart.
Identification of dysregulated circRNAs in DCM
We next performed differential analysis to identify circRNAs that are dysregulated in DCM. Using a threshold of p < 0.05 and FC > 2, a total of 392 circRNAs, including 101 up-regulated and 291 down-regulated circRNAs, were identified to be differentially expressed in DCM hearts as compared to normal hearts (Figure 3A and Additional file 10: Table S5). These differentially expressed circRNAs were derived from 290 unique host genes. Pathway analysis revealed that these host genes are significantly over-represented in pathways associated with diseases of the heart, including arrhythmogenic right ventricular cardiomyopathy (ARVC), hypertrophic cardiomyopathy, and DCM (Figure 3B). Interestingly, 2 rt-circRNAs originating from SCAF8 and TIAM2, including SCAF8_e4:TIAM2_e1 and SCAF8_e4:TIAM2_e2, were remarkably downregulated in DCM (Figure 3C). Many of the significantly down- and up-regulated circRNAs were positively correlated with the expression of their host genes such as circFBLN1_5 with FBLN1 and circNLGN1_1 with NLGN1, respectively (Figure 3D and E). However, the expression of some circRNAs in DCM was independent of changes in their host gene expression (Figure 3F). Examples of this discordant relationship included circABCC1_9 and circHERC4_11, which was significantly up- and down-regulated in DCM, respectively, while the expression of their host gene mRNAs had no obvious change in DCM (Figure 3F). Together, these results indicate that dysregulation of circRNAs in DCM tend to originate from heart disease-related gene loci, which include the rt-circRNAs from SCAF8 and TIAM2.
Screening circRNAs for potential as miRNA sponges
To screen for cardiac circRNAs with the potential to act as miRNA sponges, we analyzed the sequences of each high-confidence circRNAs for the presence of putative binding sites for 50 miRNAs that have established roles in heart disease (Additional file 2: Table S1). In total, 142 circRNAs were predicted to harbor at least 5 predicted binding sites for at least one of 34 heart disease-related miRNAs (Additional file 13: Table S6). Additionally, a total of 16,472 putative targeted genes of these miRNAs were identified, including 598 and 276 up- and down-regulated genes in DCM hearts compared to normal hearts (Additional file 14: Table S7). A circRNA-miRNA-mRNA regulatory network was further constructed based the predicted interactions between circRNAs and miRNAs, as well miRNAs and mRNAs (Figure 4A). The top 32 circRNAs that contained more than 10 putative binding sites for at least one miRNA are listed in Figure 4B. Notably, 23 of these are derived from one gene, TTN (Figure 4B). In particular, circALMS1_6 had the strongest potential to function as miRNA sponge, containing 28 and 20 putative binding sites for miR-133a-5p and miR-133a-3p, respectively (Figure 4C). Another notable finding is that the rt-circRNA, circSCAF8_e4:TIAM2_e2, was predicted to be capable of sponging multiple miRNAs including miR-150-5p and miR-17 that have been reported to be induced in the rat heart of myocardial ischemia reperfusion injury [50] and in the mouse heart of chronic hypoxia-induced pulmonary hypertension mouse model [51] (Figure 4D). Interestingly, some of the miR-17 downstream targets, such as SCN2B [52, 53], F2R [54], KCNB1 [55] and EGLN3 [56] that have been implicated in heart dysfunction, were down-regulated in the DCM hearts as revealed by expression analysis of linear RNAs, likely through the unleashed miR-17 that resulted from the down-regulation of circSCAF8_e4:TIAM2_e2 in DCM (Figure 4E). Collectively, these analyses provide a systematic evaluation of the potential of cardiac circRNAs to sponge miRNAs and provide a resource for studying cardiac miRNA inhibitors.
Experimental validation of identified circRNAs
Due to lack of availability of human samples, we next sought to validate the conserved circRNAs with mouse orthologs that we identified above using mouse heart samples. We designed divergent primers for the murine orthologs of 6 identified human cardiac circRNAs (Additional file 3: Table S2). These circRNAs were chosen to represent 2 of the most abundant cardiac circRNAs (circEXOC6B_13, circALPK2_2 and circSLC8A_11), 2 of the dysregulated circRNAs (circENAH_4 and circLRP6_4), and 1 non-changed circRNAs between DMC and normal hearts (circN4BP2L2_13). Importantly, these circRNAs have not been experimentally verified by previous studies, with exception of circSLC8A_11 that served as a positive control for circRNA amplification in this study. We performed PCR amplification using mouse heart cDNA as the template, with mouse heart genomic DNA used as a negative control and mouse Yap1 gene primers used as an internal control for the PCR amplification of genomic DNA. All 6 of the selected circRNAs were successfully amplified in heart cDNA but not in genomic DNA (Figure 5A). Sanger sequencing of the PCR products verified back-spliced junctions (Figure 5B-G). Taken together, all the circRNAs examined were confirmed to be bona fide, suggesting the reliability of our in silico circRNA identification.