General features of circRNAs in both healthy and DCM human hearts
A de novo analysis of public RNA-seq data sets generated using left ventricular tissue obtained from 5 healthy individuals and 5 DCM patients  was performed. These data sets were selected because they were generated using a ribosome-depletion library, which is the gold standard method to identify back-spliced junctions for circRNAs detection . We first removed low-quality reads and then utilized Bowtie2  to align clean reads with the human reference genome. Reads continually aligned to the genome were used for linear gene expression analysis and the remaining un-mapped reads were further used as input for find_circ  to detect both linear and back-spliced junctions (Additional file 1: Figure S1). We first performed a quality control validation of this data by performing principal component analysis (PCA) based on the expression of linear genes with raw count > 10 in at least 5 samples. PCA analysis confirmed that samples from each group were clustered respectively as expected (Additional file 2: Figure S2A). We further examined the expression of a marker of HF, atrial natriuretic peptide α (NPPA). Consistent with the reported HF phenotype , our analysis revealed that the expression of NPPA was significantly increased in DCM specimens (Additional file 2: Figure S2B). Taken together, these initial analyses confirm the quality of this RNA-seq dataset and its suitability for the further analysis of circRNAs.
With a threshold cutoff of ³ 2 unique back-spliced reads present in at least one sample, we identified a total of 20,214 circRNAs in the 10 samples (Figure 1A and Additional file 3: Table S1). The identified circRNAs originated primarily from the coding-sequence (CDS) regions of exons of single linear genes (67.2%), followed by exons spanning coding and untranslated regions (UTR) (UTR-CDS) (11.6%) (Figure 1B). Notably, a total of 252 circRNAs (1.2%) were assigned to the exons of two adjacent genes on the same strand (Additional file 4: Table S2), likely representing a novel class of circRNAs termed as rt-circRNA . In particular, the genes MYH6 and MYH7 produced the largest number (n = 22) of circRNAs, which were also observed in human, mouse and rat hearts as reported by two earlier studies (Additional file 5: Figure S3A-C) [26, 27]. In addition, these MYH6/MYH7-derived circRNAs were highly abundant and conserved in hearts of human, mouse and rat (Additional file 5: Figure S3D). However, close inspection of the sequence between the exons where they are back-spliced suggests that the back-spliced junctions of these circRNAs are probably artifactual due to read misalignment caused by the high sequence homology between MYH6 and MYH7 (Additional file 5: Figure S3E). This result prompted us to employ more stringent criteria to filter circRNAs originating from two neighboring genes with highly homological sequences. Using this revised filtering strategy, we identified 110 putative rt-circRNAs. Among them, SCAF8 and TIAM2 contain the largest number of rt-circRNAs (n = 4), all of which were also previously identified in the human heart (Figure 1C).
We next examined whether a correlative relationship between the expression of circRNAs and their parent linear genes. We found that there was a significantly higher correlation between circRNAs and their linear parent genes compared to that between circRNAs and randomly chosen linear genes (Figure 1D), suggesting that the generally high-abundance circRNAs tend to have high-abundance linear counterparts. We further found that the length of the majority of identified circRNAs was around 300-400 bp (Figure 1E). In accordance with previous studies [35-37], the most common splice accepting circularized exon was exon 2 (Figure 1F). Furthermore, we found that the majority of circRNA-producing genes generated 1 circRNA (41.7%, 2,231/5,350) and only 6.0% (n = 319) produced more than 10 circRNAs (Figure 1G). The top two genes producing the largest number of circRNAs were TTN (Titin) and RYR2 (Ryanodine receptor 2), which generated 197 and 173 circRNAs, respectively (Additional file 6: Figure S4A-B). Because the TTN and RYR2 genes have the largest number of exons in human genome, we hypothesize that the number of circRNAs produced is positively correlated to the number of exons within their parent genes. Indeed, a proportional increase in the number of circRNAs was observed to correlate with the number of exons per gene (Additional file 6: Figure S4C). Furthermore, we found 17,733 identified circRNAs, in total, contain exons. Among these, single exonic circRNAs were found in 8.4% (1,494/17,733) while the rest of circRNAs were encoded by more than two exons (Additional file 6: Figure S4D).
Next we compared the circRNAs that we identified with others that were previously identified in human hearts [26, 27]. We found that 10,990 circRNAs (54.4%) were also detected by at least one of the two studies, and 6,225 (30.8%) were detected by all three studies (Figure 1H). In addition, 2,135 and 2,297 circRNAs have orthologs in the mouse and rat, respectively, and 1,203 were conserved in all three species (Figure 1I). We found that 7,585 and 6,211 circRNAs were specifically identified in normal and DCM hearts, respectively, and that 6,418 were identified in both (Figure 1J). Further analysis revealed that the majority of circRNAs were only detected in one human sample, while only 729 circRNAs were present in all the 10 samples (Figure 1K), suggesting that many circRNAs exhibit a high degree of variation in expression among human individuals, likely due to low fidelity and/or the low expression levels of most circRNAs. Collectively, these results indicate that the expression of circRNAs, including rt-circRNAs, is widespread in the human heart with a certain degree of conservation between species.
Identification of the most abundant circRNAs in healthy human heart
CircRNAs that are abundantly expressed in normal heart likely contribute to the maintenance of cardiac homeostasis. We therefore focused on the top 30 circRNAs with the highest average expression level across the 5 healthy heart samples (Figure 2A). The top 2 most abundant circRNAs (circSLC8A1_11 and circSLC8A1_12) originate from the SLC8A1 (sodium/calcium exchanger 1) gene. While most of the top circRNAs were also identified in the studies of Tan et al.  and Werfel et al. , which found high expression levels of circRNAs such as circSLC8A1_12, circHIPK3_1 and circALPK2_2 in healthy hearts, some exceptions were observed. These include circSLC8A1_11 and circYY1AP1_3 that were not detected in study of Tan et al. , as well as circZNF91_2 and circPCMTD1_6 whose expression was identified to be abundant in the heart in only one of the previous studies (Figure 2A). Comparison of the results obtained from different species revealed that half of these highly abundant circRNAs in normal human hearts were conserved in mouse or rat hearts, but only 2 of them including circSLC8A1_11 and circHIPK3_1 exhibited high levels of expression in all species, and 2 of them including circQKI_4 and circTULP4_1 were abundantly expressed in human and mouse hearts, but not rat. The expression levels of the remaining conserved circRNAs were moderate or low in rodent hearts. We then calculated the correlation between the abundance of top circRNAs and their host genes and found a highly positive correlation for most of these circRNAs (Additional file 7: Figure S5A), including the top 3 most abundant ones (i.e., circSLC8A1_11, circSLC8A1_12 and circHIPK3_1) (Figure 2B). In contrast, circSETD3_3, circMB_1 and circCORO1C_1, together with some others (Additional file 7: Figure S5B), appeared to be independently transcribed with their linear parent genes (Figure 2C). By integrating results from multiple studies, our analyses define the most abundant circRNAs in the healthy human heart.
Identification of dysregulated circRNAs in DCM
We next performed differential analysis to identify circRNAs that are dysregulated in DCM. Given the observation that circRNA abundance was not universally distributed across samples (Figure 1K), we restricted our analysis to 2,461 circRNAs which were considered to be high-confidence circRNAs as they had ³ 2 reads spanning the back-spliced junction in ³ 4 samples in at least one group (Additional file 8: Table S3). Using a threshold of p < 0.05 and fold change (FC) > 2, a total of 392 circRNAs, including 101 up-regulated and 291 down-regulated circRNAs, were identified to be differentially expressed in DCM hearts as compared to normal hearts (Figure 3A and Additional file 8: Table S3). These differentially expressed circRNAs were derived from 290 unique host genes. Pathway analysis revealed that these host genes are significantly over-represented in pathways associated with diseases of the heart, including arrhythmogenic right ventricular cardiomyopathy (ARVC), hypertrophic cardiomyopathy, and DCM (Figure 3B). Interestingly, 2 rt-circRNAs originating from SCAF8 and TIAM2, including SCAF8_e4:TIAM2_e1 and SCAF8_e4:TIAM2_e2, were remarkably downregulated in DCM (Figure 3C). Many of the significantly down- and up-regulated circRNAs were positively correlated with the expression of their host genes such as circFBLN1_5 with FBLN1 and circNLGN1_1 with NLGN1 (Figure 3D and E). However, the expression of some circRNAs in DCM was independent of changes in their host gene expression (Figure 3F). Examples of this discordant relationship included circABCC1_9, which was significantly up-regulated in DCM, while the expression of its host gene mRNA had no obvious change in DCM. Together, these results indicate that dysregulation of circRNAs in DCM tend to originate from heart disease-related gene loci, which include the rt-circRNAs from SCAF8 and TIAM2.
Screening circRNAs for potential as miRNA sponges
To screen for cardiac circRNAs with the potential to act as miRNA sponges, we analyzed the sequences of each high-confidence circRNAs for the presence of putative binding sites for 50 miRNAs that have established roles in heart disease (Additional file 9: Table S4). In total, 142 circRNAs were predicted to harbor at least 5 predicted binding sites for at least one heart disease-related miRNA (Additional file 10: Table S5). The top 32 circRNAs that contained more than 10 putative binding sites for at least one miRNA are listed in Figure 4A. Notably, 23 of these are derived from one gene, TTN (Figure 4A). In particular, circALMS1_6 had the strongest potential to function as miRNA sponge, containing 28 and 20 putative binding sites for miR-133a-5p and miR-133a-3p, respectively (Figure 4B). Another notable finding is that the rt-circRNA, circSCAF8_e4:TIAM2_e2, was predicted to be capable of sponging multiple miRNAs including miR-150-5p and miR-17 that has been reported to be induced in the rat heart of myocardial ischemia reperfusion injury  and in the mouse heart of chronic hypoxia-induced pulmonary hypertension mouse model . (Figure 4C). Interestingly, some of the miR-17 downstream targets, such as SCN2B [40, 41], F2R , KCNB1  and EGLN3  that have been implicated in heart dysfunction, were down-regulated in the DCM hearts revealed by expression analysis of linear RNAs, likely through the unleashed miR-17 that resulted from the down-regulation of circSCAF8_e4:TIAM2_e2 in DCM (Figure 4C). Collectively, these analyses provide a systematic evaluation of the potential of cardiac circRNAs to sponge miRNAs and provide a resource for studying cardiac miRNA inhibitors.
Experimental validation of identified circRNAs
Due to lack of availability of human samples, we next sought to validate the conserved circRNAs with mouse orthologs that we identified above using mouse heart samples. We designed divergent primers for the murine orthologs of 6 identified human cardiac circRNAs (Additional file 11: Table S6) and performed PCR amplification using mouse heart cDNA as template. Mouse heart genomic DNA was used as a negative control and mouse Yap1 gene primers were used as an internal control for the PCR amplification of genomic DNA. All 6 of the selected circRNAs were successfully amplified in heart cDNA but not in genomic DNA (Figure 5A). Sanger sequencing of the PCR products verified back-spliced junctions (Figure 5B-G). Taken together, all the circRNAs examined were confirmed to be bona fide, suggesting the reliability of our in silico circRNA identification.