Our initial study found that the intron/exon’s ratio was significantly higher in CLL than NBCs (Supplementary Figure-1). The observations were quite intriguing, so we decided to test our observation in a large cohort of CLL and NBC samples. For the same, we analyzed the RNA-Seq data obtained from EGA(26).
Classification of RNA-Seq data
The immunoglobulin heavy chain variable region genes (IGHV) were available for 97 (99%) CLL samples. Forty-one had CLL cells expressing unmutated-IGHV (HR) with 98% homology to the germline IGHV, whereas fifty-six had CLL cells expressing mutated IGHV genes (LR). One sample without known IGHV mutational status was excluded from the analysis (Supplementary table-1). Eighty-five patients had CLL cells harboring Wt-SF3B1, whereas nine patients had CLL cells carrying the Mut-SF3B1 gene. The SF3B1 mutational status was not available for four samples, so these samples were excluded from the analysis (Supplementary table-2).
Identification of IR events in CLL vs. NBC samples
The transcriptome-wide data was analyzed using TPM analysis. The ranscripts were expressed in at least 60% of both NBCs and CLL samples were subsequently analyzed for IR analysis. The global distribution of IR for 14811 transcripts is shown in the volcano plot of Figure 2. The lists of transcripts are provided in Supplementary Table-3. Also, an overall pattern for intron/exon ratio for all CLL cells (regardless of their subtypes) vs. NBC has been shown in Supplementary Figure-2A. The scatter plot of CLL cells vs NBCs presents overall distribution of IR (Supplementary Figure 2A) where we observed that there were 20906 transcripts (61%) above the regression line and 13894 (39%) below the regression line, indicating that a large majority of transcripts had >1 intron/exon ratio in CLL cells as compared with NBC, consistent with our previously published study on CLL(18).
The volcano plot created to schematize the differential IR between two cell types gives a global view of expression level and IR. The upper right quadrant in Supplementary Figure 2B represents values positive for both quantities.
The isoforms plotted in this quadrant were interpreted as more expression and more IR in CLL cells. The lower right quadrant was for positive values of the “log2 of CLL cells (intron/exon) / NBC (intron/exon)” and low values of the “log10 (CLL cells (transcript expression) / NBC (transcript expression))”. This quadrant showed isoforms with more IR in CLL cells but with lower expression. The upper left quadrant shows negative values of “log2 of CLL cells (intron/exon)
/ NBC (intron/exon)” and positive values of the “log10 (CLL cells (transcript expression) / NBC (transcript expression))”. It showed more IR in NBC but more expression in CLL cells. Finally, the lower left quadrant was opposite of the right upper quadrant. It shows more expression and more IR in NBC than in CLL cells (Supplementary Figure 2B). The volcano plots making a comparison between CLL cells harboring Mut-SF3B1 and NBC (Figure 2B), CLL cells harboring Wt-SF3B1 and NBC (Figure 2C), CLL cells harboring Mut-SF3B1 and CLL cells harboring Wt-SF3B1 (Figure 2D) and finally M-CLL cells and U-CLL cells (Figure 2E) have been interpreted similarly.
Further, the scatter plot of Figure 2F shows the comparison between IR ratio on the x-axis (Log2 CLL intron / NBC intron) and exon ratio on the y-axis (Log2 CLL exon / NBC exon), for all transcripts between CLL cells vs NBC.
Association between intron retention and transcript expression in CLL
cells To assess the transcripts’ global association of IR and expression of the transcripts, we first took the complete 25276 reference transcripts. We classified them based on levels of IR and expression of the transcripts in CLL cells and NBC using a one-sided Wilcoxon test (see Methods). Using this, we found six sets of distinct transcripts: set-IA, set-IB, and set-IC, corresponding to transcripts contributing to high IR with upregulation, high IR with downregulation and high IR with non-differential expression in CLL. Similarly, set-IIA, set-IIB, and set-IIC correspond to transcripts contributing similarly to three sets in NBC. We found a total of 16725 transcripts contributing to differentially intron-retaining transcripts in CLL cells and NBC, among those, 11969 (set-I=71%) were significantly high intron-retained in CLL cells, and 4756 (set- II=29%) were significantly high used in NBC with p-value < 0.05 (FDR=5%). Within the 11969 transcripts, the set-IA
contains 10436 (87%), set-IB contains 188 (2%) and set-IC contains 1345 (11%). We constructed a 2x3 contingency table (Figure 2A) and performed a Chi-square test. The Chi-square test value was highly significant with a p-value < 1.0e-300. The data presented in the bar graph (Figure 3A), show overexpression of high and low intron-retaining transcripts in CLL and NBCs. We plotted set-IA, set-IB, set-IIA and set-IIB, performed Fisher’s exact test and observed the result with p- value< 1.0e-300 and ODDs=863. These results suggest that in CLL cells the IR contributes positively to the up-regulation of corresponding transcripts.
To study the sample-wise association between the IR and transcript expression, we sorted the set-I (11969 transcripts), ranked them by p-value from lowest to the highest order (Figure 3A), and selected the top 200 transcripts (see Methods and Supplementary Information). We recorded values for these 200 transcripts; for IR and transcript expression observed in all CLL cells and NBC samples. We segregated the CLL samples based on IGHV mutational status in order to understand the association with disease prognosis. An overall sample- wise IR in CLL cells was positively associated with the corresponding transcripts sample-wise expression as compared to IR and transcript expression in NBCs (Figure 3B & 3C). However, the association between IR and transcript expression observed, comparing unmutated-IGHV CLL cases (HR) with NBC’s had a statistically higher significance (p<0.05) than the association kept comparing mutated-IGHV CLL cases (LR) with NBCs. Further, the degree of association was increased case of IGHV unmutated (poor prognosis) cases.
Analysis of Biological Pathways and Molecular Functions Altered Due to IR between CLL vs. NBC in transcripts with maximum IR
Next, we analyzed the top 25% transcripts from Figure-1A with high IR in CLL compared with normal B-cells (Supplemental table 4). We observed everal different pathways affected by IR Figure 4A), including extracellular matrix including collagen organization, inflammation mediated by chemokine and cytokine, Hemostasis, antigen processing (T-cell activation), immune-regulatory interactions between a lymphoid and non-lymphoid cell, and cellular response to stress. However, we found RNA processing, splicing, and gene expression pathway as the major ones where 30% of the transcript input belongs to this pathway. In summary, the IR containing transcripts were enriched for biologically important pathways in CLL cells.
We analyzed the top 25% transcripts for molecular function (MFs) and assessed the effect of IR on molecular function in CLL cells. We observed several different MFs affected due to high IR in CLL/NBC (Figure-4B). Among the top MFs were protein binding, ion binding, nucleic acid binding, hydrolase activity, and transferase activity.
We analyzed RNA-Seq data obtained from 98 CLL patients and 9 NBCs sing Reactome Functional Interaction (FI) network database, a highly reliable, manually curated pathway-based protein functional interaction network; this analysis allowed us to obtain protein interaction data for 9821 genes. Protein interaction networks were used to assign sets of genes to discrete subnetworks.
Cytoscape V3.4 software enables us to identify the first neighbors of SF3B1 and visualize the subnetwork centered on SF3B1. In this subnetwork, the nodes’ color of the nodes represents the log2 ratio of IR in CLL cells versus in NBCs. (Figure-4C).
Furthermore, the network analysis provided insight into the pathways that emerged due to high IR in NBC: the immune system, metabolism of RNA, DNA replication, cell cycle, and metabolism of proteins (Supplementary Figure-3). When analyzed in CLL cells, the network analysis provided insight into the pathways that emerged due to high IR in CLL cells: immune system, homeostasis, and signal transduction (Supplementary Figure-4).
Validation of candidate transcripts for IR in CLL cells and NBC
To validate our findings from the high-throughput RNA-Seq data, we chose six transcripts with high IR in CLL cells and selected a region showing the event; the area, including the intron, was amplified, and we found that in all CLL cells but no in NBC the intron was present.
The six transcripts included novel as well as known transcripts that have been described previously in CLL cells. One of the transcripts for the CTLA4 gene has been shown for the intronic region retained in CLL cells but not in NBC (Figure 5A). The six transcripts that were validated using RT-PCR were: cytotoxic T-Lymphocyte associated protein 4 (CTLA4), androgen-dependent TFPI regulating protein (ADTRP), fibromodulin (FMOD), heparin sulfate-glucosamine 3- sulfotransferase 1 (HS3ST1), guanylate cyclase 2C (GUCY2C), ribosomal protein L39 like (RPL39L) (Figure 5B). The IR ratio for CTLA4 was 29.5 in CLL cells as compared to NBC. We were able to amplify the intronic region in all CLL cells but not detected in NBC.
We validated C6orf105, which was later designated as the ADTRP gene (Figure 5B). The IR ratio between CLL cells and NBC was 25.6, indicating a significantly higher IR in CLL cells compared with NBCs. Another known gene was (FMOD), for which we found a significant increase in IR in CLL cells compared with NBC. Previously, FMOD was overexpressed in CLL compared with NBC (44, 45). The ratio of IR in CLL cells /NBC was 23.4, and it was also validated in CLL cells (Figure 5B). Overall, we observed aberrant splicing and in particular high IR in CLL cells compared to NBC in ~73% of the transcripts.
Up-regulation of total SF3B1 and pSF3B1 in CLL Cells
A large number of studies in solid tumors and hematological malignancies reported aberrant regulation of spliceosome complex. Since, SF3B1 is one of the significant spliceosome complex proteins associated with leukemia, we reasoned to study SF3B1, one of the spliceosome complex’s significant proteins(46). We found that the expression of SF3B1 and pSF3B1 was significantly higher in CLL cells compared with NBC’s (p<0.0001 and p<0.01, respectively) (Figure 6A). However, no significant difference was observed between SF3B1 and pSF3B1 expression levels in NBCs.
Effect of Anti-IgM on SF3B1 and pSF3B1 expression in CLL cells and NBC
We tested whether SF3B1 and pSF3B1 expression could be modulated by anti- IgM stimulation. After in vitro stimulation of CLL cells with anti-IgM for 15 minutes, no difference in SF3B1 expression was observed between IgM stimulated and un-stimulated; the same effect was observed between stimulated and un-stimulated NBCs. Also, no difference in SF3B1 expression was observed we compared CLL cells; we made a comparison between Mut-SF3B1 and Wt- SF3B1 (Figure 6B). But, when we compare both groups of CLL cells (Mut-SF3B1 and Wt-SF3B1) with NBC, higher expression of total SF3B1 was evident in CLL (p-value<0.05).
When the expression of pSF3B1 levels was observed between Wt-SF3B1 vs. Mut-SF3B1 CLL samples, no differences were observed; however, in both cases, there was a significant rise in protein expression after IgM stimulation. The pSF3B1 levels in both Wt-SF3B1 and Mut-SF3B1 had a marked overrepresentation compared with NBC in both treatments, with IgM stimulation or without stimulation.
Modulation of Expression of SF3B1 and effect on post-translational modification upon macrolide probe treatment in CLL cells
Since we observed IR in more than >70% of the transcripts, we hypothesized that the CLL spliceosome machinery could have an operation defect that makes it does not complete the intron removal function as it should. Due to the lack of an ideal method to measure spliceosome activity in cells, we reasoned to use a previously reported macrolide molecule PLAD-B to modulate the splicing factor subunit SF3B1 (a driver protein for spliceosome activity) and to determine its phosphorylation. In particular, we interrogated the threonine amino acid site, which gets phosphorylated at site Thr313, reported to indicate active spliceosome(47).
We treated CLL cells with 100 nM of PLAD-B for 15, 60, 180, 360, or 960 minutes and interrogated for expression of total SF3B1 and pSF3B1. We found no significant change in total SF3B1 expression between untreated and treated CLL cells over time. However, in CLL cells, the level of pSF3B1 significantly decreased after 15 minutes of PLAD-B treatment. In contrast, the protein level of total-SF3B1 does not change, suggest that the PLAD-B could control pSF3B1 form, and in turn, modulate the IR program of the transcriptome of CLL cells (Figure 6C). In contrast, no change in the expression of SF3B1 and pSF3B1 was noticed upon treatment with 10 µM of Fludarabine (F-ara-A), a conventional chemotherapy agent used in the clinic for CLL patient’s treatment.