RNA Sequencing to Explore Dominant Isoform Switch During CCR6+ Memory T Cell Activation

Alternative splicing (AS) is an essential, but under-investigated component of T-cell function during immune responses. Recent developments in RNA sequencing (RNA-seq) technologies, combined with the advent of computational tools, have enabled transcriptome-wide studies of AS at an unprecedented scale and resolution. In this paper, we analysed AS in an RNA-seq dataset previously generated to investigate the expression changes during T-cell maturation and antigen stimulation. Eight genes were identied with their most dominant isoforms switched during T cell activation. Of those, seven genes either directly control cell cycle progression or are oncogenes. We selected CDKN2C, FBXO5, NT5E and NET1 for discussion of the functional importance of AS of these genes. Our case study demonstrates that combining AS and gene expression analyses derives greater biological information and deeper insights from RNA-seq datasets than gene expression analysis alone.


Introduction
Alternative splicing (AS) rapidly converts the product of a single gene from one isoform to another 1 . Thus, AS is crucial for generating immediate biological responses, and allows one gene to encode instructions for making multiple proteins with distinct functions 2 . CD45 is a good example. Its transcript undergoes AS ( Supplementary Fig. S1) to generate proteins with disparate functions 3 . For example, T and B cells express separate isoforms which change as the cells undergo activation and differentiation 4 . The large CD45 isoforms expressed by naïve T cells are a stronger break on activation than the shorter CD45 isoform, CD45RO, which is expressed by memory T cells. AS in uences the risk of several diseases, including neurodegenerative disorders, cancer, immune and infectious diseases, cardiovascular and metabolic diseases 2,5−7 . Studying AS holds promise for the development of clinical biomarkers and novel therapies in this era of precision medicine 8 .
AS isoforms were historically characterized by reverse transcription polymerase chain reaction (RT-PCR) and expressed sequence tags (ESTs). Experimental approaches graduated to the genome-wide scale with the development of AS microarrays. These successfully identi ed AS across tissues, cellular states, and species 9,10 . However, these technologies have low throughput (RT-PCR and ESTs), high noise (ESTs and AS microarrays), or only capture known splicing events (RT-PCR and splicing microarray). More recently, RNA sequencing (RNA-seq) has improved the study of AS in several ways 11 . Compared with microarraybased transcriptome pro ling, RNA-seq has a wider dynamic range and avoids some of the technical limitations such as varying probe performance and cross-hybridization 12 , and provides novel insights into AS 13 . Computational tools such as MISO 14 , MAJIQ 15 , rMATs 16 and LeafCut 17 have been developed to detect both known and novel splicing events from RNA-seq data. Other tools such as RSEM 18 , Kallisto 19 , and Salmon 20 can be applied to analyze and quantify known or annotated transcript isoforms.
Vitting-Seerup et al characterized isoform switching from > 5,500 cancer patients' RNA-seq data covering 12 solid cancer types and identi ed many isoform switches as powerful biomarkers: 31 switches were highly predictive of patient survival independent of cancer types 21 . Their study indicates that isoform switches with predicted functional consequences are common and important in dysfunctional cells, which in turn means that expression change should be analyzed at the isoform level. Our previous study also demonstrated additional mechanistic insights can be gained through interrogation of AS in addition to conventional gene-level analysis of RNA-seq data 22  Our understanding of immune response regulation by AS is scant compared to oncology 24 . AS microarray pro ling identi ed extensive novel AS changes in activated T cells suggesting a key role for AS in regulating the mammalian immune response 25 . The types of genes controlled by AS during T-cell activation are different from those governed by changes in transcript levels; AS is associated with cellcycle regulation, whereas alterations in transcript abundance dictate changes in immune defense and cytoskeletal architecture 25 . Previously, we compared the transcriptomes of activated CCR6 + memory Tcells by RNA-seq and microarray. A plethora of differentially expressed genes were identi ed by both platforms, but RNA-seq was superior in differentiating biologically critical isoforms 12 . Prior research de ned transcriptional changes that regulate protein expression during T-cell maturation and antigen stimulation. Here, we build upon this work by using the same RNA-seq dataset to globally analyze AS during T-cell stimulation.

Materials And Methods
RNA-seq dataset.
The raw RNA-seq data from this study was downloaded from the NCBI sequence read archive under the accession number SRP026389, and described in detail elsewhere 12 . Brie y, human PBMCs was puri ed from a healthy donor, and then CD4+ memory T cells were puri ed from PBMCs through negative selection using the memory CD4+ T cell isolation kit (Miltenyi) followed by positive selection with anti-CCR6/biotin conjugates and anti-biotin magnetic beads (Miltenyi). Puri ed CCR6+ T cells were stimulated with anti-CD3 and anti-CD28 coated beads (Miltenyi). RNA was prepared from resting and stimulated T cells at different time points over a time course of 3 days. There was a total of six time points, with two biological replicates per time point (Fig. 1A).
Isoform quanti cation and switch.
RNA-seq based transcriptome pro ling was performed by the Illumina HiSeq™ 2000 platform. Raw sequencing reads were mapped by Salmon 20 0.12.0 to the human genome GRCh38 and Gencode Release 29, and then a counting of matrix of 200,000 (transcripts) x 12 (samples) was generated as inputs for detecting isoform switch. All transcripts that did not express across RNA samples were ltered out rst. Then those protein-coding genes with two to ve expressed isoforms were kept for further analysis. The protocol for isoform switch is depicted in Fig. 1B and 1C. First, the relative abundance ( Fig 1C) of different isoforms was calculated from their corresponding expression levels ( Fig 1B) at each time point. Then the most dominant isoform was identi ed at each time point if there exists a one. All isoforms are sorted in a descending order by their relative abundance, and the top isoform is the dominant one if the relative abundance difference between the top two isoforms is greater than 0.3. Otherwise the top one is not a dominant isoform. In Fig. 1C, all dominant isoforms are colored in red circle, and the size of circles represents relative abundance of individual isoform at each time point. It is noted that at 6 and 24 hours, there are no dominant isoforms because the expression levels for the top two isoforms are too close. The reason for us to check the difference between the top two isoforms is to ensure the top one is indeed dominant. Isoform switch occurs if the dominant isoforms differ across time points. The list of candidate genes with potential isoform switch will be further checked using the Omicsoft genome browser, and only those isoform switches with strong evidence (i.e. raw sequence read coverage) are reported.
Protein sequence analysis and comparison.

Results
The isoform quanti cation was performed by using the computational algorithm Salmon 20 . By applying the isoform switch protocol described in the Methods Section, we identi ed eight genes with the most dominant isoform switching during T cell activation (Table 1 and Supplementary Fig. S3). We wanted to determine the functional relevance of the isoform switches in Table 1. Consistent with previously published data linking AS to cell cycle regulation 25 , seven of these eight genes either directly control cell cycle progression or are oncogenes. CDKN2C inhibits the activity of the cyclin D-CDK6 complex thus blocking the G1-to-S phase transition 28 . CENPM is a necessary component of the centromere, and conditional deletion halts cell division leading to cell death 29,30 . FBXO5 controls multiple cell cycle transitions as well as homologous DNA repair [31][32][33] . Methylation of piRNAs by HENMT1 is required for transposable element repression during germ cell division 34 . NET1 directly sequesters the phosphatase Cdc14 to allow cyclin-dependent kinase activity throughout the cell cycle [35][36][37] . SHLD1 is a component of the shieldin complex responsible for non-homologous end joining in the TP53 DNA damage repair response 38,39 . UNG is the major nuclear glycosylase responsible for removing mutagenic uracil from DNA during base excision repair and is necessary for class-switch recombination in B cells 40,41 . We selected CDKN2C, FBXO5, NT5E and NET1 for further exploration. CDKN2C inhibits T cell proliferation in response to TCR stimulation by binding to CDK6. This block is overcome through CD28 costimulation 42,43 . In resting T cells, the isoform CDKN2C-202 is dominant.
Within two hours of activation through both the CD3 and CD28 pathways, levels of this isoform rapidly diminish. Then at 72 hours post-activation expression of the CDKN2C-203 isoform increases to predominate ( Fig. 2A). Based on these data we hypothesized that the CDKN2C-202 isoform lacked the domains necessary to inhibit CDK6 thus leaving non-terminal effector T cells primed for proliferation in response to TCR stimulation.
CDKN2C encodes ve ankyrin repeats 44 . The second and third form the inhibitory interface with CDK6 45 .
Alignment of the splice variants demonstrated that the CDKN2C-203 isoform has a single amino acid truncation of the fourth ankyrin domain and completely lacks the fth ankyrin domain (Fig. 2B). Thus, both CDKN2C isoforms possess the second and third ankyrin domains, which are necessary for CDK6 binding. Another hypothesis we explored was that the CDKN2C-203 variant encodes an unstable CDKN2C isoform. Truncation mutants of CDKN2C lacking the fth ankyrin domain appear unstable as this deletion construct produces little protein 44,46 . This CDKN2C isoform switch is likely important to CDK6 regulation and the generation of adaptive immunity while preventing lymphomas and in ammatory diseases 42,43 .
FBXO5 is encoded at the minus strand, and has two expressed protein coding isoforms, i.e. FBXO5-201 and FBXO5-202 (Fig. 3A). FBXO5-202 is the dominant isoform at early time points, but FBXO5-201 becomes the dominant isoform at 72hr (Fig. 3A). The high expression level of FBXO5-201 at 72hr is evident from the sequence read coverage pro le (Fig. 3B). Note FBXO5 also has a non-coding isoform FBXO5-203 that is barely expressed. After protein translation, FBXO5-201 is 46 amino acids longer than FBXO5-202 in the N-terminus. These 46 amino acids encode three key features: a putative signal peptide (Fig. 3C) and two potential Cdk phosphorylation sites. As a member of the F-box protein family, FBXO5 has several protein-protein interactions that could be affected by this isoform switch [31][32][33] . Despite a large body of work on FBXO5, the functional consequences of these three features are unknown.
FBXO5 is involved in the osteogenic differentiation of mesenchymal stem cells (MSCs) 47 . The expression of FBXO5 was upregulated after osteogenic induction in human periodontal ligament stem cells (hPDLSCs). FBXO5 knockdown attenuated migration, inhibited alkaline phosphatase (ALP) activity and mineralization, and decreased RUNX2, OSX, and OCN expression, while the overexpression of two transcript isoforms signi cantly accelerated migration, enhanced ALP activity and mineralization, and increased RUNX2, OSX, and OCN expression in hPDLSCs. It was concluded that both FBXO5-201 and FBXO5-202 promoted the migration and osteogenic differentiation potential of hPDLSCs, which identi ed a potential target for improving periodontal tissue regeneration. However, whether the two isoforms have different biological roles, especially during T cell activation, remains unclear.
The protein encoded by this gene is a plasma membrane enzyme that catalyzes the conversion of extracellular AMP to adenosine. The encoded protein is used as a determinant of lymphocyte differentiation. Defects in this gene can lead to the calci cation of joints and arteries. The two CCDSvalidated transcripts of NT5E are NT5E-201 and NT5E-203, which differ with respect to the presence of exon 7 in NT5E-201 (Fig. 4). NT5E-201 encodes canonical CD73, denoted as CD73L, while the NT5E-203 transcript is predicted to encode a shorter protein CD73S. Human CD73S lacks amino acids 404-453, encoded by the missing exon 7. The dominant isoform is NT5E-203 at 2 and 4hr, but it is NT5E-201 at 0 and 72hr (Fig. 4). CD73S was expressed at low abundance in normal human tissues but was signi cantly up-regulated in cirrhosis and hepatocellular carcinoma (HCC) 48 . These two human isoforms exhibited functional differences, such that ectopic expression of canonical CD73L in human HepG2 cells was associated with decreased expression of the proliferation marker Ki67, whereas CD73S expression did not have an effect on Ki67 expression 48 . CD73S was glycosylated, catalytically inactive, unable to dimerize, and complexed intracellularly with the endoplasmic reticulum chaperone calnexin. Furthermore, CD73S negatively regulates CD73L activity and protein expression in a proteasome-dependent manner 48 .
It remains unclear the roles of CD73L and CD73S in T cell activation, though new data suggest that CD8 + CD73 + T cells may be especially important mediators of immunosuppression in human head and neck cancer 49 .
The gene NET1 is part of the family of Rho guanine nucleotide exchange factors. Members of this family activate Rho proteins by catalyzing the exchange of GDP for GTP. The protein encoded by this gene interacts with RhoA within the cell nucleus and may play a role in repairing DNA damage after ionizing radiation. Alternative splicing results in multiple transcript variants that encode different protein isoforms. Compared with NET1-202, the expression for NET1-201 is low at early time points but increases signi cantly and become the dominant isoform at 24 and 72hr. NET1-201 and NET1-202 display distinct exon usage in their 5' ends (Fig. 5A). We performed Clustal Omega alignment of the predicted protein sequences of NET1-201 and − 202, and found that NET1-202 completely lacks the rst nuclear localization sequences (NLS) in its N-terminus and that there is poor conservation of the second NLS (Fig. 5B). These NLSs are functionally important as oncogenic NET1 lacks the N-terminal 145 amino acids encoding the NLS, and deletion of the two NLS redistributes NET1 from the nucleus to the cytoplasm 50,51 . The dominance of the NLS-containing NET1-201 splice variant at 24-and 72-hours poststimulation suggests that nuclear sequestration of Cdc14 by NET1 is important for TCR-driven T cell proliferation and maximally effective adaptive immune responses.
The regulation of the two isoforms of NET1 by transforming growth factor-β (TGF-β) in keratinocytes has been studied, and the results emphasize the importance of NET1-202 in the short-and long-term TGF-βmediated regulation of epithelial-to-mesenchymal transition (EMT) 52 . It was found that short-term TGF-β treatment selectively induced NET1-202 (also termed as Net1A) but not NET1-201. Interestingly, long-term TGF-β treatment resulted in Net1A protein degradation by the proteasome. Silencing of Net1A resulted in disruption of E-cadherin-and zonula occludens-1 (ZO-1)-mediated junctions, as well as expression of the transcriptional repressor of E-cadherin, Slug and the mesenchymal markers N-cadherin, plasminogen activator inhibitor-1 (PAI-1) and bronectin, indicating that late TGF-β-induced downregulation of Net1A is involved in EMT. In conclusion, this study provides new evidence for the differential regulation of the two isoforms of the RhoA-speci c GEF NET1 by TGF-β. It points out differential regulatory effects of TGF-β on the NET1A isoform, depending on the duration of the signal 52 .

Discussion
Manual check to detect reliable isoform switch.
The accuracy of isoform quanti cation is in uenced by the complexity of gene structures and caution must be taken when interpreting quanti cation results for short and complex isofroms 53 . It was also discovered that both sequencing depth and the relative abundance of different isoforms affect quanti cation accuracy. Considering the inaccuracy and uncertainty in isoform quanti cation, we manually check all reported isoform switches and lter out those false positives. SFNX3 (sidero exin 3) has seven isoforms and SFXN3-201 and SFXN3-202 are the two mainly expressed forms ( Supplementary  Fig. S4B). According to Supplementary Fig. S4A there is an evident isoform switch, but this switch is questionable considering the two isoforms SFXN3-201 and SFXN3-202 are nearly identical. As a matter of fact, SFXN3-202 is only 5 bp longer than SFXN3-201 at the 5' end of UTR (Exon #1), and virtually indistinguishable. Therefore, the reported isoform switch is most likely a false positive due to unreliable isoform quanti cation. Another example of a false isoform switch is shown in Supplementary Fig. S5. RAB43-201 is the dominant isoform at early time points, while RAB43-208 becomes the dominant isoform at 24 and 72 hr. Unfortunately, the high expression level of RAB43-208 at 24 and 72 hours is not supported by the read coverage pro les in Fig. S5B since no sequence reads are mapped to the unique exon #3 of the isoform RAB43-208.
Isoform switching in human T cells.
There Coordinated regulation of T cell proliferation through alternative splicing.
Proliferation is a crucial part of antigen-speci c adaptive immune responses 56,57 , however T cell proliferation must be tightly controlled to allow optimal protective immunity while preventing excessive in ammatory destruction and leukemia/lymphoma. Consistent with a prior publication 25 , seven of the eight genes that demonstrated stimulation-driven isoform switching through alternative splicing in this study regulate proliferation. Regulation of proliferation by alternative splicing and isoform switching is consistent with at least two alternative, but non-exclusive, hypotheses. First, one isoform contributes to cell cycle regulation while the other isoform is associated with normal functions of the cell. This appears to be true for NET1, where the switch from NET1-202 to NET1-201 at 24-and 72-hours post-stimulation ts with a functional refocusing of the T cell from migrating and scanning antigen-presenting cells to proliferation. Without NLS, NET1-202 is likely split evenly between the nucleus and cytoplasm where it interacts with RhoA to control cytoskeletal reorganization [58][59][60] . Combined signaling through the TCR and CD28 initiates T cell proliferation which is facilitated by Cdc14 sequestration of NET1-201 in the nucleus 35-37, 50,51 . So at least some of these alternative splicing events are likely due to the need to rapidly switch between functionally distinct protein isoforms encoded by the same gene during T cell activation.
A second hypothesis is that alternative splicing allows faster protein production than de novo transcription. This may be the case for CDKN2C. If the CDKN2C-202 isoform is unstable it may be a relatively weak Cdk6 inhibitor thus allowing earlier cell cycle progression or proliferation at lower TCR/costimulatory signaling thresholds. As activated T cells progress through the cell cycle, Cdk6 inhibition by CDKN2C must be removed hence the low levels of both isoforms from 2 to 24 hours-post stimulation. The isoform switch to the putatively more stable CDKN2C-201 at 72 hours post-stimulation may facilitate the transition from T cell proliferation to differentiation or be part of terminal effector T cell differentiation.
Rapid isoform switching through alternative splicing of CD73 is likely necessary for T cell activation and effective adaptive immune responses.
Adenosine generation by CD73 plays both autocrine and paracrine roles in T cell activation 61 . The A2a, an inhibitory adenosine receptor, is the predominant form expressed by T cells. Signaling through the A2a adenosine receptor inhibits T cell proliferation at least partly by limiting IL-2 production 62,63 . Dendritic cells (DC) also express A2A and A2B adenosine receptors, and signaling through the A2B receptor blocks DC maturation and co-stimulation of T cells 64 . Thus, the switch in alternative splicing of CD73 during T cell activation ts with the immediate need for proliferation and differentiation. Upon recognition of cognate antigen, T cells cease migrating and form an immunological synapse with the presenting DC 65,66 . Increased expression of a catalytically inactive CD73 that complexes with and promotes degradation of CD73L through isoform switching would provide a rapid method of clearing this inhibitor of proliferation and differentiation 48 . As T cell activation is a stepwise dialogue between DC and T cell, quickly preventing suppressive adenosine accumulation in the microenvironment around DC-T cell pairs is likely critical to costimulation by CD86, IL-2 production and effective adaptive immune responses. Support for this hypothesis comes from cancer, where adenosine is a key component of suppressing the anti-tumor immune response 66,67 .

Declarations Authors Contributions
SZ conceived and designed this study. SZ performed the RNA-seq data analysis and drafted the manuscript. AMSB and KD participated in biological interpretation of isoform switching genes and in writing the manuscript. All authors approved the nal manuscript.