Structural analysis of the ACTR8 gene in various primates
We conducted a structural analysis of the ACTR8 gene in nine primates, including humans, which revealed that AluSz6 is located in the 7th intron region in antisense orientation through the NCBI genome database (Fig. 1A). ACTR8 gene is composed of 13 exons. The length of untranslated region (UTR) differs between species, but the open reading frame (ORF) region is highly conserved and encodes 624 amino acids. Remarkably, squirrel monkey ACTR8 gene has 12 exons and encodes a short protein of 616 amino acids. Transcripts containing AluSz6 were not found in all primates. Next, we performed genomic PCR to determine the integration time of AluSz6 using the nine primates genomic DNA samples. Amplicons containing AluSz6 were detected in all the primates studied, including hominoids (human, chimpanzee, and gorilla), Old world monkeys (rhesus monkey, crab-eating monkey, and African green monkey), New world monkeys (marmoset and squirrel monkey), and prosimian (ring-tailed lemur) (Fig. 1B). These findings indicated that AluSz6 integrated into the primate genome before the divergence of simian and prosimian lineages.
Alternative transcripts containing the Alu-derived transcript of the ACTR8 gene
To confirm the occurrence of AluSz6-derived exonization of ACTR8 gene in the crab-eating monkey, reverse transcription (RT-)PCR was performed using two validation primer pairs (V1 and V2) (Fig. 2A and Additional file 1:Table. 1). The V1 primers were designed to identify transcript variants and the V2 primers were used to detect the transcripts containing the Alu-derived exon. In total, seven transcripts were identified in the crab-eating monkey; the V1 primers yielded five transcripts and the V2 primers two (Fig. 2B). Sequence analysis of the transcripts revealed that the variants originated from multiple AS events, including exon skipping, alternative 3′ SS and 5′ SS, intron retention, mutual exclusion, and Alu-exonization (Fig. 2C) [5, 6]. The TV1 transcript skips exon 8 and 9a through alternative 3′ SS and is 19 bp longer than exon 9. The TV2 transcript has exon 7a and an AluSz6-derived exon, which were generated by mutual exclusion and Alu-exonization, respectively. TV3 and TV4 have the same AluSz6-derived exon, but carry exon 9 and exon 9a, respectively, through differential alternative 3′ SSs. TV5 is generated by simultaneous AluSz6 exonization and intron retention. TV6 has longer AluSz6-derived exon due to a differential alternative 5′ SS.
Generally, Alu-derived exonization transcripts exhibit tissue-specific expression patterns [28]. Therefore, we profiled ACTR8 gene expression in various tissues of the crab-eating monkey, including cerebellum, cerebrum, heart, kidney, lung, pancreas, spleen, and testis. Specific RT-PCR primers for each transcript variant were designed based on the splice junction (Fig. 2C). RT-PCR analysis did not reveal tissue-specific ACTR8 gene expression; the original transcript was ubiquitously expressed in all tissues evaluated, whereas the variants TV1-TV6 showed low or no expression (Fig. 2D). We further investigated ACTR8 gene expression in the cerebellum of other primates (humans, rhesus monkey, African green monkey, marmoset, and squirrel monkey) using RT-PCR with transcript variant-specific primers (Fig. 2E). In humans and Old world monkeys, all transcript variants showed expression patterns similar to that in the crab-eating monkey. Remarkably, in New world monkeys, only the original transcript was expressed.
Thus, the original transcript was ubiquitously expressed in all species studied, whereas transcript variants showed lineage-specific expression.
Lineage-specific ACTR8 gene transcript expression
Comparative sequence analysis of AluSz6-derived exon in nine primates was conducted by multiple sequence alignment (Fig. 3 and Additional file 1:Fig. S1). A novel ‘G’ duplication was found at the 5′ SS of AluSz6-derived exon in Old world monkeys and apes, providing a new canonical 5′ SS. In New world monkeys, this duplication was not present and hence, the canonical 5′ SS was not created.
In the squirrel monkey, exon 2 and 3 were found to be longer than in the other primates (Fig. 1A). Therefore, we examined the splice site of each exon. Interestingly, squirrel monkey has specific sequence, ‘TA’ acceptor splice site, whereas Old world monkeys and apes have no 5′ SS in this region (Additional file 1:Fig. S2). Marmoset and lemur have ‘TA’ acceptor splice site like the squirrel monkey, but we did not experimentally confirm that whether marmoset and lemur have longer exon 2 and 3 (Additional file 1:Fig. S2).
The TV2 transcript carrying exon 7a showed lineage-specific expression. The splicing sites (donor and acceptor site) were well conserved in all primates evaluated (Additional file 1:Fig. S1), and the branch point was analysed using SVM-BP finder as it may be caused by differences in the surrounding sequences. Multiple candidate branch points, including “TTATAAGAT”, were identified. This sequence was located 21 bp upstream of the 3′ SS of exon 7a (Additional file 1:Fig. S1). Old world monkeys and apes, but not New world monkeys and prosimians, acquired this branch point. Probably, a lineage-specific mutual exclusion exon, exon 7a, may have been spliced due to the branch point difference (Additional file 1:Fig. S1).
Protein structure analysis of various ACTR8 gene transcripts
To assess how the seven transcript variants identified in this study affect translation and protein function, in silico analysis was performed using an ORF finder (http://www.ncbi.nlm.nih.gov/projects/gorf/) and Pfam (https://pfam.xfam.org/). The original transcript encodes the full-length protein with 624 amino acids, including an ATP-binding site (amino acids (aa) 55-56, 288, and 290) and a nucleotide-binding site (aa 283-286) [29, 30]. The TV1 transcript encodes isoform 1 with 579 amino acids, TV2 encodes isoform 2 with 341 amino acids, and TV3-TV6 encode isoform 3 with 304 amino acids (Fig. 4). The truncated isoforms 2 and 3 lack the C-terminus due to exon 7a and a pre-termination codon created by AluSz6, respectively (Additional file 1:Fig. S3). Notably, the functional domains (ATP- and nucleotide-binding sites), which are the most crucial for ACTR8 function, are well preserved in TV2-TV6. According to a previous study, the N-terminal region of ACTR8 is critical for functional activity and N-terminal deletions have deleterious effects on the expressed protein, whereas deletions in the C-terminal region did not have such effects [29-31]. Based on our findings and previously reported experimental results [32, 33], we suggested that the ACTR8 gene can produce a lineage-specific protein by AluSz6 integration and subsequent splicing events.