Phylogenetic comparison and splice site conservation of the animal SMNDC1 gene family

doi:10.21203/rs.3.rs-3896856/v1

Download PDF

Article

Phylogenetic comparison and splice site conservation of the animal SMNDC1 gene family

https://doi.org/10.21203/rs.3.rs-3896856/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Alternative splicing is the process of multiple mRNAs from a single pre mRNA under the action of the spliceosome and other splicing factors. SMNDC1 (survival motor neuron domain containing 1) has been identified as a constituent of the spliceosome complex. Previous studies indicated that SMNDC1 is required for splicing catalysis in vitro and regulates intron retention in cancer. However, the phylogenetic relationships and expression profiles of SMNDC1 have not been systematically studied in the animal kingdom. To this end, in our work, the phylogenetic analysis of SMNDC1 genes was widely performed in the animal kingdom. Specifically, a total of 72 SMNDC1 genes were identified from 66 animal species. Bioinformatics analysis showed that the gene structure and function of SMNDC1 proteins are relatively conserved, and only a few members have two copies. In particular, the human SMNDC1 gene is highly expressed in multiple cancer types, including breast cancer, colon cancer and rectal cancer, indicating that SMNDC1 may play an essential role in cancer development and may be used as a valuable diagnostic or therapeutic protein target in clinical treatment. In summary, our findings facilitated a comprehensive overview of the animal SMNDC1 gene family, and provided a basic data and potential clues for the further study of molecular functions of SMNDC1.

Biological sciences/Genetics

Biological sciences/Genetics/Rna splicing

Alternative splicing

phylogenetics

splicing factor

splice site selection

SMNDC1

Most genes in higher eukaryotes are composed of exon and intron intervals. Gene splicing is the process of removing introns and joining exons of genes to generate mature mRNA. Alternative splicing (AS) is the process of selecting different combinations of splice sites, which leads to the generation of multiple mRNAs from one pre-mRNA [1]. AS greatly enriches proteomic structural and functional diversity by producing multiple proteins from a single gene. Meanwhile, these protein isoforms may differ in properties such as enzymatic activity, subcellular localization, and ligand binding. Previous studies have shown that more than 90% of human genes experience AS events [2–5]. Increasing evidence has shown that AS is essential for normal biological processes, such as hematopoiesis [6], brain development [7], and muscle function [8]. Furthermore, it also plays an important role in the occurrence and development of various diseases, including Duchenne muscular dystrophy, spinal muscular atrophy, beta-thalassemia, myotonic dystrophy, isolated growth hormone deficiency type II and Frasier syndrome [9–15].

AS is performed by the spliceosome, which contains five kinds of snRNPs (small nuclear ribonucleoprotein particles, U1, U2, U4, U5, U6) and a variety of non-snRNPs splicing factors [16, 17]. For instance, SMN (survival motor neuron) belongs to part of the small nuclear ribonucleoprotein (snRNP) complex in the cytoplasm, and is responsible for pre-mRNA splicing [18]. SMN not only has affinity for Sm ribonucleoproteins that form a ring involved in the splicing process [19], but is also implicated in binding methylated arginines [20]. Two highly homologous SMN genes were identified in the human genome, namely, SMN1 and SMN2. Meanwhile, mutation of SMN1 is the main cause of spinal muscular atrophy (SMA) [21].

SMNDC1 (survival motor neuron domain containing 1) is a paralogue of the SMN1 gene, which is implicated in pre-mRNA splicing, and has been identified to play a critical role in spliceosome assembly of the nucleus [20, 22–24]. Importantly, SMNDC1 can also regulate the splicing efficiency. For instance, low SMNDC1 poison exon inclusion was associated with notably widespread reductions in intron retention [25]. In addition, the SMNDC1 poison exon controls SMNDC1 expression to modulate pancancer intron retention [25]. Meanwhile, the biological functions of SMNDC1 have been shown to play an important transcriptional role in skeletal muscle, the adult brain and the spinal cord [26]. Additionally, SMNDC1 mRNA has been identified as a potential target of fragile X mental retardation protein (FMRP), whose loss of expression leads to fragile X syndrome [27]. All of these studies demonstrate that SMNDC1 plays important roles in human disease. On the other hand, SMNDC1 is also known as survival of motor neuron-related splicing factor 30 (SPF30), which has been predicted by bioinformatics to form the evolution and alternative splicing profile analysis of the splicing factor 30 (SPF30) in plant species [28].

To this end, in our present work, we identified and analyzed the phylogenetic relationship of the SMNDC1 gene family in different animal species. Subsequently, the gene structure, protein domains, and conserved splicing patterns were elucidated, and their expression patterns in different tissues and different diseases were discussed. This study explored the potential functions of SMNDC1 s to provide theoretical support for further functional studies.

Identification of SMNDC1 genes in animal and construction of a phylogenetic tree

To explore the functional differentiation of SMNDC1 family genes, in our work, 72 SMNDC1 protein sequences from 66 animal species were subjected to protein domain alignment analysis by the online software SMART. Specifically, 23 Primates, 19 Rodents and lagomorphs, 9 Fish, 2 birds and Reptiles (Anole lizard and Chinese softshell turtle), and one other animal (Lamprey) were identified. Specifically, 60 out of 66 species, including humans and all fish, have only one SMNDC1 gene, while 6 species have two copies of SMNDC1, including Angola colobus, Mouse Lemurs, Pig-tailed macaques, Rabbits, Golden Hamsters and Pigs. In addition, all animal species do not have more than 3 copies of the SMNDC1 gene, which is inconsistent with the results in plants. For example, Kalanchoe laxifora has four SPF30 (SMNDC1) genes and Triticum aestivum (wheat) possesses 3 copies [28].Furthermore, in order to understand the evolutionary history and phylogenetic relationships among the above identified SMNDC1 genes, a phylogenetic tree was constructed using the Bayesian method based on the amino acid sequences of 72 SMNDC1 members from 66 animal species (Fig. 1). From multiple transcript isoforms of one gene, the gene with the longest protein-coding sequence was selected as a representative. Bootstrap values are presented as a color gradient at the branches. Species from different taxonomies are marked with different colors. The tree grouped into four major clades including primates, rodents and lagomorphs (purple), birds and reptiles (pink), fish (light blue) and other mammals (green). Not surprisingly, genes from phylogenetically related animal species tend to cluster together in the tree. For example, the SMNDC1 gene from primate species including Homo sapiens and its close relatives belongs to a unique monophyletic group (Fig. 1). Taken together, the four main clades clustered reflect general animal phylogeny. Furthermore, the lengths of the branches indicate evolutionary distances between organisms, while the clear topology indicates the validity of the phylogenetic reconstruction of the SMNDC1 gene family in animals. The high-precision phylogenetic tree constructed by the current study can provide the basis for subsequent bioinformatics analysis.

Analysis of protein domain/motif

To further study the conservation of the animal SMNDC1 gene, a detailed analysis of its protein domains and conserved motifs was performed. The SMNDC1 proteins of 66 representative animal species were further aligned and used to construct a phylogenetic tree (Figs. 1,2). According to the results, the length of the identified SMNDC1 proteins from all animal species was characterized in a range of 178 to 288 amino acids. Most SMNDC1 proteins are approximately 238 amino acids in length (Table S2). Moreover, all SMNDC1 proteins have a characteristic central SMN domain. Specifically, the size of the conserved SMN domains was kept strictly at 60 amino acids. Moreover, the conserved motifs of animal SMNDC1 proteins were predicted by the MEME online tool. In detail, the top ten conserved motifs are illustrated in colored boxes, which cover most areas of the protein (Fig. 2 right panel). The vast majority of animal SMNDC1 sequences, including those of humans, contain 10 conserved motifs. Furthermore, the SMN domain was mainly concentrated in the middle 5 motifs (Fig. 2 right panel). Interestingly, animal species SMNDC1 with two copies have some differences in their motifs, one with 10 motifs, the other with less than 10 or with differences in motifs. For instance, ENSCANT00000039560.1 in Angola colobus has 10 motifs, while ENSCANT00000045349.1 has only 8 motifs, which implies potential functional diversification.

Interaction Networks of SMNDC1

The crystal structure of human SMNDC1 is presented here (Fig. 3). The aromatic cage in the Tudor domain of SMNDC1 mediates dimethylarginine recognition through cation-π interactions with five important residues of the aromatic cage (Trp83, Tyr90, Phe108, Tyr111, and Asn113), as shown in Fig. 3. In details, Tyr90 and Asn113 were highly conserved at ConSurf Grade 9. Trp83 was conserved at ConSurf Grade 8. Phe108(98.571%) and Tyr111 (97.143%) were conserved at ConSurf Grade 6 and ConSurf Grade 4.

Since the SMNDC1 protein interaction network of SMNDC1 proteins may further reveal its involvement in various biological processes. In our study, to investigate the functional relationship between SMNDC1 and other proteins, the webtool STRING was used to construct the protein interaction networks of animal SMNDC1. Based on experiments and databases, three representative SMNDC1 protein sequences of human, mouse and Schizosaccharomyces pombe (yeast) were selected to generate an interaction network (Fig. 4). The resulting networks of human, mouse and yeast SMNDC1 networks grouped 10, 10 and 5 functional partners, respectively. In detail, the interacting proteins of human SMNDC1 can be divided into three categories: small nuclear ribonucleoprotein (SNRNP200 and SNRPB), splicing factor (SF3A3, SF3A2, SF3B2, SF3B4, SF3B5 and SF3B6) and pre-mRNA processing factor (PRPF6 and PRPF3). However, except the three interacting proteins described above, mouse SMNDC1 also interacts with U6 small nuclear RNA and mRNA degradation-associated protein (Lsm5 and Lsm6) and RNA-binding motif protein (Rbmx). Interestingly, the yeast SPF30 interacting protein is quite different from human and mouse, mainly including prp1 (U4/U6 x U5 tri-snRNP complex subunit Prp1), sap62 (zinc finger protein Sap62), itr2 (MFS myo-inositol transporter), dis3 (putative 3'-5' exoribonuclease subunit Dis3) and swi6 (chromodomain protein Swi6). In addition, we found that many interacting proteins of mammalian SMNDC1 have no apparent homologue in Schizosaccharomyces pombe, for example, splicing factor (SF3A3, SF3A2, SF3B2, SF3B4, SF3B5 and SF3B6) and pre-mRNA processing factor (PRPF6 and PRPF3). Taken together, the specific interaction studies and further functional verification of SMNDC1 may reveal its involvement in various biological processes.

Analysis of gene structure and conserved motifs

To further explore the conservation of gene structure and motif composition at the genome level, the longest SMNDC1 gene transcript of each coding sequence (CDS) was chosen for analysis (Fig. 5). According to the results, different genomic structures were observed, with the number of total exons ranging between two and seven. In most primates, the number of exons remains at five, moreover, the white partridge ENSMLET00000055620.1 has seven exons, while ENSMNET00000044298.1 and ENSMICT00000046893.2 have three exons, and ENSCANT00000045349.1 has only one exon. In fish, the number of exons remained stable at five and six. In summary, the SMNDC1 gene with five exons in the CDS accounts for approximately 89% of the total (Fig. 5 and Table S2), including SMNDC1 genes from representative species human and rodent and rabbit IDs. Among the 72 SMNDC1 family genes, 43 sequences had 5 exon-4 intron gene structure layouts, accounting for 59.7% of the total number of members. Twenty-two members had 6 exon-5 intron gene structure layouts, accounting for 30.5% of the total number of members. Additionally, ENSOCUT00000012273.2 and ENSMLET00000055620.1 possess the most exons with exon 7, while ENSCANT00000045349.1 and ENSSSCT00000039370.2 have the fewest exons with exon 2. Furthermore, ENSMNET00000044298.1, ENSMICT00000046893.2 and ENSMAUT00000006691.1 have 3 exons. Among all members with 6 exons and 5 introns, except one SMNDC1 from Armadillo (ENSDNOT00000016385.2), the other members all have an extra exon that was not a coding exon. Notably, there are two sequences of SMNDC1 genes from 6 species, which have different gene structures; for example, two sequences from Sus scrofa were found, one of which has 2 exons (ENSSSCT00000039370.2), and the other contains 5 exons (ENSSHAP00000001057.1). Collectively, the differences in the exon-intron distribution patterns of SMNDC1 among the above animal species, indicate that the structural changes of genes may be involved in the evolution of the gene family in the phylogeny of general animals. Furthermore, SMNDC1 in the same branch has obvious similarities in gene structure, indicating that they have a close evolutionary relationship. Based on the differences in gene structure between SMNDC1 genes, we further used MEME to determine whether there were differences in motif composition in their cDNA sequences. As shown in the results, the 10 most conserved motifs were identified from the cDNA sequence of SMNDC1 (supplementary Fig. 6, right panel). Overall, over half of the SMNDC1 sequences contained 10 conserved motifs. The motif position and number of the SMNDC1 gene in most animals showed little difference among primates, rodents and lagomorphs, other mammals (purple) and other vertebrates (pink). Interestingly, there were few differences between the observed motifs of SMNDC1 sequences with two different gene structures in one species. For example, two sequences from Oryctolagus cuniculus were found, one containing 9 motifs (ENSMICT000000041763.2) and the other containing 8 motifs (ENSMICT 000000046893.2). In conclusion, by comparing the conserved motifs at the RNA/cDNA and protein levels, it is found that the codon usage, number and similarity of these homologues are not different. The location of these motifs indicates the preservation of animal SMNDC1 between different proteins and cDNA. In addition, the comparison of cDNA showed that no conservative motif was found in the untranslated region, and the region was enriched with regulatory elements, which provided additional information for the conservative regulatory mechanism among these SMNDC1s.

Transcript Isoforms and Conserved Splice Site Analysis

To investigate the splicing patterns and conserved splicing sites of the animal SMNDC1 family genes, we performed an AS analysis of the animal SMNDC1 genes. According to the results, a total of 36 transcript isoforms from 15 animal SMNDC1 genes were summarized from the Ensembl database and linked to the phylogenetic relationships among selected species (Fig. 6). In particular, SMNDC1 in Rattus norvegicus and Mus musculus have the most numbers of isoforms, possess five transcript isoforms, while in the other 13 animals SMNDC1 contains two transcripts. In addition, conserved protein motifs were identified from potential protein products of the above transcript isoforms by using MEME (Fig. 6 right panel). From the results, the location of splicing is mainly located on the SMN protein domain. Meanwhile, the primary transcript has the longest peptide sequence and the most conserved motifs, while the spliced transcript has a shorter protein length and contains fewer motifs. In addition, alternative splicing types of SMNDC1 in 15 animal species are mainly alternative 3′ splice and alternative 5′ splice. Meanwhile exon skipping in Rattus norvegicus and Mus musculus was also detected. Furthermore, conserved splicing sites or conserved sequences were identified. Flanking sequences (31 bp in total) of animal SMNDC1 genes were analyzed to show their consensus in WebLogo and multiple alignment. According to the results, five representative splice sites were identified. (Fig. 7A, B).

Expression profile analysis of animal SMNDC1s

To further investigate the potential functions of animal SMNDC1 in response to developmental cues or disease correlations, we analyzed the expression patterns of SMNDC1 genes from Homo sapiens and Mus musculus. In this work, we reconstructed the expression profiles of SMNDC1 in various biological aspects, such as developmental stages, different tissues and cell types, and disease conditions by using the BAR Heat Mapper Plus tool (Supplementary Figures S1–S6).The data of Homo sapiens disease proteomics expression showed that SMNDC1 protein had high expression abundance in multiple cancer types, including breast cancer (breast tumor luminal, HER2 positive breast carcinoma and triple-negative breast cancer), colon cancer (colon adenocarcinoma and colon mucinous adenocarcinoma) and rectal cancer (rectal cell carcinoma and rectal mucinous adenocarcinoma) (Figure S3). Moreover, the SMNDC1 transcript of humans is widely expressed in whole body tissues, including skeletal muscle, adult brain, spinal cord, testis, liver, ovary and lung (Fig. 8, Figure S2), while mouse SMNDC1 is highly expressed in brain tissue (Fig. 8, Figure S2). Furthermore, cell type expression analysis showed that human SMNDC1 was highly expressed in granulocyte monocyte progenitor cells, hematopoietic multipotent progenitor cells and hematopoietic stem cells, while mouse SMNDC1 was expressed in naive thymus-derived CD4-positive, alpha-beta T cells, embryonic stem cells, accumulated in induced T-regulatory cells and T-helper 17 cells (Fig. 8, Figure S4). On the other hand, human SMNDC1 was highly expressed in the fetal period and downregulated in the juvenile period (Figure S1), while mouse SMNDC1 was highly expressed in the embryonic period but did not abundantly accumulate in the fetus (Fig. 8, Figure S4). In addition, we will pay more attention to the expression of the SMNDC1 gene in cancer and other diseases. Specifically, transcriptome data revealed that human SMNDC1 was expressed at higher gene expression levels in cancer tissues than in normal paracarcinoma tissue and normal tissues (Figure S3). Among them, human SMNDC1 had the highest expression level in ovarian adenocarcinoma, followed by esophageal adenocarcinoma (Figure S3), while the expression abundance of this protein was enriched in breast, colon and rectal cancer. In summary, we found that SMNDC1 is highly expressed in ovarian adenocarcinoma and digestive system diseases, and may be used as a valuable diagnostic or therapeutic protein target in clinical treatment.

AS is the main mechanism for maintaining protein diversity [29], and abnormalities in AS can lead to the occurrence of many diseases [30, 31]. Specifically, abnormal AS promotes all stages of tumorigenesis, including cell proliferation [32], apoptosis [33], epithelial-mesenchymal transitions [34], tumor invasion and tumor metastasis [35, 36]. Increasing studies have shown that alternative splicing may be used as a new biomarker in oncology and provide a large number of new targets for drug development, which is of great value for improving the prognosis of cancer patients [37].SMNDC1 is one of the key spliceosomes, and an in-depth comparison and phylogenetic analysis of the animal SMNDC1 family were conducted, which can provide a more comprehensive and in-depth understanding of the function of SMNDC1 in animals.

Assessment of phylogeny relationships and putative functions in animal SMNDC1s

SMNDC1 has been identified as an essential component of the spliceosome complex [20]. In the present work, we successfully identified 110 SMNDC1 genes from 61 animal species and reconstructed their phylogenetic relationships of these selected genes. SMNDC1 proteins can be broadly divided into four groups, including primates, rodents and lagomorphs, other mammals, and other vertebrates, which are closely related to the evolution of animal lineages. Moreover, only six species SMNDC1 genes contained 2 copies (Supplementary Table S1), and analysis of the protein structures and protein domains of these cDNAs revealed that this gene family maintains conserved functions (Figs. 2,5). In addition, the conservative splicing pattern of animal SMNDC1 s indicates that most transcriptional subtypes of animal SMNDC1 tend to form N-terminal truncated protein types (Fig. 6). Transcriptional isoforms of SMNDC1 in animals share similar gene structures, suggesting that they may have similar functions in regulating gene expression and protein interactions. On the other hand, a previous study showed that different spliceosomes have different biological functions. For instance, two splicing isoforms of ZNF148 have different effects on the proliferation, invasion and migration of human colorectal cancer cells, and exert mutual antagonistic effects [38]. In addition, a previous study reported that SMNDC1 is critical for regulating ovarian cancer tumor growth and metastasis [39]. We expect that different spliceosomes of SMNDC1 may be a potential target for the treatment of ovarian cancer, however, the isoform function of SMNDC1 still needs further study. Our work showed that the SMNDC1 proteins of animals have an SMN (Tudor) domain (Fig. 2), which has affinity for Sm ribonucleoproteins and is further involved in the splicing process [19]. Studies have reported that SMN plays a key role in the assembly of uridine-rich small nuclear ribonucleoprotein complexes [40–42] and in pre-mRNA splicing [22, 24]. In our protein interaction work, we found that human SMNDC1 can interact with SNRNP200 (small nuclear ribonucleoprotein U5 subunit 200). SNRNP200 is closely related to the splicing of precursor mRNA, and can regulate the expression of related genes by affecting splicing, thereby affecting cell proliferation [43]. In addition, SNRNP200 also plays an important role in the pathogenesis of hereditary retinitis pigmentosa [44] and acute myeloid leukemia [45].The above results reveal that the interaction between SMNDC1 and SNRNP200 may play an important role in the function of SNRNP200 and its impact on diseases. Furthermore, SMNDC1 in mammals can interact with multiple splicing factors and pre-mRNA processing factors, suggesting that it plays an important role in AS regulation. In yeast, there is no SMN domain in the SMNDC1 protein and only one tudor-3 domain (Figure S7), which may prevent it from interacting with most alternative splicing factors.

Functional diversity of animal SMNDC1 s based on their differential expression pattern

SMNDC1 is a survival motor neuron protein that is required for spliceosome assembly [20]. Here, previous proteomic analysis revealed that SMNDC1 is critical for regulating ovarian cancer tumor growth and metastasis, consistent with its high expression level in ovarian adenocarcinoma (Figure S3), which will provide a new target and direction for anticancer drug development [39]. Splicing factors are frequently overexpressed in cancer [46]. Meanwhile, based on available proteomics datasets, our findings showed that high expression of SMNDC1 was observed in breast, colon and rectal cancers, implying its potential functional role in cancer development in these organs. In addition, based on the fact that SMNDC1 is required for splicing catalysis in vitro [24], researchers speculate that its poison exons may influence the extensive intron retention characteristic of most cancers [47, 48]. Analysis of RNA-seq data from 512 lung adenocarcinoma samples showed that: low SMNDC1 poison exon inclusion was associated with notably widespread reductions in intron retention, further experimental validation showed that SMNDC1 poison exons control SMNDC1 expression to regulate intron retention [25]. This discovery may provide a new perspective for developing new treatments and defeating cancer. Spinal muscular atrophy (SMA) is a degenerative neuromuscular disease with muscle weakness and muscle atrophy, caused by deletion or mutation of the SMN1 gene. Its incidence in neonates is estimated to be 1:6 000 to 1:10 000 [49, 50]. Moreover, SMNDC1 is a paralogue of the SMN1 gene, and may share a cellular function similar to that of the SMN1 gene. However, in spinal muscular atrophy being overshadowed by SMN, the biological function of SMNDC1 may not be fully elucidated [27]. Interestingly, SMNDC1 is highly expressed in the spinal cord and skeletal muscle tissue of Homo sapiens, which is consistent with the results of previous studies [20]. In addition, SMNDC1 is mainly expressed in fetal skeletal muscle tissue and not in adult tissue. These results indicated that the deletion or mutation of SMNDC1 may be another key factor in spinal muscular atrophy (SMA). Furthermore, in this work, we obtained comprehensive information on SMDNC1 AS in multiple animals, but biological experiments are still needed to validate these new predictions. The above findings may allow scientists to better identify biomarkers of disease substances and therapeutic targets. For example, SMNDC1 mRNA is a target of FMRP, and this result could complement the current understanding of the etiology of FXS [27].AS has become one of the hotspots in the era of functional genomics. The form of AS can be found by comparing transcripts and genomes, however, with the increasing amount of research samples and data analysis, the development of high-throughput experimental technology is particularly important [51]. The isoform level of SMNDC1 has not been thoroughly studied. Hence, it is necessary to further study the expression profile of animal smndc1 isoforms through SWATH-MS (sequential window acquisition of all theoretical mass spectra)-based quantitative approaches [52]. Our successful identification of the biological functions of the splicing-related protein SRP in plants, will provide a reference for us to further study the specific function of each SMNDC1 transcript isoform [53].

Comparison of SMNDC1 in animals, yeast and plants

Although the splicing machinery is fairly conserved among eukaryotic species, the splicing mechanisms of humans, yeast and Arabidopsis are not identical. In particular, our work further analyzed and compared the genome structure and splice site patterns of SMNDC1 s from humans, yeast and Arabidopsis. Based on the results, we found that the SMN domain is retained between humans, Arabidopsis and rice, while a Tuor3 domain is present in yeast (Figure S7). Interestingly, the three exons encoding the domain of SMNDC1 were identical between the three species.

Sequence identification and collection of the animal SMNDC1 proteins

The SMNDC1 protein sequence (ENST00000369592.1) of Homo sapiens was used as a reference to perform the BLASTp search with an e-value cut of = 1e − 10 against all available animal genome sequences from the Ensembl database(http://asia.ensembl.org/index.html) as described previously. The obtained protein regions were predicted by the online software HMMER (https://www.ebi.ac.uk/Tools/hmmer/search/phmmer). Consequently, the phylogenetic tree was constructed using the Bayesian method based on the amino acid sequences of 72 SMNDC1 members from 66 animal species.

Phylogenetic analysis of the SMNDC1 gene family in animals

The amino acid sequences of 72 SMNDC1 genes from 66 animal species were used for phylogenetic analysis by using the Bayesian method for genes with different transcript isoforms, the one with the longest protein coding sequence was used. Multiple sequence alignments of all selected SMNDC1 sequences were carried out using Mus-cle v3.8. Bayesian methods were used to construct a rooted phylogenetic tree of the SMNDC1 proteins using Mrbayes3.2. Maximum likelihood methods were also used to construct an additional tree by PhyML v3.0 for validating the result from the Bayesian tree [54]. The phylogenetic trees were edited using FigTree v1.4.3 [55].

Analysis of Gene Structures, Protein Domains and conserved motif

Gene structure and cDNA conserved motifs were identified by the MEME online tool (http://meme-suite.org/tools/meme) (Bailey et al. 2009). Protein domains were predicted by HMMER website (https://www.ebi.ac.uk/Tools/hmmer/) [56] and were drawn using TBtools [57]. and the exon–intron structures of all genes were downloaded and reconstructed from the Ensembl database.

Analysis of Protein Interaction Networks

The protein sequences of humans (ENSP00000363129.3), Mus musculus (ENSMUSP00000156644.1) and Saccharomyces cerevisiae (YLR298C_mRNA) were selected to obtain the interaction network on the STRING web server (https://string-db.org/) [58]. Finally, the predicted functional partners of each SMNDC1 protein were presented in the form of an interaction network drawn by Cytoscape 3.8 software.

AS Profile Analysis and Identification of Conserved Splice Sites

All available alternative transcripts of animal SMNDC1 genes were downloaded from the Ensembl database. All available splicing isoforms of animal SMNDC1 genes were obtained again from Ensembl database. Selected splice junction sequences (15 bp on each side) were further examined using BLAST. Consensus sequences at representative splice sites were analyzed and visually represented by using WebLogo v3.0 (https://weblogo.berkeley.edu/logo.cgi)

Expression Analysis of SMNDC1 from Online Microarray Datasets

Expression data for animal SMNDC1 family members were downloaded from the Expression Atlas (https://www.ebi.ac.uk/gxa/home). The retrieved expression data were reorganized and presented as heatmaps by using online BAR HeatMapper Plus software (http://bar.utoronto.ca/ntools/cgi-bin/ntools_heatmapper_plus.cgi).

In this study, we identified a total of 110 SMNDC1 genes from 61 animal species and comprehensively analyzed their phylogenetic relationships, genomic organization, motif and protein domain enrichment and splicing pattern conservation, providing a foundation for molecular research on SMNDC1 proteins with respect to their roles in human diseases investigated in mammalian cell lines or animal models. In conclusion, the study of SMNDC1 is of great significance not only for the elucidation of related mechanisms, but also for the diagnosis and treatment of related diseases.

COMPETING INTERESTS

There are no competing interests to declare.

AUTHOR CONTRIBUTIONS

Conceptualization, CS, HMW, and BX-H; writing original draft preparation, OY-GJ, Y-NL, BX-H and CS; writing review and editing, OY-GJ, HMW, Y-NL, CS and M-XC; funding, M-XC. The final version of the manuscript was agreed by all authors.

ACKNOWLEDGMENT

This work was supported by the Program for Science Technology and Innovation Committee of Shenzhen (2021N062-JCYJ20210324115408023), the National Natural Science Foundation of China (NSFC32001932), and the Hong Kong Research Grant Council (AoE/M-05/12, AoE/M-403/16, GRF12100318, 12103219, 12103220).

Kornblihtt AR, Vibe-Pedersen K, Baralle FE: Human fibronectin: molecular cloning evidence for two mRNA species differing by an internal segment coding for a structural domain. EMBO J 1984, 3(1):221–226.
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456(7221):470–476.
Pal S, Gupta R, Davuluri RV: Alternative transcription and alternative splicing in cancer. Pharmacol Ther 2012, 136(3):283–294.
Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A et al: Proteomics. Tissue-based map of the human proteome. Science 2015, 347(6220):1260419.
Hu Z, Scott HS, Qin G, Zheng G, Chu X, Xie L, Adelson DL, Oftedal BE, Venugopal P, Babic M et al: Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics. Sci Rep 2015, 5:10940.
Wong ACH, Rasko JEJ, Wong JJ: We skip to work: alternative splicing in normal and malignant myelopoiesis. Leukemia 2018, 32(5):1081–1093.
Matsuda T, Namura A, Oinuma I: Dynamic spatiotemporal patterns of alternative splicing of an F-actin scaffold protein, afadin, during murine development. Gene 2019, 689:56–68.
Nakka K, Ghigna C, Gabellini D, Dilworth FJ: Diversification of the muscle proteome through alternative splicing. Skelet Muscle 2018, 8(1):8.
Disset A, Bourgeois CF, Benmalek N, Claustres M, Stevenin J, Tuffery-Giraud S: An exon skipping-associated nonsense mutation in the dystrophin gene uncovers a complex interplay between multiple antagonistic splicing elements. Hum Mol Genet 2006, 15(6):999–1013.
Cartegni L, Krainer AR: Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nat Genet 2002, 30(4):377–384.
Svasti S, Suwanmanee T, Fucharoen S, Moulton HM, Nelson MH, Maeda N, Smithies O, Kole R: RNA repair restores hemoglobin expression in IVS2-654 thalassemic mice. Proc Natl Acad Sci U S A 2009, 106(4):1205–1210.
Lin X, Miller JW, Mankodi A, Kanadia RN, Yuan Y, Moxley RT, Swanson MS, Thornton CA: Failure of MBNL1-dependent post-natal splicing transitions in myotonic dystrophy. Hum Mol Genet 2006, 15(13):2087–2097.
Williams C, Hoppe HJ, Rezgui D, Strickland M, Forbes BE, Grutzner F, Frago S, Ellis RZ, Wattana-Amorn P, Prince SN et al: An exon splice enhancer primes IGF2:IGF2R binding site structure and function evolution. Science 2012, 338(6111):1209–1213.
Wang GS, Cooper TA: Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet 2007, 8(10):749–761.
Chabot B, Shkreta L: Defective control of pre-messenger RNA splicing in human disease. J Cell Biol 2016, 212(1):13–27.
Zhou Z, Licklider LJ, Gygi SP, Reed R: Comprehensive proteomic analysis of the human spliceosome. Nature 2002, 419(6903):182–185.
Will CL, Luhrmann R: Spliceosomal UsnRNP biogenesis, structure and function. Curr Opin Cell Biol 2001, 13(3):290–301.
Kolb SJ, Battle DJ, Dreyfuss G: Molecular functions of the SMN complex. J Child Neurol 2007, 22(8):990–994.
Cote J, Richard S: Tudor domains bind symmetrical dimethylated arginines. J Biol Chem 2005, 280(31):28476–28483.
Talbot K, Miguel-Aliaga I, Mohaghegh P, Ponting CP, Davies KE: Characterization of a gene encoding survival motor neuron (SMN)-related protein, a constituent of the spliceosome complex. Hum Mol Genet 1998, 7(13):2149–2156.
Lefebvre S, Burglen L, Reboullet S, Clermont O, Burlet P, Viollet L, Benichou B, Cruaud C, Millasseau P, Zeviani M et al: Identification and characterization of a spinal muscular atrophy-determining gene. Cell 1995, 80(1):155–165.
Meister G, Hannus S, Plottner O, Baars T, Hartmann E, Fakan S, Laggerbauer B, Fischer U: SMNrp is an essential pre-mRNA splicing factor required for the formation of the mature spliceosome. EMBO J 2001, 20(9):2304–2314.
Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P, Sleeman J, Lamond A, Mann M: Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat Genet 1998, 20(1):46–50.
Rappsilber J, Ajuh P, Lamond AI, Mann M: SPF30 is an essential human splicing factor required for assembly of the U4/U5/U6 tri-small nuclear ribonucleoprotein into the spliceosome. J Biol Chem 2001, 276(33):31142–31150.
Thomas JD, Polaski JT, Feng Q, De Neef EJ, Hoppe ER, McSharry MV, Pangallo J, Gabel AM, Belleville AE, Watson J et al: RNA isoform screens uncover the essentiality and tumor-suppressor activity of ultraconserved poison exons. Nat Genet 2020, 52(1):84–94.
Shahid R, Bugaut A, Balasubramanian S: The BCL-2 5' untranslated region contains an RNA G-quadruplex-forming motif that modulates protein expression. Biochemistry 2010, 49(38):8300–8306.
McAninch DS, Heinaman AM, Lang CN, Moss KR, Bassell GJ, Rita Mihailescu M, Evans TL: Fragile X mental retardation protein recognizes a G quadruplex structure within the survival motor neuron domain containing 1 mRNA 5'-UTR. Mol Biosyst 2017, 13(8):1448–1457.
Zhang D, Yang JF, Gao B, Liu TY, Hao GF, Yang GF, Fu LJ, Chen MX, Zhang J: Identification, evolution and alternative splicing profile analysis of the splicing factor 30 (SPF30) in plant species. Planta 2019, 249(6):1997–2014.
Li Y, Sun N, Lu Z, Sun S, Huang J, Chen Z, He J: Prognostic alternative mRNA splicing signature in non-small cell lung cancer. Cancer Lett 2017, 393:40–51.
Dvinge H, Kim E, Abdel-Wahab O, Bradley RK: RNA splicing factors as oncoproteins and tumour suppressors. Nat Rev Cancer 2016, 16(7):413–430.
Scotti MM, Swanson MS: RNA mis-splicing in disease. Nat Rev Genet 2016, 17(1):19–32.
Xie R, Chen X, Chen Z, Huang M, Dong W, Gu P, Zhang J, Zhou Q, Dong W, Han J et al: Polypyrimidine tract binding protein 1 promotes lymphatic metastasis and proliferation of bladder cancer via alternative splicing of MEIS2 and PKM. Cancer Lett 2019, 449:31–44.
Tyson-Capper A, Gautrey H: Regulation of Mcl-1 alternative splicing by hnRNP F, H1 and K in breast cancer cells. RNA Biol 2018, 15(12):1448–1457.
Pradella D, Naro C, Sette C, Ghigna C: EMT and stemness: flexible processes tuned by alternative splicing in development and cancer progression. Mol Cancer 2017, 16(1):8.
Wang F, Fu X, Chen P, Wu P, Fan X, Li N, Zhu H, Jia TT, Ji H, Wang Z et al: SPSB1-mediated HnRNP A1 ubiquitylation regulates alternative splicing and cell migration in EGF signaling. Cell Res 2017, 27(4):540–558.
Chen L, Yao Y, Sun L, Zhou J, Miao M, Luo S, Deng G, Li J, Wang J, Tang J: Snail Driving Alternative Splicing of CD44 by ESRP1 Enhances Invasion and Migration in Epithelial Ovarian Cancer. Cell Physiol Biochem 2017, 43(6):2489–2504.
Blencowe BJ: Alternative splicing: new insights from global analyses. Cell 2006, 126(1):37–47.
Liu Y, Huang W, Gao X, Kuang F: Regulation between two alternative splicing isoforms ZNF148(FL) and ZNF148(DeltaN), and their roles in the apoptosis and invasion of colorectal cancer. Pathol Res Pract 2019, 215(2):272–277.
Giri K, Shameer K, Zimmermann MT, Saha S, Chakraborty PK, Sharma A, Arvizo RR, Madden BJ, McCormick DJ, Kocher JP et al: Understanding protein-nanoparticle interaction: a new gateway to disease therapeutics. Bioconjug Chem 2014, 25(6):1078–1090.
Buhler D, Raker V, Luhrmann R, Fischer U: Essential role for the tudor domain of SMN in spliceosomal U snRNP assembly: implications for spinal muscular atrophy. Hum Mol Genet 1999, 8(13):2351–2357.
Pellizzoni L, Kataoka N, Charroux B, Dreyfuss G: A novel function for SMN, the spinal muscular atrophy disease gene product, in pre-mRNA splicing. Cell 1998, 95(5):615–624.
Pellizzoni L, Yong J, Dreyfuss G: Essential role for the SMN complex in the specificity of snRNP assembly. Science 2002, 298(5599):1775–1779.
Ehsani A, Alluin JV, Rossi JJ: Cell cycle abnormalities associated with differential perturbations of the human U5 snRNP associated U5-200kD RNA helicase. PLoS One 2013, 8(4):e62125.
Liu T, Jin X, Zhang X, Yuan H, Cheng J, Lee J, Zhang B, Zhang M, Wu J, Wang L et al: A novel missense SNRNP200 mutation associated with autosomal dominant retinitis pigmentosa in a Chinese family. PLoS One 2012, 7(9):e45464.
Gillissen MA, Kedde M, Jong G, Moiset G, Yasuda E, Levie SE, Bakker AQ, Claassen YB, Wagner K, Bohne M et al: AML-specific cytotoxic antibodies in patients with durable graft-versus-leukemia responses. Blood 2018, 131(1):131–143.
Urbanski LM, Leclair N, Anczukow O: Alternative-splicing defects in cancer: Splicing regulators and their downstream targets, guiding the way to novel cancer therapeutics. Wiley Interdiscip Rev RNA 2018, 9(4):e1476.
Dvinge H, Bradley RK: Widespread intron retention diversifies most cancer transcriptomes. Genome Med 2015, 7(1):45.
Jung H, Lee D, Lee J, Park D, Kim YJ, Park WY, Hong D, Park PJ, Lee E: Intron retention is a widespread mechanism of tumor-suppressor inactivation. Nat Genet 2015, 47(11):1242–1248.
Lunn MR, Wang CH: Spinal muscular atrophy. Lancet 2008, 371(9630):2120–2133.
Schorling DC, Pechmann A, Kirschner J: Advances in Treatment of Spinal Muscular Atrophy - New Phenotypes, New Challenges, New Implications for Care. J Neuromuscul Dis 2020, 7(1):1–13.
Zhu FY, Chen MX, Ye NH, Shi L, Ma KL, Yang JF, Cao YY, Zhang Y, Yoshida T, Fernie AR et al: Proteogenomic analysis reveals alternative splicing and translation as part of the abscisic acid response in Arabidopsis seedlings. Plant J 2017, 91(3):518–533.
Zhu FY, Chen MX, Chan WL, Yang F, Tian Y, Song T, Xie LJ, Zhou Y, Xiao S, Zhang J et al: SWATH-MS quantitative proteomic investigation of nitrogen starvation in Arabidopsis reveals new aspects of plant nitrogen stress responses. J Proteomics 2018, 187:161–170.
Chen MX, Mei LC, Wang F, Boyagane Dewayalage IKW, Yang JF, Dai L, Yang GF, Gao B, Cheng CL, Liu YG et al: PlantSPEAD: a web resource towards comparatively analysing stress-responsive expression of splicing-related proteins in plant. Plant Biotechnol J 2021, 19(2):227–229.
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59(3):307–21.
Morariu V, Srinivasan B, Raykar V, Duraiswami R, Davis L. Automatic online tuning for fast Gaussian summation. Adv Neural Inf Process Syst. 2008. 1113–1120.
Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res. 2018;46(W1): W200-W204.
Chen MX, Zhang KL, Zhang M, Das D, Fang YM, Dai L, Zhang J, Zhu FY. Alternative splicing and its regulatory role in woody plants. Tree Physiol. 2020, 40(11):1475–1486.
Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., et al. STRING V10: Protein-Protein Interaction Networks, Integrated over the Tree of Life. Nucleic Acids Res. 2015, 43, D447–D452.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Phylogenetic comparison and splice site conservation of the animal SMNDC1 gene family

Status:

Version 1

Abstract

Figures

INTRODUCTION

RESULTS

Identification of SMNDC1 genes in animal and construction of a phylogenetic tree

Analysis of protein domain/motif

Interaction Networks of SMNDC1

Analysis of gene structure and conserved motifs

Transcript Isoforms and Conserved Splice Site Analysis

Expression profile analysis of animal SMNDC1s

DISCUSSION

Assessment of phylogeny relationships and putative functions in animal SMNDC1s

Functional diversity of animal SMNDC1 s based on their differential expression pattern

Comparison of SMNDC1 in animals, yeast and plants

EXPERIMENTAL METHODS

Sequence identification and collection of the animal SMNDC1 proteins

Phylogenetic analysis of the SMNDC1 gene family in animals

Analysis of Gene Structures, Protein Domains and conserved motif

Analysis of Protein Interaction Networks

AS Profile Analysis and Identification of Conserved Splice Sites

Expression Analysis of SMNDC1 from Online Microarray Datasets

CONCLUSION

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1