Single Nucleotide Polymorphism (SNP) in splice sites and prediction for the doublesex (dsx) gene alternative splicing in the malaria vector Anopheles gambiae.

Background: Malaria burden continues to be signicant in tropical regions, and conventional vector control methods are faced with challenges such as insecticide resistance. To overcome these challenges, additional vector control interventions are vital and include modern genetic approaches as well as classical methods like the sterile insect technique (SIT). In the major human malaria vector Anopheles gambiae, a candidate gene favourable for sterility induction is the doublesex (dsx) gene, encoding somatic sexually dimorphic traits in mosquitoes. However, the mechanism that regulates the expression of this gene in anopheline mosquitoes is poorly understood. This study aims to screen the An. gambiae dsx gene for single-nucleotide polymorphisms (SNPs) that could be critical to its alternative splicing. Result: Variant annotation data from Ag1000G project phase 2 was analysed, in order to identify splice-relevant SNPs within acceptor and donor splice sites of the An. gambiae dsx gene (Agdsx). SNPs were found in both donor and acceptor site of the Agdsx. No splice-relevant SNPs were identied in female-specic intron 4 acceptor site and the corresponding region in males. Two SNPs (rs48712947, rs48712962) were found in female-specic donor site of exon 5. They were not specic to either males or females. as the rs48712947 was found in female mosquitoes from Cameroon, and in both males and females from Burkina Faso. In the other splice sites, the intron 3 acceptor site carried the greatest abundance of SNPs. Conclusion: There were no gender association between the identied SNPs and the random distribution of these SNPs in mosquito populations. The SNPs in Agdsx splice sites are not critical for the alternative splicing. Other molecular mechanisms should be considered and investigated.


Background
Malaria is a vector-borne infectious disease caused by the protozoan parasite belonging to the Plasmodium genus (1). The transmission occurs among humans through the bite of the female Anopheles mosquito. This disease is among the top ten causes of death in low-income countries (2) and continues to take a heavy toll on communities, especially in Africa regions. The malaria transmission cycle involves four major elements: the host (human), the parasite, the vector and the environment (3). In the absence of effective vaccine or sustainable treatment options, vector control is the cornerstone of malaria management and is based on prevention of human-host contact and reduction in vector population density (1,4). The traditional vector control strategies rely on long-lasting insecticidal net (LLIN) distribution and indoor residual sprays (IRS) which have contributed to the decreasing malaria cases and mortality (5,6). However, vector resistance against the existing insecticides is increasing in natural mosquito populations (7)(8)(9).
In the last decade, scienti c advances in additional tools for vector control include technologies such as cytoplasmic incompatibility with the use of Wolbachia infection (10); repressible dominant lethal systems in Aedes aegypti (11,12); Ychromosome shredding gene drive (13); and the genetic sterilisation of Anopheles sp., known as Sterile Insect Techniques (SIT) (14). The latter technique, SIT, is based on the repeated, high-density release of radio-sterilized males, through gamma radiation, into the environment in order to compete with wild males for mating with the native female anopheles mosquitoes hindering the production of offspring (14). Indeed, mated females will not produce viable offspring, resulting in reduced population numbers or even elimination of the target species. However, instead of exposing males to a source of radiation, sterility could be induced by genetic modi cation of the mosquito genome and may improve the effectiveness of classical SIT-based approaches (14).
In An. gambiae, one of the major malaria vectors, population supression strategies are already under investigation by targeting the sex determination genes such as the doublesex (dsx) transcription factor gene (15,16). Therefore, the Anopheles gambiae doublesex gene (Agdsx) represent a useful candidate gene for genetic manipulation and improvement of the alternative mosquito control technologies. Interest in this gene comes from the fact that it undergoes alternative splicing and result in female and male-speci c transcripts necessary for sex determination in this species (17).
The use of transgenic tools in anopheline mosquitoes through targeting the dsx gene could improve the sterility induction and genetic sexing which are major requirements for SIT technologies. However, the molecular mechanisms underlying gender determination are highly variable. Though it was demonstrated that Yob1 gene (Y-linked) is one of the determining factors of the male sex (18), the molecular pathways controlling the signal of somatic sexual commitment (dsx splicing and regulation) in An. gambiae are not well understood. The only well-known model of the dsx splicing comes from the y Drosophila melanogaster sex determination pathway (19). The dsx gene acts as a transcription factor targeting several genes which have mostly sex-and tissue-speci c functions in y (20,21). Transformer (TRA) and Transformer 2 (TRA2) are the key regulatory factors of the female-speci c alternative splicing of dsx pre-mRNA while the absence of TRA lead to the male-speci c splicing (20). Both TRA and TRA2 are downstream targets of the Sex lethal gene (Sxl) product (19). Unfortunately, An. gambiae dsx gene (Agdsx) has a different structure suggesting that Agdsx sex-speci c splicing event is caused by a mechanism different from that of the D. melanogaster dsx (17,22).
In mammalian cells, the presence of genetic variations such as single nucleotide polymorphisms (SNPs) within the donor and acceptor splice sites could in uence splicing and might lead to changes in normal splicing pattern (23)(24)(25). Donor sites (5'-splice site) are de ned by GT dinucleotide at the 5' end of exon-intron border, while AG dinucleotide de ned acceptor sites (3'-splice site) at the 3' end of intron-exon border (26). Thus, we hypothesize that SNPs could occur in the Agdsx acceptor and donor splice sites that might result in the splice variation With this in mind, the current report seeks to screen Agdsx for single-nucleotide polymorphisms (SNPs) that could be associated with alternative splicing.

Results
Identi cation of An. gambiae dsx gene (Agdsx) donor and acceptor splice sites sequence.
Agdsx is located in the band 17C of the chromosome 2R (2R: 48703664-48788460) on the reverse strand. The gene is 84.8 kb long and encodes the male and female-speci c transcript. The male transcript (6975 bp) is shorter than that of female (8667 bp). The difference between the two sex-speci c transcripts is due to the alternative splicing of exon 5. This latter is a cassette exon, which is retained in female and skipped in male transcript. The whole sequence of female speci c exon 5 is included in male intron 4 region and is spliced out. This gene structure causes a shift in intron/exon number in male. Thus, male and female have common and speci c splice sites.
Male and female mosquitoes share exon 1, 2, 3, 4 and 6 donor splice sites while exon 5 donor site is speci c to female (Table 2). Similarly, both sexes share intron 1, 2, 3, and 6 acceptor sites. Male intron 4 and female intron 5 share the same 3' end as the female exon 5 is included in the male intron 4 sequence. However females have the intron 4 speci c acceptor site (Table 2). Splice site sequences are given in 5'⎝3' direction on the reverse strand. Exonic coding sequences are shown in uppercase letters, and non-coding regions are in lowercase letters. The 12 bp preceding the 3' splice-acceptor site (NYag) is indicated, where Y = T or C and N = any nucleotide.
SNPs in female-speci c intron 4 acceptor and exon 5 donor splice sites.
Along the Agdsx gene, 17196 polymorphic sites were identi ed. The nucleotide diversity is similar between male and female mosquitoes (Fig. 1). The potential splice-relevant SNPs that could trigger the female-speci c exon 5 skipping should be in the intron 4 acceptor and exon 5 donor sites. However, there was no SNP in acceptor sequence of femalespeci c intron 4 nor in the male corresponding region (Fig. 2). However in female speci c exon 5 donor site, two SNPs (rs48712947, rs48712962) were found. Nevertheless, they were not speci c to female as the rs48712947 was found in Cameroon female mosquitoes and in both males and females from Burkina Faso (Fig. 3). The rs48712962 was absent in male mosquito population, while it was found only in Cameroon females. The minor allele frequencies (MAF) of both SNPs identi ed were very low in each population. The MAF of rs48712947 and rs48712962 were less than 1% in each female population and only 2% of Burkina Faso male carried the rs48712947. Moreover, none of these SNPs were associated with the sex phenotype (rs48712947: p = 0.32; rs48712962: p = 0.68).
SNPs in other splice sites of Agdsx.
The other splice sites were also examined for identi cation of sex-speci c SNPs (see additional Figs. 1-3). No SNP was found in the shared exon 1 donor, introns 1 and 6 acceptor sites. The highest number of SNPs (07) was found in the common intron 3 acceptor site sequence (rs48715291, rs48715294, rs48715302, rs48715306, rs48715307, rs48715308, rs48715309) (Fig. 4). However, each of these SNPs occurred in non-speci c manner in both male and female populations with variable minor allele frequencies.

Discussion
The An. gambiae doublesex (Agdsx) gene is a candidate gene of interest for sterile insect technique (SIT), as a candidate for genetic modi cations (15,16). The translation and the success of using dsx in SIT methodology require a clearer understanding of the genetic bases of the sex determination pathway. This study screened the Agdsx donor and acceptor splice sites for identi cation of splice-relevant SNPs.
According to the D. melanogaster model (19), the alternative splicing of Agdsx gene is governed by exon 5 skipping in male mosquitoes (17) suggesting a silencing mechanism of the female-speci c splice sites recognition (intron 4 acceptor and exon 5 donor sites) by the splicing machinery in male. Such silencing mechanism could be due to changes in splice site sequence. However, female-speci c intron 4 acceptor site sequence is present within male intron 4 and no SNP was found in this sequence in both sex. The SNPs rs48712947 and rs48712962 identi ed in female-speci c exon 5 donor site were neither splice-relevant nor sex-speci c. They appeared only is two mosquito populations (Burkina Faso and Cameroon) over the eight populations considered. In each population where these SNPs have been identi ed, they appeared in very few individuals, less than 1% in females and no more than 2% in males. These observations suggest that the Agdsx cassette exon 5 was not associated with changes in splice site patterns due to the presence of SNPs. The presence of SNPs in the other splice sites had also different distribution and were non-speci c to the gender of the mosquitoes.
Another factor for exon skipping is the pyrimidine content of the polypyrimidine tract in acceptor splice sequence. Indeed a poor polypyrimidine tract cause a shift of the splicing machinery to the next acceptor site, leading to the skipping as the case of exon 4 skipping in male Drosophila (19). In Anopheles gambiae the number of pyrimidine (8) in the 12 bp preceding the acceptor site pattern (acag) ( Table 2) in the female-speci c intron 4 is the same in the male corresponding region. The same number of pyrimidines in this acceptor sequence was reported by Scali et al. (17). Furthermore, the authors found that this number did not differ from the consensus number of pyrimidines (8.69) in An. gambiae splice acceptor sites, and concluded that the intron 4 site may not be a weak acceptor site (17). The Drosophila dsx splicing regulation involves the products of transformer (Tra) and transformer 2 (Tra2) genes (19). Although ortholog of Drosophila Tra2 is present in An. gambiae (accession number: AGAP006798), no ortholog of Tra was found in An. gambiae genome (www. ybase.org). This suggests that Tra may be missing in An. gambiae or its splicing regulatory function could be ensured by another gene. Overall, these ndings add another evidence that other mechanisms underlie the alternative splicing in An. gambiae and open perspective for further investigation on the molecular mechanisms of Agdsx splicing.
It was known that the regulation of alternative splicing evolved tans-acting splicing factors, such as serine-arginine-rich (SR) family proteins and heterogeneous nuclear ribonucleoproteins (hnRNPs) that bind to the auxiliary silencer and enhancer cis-element (ESE: exonic splicing enhancers; ESI: exonic splicing silencers; ISE: intronic splicing enhancers; ISI: intronic splicing silencers) (27)(28)(29). Similar regulatory cis-elements were found in Drosophila melanogaster femalespeci c exon and putative homologs were identi ed in An. gambiae female-speci c exon 5 (17). Therefore, further molecular analysis are needed toward characterizing these regulatory sequence and their binding trans-factors in order to underpin the somatic sex determination in An. gambiae.
Moreover, the epigenetic system was also reported to regulate the alternative splicing in mammalian and other insects cells. Indeed, it was demonstrated that changes in DNA cytosine methylation on the gene body in honey bees may lead to alternative splicing (30)(31)(32). Also histone post-translational modi cations (PTMs) such as lysine acetylation and methylation were associated to the alternative splicing event (33)(34)(35). Consequently, similar mechanisms could happen in the malaria vector An. gambiae to regulate gene alternative splicing. However, no signi cant DNA methylation was reported in Diptera including An. gambiae (36,37). Then, the only epigenetic modi cations that could be linked to the alternative splicing in this species are histone PTMs. Indeed, the methylation and acetylation of lysines 4, 9 and 29 of histone H3 were reported in An. gambiae (38,39). Then, it will be interesting to evaluate whether such histone modi cations enrichment in Agdsx between male and female mosquitoes could be critical for dsx alternative splicing. whole genome and that pass the variant ltering process described by (22). Only Anopheles gambiae samples were considered in our study. These mosquito samples were collected from natural populations from 2002 to 2012 in eight African countries ( Table 1). The reference sequence of Agdsx (AGAP004050) was also downloaded from Vectorbase website.  (41). The genomic position of the acceptor sites was used to select SNPs in the last 12 nucleotides of an intron preceding the 3' splice pattern NYAG and in the rst 6 nucleotides of an exon. In donor splice sites SNPs were identi ed within in the last 6 nucleotides of an exon and the rst 16 nucleotide in an intron. The SNPs association to the sex phenotype (male or female) was evaluated by running the association analysis using the general linear model (GLM) function in TASSEL. The averages nucleotide diversity at the dsx locus between male and female was calculated using scikit-allel version 1.2.1 (42) in order to determine whether SNPs density at the dsx locus is different between the two genders. Availability of data and materials

Competing interests
The authors declare that they have no competing interests Funding