Identification of cotton Dicer-like gene family and chromosomal localization
Using Arabidopsis and rice DCLs as query sequences, we identified the Dicer-like genes of two diploid cottons, G. raimondii and G. arboreum, and of the allotetraploid G. hirsutum acc. TM–1 were identified by searching the cotton genome database [46–48]. Based on these queries and Blast tool analyses, we identified 6 genes encoding DCL proteins in G. raimondii, 7 in G. arboreum and 11 in the allotetraploid G. hirsutum, respectively (Table 1).
DCL genes were named according to the closest orthologs in A. thaliana. Among 4 DCL genes in A. thaliana, AtDCL1, AtDCL2, AtDCL3, and AtDCL4 have orthologs in cotton. AtDCL1 has only one ortholog in G. raimondii and G. arboreum, andtwo in G. hirsutum. AtDCL2 has 2 orthologs in G. raimondii, 3 in G. arboreum, and 4 in G. hirsutum. AtDCL3 has 2 orthologs in the two diploid genomes and 3 in G. hirsutum. All three cotton species present two DCL3. It seems that this duplication could be relevant for cotton evolution as it is present in the three species studied here; however, we cannot exclude the possibility that it is neutral. AtDCL4 has 2 orthologs in tetraploid cotton, but only 1 ortholog in diploid cottons. Cotton DCLs distinct paralogs were coded as a and b, according to their order on the homologous chromosomes.
Detailed information for these genes is listed in Table 1, including the chromosome location, ORF and protein lengths, molecular weight and theoretical isoelectric point. The newly identified Dicer-like loci showed coding potentials of 1209 to 2009 amino acid polypeptides, with predicted molecular weights (MW) of 132.99 to 220.99 kDa, respectively. A very small DCL putative polyprotein was identified in G. arboreum for DCL2a (Cotton_A_34031), which were 770 amino acids in length (84.7 KDa). It represents an extra DCL2a, that is present only in the G. arboreum genome.
The physical location of the three cotton species DCL genes is shown in Figure 1. A total of eleven GhDCL genes were distributed on 10 G. hirsutum chromosomes. All the chromosomes (A01, A04, A05, A06, A07, D01, D06, D07, and D13) contained a single representative of GhDCL, with the exception of chromosome D05, which contained GhDCL2a and GhDCL4. Curiously, almost all the GhDCLs were correlated with chromosomes inherited from each parental-related species. G. raimondii presented 6 DCL genes distributed in 5 chromosomes, and G. arboreum 7 had DCLs distributed in 5 chromosomes and in a scaffold region (scaffold3086), that had not yet been incorporated into the physical map of chromosomes. G. arboreum presented two DCL2a located in very close proximity on chromosome 10, separated by approximately 9 kb. These two DCL2as (named herein as GaDCL2a31 and 32) were highly similar, sharing 93.4% identity at the amino acid level. The predicted GaDCL2a32 was shorter than its orthologs and seemed to have lost nucleotides/amino acids at the 5’ extremity/N-terminus.
The intron/exon (I/E) distribution as well as intron numbers, are shown in Table 1 and Figure 2. GaDCL1 (Cotton_A_14097) showed the longest ORF of 6027 bp and coding potential for a polypeptide of 2009 amino acids. The maximum number of introns, 25, was found in DCL3a and b of G. raimondii. DCL4 of all three distinct species showed a very similar intron/exon distribution. The same could be observed for DCL3a of G. arboreum, G. raimondii, and G. hirsutum from subgenome D. GhDCL3a from subgenome A, however, showed a distinct pattern at the 5’ end of the gene. It seemed to have lost part of the nucleotides from this region. A very similar distribution of introns and exons was also observed for DCL2b of G. raimondii and G. hirsutum subgenomes A and D. GaDCL2b, on the other hand, showed a very different distribution of I/E.
Cotton DCL phylogenetic and domain composition analysis
The phylogenetic relationships of the amino acid sequence of the three cotton species (G. raimondii, G. arboreum,and G. hirsutum) were used to construct a neighbor-joining (N-J) phylogenetic tree, using MEGA 7.0 software (Figure 2). These analyses indicated that the DCL genes clustered in two separate clades, one composed of DCL1, 2, and 4 and the other of DCL3s. The two GhDCL subfamilies were further divided into many subclades that clearly grouped the DCLs from the parental/ancestral diploid species with each corresponding subgenome in the allotetraploid cotton. The extra DCL3, named herein as DCL3b, was present in all three species, but the allotetraploid cotton presented only one member of this gene on the genome, showing the closest relationship to GrDCL3b. We can hypothesize that the DCL3b gene, that was acquired from G. arboreum during the G. hirsutum evolution, had been lost. Alternately, it may have been acquired independently by the three species after allotetraploid hybridization. The two DCL2a from G. arboreum formed a group together with GhDCL2a from subgenome A, showing that they probably duplicated after tetraploid hybridization. The constructed tree suggested a high level of sequence conservation for DCL sequences in the diploid species and the allotetraploid G. hirsutum during evolution.
SMART was used to identify the DCL domains in all Dicer-like genes from the three species (Figure 2). All DCLs contained a DEAD, helicase-C, DUF283 and PAZ domain, excluding GaDCL2a31 and GaDCL2b, which did not present any DEAD or helicase-C domains. In fact, GaDCL2a31 was a truncate DCL because it also did not present a DUF283 domain. Two RNase III domains were present in all DCLs, and at least one DSRM domain was present in 15 out of the 22 cotton DCL proteins. All 7 cotton DCL3s had a dsRB domain, which is a characteristic of DCL3 plant proteins. Interestingly, GhDCL1 from subgenome D seemed to have lost a DSRM domain, while DCL1 of subgenome A maintained 2 DSRB domains.
A comparison of cotton DCL proteins with those from Arabidopsis, rice, poplar, grapevine and Medicago sp. revealed that cotton DCL1s shared a common ancestor with grapevine and poplar DCL1s (Figure 3). DCL2a and 2b from cotton, however, were more related to poplar DCL2s. Grapevine and poplar DCL3s showed the closest relationship to cotton DCL3 from all species analyzed. DCL3 duplication in rice seemed to occur over a long time before cotton DCL3 duplication, but after dicot/monocot divergence. The duplication of DCL3 in cotton probably occurred before cotton allotetraploid hybridization and therefore more than 1.2 MYA. DCL4 seemed to be the most conserved DCL among the eudicots analyzed herein.
DCL gene expression profiles in different G. hirsutum organs
To evaluate if the in silico-identified DCLs in cotton were able to generate transcripts, we collected tissue samples from root, leaf, stem and flower from greenhouse-grown cotton of two distinct commercial G. hirsutum cv., Fibermax 966 (FM966) and Delta Opal (DO), at 60 days post germination (60 dpg) and analyzed gene expression by quantitative real-time PCR (RT-qPCR). These two cultivars were especially selected because they show contrasting phenotypes against an important worldwide distributed cotton virus disease, the cotton blue disease (CBD). Fibermax is susceptible to CBD, while Delta Opal is resistant.
As shown in Figure 4, transcripts of all 6 DCLs types were identified in G. hirsutum plants. DCL1 was expressed in all analyzed tissues from both cv. FM and DO, as well as DCL2a, DCL3a, and DCL4. DCL2b was not detected in leaves from the DO cv. and flowers of cv. FM, while DCL3b was almost undetectable in stems from FM and leaves from both cultivars. DCL4, which is essential for intracellular antiviral silencing, was expressed at the same levels in FM and DO plants, as well as DCL2a. However, DCL2b showed slightly contrasting basal expression levels between them. The extra DCL3 identified in cotton seemed to be important, especially in flower and root tissues from healthy plants.
DCL gene expression is modulated in response to herbivore attack and virus infection
Plant DCLs initiate the RNAi innate defense system against invading viruses because they recognize and process incoming viral and transposon nucleic acids into small siRNAs of 21, 22, and 24 nts. Thus, we were interested in shedding some light on how cotton DCL expression is modulated during RNA viral infection using virus-resistant and virus-susceptible contrasting cotton cvs.: Delta Opal and FM966, respectively. Cotton leafroll dwarf virus (CLRDV) (genus, Polerovirus; family, Luteoviridae),, which is transmitted only by an aphid vector (Aphis gossypii),, is the causal agent of cotton blue disease . CLRDV is phloem-restricted, and its genome consists of a single-strand, positive-sense, non-polyadenylated RNA (5.8 kb) containing six open reading frames (ORFs) .
As CLRDV is only transmitted by its aphid vector, another important point is to understand the aphid vector component in the DCL modulation. Consequently, we evaluated cotton DCL expression patterns during aphid herbivore attack, mediated by Aphis gossypii and/or CLRDV infection.
To analyze the influence of herbivore attack on DCL expression, 30 dag cotton plants were inoculated in the greenhouse with virus-free aphids. Aphid were restricted to one basal leave (inoculated leave) per plant and twenty-four hours after inoculation, the aphids were eliminated by insecticide application. The expression levels of all cotton DCLs were evaluated in young systemic leaves (3–4 leaves above the inoculated leaves) at 24 hpi, 5, 15, and 25 dpi after contact with the aphids (Figure 5). For CLRDV infection, a similar biological approach was applied using viruliferous aphids harboring CLRDV.
In general, all DCLs showed an induction of their expression levels in the virus-resistant DO plants after aphid contact (Figure 5). When these plants were subjected to the aphid and virus simultaneously, in the case of viruliferous aphid contact, DO DCL levels showed an otherwise repression pattern. In the virus-susceptible cv, the inverse was observed (Figure 6).
Interestingly, it was observed that after the first 24 hpi a systemic modulation of all DCLs mRNAs occurred in both cotton cvs., meaning that aphid feed in inoculated leaves is inducing a systemic response. For FM plants, this systemic response is more pronounced than for DO plants. At 24 hpi, FM plants showed strong down regulation of the three cotton DCLs involved in virus defense, DCL2a, 2b and 4, with reductions of approximately 5, 10 and 15-fold, respectively. DO plants, in contrast, showed a systemic upregulation of these DCLs. This contrasting modulation of DCLs by aphid feeding may predispose the CBD susceptible FM cv. to be more vulnerable against virus infection, as they have less DCL2 and 4 accumulation in the incoming virus tissues. Whereas for DO plants, when the virus is trying to spread from the local infection site, it faces a strong antiviral silencing pre-activated in systemic tissues that present 3–5 higher levels of DCL2a, 2b and 4 than healthy plant tissues. The presence of previously systemic accumulation of these DCL induced distally by aphid feeding can help these plants to block the virus infection cycle establishment in new cells far from local infection site.
DCL1 mRNA expression was also modulated, been induced in systemic leaves of DO cv. 24 hpi and 5 days after aphid contact. At 15 and 25 dpi, DCL1 expression was reduced in both FM and DO cvs. in comparison to healthy control plants (Figure 5).
DCL3a expression was induced by aphid feeding in both DO and FM cvs. at all time points analyzed. Important exceptions were noted for FM at 24 hpi and for DO at 25 dpi, at which time this DCL was downregulated. In contrast, the cotton extra DCL3 (DCL3b) showed strong repression during the first 24 hpi to 5 dpi in both cvs.
When DO and FM plants were infected with viruliferous aphids (Figure 6), cotton DCLs showed a distinct expression pattern in systemic leaves, compared with those inoculated with virus-free aphids, showing that the presence of virus also modulate DCL expression (Figure 6 and Figure S2 - Additional file 2). The fold change analysis between FM CLRDV-infected and FM mock plants (virus-free aphid inoculum) showed that the presence of virus induced an additional systemically down-regulation of almost all their DCLs during the initial stages of virus infection (24 hpi), with the exception of DCL2a. Five days later, however, the levels of all FM DCLs markedly increased, showing almost 60, 10, 47, 17, 28 and 28-fold change variations for DCL1, DCL2a, DCL2b, DCL3a, DCL3b, and DCL4, respectively. In contrast, the DCL transcript levels from DO plants at all time points were lightly induced or repressed, maintaining levels that were very similar to uninfected mock plants. Even with the strong DCL modulation observed in the susceptible plant FM, CLRDV could be easily detected in systemic leaves from 24 hpi, showing that even with such extensive efforts of the antiviral machinery, the virus was replicating and spreading throughout the plant (Figure 6B). This strong modulation of DCL expression was not observed in DO plants, in which DCLs were only slightly induced or reduced up to 25 dpi, and the virus was not detected in any systemic leaves between 24 hpi and 25 dpi (Figure 6B). Absence of virus accumulation in DO plants may explain why DO DCLs were almost no induced at these leaves.
Deep sequencing of viral small RNA from CLRDV FM-infected plants, previously performed by our group at this same time point on systemic leaves, showed that the most abundant viral siRNAs were 22 nt , highlighting the importance of DCL2 in combating the spread and/or replication of the virus in aerial parts far from inoculated leaves. These efforts are insufficient for the blockade of virus infection. Both DCL2s probably participate in the processing/dicing of viral dsRNA for the generation of these second viral siRNAs of 22 nts. However, as DCL2b levels are more than 4 times higher than DCL2a levels at the systemic infected leaf cells 5 dpi, we can hypothesize that DCL2b is most important for 22-nt viral siRNA generation.
Modulation of DCL2 and DCL4 expression by virus/aphid infection at local infection sites
It has been already shown for Arabidopsis that DCL4 is an essential component of intracellular antiviral silencing, whereas both DCL4 and DCL2 are necessary for the inhibition of systemic infection. Our results indicated that DCL4 was almost not modulated by CLRDV infection in aerial parts of virus-resistant cotton during infection, while DCL2a and 2b were downregulated (Figure 6). In susceptible plants, however, DCL4 was strongly downregulated systemically at the beginning of infection. Thus, the next step was to examine how these DCLs were expressed at the infection site. Consequently, we collected samples from inoculated leaves 24 h after virus infection and analyzed DCL mRNA expression profiles by RT-qPCR (Figure 7).
Aphid contact induced DCL2a and DCL4 expression at similar levels in both FM and DO cvs. (Figure 7A). DCL2b in turn was 6 times more induced in DO than in FM plants. Local aphid attack modulation responses seems to be the same in the resistance/susceptibility phenotype for DCL4 as both plants showed similar level of induction of this DCL at the inoculated leaves.
The presence of the virus, in contrast, induces stronger modulations of almost all DCLs. At virus infection sites, DCL2a, DCL2b, and DCL3b were strongly induced (approximately 250, 70, and 100-fold change, respectively) in the resistant cv. at 24 hpi (Figure 7B). The strong DCL2a and 2b induction seemed to be relevant for virus resistance because systemic spread of the virus was completely inhibited in DO plants (Figure 6B). Virus replication is inhibit even in the virus inoculated leaves, as observed by high sensitivity nested RT_PCR for CLRDV detection (Figure 7C). The susceptible cv. FM did not respond to virus presence in the same way as the DO cv., as only a very slight induction of DCL2a and b was observed at 24 hpi in inoculated leaves. In these plants virus accumulation is observed in systemic and inoculated leaves since 24 hpi (Figure 6B and 7C, respectively).
An induction of FM DCL4 6x higher than that of DO DCL4 was also observed. This finding indicated that both susceptible and resistant plants produced more DCL4 to combat virus invasion; however, this overexpression alone did not seem to be sufficient to inhibit the spread of infection, since FM plants that produce more DCL4 than DO are completely susceptible. So, the two cotton DCL2s seem to be very important to avoid virus dissemination and accumulation at inoculation sites while DCL4 may have a secondary paper. However, we cannot say with these results that the strong induction of DCL2 is the responsible for DO CLRDV resistance phenotype as unrelated resistance mechanism may be acting also. Curiously, DCL3a and 3b were highly induced in the FM cv. (11 and 260-fold greater expression than in mock-infected plants and more than 2-fold in DO plants, respectively), while only DCL3b was induced in DO cv. (approximately 100-fold greater expression than in the mock). These results showed that DCL3b seemed to be important during the initial virus defense activation.
Taken together, our results suggested that the contrasting CLRDV susceptibility phenotype demonstrated by FM and DO plants might be related to their distinct DCL modulation mediated both by aphid feed and virus infection.
Corroborating the importance of cotton DCL2 in the polerovirus infection, a profile of the vsiRNAs produced by DCL dicing was obtained for CLRDV and another polerovirus infected cotton plants by deep sequencing. As observed in Figure 7D, in the plants infected with both CLRDV and Cotton anthocyanosis virus (CAV), 22–nt sviRNAs accumulated in major levels than 21-nt sviRNAs showing the relevant paper of cotton DCL2s in the virus defense.
Biotic and abiotic stress-responsive cis-acting regulatory elements are present in the promoters of upland cotton DCL genes.
Promoter sequences 1.5 kb upstream of the translation start of all G. hirsutum DCL genes were obtained from the cotton genome project to attempt to understand how cotton DCLs are modulated by both virus infection and herbivore attack. Transcriptional responsive cis-elements of DCL gene promoters were analyzed using PlantCare. Analysis of the promoter region of all 11 upland cotton DCL genes revealed the presence of various biotic and abiotic stress-responsive cis-acting regulatory elements, including the TCA-element, ERE and ABRE. Light stress-responsive elements were relatively the most abundant in the promoters of the upland cotton DCL genes, specifically Box1, Box4 and GT1-motif (Figure 8 and Additional file 3 Figure S3), indicating that all DCL proteins might have an important functional role in light stress responses. All DCL promoters displayed the development of cis elements, especially HD-Zip1 and 2, and almost all the elements that are responsive to drought (especially MBS), salicylic acid (TCA) and other biotic stresses (as WUN, box-W1, ELI-box3, TC-rich repeats, Box S, JERE, and GCC boxes), revealing possible mechanisms mediated by almost all DCLs in drought tolerance and biotic stress responses in the upland cotton G. hirsutum. Surprisingly, GhDCL4A did not show typical biotic stress or ethylene-responsive cis elements, but they were present in GhDCL4D. In general, there were significant differences in the average proportions of the promoter elements detected within the different DCL gene families, as well as between the same DCLs originating from a distinct parental diploid cotton (Figure 8). However, abiotic stress elements presented the highest average proportions in all DCLs. Phytohormone-responsive elements, especially those associated with ethylene and gibberellin, were found in DCL1, DCL2a and 2b, DCL3b, and DCL4, while those correlated to Me-jasmonate were predominant in DCL2b, DCL3, and DCL4-A. A large number of enhancer elements were found in all DCLs (Additional file 3 Figure S3), suggesting that all the DCLs from the two subgenomes were able to generate transcripts.