Domain structures of characterized Dicers and their distribution across the taxonomy
There were limited numbers of well characterized Dicers in fungal kingdom. Taking model fungi Neurospora crassa and Magnaporthe oryzae for instances, they both encoded two redundant Dicers with typical structures of two separate RNase III domains and at least one double-stranded RNA-binding domain [42, 43], aforementioned as “canonical” Dicers. Despite the presence of predicted Argonaute and/or RDRP-encoding genes (RNA-dependent RNA polymerase), a number of species like Saccharomyces castellii , Candida albicans , and Kluyveromyces polysporus  lacking of canonical Dicers kept single non-canonical Dicer for dsRNA processing instead. These non-canonical Dicers shared identical domain profile of one RNase III domain and two dsRBDs.
To address the existing of RNase III family including canonical and non-canonical Dicers and their relationship, a total number of 427 RNase III domain-containing protein sequences were retrieved from the proteome sequences from 83 species covering 12 bacteria, three Oomycetes, and 68 fungi. Canonical Dicers, non-canonical Dicers and NCDLs shared essential domains but varied in the numbers and configuration resulting in differentiated functions. All types of proteins were of different composition of three domains, i.e. RNase III domain, a RBD preceding RNase III domain normally found in canonical Dicers (Dicer dimerisation domain, IPR005034) and RBDs following RNase III domain (IPR014720). Based on this observation, all RNase III-containing proteins were divided into eight types of three subgroups (Figure 1A): canonical Dicers (TYPE1 and 2), non-canonical Dicers (TYPE3) and non-canonical Dicer-like proteins (TYPE4 to 8). To discuss them across the kingdoms, additional types exclusively found in five model species of animals and plants were also included (TYPE9-10, Supplementary Figure S1).
Canonical Dicers were subdivided into two types depending upon the presence of additional dsRBD at C-terminus (as TYPE1 and TYPE2), and non-canonical Dicers were assigned as TYPE3. These eight types of proteins distributed interestingly in a taxon-specific manner that Dicers and NCDLs always presented together in most species each phylum or subphylum had a relatively fixate combinations of different types of Dicers and NCDLs (Figure 1B).
Clearly, no canonical Dicers appeared in Bacteria, the simplest forms of TYPE4 and 8 proteins processed all dsRNA. Canonical Dicers stared to present with NCDLs in low eukaryotes. TYPE1 much outnumbered of TYPE2 and presented in the majority of fungal species and Oomycota, while TYPE2 appeared only in Pezizomycotina (Figure 1B). Canonical Dicers (TYPE1 and 2) were completely absent in Saccharomycotina, while Ustilaginomycotina and Microsporidia with only canonical Dicer. The interesting parallel existing of TYPE1 and TYPE 2 Dicers was only found in Ascomycetes, mostly Sordariomycetes (12/15), meanwhile partial species in other class Leotiomycetes (1/2), Dothideomycetes (2/4) and Eurotiomycetes (2/10) obtained both types of canonical Dicers. It was not clear how TYPE2 differed from TYPE1 in its function and why necessary it was to keep both, TYPE2 were not simply duplication of TYPE1. Although Neurospora crassa encoded both types of Dicers as functional redundancy , Magnaporthe oryzae Dicers showed clear diversification in function and structure as well [43, 44]. There were also three species obtained only TYPE2 Dicers: Verticillium albo-atrumVaMs.1021, Botrytis cinerea and Mycosphaerella graminicola. Budding yeast seemed completely lost the RNAi pathway, TYPE3 instead solely presented in four Saccharomyces with validated in RNA processing function, including previously characterized non-canonical Dicers ScaDcr 1 (TYPE3) in Saccharomyces castellii and CaDcr1 in Candida albicans SC5314. Two additional proteins as TYPE3 structure in Pichia stipites and Kluyveromyces polysporus were also included in our data set. Furthermore, non-canonical Dicer was also reported in Entamoeba histolytica , showing a broad existing of such protein.
Among all NCDLs, TYPE4 and TYPE8 were very commonly in most species neglecting the taxa. TYPE4 could found in all species except for one bacterium, one Oomycete and seven fungi species. TYPE8 as the simplest form of all, was absent in most Bacteria (11 out of 13) and 7 fungi species somehow. It was obvious that TYPE5, 6 and 7 were rare: Type5 presented only in two Oomycetes and seven fungi; TYPE6 found in seven fungi and TYPE7 were in barely three. All these types existed in the five animal and plant species, with a few proteins that do not fit into any types described.
Phylogenetic analysis on RNase III domain
The relationship of RNase III domains showed in phylogenetic tree constructed by all RIIIDs despite of protein types. There was a quite clear distinction: RIIIDs from canonical Dicers separated from the ones from NCDLs (Figure 2. Clade A and B vs Clade C and D); RIIIDs in the same position were quite similar for the first RIIIDs were more similar to each other than the second RIIIDs. The first RIIIDs from canonical Dicers (S1 and S3) were exclusively grouped in the vicinity of each other in Clade A, as well as S2 and S4 (the second RIIIDs in the canonical Dicers) in Clade B. None of S1-4 (canonical Dicer RIIIDs) was found in the other half of the tree, where S6 and S10 were prevalent. Meanwhile, the S5 from non-canonical dicers scattered mostly in Clade C with S6 rather than clustered independently, showing its closer relationship with non-canonical RIIIDs. Most S8 and S9 from TYPE6 NCDLs located with S1/3 in Clade A and S2/4 in Clade B separately (10 out of 13), while S10 and S11 from TYPE7 NCDLs were all located with S6. As for RIIIDs in NCDLs, S6 and S12 formed two major clades of NCDLs individually with slightly mixed. Most S6 RIIIDs formed an S6-dominant clade (Clade C) harboring with a few of S5, S7, S10 and a small group of S12. There were some S6 RIIIDs interleaved with the majority of S12 in Clade D. Shown in the phylogenetic tree, S1/3/8 and S2/4/9 were close to each other, respectively, which was in accordance with their position in the protein sequences, i.e. S1/3/8 were closer to the N-terminus and S2/4/9 were near the C-terminus. The topology implied that RIIIDs from Dicers might have evolved from RIIIDs from NCDLs ancestors and obtained the function of canonical and/or non-canonical Dicers.
Furthermore, two regions showed high variation between the conserved sites were identified in canonical Dicer RIIIDs without proof of affection of catalytic ability: one conserved region in the N-terminal of S1/3 RIIIDs and the one clear insertion in the middle of S2/4 RIIIDs (Figure 3, green boxes). There was a short pattern of high converseness presented in 28 S1/3 but not in the rests comparing to S2/4. This pattern was mostly in Ascomycetes, not in Basidiomycetes. The other region was a relatively long insertion presented in 45 S2/4 RIIIDs from 40 Ascomycetes and Basidiomycetes. This insertion were less conserve, and mostly in TYPE2 (14 out of 18) had insertion while 31 out of 99 S2 (TYPE1) showing a bias between protein types. Interestingly, none of the sequences obtained S13 pattern in first RIIIDs (S1/3) and second RIIID insertions (S2/4) at the same time (Supplementary Figure S2). Formerly reported that Dicer sequence variation was related with function that a long insertion in Human Drosha RIIIDa that worked as a “ruler” comparing bacterial RIIIDs . Such insertion was not the same insertion we described in here, may duel to that fungal RIIIDs were similar to human Drosha RIIIDa than bacterial RIIIDs and Drosha did not exist in fungi or plant.
Phylogenetic analysis on RBD domain
There were two types of dsRNA binding domains among all RNase III domain-containing proteins: IPR005034 which was always in front of RIIIDs (assigned as A in Figure 4A) and the ones following RIIIDs were IPR014720 (Assigned as B in Figure 5A). Phylogenetic trees of IPR005034 and IPR014720 were respectively constructed in order to find their relationships among Dicers and NCDLs. Among RBDs preceding RIIIDs (IPR0005034) in TYPE1-2 and TYPE5, most A2s (10 out of 18) gathered in one subclade and the rests scattered among A1s with four A2 RBDs formed a tiny subclade (Figure 4B). Only one A3 mixed with A2, while the rests gathered with A1s. In RBDs following the RIIIDs, B4s (RBDs in TYPE 4) were the most frequently found. All of B1s located at the outermost clade, while other RDBs (B2, B3, and B5) were interspersed among the B4 RBDs. As for B2s and B3s double RBDs in non-canonical Dicers, they were similar but not duplication, like the double RIIIDs in canonical Dicers. B5s from TYPE7 were mixed with in B4s and far away from B1 to B3 RBDs. Based on the topology of the C tree, there might be possibility either that B1 was the ancestral form of B1-3 or that B1 was very different from other RDBs considering they obtained by canonical Dicers (Figure 4C).
Structural comparison of non-canonical Dicers
In order to find structural converseness of four non-canonical Dicers, their protein sequences were subjected to SWISS-MODEL (https://swissmodel.expasy.org/interactive) for 3D-structures  and then visualized by PyMol . Each RIIID and the adjacent dsRBD were selected for prediction, and the best model of highest GMQE with acceptable QMEAN (no less than -4.0) were picked. Results showed conserved structures among the predicted regions by comparing their structure pairwise (Figure 5A), most of the helices shared by two or more proteins when superimposed together (Figure 5B). In Pichia stipitis genome which lacked of Argonaute and RdRP, putative Dicer shared structural conserveness with the other three characterized non-canonical Dicers. For the great importance of P. stipitis in industrial aspects, more studies deciphering its RNAi mechanism may be required. In addition, the 3D structure of Rnt1 (a TYPE4 protein) in S. cerevisiae was also modeled by the same approach, showing a largest overlap with DCR1 in K. polysporus and P. stipitis (Figure 5C). It suggested that TYPE4 might be an intermediate or building block for a non-canonical Dicer.
A conserved motif at the C-terminus of canonical Dicers
In Schizosaccharomyces pombe, a C-terminal domain including a zinc-binding motif was responsible for nuclear retention of Dcr1, a canonical Dicer . Such CHCC motifs were identified from the C-terminal region following the second RNase III domain of canonical Dicers and NCDLs in different proportion. Among the 116 canonical Dicers, CHCC motif presented in 58 out of 98 TYPE1 and 4 out of 18 TYPE2 canonical Dicers. Meanwhile, four TYPE6, one TYPE4 and one TYPE5 were found with CHCC motif as well. A number of pathogenic fungi to animals or plants had Dicer(s) with CHCC motif at the C-terminus, as previously reported. Here we found this motif existed in all fungal species with canonical Dicers except for Oomycetes and Bacteria, involved saprotrophic fungi like Neosartorya fischeri, Neurospora crassa, Trichoderma virens, S. octosporus, and Podospora anserina (Supplementary Table S1).
Both Dicers in Neurospora crassa obtained different features: NCU06766T0 with insertion in S4 RIIID and NCU08270T0 with S13 conserved pattern in S1 RIIID and CHCC motif. Magnaporthe oryzae Dicers had the very same situation of different features: DCL2 or MDL-2 that produce siRNA  had insertion in the second RIIID while DCL1 possessed S13 pattern and CHCC at the same time.
Interestingly, there was an incompatible relationship between CHCC motif and the S2/4 insertion: the sequences with CHCC motifs did not possessed insertions in S2/4 RIIIDs or vice versa. The 68 sequences with CHCC motif largely overlapped with 28 inserted S1/3 RIIIDs, for 26 RIIIDs obtained both features. Above all, there were seven canonical Dicers obtained none of the features mentioned here, and all of them were TYPE1 proteins (Supplementary Figure S2).
Evolution of RNase III domain-containing proteins
Considering a parsimonious way of evolution, we assumed that the earliest ancestors were of a simple structure with single RIIID, or TYPE 8 type proteins. Based on the phylogenetic tree of RIIIDs, S12 or TYPE8 might have diverged into the other types following the respective course of evolution. As acquisition of additional domain(s), some of RNase III domain-containing proteins gained their relative functions and eventually evolved into Dicers. The steps of acquisition and function might not happen at random. Furthermore, Dicers (canonical and non-canonical ones) and Dicer-like proteins should have their own ancestors separately. We speculated the direct ancestor of canonical Dicers should be TYPE4 like proteins.
From Bacteria kingdom, TYPE8-like ancestor firstly got one RNA-binding domain and became Ribonuclease, like E.coli Ribonuclease and yeast Rnt1 that not necessarily RNA silencing function but dsRNA processing. TYPE4 proteins remained in all Bacterial genomes instead of TYPE8, while most eukaryotes genomes processed both TYPE4 and TYPE8 proteins. For unclear reasons, they are very necessary for only very rare cases missed both types at the same time. For instance, Puccinia graminis possessed one TYPE6 as the only NCDL alongside six TYPE1 Dicers; meanwhile Nosema ceranae only obtained one TYPE1 without NCDLs. TYPE4 outside Bacteria diverged a little bit in their architecture. The distance between RBD and RIIIDs in bacteria were very shortly or merely next to each other, and later increased and became more common in fungi. Such distances was not strict, but might reflect a certain tendency of how Dicers evolved.
By its look, TYPE3 proteins seemed like coming from TYPE4 by gaining an extra RBD (Figure 6A), and only existed in the budding yeast species. Taking together the existing of non-canonical Dicers and Argonautes in some Saccharomycotina genomes, TYPE3 proteins here possibly work as substitution of canonical Dicers thusly Saccharomycotina may obtain partial RNA silencing ability.
In eukaryotes, two signatures marked the differences between canonical Dicer and NCDLs: double RNase III domains and Dicer dimerisation domain. If we suppose the additional RIIID was added in two possible steps depending on the order of gaining domains: firstly evolving into double a RIIIDs structure either like TYPE6 (Figure 6B), or through TYPE5 by gaining Dicer dimerisation domain first (Figure 6C). The RIIID tree indicated most S8 and S9 RIIIDs were already very similar to S1/3 and S2/4 respectively. Interestingly, the S10 RIIIDs in TYPE5 were very similar to S12 RIIID while their Dicer dimerisation domains were similar to the ones in canonical Dicers. So far, TYPE5 became an important middle-type during canonical Dicers evolution. Till these stages, neither TYPE5 nor 6 proteins became functional and left in the genomes as very rare cases. Another supported was that S7 from TYPE5 were all closer to S6 in TYPE4 than RIIIDs in canonical Dicers indicating RIIID duplication. TYPE5 proteins were relatively rare, only 13 sequences somehow in a variety of Taxa including one Oomycete, one Zygomycete, two Pezizomycetes, three Pucciniomycetes and six Agaricomycetes. After that, NCDLs gained the second RIIIDs and then RBSs became canonical Dicers. There was no middle type of one RIIID, one Dicer dimerisation domain and one following RBD, really suggested that TYPE2 structure should came from TYPE1 in higher kingdom of plant and animal (Figure 6, red box). It is still not solid to determent that canonical Dicers and NCDLs evolved through which way with current proves. Never the less, by logical, TYPE4 to TYPE6 direction would be easier to explain the evolution paths of all types. There were other protein types found in plant and animal that could also be included in these middle types and paths.
In conclusion, we identified classified the RNase III-containing proteins in fungi by domain structures. The different types reflected their function as well as their occurrence among the taxa. There were regions of sequence variations between two RIIIDs of canonical Dicers correlated with CHCC motif. By RIIID and RBD relationships, possible evolution paths included all protein types were indicated. From the simplest form (TYPE8) there were two distinguish directions: canonical Dicers (TYPE1, and TYPE2), non-canonical Dicers (TYPE3) and Dicer-like proteins (TYPE4 to 8). They consistently accumulated the essential domains by different strategy in each direction. And different directions have different evolution outcomes with different functions, and build up the complex and diverse RNase III family including canonical Dicer.