3.1. Causative pathogens of mucormycosis
The initial database searching results covered 672 articles with 25 duplications, which had been deleted with exclusions (Figure 1). 631 new patient datas from 272 eligible articles were included in the final analysis (Appendix S2). Merged with the data from 2000 to 2017 without duplications [4], 1078 individual cases all together have been analyzed.
A total of eight major genera (36 specific species and several unidentified species) of Mucorales organisms were identified in 1078 mucormycosis patients (Figure 2). Frequency outcomes showed that Rhizopus spp. (616/1078, 57.14%) was the most common pathogenic species, followed by Mucor spp. (124/1078, 11.5%) and Lichtheimia spp. (105/1078, 9.74%). So far, Rhizopus arrhizus (239/403, 59.31%) and Lichtheimia corymbifera (23/45, 51.11%) was the most frequent strain in Rhizopus spp. and Lichtheimia spp. respectively. Majority of Mucor spp. had not been identified to species level (48/61, 78.69%), which might be partially accounted for the countries with high incidence of mucormycosis lacking in adequate laboratory technologies for their poor sanitation and poverty. The top 10 identified pathogenic strains, in sequence, were Rhizopus arrhizus (28.66%), Rhizopus microspores (8.26%), Lichtheimia corymbifera (5.94%), Cunninghamella bertholletiae (3.53%), Apophysomyces elegans (3.06%), Apophysomyces variabilis (2.6%), Rhizopus homothallicus (2.41%), Saksenaea vasiformis (2.32%), Rhizomucor pusillus (1.76%), Lichtheimia ramosa (1.3%). To our surprise, Mucor racemosus, which had been recognized as one of the medically important Mucorales [20] showed no reported case in our data, as well as Jeong’s result [4]. However, as Prakash has figured out, the true prevalence may be far underestimated in mucormycosis, with the difficulties in collecting specimens and diagnostic tests, especially in some high incidence Asian countries [8].
3.2. Sequence analysis and assemblies of the whole genomes
De novo sequencing of four pathogenic Mucorales isolates B50a, B63a, B66h and B81a (Table 2) were generated by Illumina Novaseq 6000 with PE150 mode. B81a had the longest raw reads as 4.88GB, while B66h had the shortest one as 13.7GB. B50a (8.11GB raw reads) and B63a (11.68GB raw reads) were intermediate in the sequence reads. Correspondingly, B50a had the minimum PacBio reads as 931,044,277 while B63a had the maximum ones as 1,546,237,919. In between, B66h had 1,167,129,942 and B81a had 1,403,589,167 PacBio reads. All four genomes had similar Q30 value of clean reads (92.73–93.97%). For following genome assembly and analysis, we have chosen the entire subreads longer than 500bp. B81a had the largest genome of the four isolates (51.80Mb), while B66h had the smallest one (32.29Mb). B50a (33.99Mb) and B63a (38.99Mb) owned the moderate genome sizes. The overall G+C content of the four isolates entire genome were 37.86% in B50a, 43.44% in B63a, 33.05% in B66h and 35.71% in B81a. We predicted 8,458 genes in B50a, 11,099 genes in B63a, 5,013 genes in B66h and 11,109 genes in B81a, which covered a total of 12,932,300 bp of the genome with a mean length of 1,529bp per gene, total of 16,636,830 bp of the genome with 1499bp per gene, total of 6,975,655bp of the genome with 1392 per gene, and total of 13,553,219bp of the genome with 1,220 per gene, respectively. Moreover, we identified 37 rRNA genes and 242 tRNA genes in B50a, 257 rRNA genes and 254 tRNA genes in B63a, 149 rRNA genes and 300 tRNA genes in B66h, and 40rRNA genes and 299 tRNA genes in B81a.
The whole genome sequencing of four Mucorales isolates have been uploaded to NCBI respectively: SRR16108680: Mucor irregularis (B50a), SRR16108679: Lichtheimia corymbifera (B63a), SRR16108678: Mucor hiemalis (B66h) and SRR16108677: Rhizopus arrhizus (B81a).
Table 2
De novo sequencing results of four pathogenic Mucorales isolates B50a, B63a, B66h and B81a.
| Mucor irregularis | Lichtheimia corymbifera | Mucor hiemalis | Rhizopus arrhizus |
Genome Size: | 33,988,018 | 38,994,095 | 32,290,736 | 51,800,396 |
GC Content(%): | 37.87 | 43.45 | 33.05 | 35.7 |
Gene Number: | 8,457 | 11,099 | 5,013 | 11,109 |
Gene Total Length: | 12,930,096 | 16,636,830 | 6,975,655 | 13,553,220 |
Gene Average Length: | 1,529 | 1,499 | 1,392 | 1,220 |
Gene's GC Content: | 42.25 | 46.04 | 38.21 | 39.2 |
% of Genome (Genes): | 38.04 | 42.66 | 21.6 | 26.16 |
Intergenic region Length: | 21,057,922 | 22,357,265 | 25,315,081 | 38,247,176 |
Intergenic's GC Content: | 35.18 | 41.52 | 31.63 | 34.46 |
% of Genome (Intergenic): | 61.96 | 57.34 | 78.4 | 73.84 |
3.3. Repetitive sequences and genome annotation
Repetitive sequences (RS) have been shown to serve as vehicle to maintain genomic variability and to serve evolutionary change. We identified a total of 6,488 RSs in B50a occupying 2.12% of the global genome, 13,337 RSs in B63a with 3.44%, 6,214 RSs in B66h with 1.98%, and 20,845 RSs in B81a with 20.90%.
The gene sequences of B50a, B63a, B66h and B81a were compared against NR, GO, SWISS-Prot, KEGG and EGGNOG databases. The prediction of gene function from NR revealed 7692 genes which accounted for 90.95% of entire genome in B50a, 10938 genes in B60a with a percentage of 98.55%, 4800 genes in B66h with a percentage of 95.75% and 10689 genes in B81a took a percentage of 96.22%.
With NR protein database, the prediction of gene function from the clusters of orthogroups (KOG) database were 5541 genes in B50a, 6970 genes in B63a, 3847 genes in B66h and 7137 genes in B81a (data not shown).
3.4. Pan-genome and phylogenetic analysis
According to the pathogenic proportions of mucormycosis, we assembled, annotated and analyzed the genomes of (i) 25 species of the Mucorales order, including Rhizopus (Rhizopus arrhizus, Rhizopus microsporus, Rhizopus stolonifer, Rhizopus delemar), Mucor (Mucor circinelloides, Mucor hiemalis, Mucor indicus, Mucor irregularis, Mucor lusitanicus, Mucor racemosus and Mucor velutinosus), Lichtheimia (Lichtheimia corymbifera, Lichtheimia ramosa), Apophysomyces (Apophysomyces elegans, Apophysomyces ossiformis, Apophysomyces trapeziformis, Apophysomyces variabilis), Rhizomucor (Rhizomucor miehei, Rhizomucor pusillus), Cunninghamella (Cunninghamella bertholletiae, Cunninghamella elegans), Syncephalastrum (Syncephalastrum monosporum, Syncephalastrum racemosum), Actinomucor (Actinomucor elegans) and Cokeromyces (Cokeromyces recurvatus), and (ii) 22 isolates of the non-Mucorales genera, including Conidiobolus (Conidiobolus incongruous and Conidiobolus coronatus), Basidiobolus (Basidiobolus heterosporus and Basidiobolus meristosporus), Dermatophyte (Trichophyton rubrum, Trichophyton violaceum, Trichophyton mentagrophytes and Microsporum canis), Candida (Candida albicans, Candida glabrata, Candida parapsilosis, Candida tropicalis, Candida orthopsilosis and Candida metapsilosis), Cryptococcus (Cryptococcus neoformans and Cryptococcus gattii), Malassezia (Malassezia furfur, Malassezia dermatis and Malassezia globosa), and Aspergillus (Aspergillus niger, Aspergillus flavus and Aspergillus fumigatus) to identify both of common and taxa-specific genetic elements in Mucorales and pathogenic non-Mucorales fungi (Table S2).
Pan-genome analysis was conducted to figure out Mucorales-specific genes, as well as in different Mucorales species to find out Mucorales genus-specific genes. One comparison of all 47 fungal genomes indicated that Mucorales species have 50 specific orthogroups which appear discrepant expression levels (Figure 3). Mucorales-specific orthogroups showed distinct homology to Non-Redundant (NR) Protein Sequence Database (Table 3). In sequence, OG0003655 had 96.9% similarity with EPB92269.1 as the STE/STE20 protein kinase, OG0002531 had 83% similarity with GAN03371.1 as TRM2 tRNA methyltransferase 2 homolog A, OG0000774 had 81.2% similarity with GAN00621.1 as pkinase-domain-containing protein, OG0004491 had 78.3% similarity with GAN01313.1 as transmembrane and coiled-coil domain-containing protein 3, and OG0002100 had 76.7% similarity with OAD09236.1 as glycoside hydrolase family 36 protein. Another comparison in the 9 Mucorales genera, genomes shared 1157 common orthogroups (Figure 4). Cunninghamella spp. owned the most specific orthogroups with 741 ones, followed by Lichtheimia spp. (333 orthogroups). Instead, Rhizopus spp. had 22 specific orthogroups and Mucor spp. had none. Based on the KOG database, the putative proteins were functionally classified into 22 molecular families covering cellular structure, biochemistry metabolism, molecular processing, and signal transduction (Figure 5). Orthogroups functioned more as signal transduction mechanisms in Apophysomyces spp. at 22.34% (42/188). The common orthogroups of Cunninghamella spp. gathered more in posttranslational modification, protein turnover, chaperones at 14.18% (37/261), as well as that of Lichtheimia spp. at 26.80% (26/97). Intracellular trafficking, secretion, and vesicular transport appeared to be main function in the common orthogroups of Syncephalastrum species.
Table 3
Information of the Mucorales-specific orthogroups compared to NR database.
Orthogroup | NR tophit name | NR tophit description | NR top Similarity |
OG0003655 | EPB92269.1 | STE/STE20 protein kinase [Mucor circinelloides f. circinelloides 1006PhL] | 96.9% |
OG0002531 | GAN03371.1 | TRM2 tRNA methyltransferase 2 homolog A [Mucor ambiguus] | 83% |
OG0000774 | GAN00621.1 | pkinase-domain-containing protein [Mucor ambiguus] | 81.2% |
OG0004491 | GAN01313.1 | transmembrane and coiled-coil domain-containing protein 3 [Mucor ambiguus] | 78.3% |
OG0002100 | OAD09236.1 | glycoside hydrolase family 36 protein [Mucor circinelloides f. lusitanicus CBS 277.49] | 76.7% |
OG0001006 | GAN03654.1 | sel1 repeat protein [Mucor ambiguus] | 69.1% |
OG0003562 | PHZ17618.1 | Quino protein alcohol dehydrogenase-like protein [Rhizopus microsporus ATCC 52813] | 67.5% |
OG0002562 | GAN07947.1 | small GTPase rabE [Mucor ambiguus] | 58.6% |
OG0002218 | OBZ90755.1 | Centrosomal protein [Choanephora cucurbitarum] | 51.1% |
OG0002849 | OBZ88254.1 | DNA-dependent protein kinase catalytic subunit [Choanephora cucurbitarum] | 51% |
Based on the pan-genome analysis data, a maximum likelihood-based phylogenetic tree was constructed in 25 Mucorales fungi and 18 non-Mucorales fungi, with 4 Entomopthorales fungi as outgroups (Figure 6). We confirmed that Mucorales species were far away from non-Mucorales species. In addition, we noticed that the M. irregularis (B50a) and M. hiemalis (B66h) were clustered in one clade, sharing a very close relationship. Cokeromyces recurvatus and Actinomucor elegans were phylogenetically related to Mucor spp. within one phylogenetic cluster.
3.5. Expressions of proteins related to the virulence
To verify the reported proteins related to the virulence of Mucorales species, the copies of these proteins were compared in Mucorales species and other pathogenic fungi. Despite that arf and aqp1 showed no difference, atf, gcn4, igp1, pps1, mcmyo5, mcplD, 112092, hmgR, cotH3 and fob had significant difference while ico1 varied slightly between the two groups (Figure 7). For further knowledge of the unequal pathogenicity of the Mucorales species, we listed the expression of proteins related to the virulence in all the fungi we have referred above (Figure 8). Few expressions of 112092, cotH3, gcn4 and igp1 was detected in non-Mucorales species, while almost all the Mucorales species expressed in different levels. Although the protein expression of fob and hmgR exhibited significant difference between Mucorales species and other pathogenic fungi, both were not as high as other proteins. Compared to the non-Mucorales species, Mucorales species displayed abundant but uneven expression levels of pps1, atf, mcplD and mcmyo5. Gene pps1 expressed mainly in Rhizopus stolonifer, Rhizopus delemar and Mucor irregularis; atf had high expression in Mucor racemosus, Rhizopus delemar, Mucor lusitanicus, Mucor circinelloides and Rhizopus arrhizus; mcplD expressed more in Rhizopus stolonifer, Apophysomyces elegans and Mucor racemosus; mcmyo5 had multiple expressions in Rhizopus stolonifer, Mucor racemosus and Rhizopus delemar.