Rhizophagus Proliferus Genome Sequence Reiterates Conservation of Genetic Traits in AM Fungi, but Predicts Putative Higher Saprotrophic Activity

Unavailability of the genome sequences of several species of arbuscular mycorrhizal (AM) fungi limits the opportunities for optimizing these biofertilizer species for agricultural benets. The present work comprises the rst draft of the genome sequence of Rhizophagus proliferus, which is an important AM species present in biofertilizer consortia for agricultural purpose. The estimated genome size of Rhizophagus proliferus is ~ 110 Mbps and the created genomic assembly using the paired-end Illumina reads is 94.35% complete. Genome mining was carried out to identify putative gene families important for biological functions. A total of #22,526 protein-coding genes were estimated in the genome, with an abundance of kinases and reduced number of glycoside hydrolases as compared to other fungal classes. A striking nding in the R. proliferus genome was a higher number of carbohydrate esterases (CE), which may suggest towards presence of higher saprotrophic activity in this species as compared to the previously reported AM fungi. The genome sequence and annotation of R. proliferus presented here would serve as an important reference for functional genomics studies required for developing biofertilizer formulations in future. In addition, the ndings from this work may also prove important in deciphering molecular mechanisms in AM fungi that govern the host-specic interaction and associated agriculture benets.


Introduction
AM fungi have approximately 350 to 1000 molecularly de ned species under the Division Glomeromycota. These fungi are obligate biotrophs and complete their life-cycle by developing symbiotic relationship with a host plant (Smith and Read 2008). AM fungi have mutualistic association with more than 80% of terrestrial plants (Schüßler et al., 2001;Kivlin et al., 2011) and bene t them by improving nutrient and water uptake e ciency, and also resistance to various abiotic and biotic stresses (Jung et al., 2012). These fungi are endosymbionts and form highly branchedhyphal structures called arbuscules inside the plant cortical cells. The arbuscules deliver mineral nutrients to the cortical cells and also function in carbon acquisition from the host. Mechanisms underlying obligate symbiotic relationship of AM fungi with plant are not completely understood.
Knowledge regarding genome organization, genetic functions and reproductive mechanism of several species of AM fungi is not available at present.
Lack of genetic information about gene repertoires of AM fungi poses di culty in their optimization for crop improvement. Therefore, in order to unravel the genetics of AM fungi, researchers have started exploring genome sequence of different species of AM fungi. Information on gene repertoires of some important species of AM fungi, namely, Rhizophagus irregularis (Tisserant et al., 2013), Rhizophagus clarus (Kobayashi et al., 2018), Diversispora epigaea (Sun et al., 2018), Gigaspora rosea, Rhizophagus diaphanous, Rhizophagus cerebriforme (Morin et al., 2019) and Gigaspora margarita (Venice et al., 2020) is now available in the public domain. The most striking ndings of these projects include absence of some core eukaryotic genes and presence of a putative sexual reproductive mechanism in AM fungi. An interesting nding reported by researchers involved in understanding the molecular exchanges between AM fungi and the host plant during mycorrhizal interaction is that plant synthesized lipids are imported by AM fungi (Jiang et al., 2017). In agreement with this nding, absence of the multi-domain fatty acid synthase FAS I gene has also been described in the genomes of AM fungi (Tisserant et  In the present work, we performed de novo genome sequencing and genome annotation of an important AM fungal species, namely, Rhizophagus proliferus, and compared the genetic structure with previously reported AM fungi, Ecto-mycorrhizal (EM) fungi and a pathogenic ascomycetous fungus. These included Rhizophagus irregularis (Tisserant et al., 2013) (Ma et al., 2009) as a pathogenic ascomycetous fungus. R. proliferus (previously known as Glomus proliferum), is morphologically different from the model AM fungal species R. irregularis. R. proliferus was rst described by Declerck et al., (2000) to possess distinguishing characteristics of spores, such as, small size, hyaline color, smooth wall surface, permanent four-layered spore wall structure, and long hyphae that produced clusters of spores containing several hundred individuals. Most noticeably, anastomoses between hyphae and retraction septa, which are peculiar traits for spore germination in the absence of a host (Logi et al., 1998), were frequently observed in R. proliferus. The isolate of R. proliferus sequenced in this study has been found to provide agricultural bene ts to several crops.

Aim, design and setting of the study
The research work reported here was undertaken to understand the genome structure and function of R. proliferus. Genome sequencing was done using Illumina's next-generation sequencing method. A de novo assembly was created using the sequenced reads after Quality control. This was followed by in silico prediction of gene repertoires in R. proliferus followed by their annotation and estimation for important protein families. An additional exploration to identify presence of core eukaryotic genes in R. proliferus was also carried out.

Description of materials
Fungal isolate and DNA extraction The isolate AM-1901 of R. proliferus, from the Centre for Mycorrhizal Culture Collection (CMCC) of The Energy and Resources Institute (TERI), India, was used for genome sequencing in this study. In order to investigate the morphological features of the species microscopic analysis after PVLG and Melzer's staining of spores were carried out using a compound microscope (Carl Zeiss primostar) and a previously published protocol (Błaszkowski et al., 2014). Molecular identi cation of R. proliferus was carried out by using SSUmAf-LSUmAr and the SSUmCf-LSUmBr primer pairs in a nested PCR as described by Krüger et al., (2009). Spores were produced in mono-axenic cultures that were maintained on Agrobacterium rhizogenes-transformed roots of carrot (Daucus carota, Clone GP1). A total of 150,000 sterile spores were collected and high molecular weight (HMW) genomic DNA was extracted.
2.3 Genome Sequencing, assembly and annotation DNA was fragmented and library was constructed using the Nextera DNA Library Prep protocol. Sequencing (2x150 bp paired-end sequencing) was performed using the services of a commercial service provider (AgriGenome Labs Pvt. Ltd., Kerala, India) on a HiSeq 2500 sequencing platform. Quality control of the DNA library was done by analysis on an Agilent 2000 Bioanalyzer. Preprocessing of reads was carried out (adapter trimming and Q > 20) using AdapterRemoval version 2.2.0. Homology search of preprocessed reads were done using Blastn suite against bacterial database and unaligned reads were considered. The mitochondrial sequences were removed from the bacterial unaligned reads by comparing the reads with the NCBI database mitochondrion.1.1.genomic.fna.gz. The ltered sequences were assembled into scaffolds by executing De novo assembly method Spades version 3.12.0 (Bankevich et al., 2012). All scaffolds with length < 1000 bp were excluded in the nal assembly. The scaffold sequences were also subjected to homology search in the NCBI nucleotide database to remove non-fungal contamination from carrot-root DNA sequences, all the scaffolds with Identity > 90%, Query-coverage > 75%, GC content > 50% and from non-fungal origin were excluded from the assembly. The repeat sequences were masked using REPEATMASKER version 4.0.7 (http://www.repeatmasker.org/) and the total survived scaffolds were considered for downstream analysis. The completeness of the R. proliferus draft genome assembly was searched against the core eukaryotic genes present in CEGMA (Parra et al., 2007) to evaluate genome completion. tRNAs were identi ed using tRNAscan-SE version 1.

Morphological and molecular characterization
The morphological details of the isolate of R. proliferus sequenced in this study are presented in Fig. 1. The spores were small in size with diameter ranging between 65 to 125 µm at different stages of life-cycle and had three distinct wall layers. The spores were observed to have hyaline color, smooth wall surface, and had long hyphae that held a bunch of spores. Sequencing and phylogenetic analyses of SSU-ITS-LSU nrDNA sequences and morphological studies of spores con rmed the species under investigation to be R. proliferus.
3.2 Genome sequencing, assembly and structure 15 Mio reads and 7.8 Gb of primary sequences were received from the whole genome sequencing project. After quality control, the raw sequences were assembled into #12,903 scaffolds with an assembly size of ~ 102.4Mbps and average GC content 27.99%. N 50 scaffolds and L 50 values were #2126 and #13,544bp respectively (Table 1). The assembled R. proliferus genome was found #94.35% complete by CEGMA (Table S1). The genomic assembly statistics of Rhizophagus proliferus, in comparison to previously reported AM fungi: Rhizophagus irregularis, Rhizophagus cerebriforme, Rhizophagus diaphanous, Gigaspora rosea, Gigaspora margarita, EM fungi: Tuber melanosporum, Laccaria bicolor; and pathogenic ascomycetous fungus: Rhizophus oryzea are presented in Table 1.

Genome annotation
A total of #22,526 protein-coding genes were estimated by specifying Saccharomyces cerevisiae as the model species in AUGUSTUS version 3.1.0. #187 tRNAs genes were predicted (Table S2). #15,087 proteins shared homology with the NCBI nr database. #3,988 genes that were identi ed by InterProScan search were found to be distributed in #52 different domains (Table S3). Two predicted domains were unique to R. proliferus: PPMtype phosphatase and PTP-type proteins phosphatase, both of which have been found to in uence, signal transduction and cell cycle. The terms protein phosphorylation, nitrogen compound metabolic process, cell communication, and signal transduction were frequent GO Biological functions. Hydrolase, transferase and proteins involved in binding of different types of compounds and molecules were among the top 10 terms under GO molecular functions (Fig. 2a, 2b and 2c).
A total of #4569 genes were assigned to #321 KEGG pathways in R. proliferus (Table S4) (Table 2). Furthermore alpha protein kinases were predicted in R. proliferus, which are similar to other AM fungi and are reported to be absent in ectomycorrhizal and pathogenic fungal species.  For genes involved in sexual reproduction, a total of #89 HMG (high mobility group) box containing genes (Table S6) and #47 meiosis-related genes (Table S7) (Table S8) and #19 genes from the set of "missing ascomycetes core genes (MACGs)" were identi ed. Furthermore, many important CUG also including the fatty acid synthase (FAS) gene were not found. Table S8 presents a comparative status of presence of these genes in other fungi, which were included for comparison in this study.

Discussion
AM fungi constitute an important group of fungi for sustainable agriculture bene ts; however, the genome sequences and gene repertoires of most of the AM species are not yet explored. The information on genetic structure of these fungi could provide important information about molecular mechanisms underlying the host-speci c interaction with different species of crop plants and associated agriculture bene ts (Prasad et al., 2019). For majority of fungal classes and species, the information regarding their genetic structure and function has commonly been acquired by comparative studies with the genomes of model species belonging to Ascomycota and Basidiomycota. However, it is di cult to achieve understanding about the genetic architecture of AM fungi by similar comparisons as Ascomycota and Basidiomycota are only distantly related with Glomeromycota and extensive divergence between them over the long evolutionary period has occurred (Sanders and Croll 2010). With such a background, the exceptional identi cations regarding the lack of many genes constituting the basic machineries for eukaryotic metabolic pathways in Glomeromycota, expansion of kinome and reduction of CAZymes are being cautiously probed.
The investigation reported here provided rst draft of the genome sequence and genome annotation of R. proliferus, which is one of the important species of AM fungi known to provide bene ts to multiple crops. The estimated size of genome is ~ 110 Mbps, which is the smallest of all the reported AM fungi till date. Like the previously reported AM fungi, conservation with respect to fewer carbohydrate active enzymes and higher number of protein kinases was predicted in R. proliferus in comparison to EM and Ascomycetes fungi. High proportion of protein classes representing "establishment of localization" and "signal transduction" proteins were seen in GO classi cation in R. proliferus. Genes coding for "establishment of localization" could be crucially involved in the development of plant-microbial interactions in a symbiotic association. processes that are involved in establishment of symbiotic interaction between AM fungi and plant. Interestingly, TKL-containing proteins have been observed to over-express in germinating spores and intraradical mycelium in R. irregularis (Tisserant et al., 2013). Conservation of alpha protein kinases, which is an ancient class of protein (Drennan and Ryazanov 2004), in R. proliferus and other AM fungi unlike the other fungal groups, may either indicate ine ciency of AM fungi to expel the genetic load through sexual reproduction or a strong conservation of the molecular mechanisms supportive of lifestyle of AM fungi.
The reduced presence of Glycoside hydrolases found in R. proliferus #24 in comparison to other fungal division was in conformity with the other species of AM fungi. Expansins (EXPN) and Polysaccharide Lyases (PL) were absent in R. proliferus genome similar to the previous reports in AM species. Expansins (EXPN) functions in cell wall loosening and help the accommodation process of the fungus inside the cortical cells (Cosgrove et al., 2002). Expansins of fungal origin are supposed to function in the loosening of interfacial material loose (Balestrini et al., 2005).
Polysaccharides lyases (PL) play a role in degradation of pectin layers of wood (Kristiina and Miia 2018). These observations in AM fungi, unlike the EM and the pathogenic fungi, has been proposed as "functional tradeoffs" in an obligate symbiont for achieving a stealth entry and colonization into root while evading plant immune response (Tisserant et al., 2012). In contrast to the previous reports in AM fungi (Tisserant et  A widespread notion of the absence of sexual recombination in AM fungi was challenged by the contrasting observations made in the whole genome analysis of R. irregularis (Lin et al., 2014). An exploration of the sexual potential of R. irregularis identi ed a putative AM fungi mating-type locus with prominent similarities to the mating-type locus of Basidiomycota (Ropars et al., 2016). In addition, 76 HMG (high mobility group) box containing genes were identi ed in R. irregularis (Riley et al., 2014). Also, in G. rosea #48 meiosis-related genes were found (Tang et al., 2016). In agreement with these ndings, #89 HMG (high mobility group) box containing genes and #47 meiosis-related genes were identi ed in R. proliferus. Such a conservation of meiosis-related genes re-emphasized existence of a yet unknown sexual reproduction mechanism in Glomeromycotan fungi and particularly in R. proliferus.
In Thiamine is a cofactor for enzyme complexes involved in the citric acid cycle, pyruvate dehydrogenase and α-ketoglutarate dehydrogenase, and therefore it is an essential constituent of all cells. The biosynthetic pathway for thiamine has been reported missing in AM fungi. In congruence with the previous reports, thiamine biosynthetic pathway genes were not predicted in R. proliferus.
Proteins, uridine permease, uracil permease and dihydroorotate dehydrogenase support uracil metabolism, transport and maintain the intracellular level of uracil. Tight control of the intracellular uracil has been suggested important to reduce the rate of uracil incorporation into DNA (Sun et al., 2013). Dihydroorotate dehydrogenase (DHODH; EC 1.3.99.11), which is the fourth enzyme of the pyrimidine de novo biosynthesis pathway, was the only gene from the pathway that was present in both the R. irregularis and R.proliferus genomes. Genes for glutamate metabolism and glutathione metabolism were predicted in R. proliferus, which indicated for its potential for the metabolism of nucleic acids and proteins (Yelamanchi et al., 2016) and detoxi cation of xenobiotics and the oxidative stress response (Shen et al., 2015) respectively, similar to other AM fungi. Transporters and channels for potassium transport from the soil to the host by the AM fungi are still not completely deciphered in AM fungi.
Seven sequences from an EST library of R. irregularis were annotated as K + transport systems (Casieri et al., 2013), which coded for SKC-type channels and KT/KUP/HAK transporter. Noticeably, no Trk and TOK members were identi ed in either the EST library or the sequenced nuclear genome (http://genome.jgi.doe.gov/Gloin1/Gloin1.home.html). In congruence with the previous reports, no homologue gene for the yeast TOK1 was identi ed in R. proliferus. However, a Trk-type K + transport system in R. proliferus was predicted. Probable ferric reductase transmembrane component 8, which is expected to function in the assimilation of iron, was identi ed only in R. proliferus by the conserved domain analysis and comparison with the proteins coded by yeast. In our comparative analysis, the absence of several CUG in R. proliferus was mostly in con rmation with the previously reported AM fungi, which suggested high conservation in genetic features among all species belonging to Glomeromycotina.

Conclusions
The genome of R. proliferus shared several conserved features with previously reported AM species with respect to the genetic structure and functions. This included absence of several eukaryotic genes, prominently the type I FAS gene, abundance of protein kinases and reduced number of glycoside hydrolases. A unique nding was higher proportion of carbohydrate esterases, which might suggest for presence of higher saprotrophic activity in R. proliferus as compared to other AM fungi. The present rst draft of R. proliferus genome would serve as a reference for all future genetics and functional genomics analysis of the species. It would also provide information for comparative genomics analysis required for developing comprehensive understanding about structure and function of AM fungi in future.