Morphological and microscopic analysis of A. candida
The leaves collected from white rust infected B. juncea plants were observed for disease development. The pustules appeared on the abaxial surface in the form of irregularly distributed, small pinheads, that were creamish in colour with length ranging from 2.5–4.5 mm (av. 3.6 mm, n = 50) and diameter 1–4 mm (av. 2.6 mm, n = 50). Light microscopy of the infected leaves revealed hyphae that were branched and regularly or irregularly swollen. Some intercellular hyphae developed into unbranched, club-shaped sporogenous hyphae that formed long chain of sorus filled with numerous sporangia arranged in a basipetal succession (Fig. 1A). Sporangia appeared hyaline, globose and arranged in basipetal chains with 22.5–27.5 µm (av. 22.7) diameter (n = 50) (Fig. 1B). Furthermore, the sexual oospores were thick walled, pale yellow in colour, and globose with verrucose indentations on the wall surface (Fig. 1C). Haustoria, which are the exchange sites for nutrient uptake and fungal virulence factors such as effectors, were mostly globose and surrounded by a thick sheath with a narrow stalk (Fig. 1D). .
Genome sequencing, assembly, and BUSCO analysis
The purity of the high molecular weight-DNA, isolated from the zoosporangial mass of the A. candida (Ac2v) pathogen ‘Delhi' isolate was confirmed by sequence homology of the ITS region via PCR amplification followed by Sanger sequencing. The contaminant free DNA was then subjected to complete sequencing on the Oxford Nanopore (ONT) platform for long reads and on the Illumina HiSeq platform for short reads. The paired-end ONT and Illumina libraries generated a total data of 6.3 GB and 4.5 GB, respectively. The raw data were then filtered based on the quality score, trimmed and adapter sequences were removed. Error correction was performed to obtain reads representing at-least 40x genomic coverage. The corrected reads were then used for genome assembly into contigs and then assembled into scaffolds using MaSuRCA v3.4.2. Furthermore, Illumina reads were used to improve the base accuracy of the genomic scaffolds and gap filling. The aforementioned assembler statistics resulted in 547 contigs that were assembled into 415 scaffolds with N50 of 301.91 kb, L50 of 39, and a hybrid assembly of 36.88 Mb of A. candida (Table 1). The assembly is longer than SOLID-based Ac2VRR (34.56 Mb) and the Illumina-based AcNc2 assemblies, however, is slightly smaller than the PacBio based Ac2VPB assembly (38.96 Mb) [19, 20]. The contigs were also evaluated for homology search using NCBI blast, while most of the contigs match to the Ac2vPB assembly, 62 contigs were identified with no significant hits (Supplementary table 6). To validate the new findings, 10 contigs (contig_129, contig_131, contig_135, contig_138, contig_163, contig_22, contig_24, contig_25, contig_27 and contig_163) were taken at random and primers were designed to validate the novel contigs through PCR which confirmed their presence (Fig. 3). The total length of 62 contigs correspond to 2,34,330 bp which represents 0.63% of the total variation in the 36.88 Mb of genome. Since, only the mapped reads have been considered for the assembly into contigs, the absence of significant similarity in the 62 contigs suggests rearrangement of certain regions in the Delhi strain resulting in the varaiation. The quantitative assessment of A. candida hybrid genome assembly and annotation completeness was carried out using BUSCO. The draft assembly consisted of 97% of complete BUSCOs and a very minute percentage of missing and fragmented BUSCOs, i.e., 1% and 2%, respectively. Meanwhile, BUSCO analysis of the predicted proteins from the assembly showed 99% of complete BUSCOs for Stramenopile database and 57.1% of fungal database (Table 2).
Table 1
Statistics of draft genome assembly of A. candida Delhi isolate
Statistics | A. candida assembled genome |
Quantitative analysis of genome assembly |
Size of assembled genome | 36.88 Mb |
Total number of scaffolds is assembly | 415 |
N50 scaffold length | 301.91 Kb |
Longest scaffold length | 1.03 Mb |
L50 | 39 |
GC (%) | 43.44% |
No. of genes predicted | 13,715 |
Qualitative analysis of assembly |
BUSCOs Complete (percentage) /fragmented (percentage)/Missing (percentage) (Metaeuk) | 100% / 0%/ 0% |
BUSCOs Complete number (percentage) /fragmented number (percentage)/Missing number (percentage)(Augustus) | 97%/ 1%/ 2% |
BUSCO for proteins |
BUSCOs Complete number (percentage) /fragmented number (percentage)/Missing number (percentage) (Stramenopiles_odb) | 99 (99%)/ 0 (0%)/ 1 (1%) |
BUSCOs Complete number (percentage) /fragmented number (percentage)/Missing number (percentage)(fungi_odb) | 433 (57.1%)/ 50 (6.6%)/ 275 (36.3%) |
Bioinformatics analysis
Transposable Elements (TEs), Repetitive Sequences and variant analysis
Transposable elements (TEs) are the chief mechanistic drivers of genome evolution, therefore, whole-genome data was analyzed for understanding the transpositional landscape of the A. candida pathogen as represented in Table 2. The A. candida genome harbours both Class I and Class II transposable element (TE) families. The repeat elements occupied only 24.29% of the genome. Long terminal repeat (LTR) retroelements were the most abundant TEs, representing 7.87% of the assembly, followed by Long interspersed nuclear element (LINEs), which constitute 7.3% of the genome. Among the LTR elements, Copia/TY1 (5.6%) was more copious than Gypsy/DIRS1 (2.01) and retroviral elements (0.27). In addition, the repeat sequences consisted of DNA transposons (3.19%), low-complexity repeats (0.02%), small RNA (0.39%), simple repeats (0.23%), and some unknown sequences (5.65%).
Table 2
Types of transposable elements identified in A. candida genome
Types of transposable elements | Number of elements | length occupied (bp) | Percentage of sequence (%) |
Retroelements | 5219 | 5697587 | 15.45 |
SINEs: | 274 | 101513 | 0.28 |
Penelope: | 0 | 0 | 0 |
LINEs: | 1732 | 2691482 | 7.3 |
CRE/SLACS | 0 | 0 | 0 |
L2/CR1/Rex | 0 | 0 | 0 |
R1/LOA/Jockey | 0 | 0 | 0 |
R2/R4/NeSL | 0 | 0 | 0 |
RTE/Bov-B | 134 | 90188 | 0.24 |
L1/CIN4 | 673 | 1083305 | 2.94 |
LTR elements: | 3213 | 2904592 | 7.87 |
BEL/Pao | 0 | 0 | 0 |
Ty1/Copia | 2460 | 2066099 | 5.6 |
Gypsy/DIRS1 | 696 | 740442 | 2.01 |
Retroviral | 57 | 98051 | 0.27 |
DNA transposons | 1535 | 1175672 | 3.19 |
hobo-Activator | 0 | 0 | 0 |
Tc1-IS630-Pogo | 468 | 505134 | 1.37 |
En-Spm | 0 | 0 | 0 |
MULE-MuDR | 1052 | 667095 | 1.81 |
PiggyBac | 0 | 0 | 0 |
Tourist/Harbinger | 0 | 0 | 0 |
Other (Mirage, P-element, Transib) | 0 | 0 | 0 |
Rolling-circles | 0 | 0 | 0 |
Unclassified | 5954 | 2084549 | 5.65 |
Total interspersed repeats | | 8957808 | 24.29 |
Small RNA | 357 | 143706 | 0.39 |
Simple repeats | 1295 | 85889 | 0.23 |
Low complexity | 164 | 7310 | 0.02 |
A total of 1039 SSRs were revealed with a relative abundance of 28.16 SSRs/Mb of the genome of the test race. The most frequent motifs were mono- (63.6%) followed by di- (22.1%), tri- (8.7%), tetra- (2.1%), penta- (0.1%), and hexa-nucleotides (3.3%). The genome sequence also yielded 108 compound SSRs. The percentage frequency of SSRs based on the nucleotide size revealed that SSRs composed mostly of smaller repeats, while SSRs with large repeats represented a smaller percentage. The distribution of the different SSR types is heterogeneous, particularly in mono- and di-nucleotides. Among P1, A/T (77.7%) was most frequently occurring and were also the most frequent motif in the entire genome followed by G/C (22.2%). Among P2, the highly distributed motifs were AG/CT (32.6%), AT/AT (31.7%) and AC/GT (26.5%). While in P3, motifs AAG/CTT (32.9%) and AAC/GTT (19.7%) were the most abundant. Among P4, ACAT/ATGT (54.5%) was copious, in P5 the only representative sequence was AAGAT/ATCTT and in P6, motif AATGTG/ACATTC (41.1%) was mostly present (Fig. 3).
The present assembly was also compared with the Ac2vPB and Ac2VRR assembly for variant analysis. A total of 1,24,974 variants were identified in the genome against Ac2vPB, out of which 1,24,923 were SNPs and 51 were InDels, while 92,444 SNPs and 52 InDels were found against the Ac2VRR assembly. This is an indicative of variation that exists between the two isolates delimited by the geographical barriers, even though they belong to the same race (race 2). Therefore, suggesting the highly variable nature of the A. candida pathogen depending upon the place of origin.
Gene prediction and functional annotation
Scaffolds arising from the primary assembly were repeat masked and further comprehensive gene predictions in the assembly was carried out using the BRAKER2 program followed by functional annotation using eggNOG emapper v2.1.8 and InterProScan v5.55-88.0 to predict protein domains, GO terms and putative pathway. A total of 13,715 genes was predicted in the Indian isolate of A. candida, which is higher than the predicted genes in Ac2vPB assembly (13,073), however, surprisingly fewer than Ac2VRR (15,826). Out of the predicted 13,715 genes, 11,556 genes were annotated with GO, COG, KEGG, CAZy and Pfam databases accounting for 84.2% of the total predicted genes (Table 3). tRNAscan-SE, Rfam and Rnammer were used to predict tRNA, snRNA and rRNA, respectively. Results showed that the genome contained 314 transfer RNAs, 22 snRNAs, and 66 ribosomal RNAs represented by 48 number of 8S subunit while, only 6 number of 28S and 18S subunit each. GO analysis for the genome revealed that the top five enriched GO term were cell, cellular process, metabolic process, and binding, with 4419 genes, 4214 genes, 3517 genes, and 3468 genes, respectively (Fig. 7).
Table 3
Functional annotation summary of A. candida genome from different databases
Database | Number of genes | Percentage |
GO | 4698 | 34.2% |
KEGG | 6357 | 46.3% |
COG | 7358 | 53.6% |
Pfams | 9602 | 70.0% |
CAZy | 231 | 1.7% |
PHI | 5922 | 43.2% |
BiGG | 128 | 0.9% |
Reactome Pathway | 8783 | 64.0% |
MetaCyc Pathway | 7288 | 53.1% |
TMHMM | 2209 | 16.1% |
Among 31 KEGG pathways, the top five pathways were ‘Global and Overview’, ‘Signal transduction’, ‘Organismal system’, Transport and catabolism’, and ‘Cell growth and death’, including 2826 genes, 2821 genes, 1480 genes, 1117 genes, and 1082 genes, respectively (Fig. 8). According to the results of gene set enrichment analysis for top KEGG pathways, a significant proportion of genes in A. candida Indian isolate are likely involved in metabolism, environmental information processing and organismal systems.
Through COG mapping, 7358 proteins were assigned to different COG categories (Fig. 9A). The most gene-abundant classes in the COG groups included ‘Replication and repair’, with the highest number of genes (897), followed by ‘Post-translational modification, protein turnover, chaperone functions’, ‘Translation, ribosomal structure and biogenesis’, ‘Signal Transduction’, ‘Intracellular trafficking and secretion’, ‘Amino acid metabolism and transport’ and ‘Carbohydrate metabolism and transport’. These findings emerged as indicative of an enriched and diverse set of proteins associated with metabolism and secretion functions. These proteins perform a central role in enhancing the absorption and metabolism of nutrients from substrates, thereby fostering infection in the host plant. CAZy database was used on the predicted fungal genes to annotate the carbohydrate-active enzyme (CAZymes). A total of 231 genes were assigned to CAZymes families out of which the highest percent was represented by glycoside hydrolases (GHs, 49.1%), followed by glycosyl transferases (GTs, 37.3%), carbohydrate-binding modules (CBMs, 5.5%), polysaccharide lyases (PLs, 4.5%), and carbohydrate esterases (CEs, 3.6%) (Fig. 9B). GHs are usually predominant in majority of fungal pathogens, their function to break down glycosidic bonds is an indicative of their diverse role in the fungal pathogenesis like breakdown and modification of the plant cell wall components. Especially for a biotroph like A. candida which depends on its host for nutrition, GHs can play a major role in cell wall manipulation allowing development of haustoria. An inclusive analysis of these CAZys can give valuable information to discern the host-pathogen interaction, ultimately contributing towards our understanding of disease establishment. Homology search for pathogenicity-related genes among 13,715 predicted genes revealed 9152 genes that aligned, with the threshold of Evalue < 1e − 05, to the PHI database. These pathogenesis-related proteins were predominantly categorized in six groups considering their functional characteristics and distinctive traits. Most PHI genes were categorized under reduced virulence (3732 genes), followed by unaffected pathogenicity (1175 genes), loss of pathogenicity (555 genes), lethal (210 genes), increased virulence (179 genes) and least under plant_avirulence_determinant (71 genes).
Secretome and effector diversity
Of the total 13,715 protein encoding genes, SignalP v6.0 predicted 688 secretory proteins with signal peptides (SP) in the N-terminal region. This repertoire of secretory protein appears to be smaller than that identified from the transcriptome of the Ac2VRR (929 secreted proteins) and Ac2VPB (1,104 secreted proteins) isolates of A. candida [19, 20]. The predicted secretory proteins were then subjected to TMHMM that identified 526 secretory proteins without transmembrane domain from the protein dataset. Removing the rest 162 proteins that had one or multiple transmembrane domains the predicted secretory proteins were then classified using EffectorP, which predicted 355 effector proteins. Among these effector proteins, 304 were categorized as cytoplasmic effectors (85.6%) and 51 as apoplastic effectors (14.4%). The predicted effector proteins were then annotated against Pfam, GO, KEGG, COG, PHI and CAZy database to evaluate their role in pathogenesis. Annotation of the 355 effector proteins showed only 125 proteins with identity and description while rest 230 proteins were novel and were further classified based on the motifs into RxL and CCG types.
Out of the 125 annotated effectors, 64 proteins aligned to PHI-base at 1e-05 under different categories such as reduced virulence, unaffected pathogenicity, loss of pathogenicity, hypervirulence and avirulence effector. Of the rest 61 proteins, 4 were classified under CAZy as glycoside hydrolase and glycosyl transferases, nine were identified to have plant cell wall modification activity out of which three were identified to be pectate lyase while five were polygalacturonase and only one was found to have cellulase activity. Moreover, the results showed three cysteine rich secretory proteins that are known to be responsible for eliciting defense response in non-host plants [21, 22, 23].
Among the crinkling and necrosis (CRN) class effectors six proteins were identified that had amino-terminal “LYLAK” instead of “LFLAK” domain similar to the CRN effector identified in Phytophthora sojae (Fig. 4) [24]. However, Interestingly, out of the six CRN proteins, only one protein was identified as effector while the rest five had no signal peptide sequence suggesting their non-secretory nature. Apart from CRNs we also identified six secretory proteins without transmembrane domain annotated as elicitins through InterProScan. Elicitins are a highly conserved group of secreted proteins and includes members such as INF1, a 10 KDa protein from P. infestans. The presence of six cysteine residues is a common feature of elicitins and has been used to mine databases revealing 156 elicitins (ELI) or elicitin-like (ELL) genes from seven Phytophthora species [25].
Cellulose Binding Elicitor Lectin (CBEL) is a cell wall glycoprotein first reported from P. parasitica var. nicotianae that is responsible for eliciting a defense response in the host Nicotiana benthamiana [26]. CBELs from Phytophthora species generally has a secretion signal followed by interleaved Cellulose Binding (CBD) and Apple domains which mediate protein-protein or protein-carbohydrate interactions. Links et al. (2011) reported two CBELs in their Ac2VRR genome. Ac2VRR-CBEL1 had a signal peptide while Ac2VRR-CBEL2 did not contain a signal peptide. However, three CBELS were identified in the present assembly containing signal peptide with no transmembrane domain and having sequence similarity to the CBEL reported from P. parasitica. The results also showed two effector proteins with cellulose binding domain following the signal peptide but no apple domain.
The search for RXLR effector candidates among the 290 novel effector proteins against the RdEER.hmm, showed no hits for homology suggesting a lack of the commonly found RxLR motif in A. candida [27]. However, Links et al. (2011) reported a small group of genes that encode secreted proteins with a variant RxLR motif (Ac-RxL). Therefore, we build a MEME motif using sequences of “Ac-RxL” effector proteins reported in Ac2VRR assembly. The motif “RSLRRSTE” was then used to search in our list of putative effector proteins through MAST. As opposed to 26 predicted RXL effector identified in Ac2VRR assembly, in the present assembly only 3 gene moles were predicted which contained a putative sec-dependent signal peptide, lacked homology to known proteins, and had an Ac-RXL motif R(SQK)L(FRKE) near the N terminal (Fig. 5).
CHxC effectors are another conserved effector motif commonly present in most of the Oomycetes. However, Furzer et al. (2022) reported that compared with the consensus motif from AlNc14 (A. laibachii) genome, the previously reported histidine residue in the CHxC motif was less conserved in Ac2VPB, but a glycine six amino acids after the second cysteine was highly conserved, making the consensus motif CxxCxxxxxG. They redefined the effector class as “CCGs” in A. candida. CCG type effectors were also reported in the Ac2VRR assembly. Therefore, based on the previously identified CCG class effectors in Ac2VRR and Ac2VPB genome, a motif was generated through MEME (Fig. 6). Using the predicted effectors as input file in MAST, a total of 13 CCG protein candidates were identified that showed no homology to other protein, accounting for 3% of the secretome. This number is far less than the, 110 CCG candidates identified in Ac2vPB assembly and 40 in Ac2VRR, this could be an indicative of the difference in virulence between the two variants, suggesting a low degree of virulence in the Indian isolate as compared to its Canadian variant. Another reason for the difference in the numbers of CCG and RxLR effectors could be that, in the case of the Canadian isolate, the effectors were predicted from the transcriptome dataset. In contrast, in the present study, the proteins predicted from the genome were used as the query for effector prediction, although the same pipeline and tools were used as in previous studies. Rest of the secretome had no known motif and need to be further evaluated for their role in pathogenesis.
Genome-level phylogenetic analysis
Although previous studies have revealed the phylogeny and genetic diversity of A. candida, most of them have been based on the sequence alignment of nuclear ITS or mitochondrial COX [12, 28]. In current investigation, the phylogenetic position of A. candida race Ac2v Delhi isolate was elucidated by constructing genome-based phylogeny using average nucleotide identity (ANI). It is a robust tool that provides reliable evolutionary relationships among species by calculating the nucleotide-level genomic similarity [29]. As the ANI values remain unaffected by horizontal gene transfer or variable evolutionary rates, it serves as a fine indicator of the relatedness among the coding region of two genomes [30]. Typically, an ANI value surpassing 95% is widely accepted and is a benchmark for species classification and clustering. Conferring from the results as shown in Fig. 10, the ANI values varied from 75.2–99.8%. Among Albugo genus the present assembly had high average nucleotide identity with race Ac2vPB (99.64%) and Ac2vRR (99.47%) therefore clustering them together in the phylogeny tree (Fig. 11), followed by race Ac7v (98.5%), while for Albugo laibachii race AlNc14, infecting Arabidopsis thaliana, the genome showed low ANI value (80.23%), confirming the difference in species. The phylogeny confirmed that the most prevalent race of A. candida infecting B. juncea in India is a variant of race 2 rather race 7 which also infects the same host plant. Although the difference in ANI values among the A. candida isolates might not seem much, but as low as 0.36% between Ac2vPB and Ac2v Delhi variant suggest that the genome have frequently undergone wide range of re-arrangements rather than indels accumulating to huge variations thus evidencing towards its generalist behaviour. This is also suggested by identification of new contigs in case of the Ac2v Delhi assembly, as the reads were mapped to the reference genome however, when assembled into contigs the BLAST result showed no sequence similarity.