a) Alignment of upstream nucleotide and downstream gene sequences
In a previous study on the time course of sunflower infection by Pl. halstedii gene expression analysis revealed a total 97 expression clusters, including 8,444 genes (Bharti et al. 2022 (submitted)). In the current study, sequence alignments of orthologous genes from the genomes of Ph. infestans and Ph. sojae could be obtained for 96 of 97 clusters, with an E-value < 0.01, identity >70%, and hit strand in the same orientation and a similar procedure as for their downstream regions (~25K gene sequences) revealed sequence identity >50% within gene sequence alignments (Fig 2a; Additional file 1). For cluster 22, no sequence alignment was obtained. The remaining 96 clusters contained about 4,084 sequences and had between 4 and 689 members and length up to 400 nt from ATG start site (Additional file 1).
b) Reference motifs database generation from orthologous oomycete genomes and motif discovery on clustered orthologous upstream regions intersected with downstream gene sequences
The tool STREME generated a reference motif database containing 15 over-represented, statistically significant motifs (p-value <0.05). The retrieved STREME-5 motif (AGCGCGTG, 622 occurrences) out of 15 reference motifs had the highest occurrences, and the second most abundant motif was STREME-1 (CAGCGGGGCTGCCGT, 369 occurrences) (Table1; Additional file 2). A total of 245 novel, statistically significant (p-value <0.05) motifs were found using the tools MEME and STREME (Additional file 3). AME derived the genomic co-ordinates (p-value < 0.05) for 112 putative and cross-species conserved motifs (STREME-IDs) against the reference motif database (Additional file 3). No significantly enriched motifs were generated for 25 out of 97 expression clusters.
c) Motifs categorization
The 245 motifs found were further divided into five categories on the basis of upstream genomic alignments conservation. The motifs conserved in all three genomes (Pl. halstedii; Ph. sojae; Ph. infestans) were categorized as Category A (112 motifs). The Category B (76 motifs) was given for conserved motifs in Pl. halstedii and Ph. sojae, Category C (10 motifs) for Ph. sojae and Ph. infestans and Category D (4 motifs) for Pl. halstedii and Ph. infestans, Category E (43 motifs). All generated motif categories are shown in Fig 2b and Category A motifs with their genomic positions are shown in Fig 3, 4 (Additional file 3; Additional Fig 1, 2, 3). The reference gene cluster expression profiles for Category A motifs were adapted from (Bharti et al. 2022 (submitted)) as shown in Fig 3.
Utilizing sequence similarity-based criteria 112 Category A motifs conserved in Pl. halstedii, Ph. sojae, and Ph. infestans were further analyzed for association with genes related to pathogenicity, exocytosis and vesicle transport, ion channels and calcium-binding proteins, plant cell wall degrading enzymes (PCWDEs), and transcription factor (TF) proteins. There were none of the motifs found associated with protease inhibitors and ion channels (Table 1).
Table 1 Category A motifs classified according to the gene groups
Groups
|
Motif_ID (Cluster number [Sequences/Total unique sequences])
|
RxLRs
|
MOTIF-60 (36 [2/26]) *, MOTIF-61(36 [2/24]) *, STREME-3 (90 [1/8]) *
|
Crinklers (CRNs)
|
MOTIF-84 (61 [2/14]), MOTIF-85 (65 [3/24]) *, STREME-5 (54 [1/7]) *, MOTIF-38 (19 [2/6]) *, MOTIF-13 (5 [2/15]) *
|
Proteases
|
MOTIF-15 (6 [2/21]), MOTIF-17 (7 [2/21]), MOTIF-86 (65 [2/26]), MOTIF-98 (81 [1/10]), MOTIF-103 (88 [3/9]), MOTIF-117 (111 [2/8])
|
ABC-transporter
|
MOTIF-78 (56 [1/21])
|
Calcium binding channel proteins
|
MOTIF-15 (6 [2/21]), MOTIF-17 (7 [2/21])
|
Exocytosis and vesicle transport
|
MOTIF-54 (28 [2/18]), MOTIF-61 (36 [2/24]), STREME-5 (12 [3/17]), MOTIF-85 (65 [2/24), MOTIF-86 (65[2/26]), MOTIF-104 (90 [3/15])
|
Cell-wall degrading enzymes
|
MOTIF-25 (12 [3/21]), STREME-12 (12 [3/3]), MOTIF-82 (60 [4/20]), MOTIF-96 (75 [2/13]), STREME-14 (23 [2/5])
|
Transcription factor (TF)-proteins
|
STREME-1 (54 [2/4]), STREME-13 (54 [2/6]), MOTIF-11,12 (4 [1/6]), STREME-1 (2 [1/9]), STREME-14 (23 [1/5]), MOTIF-64 ( 41 [3/7]), STREME-12 (3 [3/11]), MOTIF-121 (113 [5/18]), MOTIF-78 (56 [2/21]), STREME-1 (3 [3/4]), MOTIF-60 (36 [3/26]), MOTIF-61 (36 [3/24]), MOTIF-70 (47 [2/18]), MOTIF-23 (12 [2/11]), MOTIF-15 (6 [5/21]), MOTIF-17 (7 [5/21]), MOTIF-67 (45 [2/19])
|
As per the gene groups, the upstream motifs and clusters of Category A (Pl. halstedii; Ph. sojae, Ph. infestans) were separated out. On the basis protein sequences similarity search and mBed-like clustering guide tree generated using Clustal Omega, further the protein sequences within the clusters were divided into sub-groups in order to analyze the orthologous sequence conservation among the gene group(s) within cluster(s). The secretion motif information adapted from (Sharma et al., 2015) and examined for their occurrences in the identified RxLRs and CRNs sequences within clusters. * The presence of secretion motif in the cluster member(s).
|
d) Association of TFBS motifs with pathogenicity related genes, exocytosis and vesicle transport, calcium-binding proteins, plant cell wall degrading enzymes (PCWDEs) and transcription factors
Genes in 25 clusters show significant over-representation of regulatory motifs (46 motifs; p < 0.05) for genes involved in pathogenicity, exocytosis and vesicle transport, ion channels and calcium-binding proteins, plant cell wall degrading enzymes (PCWDEs), and transcription factors (TF)-proteins, which implies that genes involved in these biological functions are regulated in Pl. halstedii and two Phytophthora species in a conserved fashion in spite of their different genomic locations (Table1).
e) RxLR and Crinkler (CRN) motifs and their upstream regulatory motifs
The upstream motifs found were MOTIF-60 (TCAWBKNSMRKCYGRD; 29 sites; 26 genes), MOTIF-61 (ACAAGC; 30 sites; 24 genes) in cluster (cl) 36, and STREME-3 (CTCTGACGCAAAG; 8 sites; 8 genes) in cl 90 (Additional file 4). In the cross- protein sequence similarity analysis, it was found that both motifs were associated with previously reported effectors (Fig 4). Interestingly, two RxLR-like protein coding genes CEG44730 and PHYSODRAFT_293198 with putative regulatory motifs MOTIF-60, 61 and the RxLR-like protein coding gene CEG41057 with STREME-3 upstream were retrieved (Sharma et al., 2015). Further, sequence similarity with the functionally characterized effectors Ph. infestans Avrblb2 and Ph. sojae avirulence homolog-5 (Avh5) revealed a putative location an alternative RxLR motif (Fig 4). Interestingly, there is also protein conservation for genes within clusters containing RxLR-like proteins such the presence of Radial spoke protein 11 and DnaJ heat shock protein.
The MOTIF-84 (cl 61; CMCVCTSWRGAY; 17 sites; 14 genes) found upstream of conserved CRN domain-containing proteins CEG41789 and PHYSODRAFT_316479 found to have no similarity with previous experimentally validated effectors. Three CRN-like genes, CEG44506 (crn1 and PsCRN108 like; conserved LQLFLAK domain) and PHYSODRAFT_471561 (crn2 like) with LFLAK-Position FLAR and upstream MOTIF-85 (cl 65; CGTM; 28 sites; 24 genes) were significantly over-represented and found to be conserved (Fig 5). Two orthologous CRN-like protein coding genes with motifs HVLVVVP-position were conserved for MOTIF-38 (cl 19; AGAAKRYRATCAAGG; 7 sites; 6 genes; HVLVVVP:67), MOTIF-13 (cl 5; GWCMKDDTTC; 17 sites; 15 genes; HVLVAL:75) (Additional file 4). One CRN-like gene CEG44994 (Avr4/6 like) with the putative regulatory motif STREME-5 (cl 54; AGCGCGTG; 10 sites ;7 genes; HVLVVVD:37) was found.
f) Other groups and the upstream regulatory motifs
Pathogenic microbes produce a variety of peptidases, which are enzymes that catalyze the breakdown of host proteins into small polypeptides to disrupt the host defense and create conditions suitable for pathogen colonization (Marshall, Finlay and Overall, 2017; Figaj et al., 2019). In the current study, orthologous ubiquitin specific protease, OTU (ovarian tumor)-like cysteine protease and serine protease family were found in some clusters with the following regulatory motifs: MOTIF-15 (cl 6; GSCACCAASYT; 24 sites; 2/21 genes), MOTIF-17 (cl 7; WSAAMTSKCBVC; 24 sites; 2/21 genes), MOTIF-86 (cl 65; YTGACKCAAA; 29 sites; 2/26 genes), MOTIF-98 (cl 81; CKCYGA; 12 sites; 1/10 genes), MOTIF-103 (cl 88; CKYSTCAMHSSCYKC; 10 sites 3/9 genes), MOTIF-117 (cl 111; GTCTCSCSKNCCAC; 10 sites; 2/8 genes) were found (Table1; Additional file 4).
ATP-binding cassette (ABC)-transporter proteins, constitute one of the largest protein families, present in both prokaryotes and eukaryotes and transport a broad range of substances across biochemical membranes (Dassa and Bouige, 2001). The regulatory motif MOTIF-78 (cl 56; CDSYCAS; 23 sites; 1/21 genes) was found enriched with ABC-transporter proteins (Table 1; Additional file 4).
In oomycetes, cytoplasmic Ca2+ levels are controlled by calcium binding channel proteins and channels represent key targets for anti-oomycete fungicides for pathogen control (Judelson and Blanco, 2005). In the current study, the regulatory motifs MOTIF-15 (cl 6; GSCACCAASYT; 24 sites; 2/21 genes) and MOTIF-17 (cl 7; WSAAMTSKCBVC; 24 sites; 2/21 genes) were found associated with such proteins (Table 1; Additional file 5).
Exocytosis serves for the delivery of vesicle content with enzymes such as proteases, glucanases, and callose from the pathogen cell to the host extracellular matrix and also to the plasma membrane (Leborgne-Castel and Bouhidel, 2014). Rab family GTPases and transfer proteins SEC20 and SEC14 for export from endoplasmic reticulum to the Golgi apparatus were conserved in Pl. halstedii and the two Phytophthora species and associated with the motifs MOTIF-54 (cl 28; WCGG; 24 sites; 2/18 genes), MOTIF-61 (cl 36; ACAAGC; 30 sites; 2/24 genes), STREME-5 (cl 12; AGCGCGTG; 17 sites; 3/17 genes), MOTIF-85, 86 (cl 65; CGTM, TGCAG; 28 and 29 sites; 2/24 and 2/26 genes), and MOTIF-104 (cl 90; GYTACGGCAGCCCCG; 23 sites; 3/15 genes) (Table 1; Additional file 5).
Phytopathogenic oomycetes enter the plant through multiple routes and many, including downy mildews, Ph. infestans and various Pythium species, penetrate into the host using appressoria (Judelson and Ah-Fong, 2019). Genes induced in the appressorium stage by Phytophthora species include the cell wall-degrading enzymes (CWDEs) to degrade cellulose, hemicelluloses, xylan, pectin, β-1,3-glucans and glycoproteins in the plant cell wall (Blackman et al., 2015), so the pathogen can grow into the host. Conserved glycoside hydrolases coding genes are associated with motifs MOTIF-25 (cl 12; AGCTACGGCAGCCCC; 35 sites; 3/21 genes), STREME-12 (cl 12; TCTTCGCCAGGA; 5 sites; 3/3 genes), MOTIF-82 (cl 60; CRKACA; 26 sites; 4/20 genes), MOTIF-96 (cl 75; GTGKCCGT; 14 sites; 2/13 genes), and STREME-14 (cl 23; CCGATGGTCARACAG; 5 sites; 2/5 genes) (Table 1; Additional file 5).
Transcription factors (TFs), sequence-specific DNA-binding proteins, directly bind to regulatory regions on DNA. TFs regulate the production of virulence factors by modulating the gene expression (Charoensawan, Wilson and Teichmann, 2010). Mostly, zinc finger and myb transcription factors encoding genes were found be associated with putative upstream TFBS motifs STREME-1,13 (cl 54; CAGCGGGGCTGCCGT, GGCGTCTCRGCGCAR; 4 and 7 sites; 2/4 and 2/6 genes), MOTIF-11,12 (cl 4; GCCAT, CGTACCGG; 7 sites each; 1/6 genes each), STREME-1 (cl 2; CAGCGGGGCTGCCGT; 22 sites; 1/9 genes), STREME-14 (cl 23; CCGATGGTCARACAG; 5 sites; 1/5 genes), MOTIF-64 (cl 41; GAGGCTGCM; 8 sites; 3/7 genes), STREME-12 (cl 3; TCTTCGCCAGGA; 12 sites; 3/11 genes), MOTIF-121 (cl 113; GTGG; 29 sites; 5/18 genes), MOTIF-78 (cl 56; CDSYCAS; 23 sites; 2/21 genes), STREME-1 (cl 3; CAGCGGGGCTGCCGT; 4 sites; 3/4 genes), MOTIF-60, 61 (cl 36; TCAWBKNSMRKCYGRD, ACAAGC; 29 and 30 sites; 3/26 and 3/24 genes), MOTIF-70 (cl 47; CTCC; 20 sites; 2/18 genes), MOTIF-23 (cl 12; ATCCGCAGMCT; 14 sites; 2/11 genes), MOTIF-15 (cl 6; GSCACCAASYT; 21 sites; 5/21 genes), MOTIF-17 (cl 7; WSAAMTSKCBVC; 24 sites; 5/21 genes), and MOTIF-67 (cl 45; GYTSDAGACMRT; 22 sites; 2/19 genes), most of which are highly GC-rich (Table 1; Additional file 5).