We explored the phylogenomics and methylomics of NLR genes in 41 plant species and found that the expansion and diversification of plant NLR genes are co-evolved with their DNA methylation.
Research Article
Co-evolution of plant disease resistance NLR genes and their DNA methylation
https://doi.org/10.21203/rs.3.rs-2563255/v1
This work is licensed under a CC BY 4.0 License
published 13 Apr, 2023
You are reading this latest preprint version
We explored the phylogenomics and methylomics of NLR genes in 41 plant species and found that the expansion and diversification of plant NLR genes are co-evolved with their DNA methylation.
NLR genes
DNA methylation
evolution
transposons
Disease resistance (R) genes are a class of genes that can resist a variety of pathogens. Plants use the nucleotide-binding domain and leucine-rich repeat (NLR) family of intracellular receptors to detect the presence of pathogens. After recognizing pathogen invasion, NLR proteins can activate hypersensitivity and a series of immune responses, ultimately leading to cell death of infected cells and inhibiting the proliferation and spread of pathogens (Wang et al. 2020). NLR genes exhibit very rapid evolutionary patterns in response to various evolving pathogens, so expansion and diversification of NLR genes occur rapidly and form multiple different evolutionary patterns during plant speciation progress (Li et al. 2021). Based on the different evolutionary rates, NLR genes can be divided into two types: type I NLR genes has fast evolutionary rate and high copy number, and frequent sequence exchange with other genes leads to its extensive chimeric structure and fuzzy allelic/homologous relationship, while low sequence similarity. In contrast, type II NLR genes are highly conserved, low copied, rarely recombines with other genes, and evolves independently (Kuang et al. 2004). Regarding the mechanism of the difference between type I and type II NLR genes, we found that the G/C content of type I NLR locus in rice is slightly but significantly higher than that of type II NLR locus, while the homology around type I locus is markedly lower as compared with type II locus (Luo et al. 2012). However, why evolutionary pattern of type I and type II NLR genes is associated with G/C content remains unknown.
DNA methylation in G/C context is a conserved epigenetic modification, and the level of DNA methylation in different tissues or cells is strictly controlled during the life cycle (Zhang et al. 2018). In mammalian evolution, DNA methylation plays a dominant role in dosage balance after gene replication by inhibiting transcriptional priming of duplicating genes (Chang and Liao 2012). Here we posed a scientific question: Whether DNA methylation had a regulatory effect on NLR gene expansion. In this study, we analyzed NLR genes using published bisulfite sequencing data of 41 plant species. We found significant differences in the methylation level of NLR genes between type I and type II evolutionary models, which indicated that DNA methylation might affect the expansion and diversification of NLR genes in plants.
We identified total 12381 NLR genes in 41 publicly available plant genomes using the HMMER search (Supplementary Table 1, 2). These genes were clustered into orthologous/paralogous groups, representing 509 gene families (Supplementary Table 3). The evaluation of 41 different plant genomes revealed large differences in NLR copy number between different genomes (Fig. 1a). NLR families were numbered from 1 to 509 based on the gene copy number in each family. Interestingly, NLR genes from the 20 largest families (most duplicated) accounted for 85.07% of our dataset. These results demonstrated that the 20 largest families were highly repetitive (Fig. 1a). To further investigate the evolutionary mechanism of expansion of NLR genes, we analyzed the copy number variation (CNV) within NLR gene family for each species. We defined NLR genes with a copy number greater than 5 as high copy, and vice versa as low copy. In the five largest NLR families, high-copy NLR families (high-copy: 99.53%, and low-copy: 0.47%) were dominant, indicating that these NLR genes evolved rapidly and were highly amplified. We calculated the pairwise similarity of NLR genes with two CNV levels for each family and observed that the sequence similarity of high-copy NLR genes was low, while that of low-copy NLR genes was high (p-value = 1.725e-05) (Fig. 1b; Supplementary Table 5), while suggested that the CNV of NLR genes is positively correlated with evolutionary divergence in a certain family. Moreover, the sequence difference between high-copy and low-copy NLR genes was consistent with the evolutionary pattern of type I and type II NLR genes. Thus, we defined NLR genes with high copy number (>5) as type I NLR genes and those with low copy number (<=5) as type II NLR genes.
Previous reports suggested that NLR genes expression was regulated by DNA methylation. For example, two tandem miniature transposons (MITEs) in the CNL PigmS (ectopic expression of R8) promoter determine pollen-specific expression in rice, these MITEs inhibit PigmS through RNA-directed DNA methylation (RdDM) during development (Deng et al. 2017). We hypothesized that DNA methylation might have a certain effect on NLR gene expansion. To prove this hypothesis, we calculated the methylation levels in the upstream regions, gene body, and downstream regions of the NLR genes of the two CNV levels for each species. These results are consistent with our hypothesis, we compared the CG, CHG, and CHH methylation levels of type I and type II NLR genes in 41 plant species (Supplementary Table 7). Among them, 23 species showed significant differences in the upstream, gene body or downstream regions. The CG methylation levels of type I and type II NLR genes were significantly different in 4, 3, and 8 species in upstream, gene body, and downstream consensus region, respectively, while CHG methylation levels were 5, 10, and 2 species respectively, and CHH methylation had 3, 3, and 5 species respectively. Subsequently, we singled out these regions which showed significant differences in CG, CHG, and CHH methylation levels respectively to explore whether there were significant differences in methylation levels of type I and type II NLR genes at the overall level (Supplementary Table 8). The results showed that there were significant differences in the CHH methylation level of the gene body and flanking consensus regions, especially in the downstream consensus regions (Fig. 1c). There were also significant differences in the upstream and gene body consensus regions of CHG methylation (Supplementary Fig. S1b), however no marked change was observed in CG methylation (Supplementary Fig. S1a). In terms of significant regions, type I NLR genes had significantly higher methylation levels relative to type II NLR genes, suggesting that DNA methylation could affect the expansion and diversification of NLR genes. Take as an example, we plotted CHH methylation levels of the Populus trichocarpa across the gene body and the flanking region (Fig. 1d). The results showed that the methylation level of NLR genes of type II was significantly lower than that of type I in gene body region and downstream region, which once again demonstrated that DNA methylation can affect the expansion of plant NLR genes. Tandem duplication events are major contributors to the expansion of NLR families. Direct tandem duplication may promote unequal crossover and family expansion (Zhang et al. 2016). High expression of plant NLR defense mechanisms are usually fatal to plant cells, and DNA methylation can affect the expression of NLR genes to buffer gene dosage imbalance events caused by gene replication, minimizing the fitness cost (Zhang et al. 2016; Xia et al. 2013) . However, whether increased methylation is a driver of NLR genes expansion, rather than a result of it, remains to be explored.
It has been widely recognized that DNA methylation has the function of silencing transposons. In combination with the above results, we could speculate that transposons may play a remarkable role in the event that DNA methylation mediated the evolution of NLR genes in plants. For this, we calculated the transposon coverage of type I and type II NLR genes in upstream regions, gene body, and downstream regions for each species (Supplementary Table 9). The results showed that the transposon coverage of type I and type II NLR genes was significantly different in 3, 7 and 4 species in upstream, gene body and downstream consensus regions, respectively. Then, we singled out these regions which showed significant differences in transposon coverage to explore whether there were significant differences in transposon coverage of type I and type II NLR genes at the overall level (Supplementary Table 10). The results exhibited that there were significant differences between the transposon coverage in upstream and gene body consensus regions (Fig. 1e). Afterward we compared the regions with significant differences in methylation levels to the regions with significant differences in transposon coverage (Fig. 1f). The results revealed that 60% of the regions with substantial changes in CHG methylation upstream of NLR genes also had marked difference in transposons (TE region number/Methylation region number:3/5). There were 40% of the corresponding regions in the gene region (4/10). More than 30% of the regions with significant differences in gene body CG methylation and upstream CHH methylation of NLR genes also had substantial difference in transposon coverage (Fig. 1f). This suggested that transposons may play an important role in the events that DNA methylation influence the evolution of type I and type II NLR genes. TE insertion causes local changes in chromatin structure. The epigenetic control of TE DNA allows for genome expansion and provide new regulatory function in most plant species. Cytosine DNA methylation is a marker of transposition factor silencing. One well-described example is that H3K9me2 levels on Copia-type retrotransposons affect the correct transcription and splicing of NLR genes (Tsuchiya and Eulgem 2013).
In summary, we investigated the association between DNA methylation and NLR genes, using 41 available land species genomes, and concluded that the expansion and diversification of NLR genes are co-evolved with their DNA methylation. Our findings provide new insights into the evolution of NLR genes in the context of DNA methylation.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (32070250), the Natural Science Foundation of Guangdong Province (2020A1515011030) and the open research project of “Cross-Cooperative Team” of the Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences. We thank Yong Feng for providing suggestions. Computational support was provided by the Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agri- culture and Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
Authors’ contributions
Y. Z. planned and designed the research; Q. C. and S. W. performed the research; Q. C. and S. W. analyzed the data; Q. C wrote the paper. All authors reviewed and edited the manuscript.
Competing interests
The authors have declared that no competing interests exist.
All the data used in this study were published in NCBI. We selected the species with complete genomes, whole-genome bisulfite sequencing (WGBS) data, gene annotation, and protein sequence. Then we screened out 41 species with sequencing depth greater than 5 and methylation mapping rate greater than 60% for subsequent analysis. Performed methylation alignment using the default parameters for HISAT-3n (Zhang et al. 2021). We converted genome-wide cytosine methylation reports into specific formats using custom Perl scripts (chromosome, position, strand, count methylated, count unmethylated, C-context, trinucleotide context). The annotated transposon sequence of the genome using the default parameter for Extensive de-novo TE Annotator pipeline (Ou et al. 2019). Removed redundant protein sequences and searched the genome for predictive proteins using the HMMER V3 (Finn et al. 2011) of the Hidden Markov model (HMM) corresponding to the Pfam (Finn et al. 2014) NBS (NB-ARC) family (PF00931; http://pfam.sanger.ac.uk/). High-quality protein sites (E-value < 1 x 10-10) were obtained by filtration. Clustered the NLR proteins using OrthoFinder (Emms and Kelly 2019) and then obtained the copy number of the NLR genes for each species in each family.
Counted the frequency of each copy number of the NLR genes for each species in each family and calculate the difference in frequency. Finally, the selected values where the difference slope tends to be stable as the cut-off value (cut-off 5). (Supplementary Table 5). We defined the copy number greater than 5 as high-copy genes, and less than or equal to 5 as low-copy genes. The identity of high-copy and low-copy NLR proteins were calculated using BLASTP. The average sequence consistency was calculated using custom scripts since it is possible for one sequence to match multiple sequences. We calculated methylation levels and transposon coverage in the gene body and flanking regions of high and low copy NLR genes for each species by custom perl scripts, and performed significance analysis using t-tests (p-value < 0.01 was significant). Methylation levels in the gene body and flanking regions of Populus trichocarpa were determined by dividing the region into 100 equal-sized bins by ViewBS (Huang et al. 2018) and assessed at weighted methylation levels.
References
Emms DM, Kelly S (2019) OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol 20:1–14. https://doi.org/10.1186/s13059-019-1832-y
Finn RD, Bateman A, Clements J, et al (2014) Pfam: The protein families database. Nucleic Acids Res 42:222–230. https://doi.org/10.1093/nar/gkt1223
Finn RD, Clements J, Eddy SR (2011) HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res 39:29–37. https://doi.org/10.1093/nar/gkr367
Huang X, Zhang S, Li K, et al (2018) ViewBS: A powerful toolkit for visualization of high-throughput bisulfite sequencing data. Bioinformatics 34:708–709. https://doi.org/10.1093/bioinformatics/btx633
Ou S, Su W, Liao Y, et al (2019) Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20:1–18. https://doi.org/10.1186/s13059-019-1905-y
Zhang Y, Park C, Bennett C, et al (2021) Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N. Genome Res 31:1290–1295. https://doi.org/10.1101/gr.275193.120
published 13 Apr, 2023
Reviewers agreed at journal
07 Mar, 2023
First submitted to journal
14 Feb, 2023
Editor assigned by journal
09 Feb, 2023
You are reading this latest preprint version