A retrotransposon insertion caused epr pseudogenization in Parasponia species
Parasponia represents five species, three for which genome sequence data have been generated; P. andersonii, P. rigida, and P. rugosa, respectively [4]. Earlier analysis revealed that these Parasponia species, as well as close relatives of the genus Trema, possess a single LjEPR3/MtLYK10 orthologous gene named EXOPOLYSACCHARIDE RECEPTOR (EPR). P. andersonii and P. rigida EPR accumulated different mutations in the first exon causing a disruption of the predicted open reading frame (ORF), whereas P. rugosa EPR experienced a large deletion affecting exon 1 to 5 (table 1). As these mutations in EPR are not shared between the three Parasponia species, they must have occurred in parallel. This may suggest that the loss of EPR in Parasponia is the result of genetic erosion rather than specific selection. Alternatively, a shared, but yet unknown mutation occurred in the non-coding region of the gene affecting its functioning.
Table 1: Independent mutations in the presumed coding region of epr pseudogene of three Parasponia species
species gene name mutation in cds encoded protein GeneBank

To find evidence for this latter scenario, we investigated the putative promoter region of EPR in Parasponia and Trema species. In L. japonicus the functional promoter region of LjEPR3 is relatively short, spanning only 329 bp upstream of the translational start codon [7]. We analysed the EPR promoter region in three Parasponia and two Trema species. The alignment of these promoters revealed a large 5,7 kb insertion in all three Parasponia species, just 154 bp upstream of the predicted translational start codon (Fig. 1A; Supplemental data file 1). Homology searches using BLAST revealed that this insertion represents a unique TY3-GYPSY-type retrotransposon element, which occurs only as a single copy in the genomes of the three Parasponia species, whereas it is absent in Trema. We compared the expression of the P. andersonii epr pseudogene to close homologs of the LysM-type receptor kinase (LYK) family [13]. This uncovered that in none of the samples, expression was observed, including roots and nodules at different stages of development (Figure S1). This further supports that the retrotransposon insertion in the putative regulatory region of EPR could have been instrumental for the pseudogenization of this gene in the Parasponia lineage.
Trema EPR is expressed in rhizobium-induced nodules
In L. japonicus, the LjEPR3 promoter possesses putative binding sites for the nodulation-specific transcription factors NIN and ERN1 [7]. We analysed the putative promoter region of Trema and Parasponia EPR using MEME combined with manual curation [14, 15]. This predicted the occurrence of conserved putative transcription factor binding sites for ERN1 (1x) and NIN (3x), both in Trema and Parasponia EPR promoters in a confined ~ 500bp region (Fig. 1B). This may suggest that transcriptional regulation of EPR3 ortholog genes is conserved in legumes and non-legumes. As the putative NIN and ERN1 binding sites are present also in the T. orientalis EPR promoter, we questioned whether Trema EPR still possesses a nodule-enhanced expression profile, despite the loss of the nodulation trait.
To find support for the functioning of EPR in nodulation in a Trema-Parasponia ancestor, we generated transgenic P. andersonii lines carrying a TorEPR promoter GUS reporter construct (pTorEPR:GUS). As a putative promoter, a fragment of 1,730 bp upstream of the translational start codon was used, which includes the putative NIN and ERN1 binding sites. Two independent transgenic lines were studied. GUS staining of root tissue did not reveal any blue staining. Subsequently, plantlets (2x n = 10) were inoculated with the compatible strain Bradyrhizobium elkanii WUR3 [16] and studied 4 and 8 weeks post-inoculation. TorEPR protomer GUS activity was observed in rhizobium-induced cell divisions (Fig. 2A,B), which in P. andersonii occur in the root epidermis and outer cortical cell layers [17]. In mature nodules, pTorEPR:GUS induced blue staining is confined to the meristematic zones in the apex of the nodule (Fig. 2C-E). In both cases, the GUS expressing cells were yet to be infected by rhizobium.
To find additional support of Trema EPR expression in nodules, we studied gene expression in an intergeneric F1 hybrid of the cross P. andersonii x Trema tomentosa. Earlier studies showed such hybrid plants can be nodulated, but are hampered in hosting rhizobium intracellularly [4]. T. tomentosa is an allotetraploid. We analysed available genome sequence data and identified two T. tomentosa EPR genes, which were named TtoEPRa and TtoEPRb (Supplemental data file 2). Next, we studied EPR allele-specific expression in P. andersonii x. T. tomentosa F1 hybrid roots and nodules. This revealed a nodule-specific expression of TtoEPRa and TtoEPRb whereas no expression of P. andersonii EPR was detected (Fig. 3).
Taken together, expression analysis of the P. andersonii x T. tomentosa F1 hybrid as well as TorEPR:GUS reporter studies in P. andersonii confirm that Trema EPR possesses essential cis-regulatory elements allowing nodule specific expression. This suggests that in a Trema-Parasponia ancestor, EPR functioned in nodulation.
The loss of EPR in nodulating species is specific to the Parasponia lineage
In L. japonicus and M. truncatula, LjEPR3 and MtLYK10 commit essential functions in rhizobium infection, whereas in Parasponia the orthologous gene is pseudogenized. Earlier studies showed that also in the legume Aeschynomene evenia the LjEPR3/MtLYK10 orthologous gene is absent [18]. However, this species possesses a close paralog, possibly evolved as a result of a legume-specific duplication event that may commit a similar function [9, 18]. To determine whether loss of EPR3 occurred more often in nodulating species, we analysed genome sequences of 34 species; 26 legumes (including A. evenia, L. japonicus, and M. truncatula), 7 actinorhizal plant species that nodulate with Frankia, and P. andersonii. In all species, 1 to 4 putative ERP3 orthologous genes were identified. Many of these gene models have been predicted based on automated bioinformatics, without manual curation. As LjERP3, MtLYK10 and TorEPR/Panepr have a conserved gene structure consisting of 10 exons, we used these to manually curate the gene models in other species (Table S1). This revealed that all species investigated possess at least one gene copy that can encode a LysM-type receptor kinase that in length and structure is comparable to LjEPR3/MtLYK10/TorEPR. Subsequent phylogenetic reconstruction, based on a coding sequence alignment and using close paralogs LjLYS4, LjLYS5, MtLYK11, and PanLYK4 as an outgroup, supported the orthologous relation (Fig. 4; Supplemental data file 3). Also, it supports the occurrence of a duplication event in the legume Papilionoid subfamily, and the subsequent loss of one copy in the clade formed by Cicer, Medicago, Trifolium, Vicia, and Pisum. As all analysed plant genomes -except Parasponia- possess an EPR3-type gene, we conclude that loss of this gene in nodulating plant species is uncommon.