Comparison of RWP-RK transcription factors between Arabidopsis and legumes uncovers a relationship between nitrogen signaling and nodulation

Background Nitrogen, as a constituent of amino acids and nucleic acids, is an essential macronutrient for all living organisms. The nitrogen-fixation clade (NFC) is a clade, consisting of Fabales, Fagales, Cucurbitales, and Rosales, where all nodulating plants have been originated. The plant-specific RWP-RK family of transcription factors are involved in nitrate responses and play specific roles in nodule inception. In the present study, by investigation of RWP-RKs at genome-wide level and comparative coexpression networks, the roles of RWP-RKs involved in nitrate response and nodulation were analyzed to reveal evolution of RWP-RKs and a relationship between nitrogen signaling and nodulation. Results Here, we systematically investigated 292 RWP-RKs from 26 species of legumes and non-legumes of NFC by whole-genomic analysis and characterized their evolutionary relationships, protein motifs, and gene structures. We compared RWP-RK networks from Arabidopsis thaliana under N-starvation and N-supplementation conditions, as well as transcriptome atlases from Phaseolus vulgaris and Glycine max . This revealed that N starvation, which is essential for nodulation, alters the connectivity of RWP-RKs to other genes, including symbiosis-related genes. Meanwhile, appropriately low concentrations of nitrates stimulate nodulation by regulating RWP-RK expression in P. vulgaris . Conclusions Our comparative evolutionary analysis of RWP-RKs between A. thaliana and legumes revealed the evolutionary features and the relationship between the nitrate signaling pathway in a model organism and nodulation in legumes.

limited N levels by the plant [11,12]. The nitrate signaling pathway and nodulation appear to be involved in controlling processes regulated by RWP-RKs. Some NFC members lack NIN and do not form nodules. To date, the RWP-RK family in the NFC has not been subjected to detailed analysis. The recent sequencing of multiple genomes of legumes and non-legumes capable or incapable of undergoing nodulation has paved the way for comparisons of the RWP-RK family across species [7].
A. thaliana as an ideal model dicot, our understanding of the roles of RWP-RKs in nitrate signaling in A. thaliana could facilitate the improvement of NUE in crops via evolutionary and comparative analysis [13,14]. The construction of coexpression network based on correlated gene expression patterns is a valuable tool for revealing sets of genes that function in specific biological processes [15]. Therefore, combined coexpression network and gene evolution analysis of both model plants and non-model crops should provide new insights into the relationship between the nitrate signaling pathway and symbiosis mediated by RWP-RK family members.
In this study, we constructed a phylogenetic tree of 26 plant species based on genome-wide gene duplication events to explore the evolutionary features of the NFC. We identified 292 RWP-RK family members in the 26 species covering five orders (Fabales, Rosales, Cucurbitales, Fagales, and Brassicales), including 25 species of four orders belonging to the NFC, using A. thaliana (Brassicales, non-NFC) as the outgroup (Table S1) Of the 25 species from the NFC, 21 form nodules, whereas the four remaining species lack the capacity for nodulation (Fig. 1A). A total of 292 RWP-RK domains and 156 PB1 domains has been identified in these analyzed species (Table S1).
RWP-RKs appear to be randomly distributed in each order, indicating that genomewide gene duplication events did not contribute to the expansion of RWP-RKs in specific orders. Moreover, it appears that the RWP-RKs did not expand in nodulated plants ( Fig. 1A and Fig. S1). For example, Begonia fuchsioides (which lacks nodules) contains the second-highest number (22) of RWP-RKs after G. max, and A. thaliana contains 14 RWP-RKs, more than many nodule-forming plants. These results suggest that the expansion of RWP-RKs involved in the nitrate signaling pathway and nodule inception might represent an adaptive response of plants to diverse nitrogen conditions. Compared to RKDs, NLPs (such as NIN, which is specifically required for nodulation in the NFC) contain an additional PB1 domain, which increases their interaction with other proteins [33,34]. Despite the similar median values for each domain number between nodule-forming and non-nodule-forming plants, the numbers of RWP-RK and PB1 domains are more variable in nonnodulating than in nodulating plants, except for the outlier, large number of RWP-RK in G. max (possibly because of its two genome-wide duplications) (Fig. 1B). We speculate that non-nodulating plants require more RWP-RK and PB1 domains to improve NUE due to their inability to fix nitrogen from atmospheric inorganic nitrogen.
We also analyzed the consensus motifs of both the RWP-RK and PB1 domains within each species. The consensus motifs could not be analyzed in some species with few instances of these domains, for example for both the RWP-RK and PB1 domains in Lupinus angustifolius and for PB1 in Vigna angularis and Lotus japonicus (Table   S1). A few species have large insertions in their RWP-RK domains, such as A.

Analysis of RWP-RK and PB1 domains in the NFC
Despite the important roles of RWP-RKs in both nodulating and non-nodulating plants [7,9,10], the features of these proteins within the NFC, such as protein motifs, gene structures, and evolutionary relationships, have not been investigated in detail. We therefore analyzed the consensus domains of RWP-RK and PB1 in nodule-forming and non-nodule-forming plants separately. These domains are strongly conserved but contain some modified insertions at different positions ( Fig.   2A). Alignment of the domains revealed that the 49 th site (K, Lys) and 63 th site (R, Arg) are conserved across all RWP-RKs (Fig. S3) (Fig. S3).
Interestingly, almost all of these modifications occurred in the first amino acid of the conserved RWP-RK motif in nodule-forming Fabales and Cucurbitales plants. To explore the possible origins of all the identified RWP-RKs among the 26 species, we grouped the RWP-RKs into different orthogroups. An orthogroup is defined as a set of genes that descended from a single gene in the last common ancestor of the analyzed species [17]. Except for one A. thaliana RWP-RK (AT4G38340) and six Arachis duranensis RWP-RKs (Aradu.I1BME, Aradu.TG0QF, Araip.377BK, Araip.6M6N8, Araip.YB35N, Araip.YWB61), which lack orthogroups (and are possibly orphan genes), the 285 remaining RWP-RKs clustered into 11 orthogroups containing 1-69 RWP-RKs genes, suggesting that these genes have diverse origins (Fig. 2B and Table S3).
We predicted the length, isoelectric point, and subcellular localization of each RWP-RK using various databases [24,35]. The isoelectric points of the 292 RWP-RKs range from 4.73 for Drydr146S16094 to 10.6 for Aradu.I1BME, and the molecular weights range from 7.14 kDa for Aradu.I1BME to 156.01 kDa for Aradu.G4SB3. Most proteins were predicted to localize to the nucleus (Table S4).

Phylogeny and characteristics of the RWP-RKs
To investigate the phylogenetic relationships among RWP-RK family members in the 26 species, we constructed a phylogenetic tree of 292 RWP-RKs via the neighborjoining method with 1000 bootstrap values. To limit issues related to high divergence between proteins, we selected the RWP-RK domains with 30 additional amino acids upstream and downstream of the domains for alignment and phylogenetic analysis. The tree formed six clades based on the relationships of the NLPs and RKDs in A. thaliana. The NLP subfamily clustered into a single clade with three subclades, NLP-1 (AtNLP1, AtNLP5), NLP-2 (AtNLP6, AtNLP7), and NLP-3 (AtNLP8, AtNLP9), containing all NLPs with PB1 domains. The RKD subfamily includes RKD-1 (AtRKD4, AtRKD5), RKD-2 (AtRKD1, AtRKD2, AtRKD3), and RKD-3 (Fig. 3A).
To learn more about the features of each clade of RWP-RKs in the phylogenetic tree, we investigated their gene structures using GSDS 2.0 [36] and predicted additional conserved motifs using MEME 5.0.1 [25]. Due to the lack of exon information in the annotated gff3 file of Trifolium subterraneum (with 13 RWP-RKs), we omitted those data and used the 279 remaining RWP-RKs for analysis. The average number of exons in the NLP subfamily and RKD subfamily is 4.9 (ranging from 2-13) and 4.0 (1-9), respectively. The number of exons in each subclade of NLP subfamily members is 4.1 for NLP-1 (2-11), 5.3 for NLP-2 (3-8), and 5.6 for NLP-3 (4-13). The number of exons in each subclade of RKDs is 3.7 for RKD-1 (1-6), 5.6 for RKD-2 (2-9), and 3.5 for RKD-3 (3-5) (Fig. 3B and Table S5). Therefore, the average number of exons is higher in the NLPs than in the RKDs.
Of the 50 consensus motifs predicted by MEME, in addition to the three common motifs (including known RWP-RK and PB1 motifs) in both the RKDs and NLPs, the NLP subfamily contains 39 enriched motifs, whereas the RKD subfamily contains only 8 enriched motifs. Each subclade contains some unique motifs, pointing to the diverse roles of RWP-RKs, such as motif #24 in NLP-1, motif #18 in NLP-2, and motif #23 in NLP-3. Based on predictions from NetPhos 3.1 Server [37], motif #24 contains a predicted phosphorylation site (Y) with a score of 0.89; motif #18 contains a predicted phosphorylation site (S) with a score of 1.00; and motif #23 contains multiple predicted phosphorylation sites (S) with high scores. In addition, members of RKD-3 (only in the NFC) contain several unique motifs, including motif #34, motif #37, and motif #41 (Fig. S5, Fig. S6 and Table S6). These unique motifs in each subclade may increase the number of specific interactions with other proteins.
Statistical analysis of intron phases, that is whether the intron disrupts a codon, revealed that RWP-RKs contain the most phase 0 introns, which cause no disruption of a codon), followed by phase 2 and phase 1, which disrupt the codon between bases 2 and 3 or bases 1 and 2, respectively. However, RKDs contain more phase 1 introns than NLPs with phase 1 introns (Fig. 3C). Overall, the results of exon number and intron phase analyses were consistent with the phylogenetic analysis, but there were some exceptions. For example, Cerca58S27147 has more short additional exons than other members of the NLP-1 clade but lacks additional protein motifs, suggesting that additional exons of this gene might play a regulatory role in Cercis canadensis (Fig. S6).
Comparison of RWP-RKs in non-nodulating vs. nodulating plants a.
thaliana has many advantages to study interactions between diazotrophic bacteria and dicots [14], and therefore the knowledge of A. thaliana is helpful to translate biological knowledge from model organisms to crops. To explore the relationship of RWP-RKs in nodulating and non-nodulating plants, we select A. thaliana (non-nodulation), G. max (nodulation), and P. vulgaris (nodulation) for subsequently comparative analysis. We classified the RWP-RKs from A.
thaliana (14), G. max (28), and P. vulgaris (12) into three clades: unique RKDs in G. max and P. vulgaris (Clade I), RKDs in all three species (Clade II, including AtRKD1-AtRKD5), and NLPs in all three species (Clade III, including AtNLP1-AtNLP9). Analysis of gene structure revealed that Clade III genes contain many more exons than those of clade I, which is consistent with the finding that additional protein motifs (such as the important PB1 domain) are essential for the functioning of NLPs (Fig. 4A, Fig. S6 and Table S6). In addition to motif Despite the important roles of nitrates in plant growth, N limitation, including N starvation or low N, is essential for nodulation in legumes [11,12]. To explore the relationship between N limitation and nodulation, we integrated time-series transcriptome datasets from A. thaliana, including root samples treated with KCl (defined as N starvation) and KNO 3 (defined as N supplementation) [38], as well as expression atlases from G. max and P. vulgaris [39,40]. NLP genes are expressed in different tissues, with some specifically expressed in the nodules of G. max and P. vulgaris ( Glyma.02G311000, Glyma.06G000400, Glyma.04G000600, Glyma.14G001600, Phvul.008G291800, and Phvul.009G115800).
These nodule-specific genes clustered together and are closely related to AtNLP1 and AtNLP2 within subclade IIIc (Fig. 4A, C, D). However, AtNLP1 and AtNLP2 showed opposite fold changes in response to N starvation, indicating that regulatory elements have rapidly evolved. Taking the evolutionary relationships and expression patterns of model dicot and legume crops together, we conclude that compared to subclades IIIa and IIIb, the acquisition of additional protein motifs in subclade IIIc has provided the prerequisite for nodule inception and that the further evolution of regulatory elements within IIIc will be crucial for the development of this process.
The connected genes of AtNLPs in coexpression networks provide the prerequisite for nodulation Our results demonstrate that AtNLPs are widely involved in plant responses to N starvation. Weighted correlation network analysis (WGCNA) is a powerful tool for dividing genes with correlated expression patterns into different modules with biological significance [15,27,41]. To investigate the possible relationship between N starvation and nodulation, we explored differences in the modules containing AtNLPs under N starvation vs. N supplementation. We used WGCNA to construct a weighted network from N-starvation datasets with rlog normalization (Fig. S7) implemented in the DESeq2 package [31]. GO enrichment analysis of genes in the biological process category revealed no commonly enriched biological process in the five modules ( Fig. 5F and Table S7).
In the purple module (with AtNLP2, AtNLP4 and AtNLP9), in which genes were continuously upregulated after 20 min under the treatment of N starvation, we detected a high proportion of unique GO terms associated with transport (20/54), such as "calcium ion transport" (GO:0006816) and "calcium ion transmembrane transport" (GO:0070588), which may be related to "calcium spiking", a symbiotic signaling event. Genes in the blue module (with AtNLP3) were relatively upregulated at the early stage (10-15 min of treatment). We discovered uniquely enriched GO terms in the blue module, such as "endocytosis"  (Table S7). Although genes in both modules were highly expressed only at 10-15 min under the treatment of N starvation, the genes enriched for the GO term "symbiosis, encompassing mutualism through parasitism" did not overlap, highlighting the independent roles of genes regulated by AtNLP3, AtNLP6, and AtNLP7 within each module. Also, this result suggested that response to N starvation at early stage may be essential for symbiosis. similarity, changed little (fold change < 2) between N-starvation and Nsupplementation conditions. AtNLP7 is a major regulatory element among NLP proteins [42]. Despite their weak upregulation, AtNLP6 and AtNLP7 might function primarily at the protein level rather than the transcriptional level, as previously reported [42].
We constructed a protein association network of AtNLP3, AtNLP6, and AtNLP7 to genes using STRING 11.0 (Fig. 6C,D Most NLPs (AtNLP2-AtNLP7), especially AtNLP5 from the turquoise module, which were significantly upregulated under N starvation (Fig. 4B), showed higher correlations to more genes under N starvation than under N supplementation (absolute pcc ≥ 0.8) (Fig. S9). AtNLP5 (in the turquoise module) was significantly upregulated after 20 min of N starvation, whereas AtNLP3 (in the blue module) was significantly downregulated (Fig. 5C,E); these opposite expression patterns indicate that AtNLPs play diverse roles under N starvation. AtNLP1, AtNLP8, and AtNLP9 without strong correlation to any other genes (absolute pcc≥ 0.8) under N supplementation (defined as lonely expressed genes) showed differential expression patterns under N starvation vs. N supplementation. The expression patterns of these genes were not significantly correlated to those of other genes under N starvation, indicating that they have lost many connected neighbors.
Among the genes most closely related to nodule-specific NLPs of G. max and P.
vulgaris, AtNLP1 and AtNLP2 had opposite expression patterns. AtNLP2 has many connected neighbors, whereas AtNLP1 has completely lost highly connected neighbors ( Fig. 4 and Fig. S9). AtNLP2 reflects the evolutionary imprint of nodulespecific NLP genes. Taking the expression patterns of AtNLP6/AtNLP7 together, these results suggest that the differential connectivities of the NLPs , along with their differential expression patterns, may be essential for nodulation under N starvation. and PKs connected to NLPs at the transcriptional level (Fig. 7A,B and Table S10).
The highly correlated TFs, TRs, and PKs were enriched in both the same and unique  Table S11). The highly correlated PKs were enriched in "protein phosphorylation" (GO:0006468), "cell communication" (GO:0007154), and "cell surface receptor signaling pathway" (GO:0007166), indicating that N starvation influences plant cellcell signaling pathways mediated by NLPs (Fig. 7E and Table S11).
To further explore the possible roles of RWP-RKs in nodule-forming plants, we constructed networks for G. max and P. vulgaris. P. vulgaris contains 9 RWP-RKs, including 7 NLPs and 2 RKDs, which are highly correlated to 130 regulators, with 1,639 edges. G. max contains 16 RWP-RKs, including 14 NLPs and 2 RKDs, which are highly correlated to 270 regulators, with 4,113 edges (Fig. 8A,B and Table S12).
Recent whole-genomic duplication event within G.max affecting RWP-RKs have increased the number of edges of RWP-RKs connected to gene regulators compared to P. vulgaris. Among these RWP-RKs, Phvul.009G115800 and Phvul.008G291800 in P. vulgaris and Glyma.04G000600, Glyma.02G311000, Glyma.14G001600, and Glyma.06G000400 in G. max, which are NIN (nodule inception) genes [6], were highly expressed in separate nodules and are located in separate modules, pointing to their overall upregulation in nodule tissue (Fig. 8C, D and Table S12). These nodule-specific NLPs clustered together in clade IIIc.
Two other genes in G. max, Glyma.13G346300 and Glyma.12G050100, which are closely related to AtNLP8 and AtNLP9 within clade IIIb, were highly expressed in nodules. Interestingly, Phvul.005G15510 in clade IIIb was expressed at higher levels in ineffective N-fixation nodules than in effective N-fixation nodules. This observation suggests that the regulation of NLPs from clade IIIb is also influenced by rhizobia without the capacity for N fixation and that NLPs from clade IIIc are critical for N-fixing nodules.
We also detected divergence within IIIc: Phvul.009G01120, together with Glyma.04G017400 and Glyma.06G017800, which are closely related to AtNLP4 and AtNLP5, also showed the highest expression levels in ineffective N-fixation nodules.
Only genes in the small clade closely related to AtNLP1 and AtNLP2 were specifically upregulated in effective N-fixation nodules in both G. max and P. vulgaris (Fig. 4).
These findings reflect the functional divergence of NLPs with regard to nodulation, that is, the existence of nodulation with and without N fixation.

Effect of nitrogen on nodulation via the regulation of NLPs in P. vulgaris
Not only are NLPs involved in the nitrate signaling pathway, but they also influence nodule inception [6]. Whether nitrate influences nodulation via the regulation of NLPs mediating nitrate signaling in P. vulgaris has been unclear. In plants inoculated with Rhizobium tropici treated with different concentrations of nitrate, 10 mM nitrate inhibited nodulation in both early and mature nodules. Mature nodules from plants treated with 5 mM nitrate appeared to be browner, and plant height was taller, than for plants treated with either 0 mM or 10 mM nitrate (Fig. 9A, B, C and   Fig. S10).

Expression analysis of six PvNLP genes in early roots and root-nodule mixtures
under different concentrations of nitrates indicated that Phvul.004G114100, Phvul.008G291800, and Phvul.011G052100 showed higher expression at 5 mM nitrate (low-nitrogen conditions) with inoculation than they did under either 0 mM nitrate (nitrogen-free conditions) or 10 mM nitrate (high-nitrogen conditions) regardless of inoculation (Fig. 10A). When the roots were treated with different concentration of nitrates without inoculation, Phvul.009G011200 and Phvul.009G115800 showed highest expression under low-nitrogen as compared to both nitrogen-free and high-nitrogen conditions, whereas the other genes showed gradual inhibition with increasing nitrogen concentration. Finally, Phvul.008G291800, Phvul.009G011200, and Phvul.011G052100 were significantly inhibited under high-nitrogen conditions (Fig. 10B). In mature nodules, Phvul.007G071900 was significantly upregulated under low-nitrogen vs. nitrogenfree conditions, whereas other genes were downregulated or varied in expression pattern (Fig. 10C).
discussion Nitrogen (N) is an essential element for crop growth, productivity, and grain quality.
N is also a central component of DNA, RNA, proteins, and various metabolic compounds [1]. To biosynthesize the precursors of these biomacromolecules, plants must integrate inorganic carbon (CO 2 ) and nitrogen (NO 3 -/NH 4 + ); however, compared to CO 2 , NO 3 -/NH 4 + is more difficult to obtain from the environment.
Therefore, plant growth is generally nitrogen limited in both natural and agricultural environments [46]. The extensive use of synthetic fertilizers in developed countries is expensive and environmentally damaging. By contrast, in developing countries, the lack of fertilizer causes low crop yields, leading to hunger and malnutrition.
NLPs of RWP-RKs play important roles in nitrate signaling [4,9,47]. Meanwhile, the NFC is a clade originated at 100 million years ago, in which NIN (a member of RWP-RKs) is reported to be essential for nodulation [7,48]. Therefore, comparative research on RWP-RKs in A. thaliana and NFC which involve in both nitrogen signaling and nodulation is crucial for improving sustainable agriculture and human health now and in the future.

Distribution and features of RWP-RKs in NFC
In contrast to animals, the success of angiosperms is partially attributed to innovations caused by gene or whole-genome duplications [49]. An accurate phylogenetic tree of species is required for evolutionary comparisons of RWP-RKs. It is impossible to deduce a phylogenetic tree from one-to-one corresponding orthologues in that few of these orthogroups in 26 species contain just one orthologue from each species. Therefore, we used a newly developed method for phylogenetic tree construction based on the analysis of gene duplications for all genes [50]. In the phylogenetic tree (Fig. 1A), species from separate orders clustered together, regardless of their ability to undergo nodulation. These results suggest that the genome-wide duplication of genes is not the direct driver of the evolution of nodulation across the NFC. This is also supported by the finding that the evolutionary relationship between polyploidy and nodulation is not sufficient to make a species able to form nodules [51]. The presence of RWP-RKs with multiple origins (Fig. 1B) might help these species adapt their growth and metabolism in response to fluctuations in nitrogen availability in different habitats, as legumes are exceptionally ecologically diverse [52,53].
In both RKDs and NLPs, there are conservative and non-conservative amino acid sites ( Fig.2A, Fig.S3 and Fig.S4), which appears to be not relevant to status of nodulation across the species of NFC. However, the 49 th site (K, Lys) and 63 th site (R, Arg) are conserved across all RWK-RDs, suggesting that these sites are vital for RWP-RK activity. Also, completely conserved sites (4 th K, 13 th R, 62 th D and 75 th D) in PB1 domain is conserved in animals, fungi, amoebas, and plants indicating in diverse biological processes beside plant-specific nitrate signaling and nodulation [54]. In addition, variation of motifs of both the RWP-RK and PB1 domains caused by insertion/deletion may provide novel regulatory function for different RWP-RK members. For example, AtNLP8 with insertion in the RWP-RK domain functions as a master regulator of nitrate-promoted seed germination and is activated by nitrate via a mechanism different from AtNLP7 without insertion [4]. Whether this insertion within RWP-RK of AtNLP8 is involved in this specific mechanism (as is the case for AtNLP6) remains to be explored. Variations in the domain lengths of RWP-RK and PB1 and specific amino acid sites (resulting in noncanonical domains) might mediate noncanonical interactions during various biological events [54]. The diversity of RWP-RKs within species indicates that these proteins play diverse roles in plant responses to nitrates.
Phylogeny of all RWP-RKs from 26 species based on protein sequences showed six subclades into two clades (RKDs and NLPs) (Fig.3A). The variations in exon number, leading to more protein motifs in NLPs than in those in RKDs (Fig.3B), highlight the functional novelty of the RWP-RK subclades, as these variations are usually accompanied by the additions/deletions of functional motifs. Meanwhile, high numbers of exons increase the chances for alternative splicing of precursor mRNAs, which is important for gene regulation [55]. In addition, analysis of intron phases of the six subclades (Fig.3C) suggested that generally number of exons with phase 0 introns for RWP-RKs is higher than those with phase 0 and phase 1 introns. There is an excess of phase 0 introns also found in other cases, supported by prediction of the exon theory of genes [56]. However, the RKDs has more exons with phase 1 introns than NLPs with phase 1 introns, which indicates different origins of intronexon structures for RKDs and NLPs, respectively [57].

NLPs
Over the past few decades, the model dicot A. thaliana has become an excellent model for studying symbiosis in non-nodulating plants [14]. The most important nodule-bearing legume crops such as G. max and P. vulgaris have not been studied in as much detail as A. thaliana. RWP-RKs play important roles in both nitrate responses and nodule inception, and they interact with each other to coordinate nitrate signaling and nodulation [6]. Therefore, we felt that a comparison of RWP-RK expression and homology in A. thaliana vs. legume crops (G. max and P. vulgaris) would allow gene functional analyses in model organisms to be applied to noduleforming crops. Such studies might also facilitate the transfer of the nitrogen-fixation trait into non-nodulating plants to improve NUE [3,58]. Interestingly, AtRKD1-AtRKD4 are highly expressed in reproductive organs, and AtRKD5 has pleiotropic effects on phytohormone pathways, highlighting the regulatory importance of AtRKDs in female gametophyte development [59,60]. Finally, almost RKDs are absent from the separated transcriptome atlas of G. max and P. vulgaris due to the lack of gametophyte tissues (Fig. 4C, D). Together, our results indicate that the RKDs and NLPs show significant functional diversity in many biological processes.
At the post-transcriptional level, AtNLP7 localizes to the cytoplasm after N starvation, whereas it relocates to the nucleus after N supplementation [42]. Here, Modules including different AtNLPs provide differential prerequisites for nodulation.
GO term of "endocytosis" (GO:0006897) in the blue module results in the formation of organelle-like structures known as "symbiosomes" during early rhizobial invasion.
The term "response to cytokinin" (GO:0009735) in the blue module is also essential for the initiation of nodulation. Auxin signaling is required for rhizobial infection in

Simulation of low nitrogen on nodulation mediated by NLPs
It is reported that high nitrate repressed nodulation while nodulation only occurred under low nitrates or free nitrates [11,12,62]. In our experiment, we found the effects of nitrate on nodulation were possibly mediated by regulation of NLPs in P.  S11). This finding, combined with the uniquely high expression pattern of Phvul.007G071900 in mature nodules under low-nitrogen conditions (Fig. 10C), indicates that Phvul.007G071900 is involved in integrating nitrate signaling and nodule symbiosis in P. vulgaris.
Finally, we determined that Phvul.007G071900 and NRSYM1 are closely related to AtNLP6 and AtNLP7 within clade IIIa (Fig. 4A and Fig. S11). This finding points to the functional similarity of these proteins from clade IIIa in integrating the nitratesignaling pathway and nodulation. This notion is supported by the observation that transformation with AtNLP6 or AtNLP7 partially rescues the nodulation phenotype of the L. japonicus nrsym1 mutant [47]. Indeed, AtNLP6 or AtNLP7 might be involved in integrating symbiosis and nitrate signaling under N-starvation. Compared to nitrogen-free conditions, we propose that the differential upregulation of genes under low-nitrogen conditions partially explains the stimulatory effects of lownitrogen treatment on nodulation, which might occur via the upregulation of specific NLPs in P. vulgaris.   [20] used for searching using HMMER 3.2 [21]. After integration of the results from BLASTP and HMMER and removal of redundancy, the results were further checked using NCBI CDD [22] and the SMRT database [23]. Basic bioinformatics analysis of various features of the proteins including the molecular weight (MW), isoelectric point (pI), and length were performed using ExPASy (http://www.expasy.ch/tools/pi_tool.html), and the subcellular localizations of the proteins were predicted using BUSCA [24]. Additional conserved motifs were identified using the MEME 5.0.1 package [25] with the parameters minimum motif width, 6; maximum motif width, 50; and maximum number of motifs, 50.
Phylogenetic analysis of RWP-RKs Availability of data and material The plant samples that are used in the present study are available from the corresponding author on reasonable request. The genome information in our study can be found in Table S1 in the supplemental data.  Figure 1 Species tree based on genome-wide duplication events constructed using the maximum-likel Comparison of gene expression patterns and homology among Arabidopsis thaliana, Phaseolu Relationships of AtNLP3, AtNLP6, and AtNLP7 to genes involved in symbiosis (GO:0044403) a Expression analysis of selected RWP-RK genes in early roots, early nodules, mature roots, an

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download. Supplemental_figures_tables_190914.rar