Genome-wide identification, phylogeny, and expression profiling of RWP-RK gene family under low nitrogen and nodulation in Arabidopsis and legumes

Background Nitrogen, as a constituent of amino acids and nucleic acids, is an essential macronutrient for all living organisms. The nitrogen-fixation clade (NFC) is a clade, consisting of Fabales, Fagales, Cucurbitales, and Rosales, where all nodulating plants have been originated. The plant-specific RWP-RK family of transcription factors are involved in nitrate responses and play specific roles in nodule inception. In the present study, by investigation of RWP-RKs at genome-wide level and comparative coexpression networks, the roles of RWP-RKs involved in nitrate response and nodulation were analyzed to reveal evolution of RWP-RKs and a possible relationship between nitrogen signaling and nodulation. Results Here, we systematically investigated 292 RWP-RKs from 26 species of legumes and non-legumes of NFC by whole-genomic analysis and characterized their evolutionary relationships, protein motifs, and gene structures. We compared RWP-RK networks from Arabidopsis thaliana under N-starvation and N-supplementation conditions, as well as transcriptome atlases from Phaseolus vulgaris and Glycine max . This revealed that N starvation, which is essential for nodulation, alters the connectivity of RWP-RKs to other genes, including symbiosis-related genes. Meanwhile, appropriately low concentrations of nitrates stimulate nodulation by regulating RWP-RK expression in P. vulgaris . Conclusions Our comparative evolutionary analysis of RWP-RKs between A. thaliana and legumes revealed the evolutionary features and the relationship between the nitrate signaling pathway in a model organism and nodulation in legumes. results suggest that the differential connectivities of the NLPs, along with their differential expression patterns, may be essential for nodulation under N starvation. The association of genes with NLP genes under N starvation, which are related biological processes of symbiosis, cell cycle, cell wall organization or biogenesis, and calcium ion transport, might have paved the way for nodulation in the NFC.

(TFs) that control N uptake efficiency and N utilization by sensing nitrate signals. These plant-specific TFs are grouped into two subfamilies: Nodule inception (NIN)-like proteins (NLPs) and RWP-RK-domain proteins (RKDs). In addition to sensing nitrate signals in non-nodule-forming Arabidopsis thaliana [4] and rice (Oryza sativa) [5], RWP-RKs have specifically evolved with the TF NIN (which is involved in nodule inception) in nodule-forming plants, such as Medicago truncatula [6,7]. Besides the common DNA binding RWP-RK domain, NLPs contain an additional domain known as PB1 (Phox and Bem 1) at their C-termini that allows interactions with additional proteins [8]. For example, NLP1 functions through PB1-mediated interactions with NIN, leading to the suppression of its target gene, CRE1, in Medicago [6]. In A. thaliana, NLPs play a central role in nitrate signaling by binding to the nitrateresponsive cis-elements in their target genes [9].
NIN, an RWP-RK family member, is essential for nodulation in legumes and non-legumes of the NFC [7,10]. Nodule formation is dependent on the perception of limited N levels by the plant [11,12]. The nitrate signaling pathway and nodulation appear to be involved in controlling processes regulated by RWP-RKs. Some NFC members lack NIN and do not form nodules. To date, the RWP-RK family in the NFC has not been subjected to detailed analysis. The recent sequencing of multiple genomes of legumes and non-legumes capable or incapable of undergoing nodulation has paved the way for comparisons of the RWP-RK family across species [7].
A. thaliana as an ideal model dicot, our understanding of the roles of RWP-RKs in nitrate signaling in A. thaliana could facilitate the improvement of NUE in crops via evolutionary and comparative analysis [13,14]. The construction of coexpression network based on correlated gene expression patterns is a valuable tool for revealing sets of genes that function in specific biological processes [15]. Therefore, combined coexpression network and gene evolution analysis of both model plants and non-model crops should provide new insights into the relationship between the nitrate signaling pathway and symbiosis mediated by RWP-RK family members.
In this study, we constructed a phylogenetic tree of 26 plant species based on genome-wide gene duplication events to explore the evolutionary features of the NFC. We identified 292 RWP-RK family members in the 26 species covering five orders (Fabales, Rosales, Cucurbitales, Fagales, and Brassicales), including 25 species of four orders belonging to the NFC, using A. thaliana (Brassicales, non-NFC) as the outgroup (Table S1). We analyzed the physicochemical properties, genomic positions, gene structures, protein motifs, and phylogenetic relationships of these genes. We then constructed comparative coexpression networks of A. thaliana under N-starvation and N-supplementation conditions, as well as coexpression networks of transcriptome atlases from Phaseolus vulgaris and Glycine max containing nodules, to uncover the relationships among RWP-RKs involved in nitrate responses and symbiosis. Quantitative RT-PCR analysis of P. vulgaris under nitrogen-free, lownitrogen, and high-nitrogen conditions shed light on the diverse roles of these RWP-RKs in nitrate signaling and nodulation.

Phylogeny of NFCs and distribution of RWP-RKs
To learn about the distribution of RWP-RKs across the NFC, based on gene duplication events, we built a phylogenetic tree with high bootstrap values for 26 species from three clades, defined as clade I (Brassicales), clade II (Fagales, Cucurbitales, and Rosales), and clade III (Fabales), with A. thaliana as the outgroup. Of the 25 species from the NFC, 21 form nodules, whereas the four remaining species lack the capacity for nodulation (Fig. 1A). A total of 292 RWP-RK domains and 156 PB1 domains has been identified in these analyzed species (Table S1). RWP-RKs appear to be randomly distributed in each order, indicating that genome-wide gene duplication events did not contribute to the expansion of RWP-RKs in specific orders. Moreover, it appears that the RWP-RKs did not expand in nodulated plants ( Fig. 1A and Fig. S1). For example, Begonia fuchsioides (which lacks nodules) contains the second-highest number (22) of RWP-RKs after G. max, and A. thaliana contains 14 RWP-RKs, more than many nodule-forming plants. These results suggest that the expansion of RWP-RKs involved in the nitrate signaling pathway and nodule inception might represent an adaptive response of plants to diverse nitrogen conditions. Compared to RKDs, NLPs (such as NIN, which is specifically required for nodulation in the NFC) contain an additional PB1 domain, which increases their interaction with other proteins [33,34]. Despite the similar median values for each domain number between nodule-forming and non-nodule-forming plants, the numbers of RWP-RK and PB1 domains are more variable in non-nodulating than in nodulating plants, except for the outlier, large number of RWP-RK in G. max (possibly because of its two genome-wide duplications) (Fig. 1B). We speculate that non-nodulating plants require more RWP-RK and PB1 domains to improve NUE due to their inability to fix nitrogen from atmospheric inorganic nitrogen.
We also analyzed the consensus motifs of both the RWP-RK and PB1 domains within each species. The consensus motifs could not be analyzed in some species with few instances of these domains, for example for both the RWP-RK and PB1 domains in Lupinus angustifolius and for PB1 in Vigna angularis and Lotus japonicus (Table S1). A few species have large insertions in their RWP-RK domains, such as A. thaliana AT2G43500 (AtNLP8) and Discaria trinervis Distr2169S18482. In addition to conserved motifs commonly existed in almost RWP-RKs, similar insertions also occur in PB1 domains to some extent (Fig. S2).

Analysis of RWP-RK and PB1 domains in the NFC
Despite the important roles of RWP-RKs in both nodulating and non-nodulating plants [7,9,10], the features of these proteins within the NFC, such as protein motifs, gene structures, and evolutionary relationships, have not been investigated in detail. We therefore analyzed the consensus domains of RWP-RK and PB1 in nodule-forming and non-nodule-forming plants separately. These domains are strongly conserved but contain some modified insertions at different positions ( Fig. 2A). Alignment of the domains revealed that the 49th site (K, Lys) and 63th site (R, Arg) are conserved across all RWP-RKs ( Fig. S3 S4).
We predicted the length, isoelectric point, and subcellular localization of each RWP-RK using various databases [24,35]. The isoelectric points of the 292 RWP-RKs range from 4.73 for Drydr146S16094 to 10.6 for Aradu.I1BME, and the molecular weights range from 7.14 kDa for Aradu.I1BME to 156.01 kDa for Aradu.G4SB3. Most proteins were predicted to localize to the nucleus (Table S4).

Phylogeny and characteristics of the RWP-RKs
To investigate the phylogenetic relationships among RWP-RK family members in the 26 species, we constructed a phylogenetic tree of 292 RWP-RKs via the neighbor-joining method with 1000 bootstrap values. To limit issues related to high divergence between proteins, we selected the RWP-RK domains with 30 additional amino acids upstream and downstream of the domains for alignment and phylogenetic analysis. The tree formed six clades based on the relationships of the NLPs and RKDs in A. thaliana. The NLP subfamily clustered into a single clade with three subclades, NLP-1 (AtNLP1, AtNLP5), NLP-2 (AtNLP6, AtNLP7), and NLP-3 (AtNLP8, AtNLP9), containing all NLPs with PB1 domains.
To learn more about the features of each clade of RWP-RKs in the phylogenetic tree, we investigated their gene structures using GSDS 2.0 [36] and predicted additional conserved motifs using MEME 5.0.1 [25]. Due to the lack of exon information in the annotated gff3 file of Trifolium subterraneum (with 13 RWP-RKs), we omitted those data and used the 279 remaining RWP-RKs for analysis. The average number of exons in the NLP subfamily and RKD subfamily is 4.9 (ranging from 2-13) and 4.0  Table S5). Therefore, the average number of exons is higher in the NLPs than in the RKDs.
Of the 50 consensus motifs predicted by MEME, in addition to the three common motifs (including known RWP-RK and PB1 motifs) in both the RKDs and NLPs, the NLP subfamily contains 39 enriched motifs, whereas the RKD subfamily contains only 8 enriched motifs. Each subclade contains some unique motifs, pointing to the diverse roles of RWP-RKs, such as motif #24 in NLP-1, motif #18 in NLP-2, and motif #23 in NLP-3. Based on predictions from NetPhos 3.1 Server [37], motif #24 contains a predicted phosphorylation site (Y) with a score of 0.89; motif #18 contains a predicted phosphorylation site (S) with a score of 1.00; and motif #23 contains multiple predicted phosphorylation sites (S) with high scores. In addition, members of RKD-3 (only in the NFC) contain several unique motifs, including motif #34, motif #37, and motif #41 (Fig. S5, Fig. S6 and Table S6).
These unique motifs in each subclade may increase the number of specific interactions with other proteins.
Statistical analysis of intron phases, that is whether the intron disrupts a codon, revealed that RWP-RKs contain the most phase 0 introns, which cause no disruption of a codon), followed by phase 2 and phase 1, which disrupt the codon between bases 2 and 3 or bases 1 and 2, respectively. However, RKDs contain more phase 1 introns than NLPs with phase 1 introns (Fig. 3C). Overall, the results of exon number and intron phase analyses were consistent with the phylogenetic analysis, but there were some exceptions. For example, Cerca58S27147 has more short additional exons than other members of the NLP-1 clade but lacks additional protein motifs, suggesting that additional exons of this gene might play a regulatory role in Cercis canadensis (Fig. S6).
Comparison of RWP-RKs in non-nodulating vs. nodulating plants A. thaliana has many advantages to study interactions between diazotrophic bacteria and dicots [14], and therefore the knowledge of A. thaliana is helpful to translate biological knowledge from model organisms to crops. To explore the relationship of RWP-RKs in nodulating and non-nodulating plants, we select A. thaliana (non-nodulation), G. max (nodulation), and P. vulgaris (nodulation) for subsequently comparative analysis. We classified the RWP-RKs from A. thaliana (14), G. max (28), and P. vulgaris (12) into three clades: unique RKDs in G. max and P. vulgaris (Clade I), RKDs in all three species (Clade II, including AtRKD1-AtRKD5), and NLPs in all three species (Clade III, including AtNLP1-AtNLP9). Analysis of gene structure revealed that Clade III genes contain many more exons than those of clade I, which is consistent with the finding that additional protein motifs (such as the important PB1 domain) are essential for the functioning of NLPs (Fig. 4A, Fig. S6 and Table S6). In Despite the important roles of nitrates in plant growth, N limitation, including N starvation or low N, is essential for nodulation in legumes [11,12]. To explore the relationship between N limitation and nodulation, we integrated time-series transcriptome datasets from A. thaliana, including root samples treated with KCl (defined as N starvation) and KNO 3 (defined as N supplementation) [38], as well as expression atlases from G. max and P. vulgaris [39,40]. In A. thaliana, all AtRKDs are not expressed in roots, whereas AtNLPs are differentially regulated. Specifically, AtNLP1 and AtNLP3 are downregulated and other AtNLPs are upregulated, indicating that the different NLPs have different effects on plant responses to N starvation (Fig. 4B). NLP genes are expressed in different tissues, with some specifically expressed in the nodules of G. max and P. vulgaris (Glyma.02G311000, Glyma.06G000400, Glyma.04G000600, Glyma.14G001600, Phvul.008G291800, and Phvul.009G115800). These nodule-specific genes clustered together and are closely related to AtNLP1 and AtNLP2 within subclade IIIc (Fig. 4A, C, D). However, AtNLP1 and AtNLP2 showed opposite fold changes in response to N starvation, indicating that regulatory elements have rapidly evolved. Taking the evolutionary relationships and expression patterns of model dicot and legume crops together, we conclude that compared to subclades IIIa and IIIb, the acquisition of additional protein motifs in subclade IIIc has provided the prerequisite for nodule inception and that the further evolution of regulatory elements within IIIc will be crucial for the development of this process.
The connected genes of AtNLPs in coexpression networks provide the prerequisite for nodulation Our results demonstrate that AtNLPs are widely involved in plant responses to N starvation. Weighted correlation network analysis (WGCNA) is a powerful tool for dividing genes with correlated expression patterns into different modules with biological significance [15,27,41]. To investigate the possible relationship between N starvation and nodulation, we explored differences in the modules containing AtNLPs under N starvation vs. N supplementation. We used WGCNA to construct a weighted network from N-starvation datasets with rlog normalization ( GO enrichment analysis of genes in the biological process category revealed no commonly enriched biological process in the five modules ( Fig. 5F and Table S7). In the purple module (with AtNLP2, AtNLP4 and AtNLP9), in which genes were continuously upregulated after 20 min under the treatment of N starvation, we detected a high proportion of unique GO terms associated with transport (20/54), such as "calcium ion transport" (GO:0006816) and "calcium ion transmembrane transport" (GO:0070588), which may be related to "calcium spiking", a symbiotic signaling event. Genes in the blue module (with AtNLP3) were relatively upregulated at the early stage (10-15 min of treatment).
We discovered uniquely enriched GO terms in the blue module, such as "endocytosis" (GO:0006897).
The unique GO terms in the green module (with AtNLP6 and AtNLP7) included many important terms, such as "cell wall organization or biogenesis" (GO:0071554), "cell wall organization" (GO:0071555), "plant-type cell wall organization or biogenesis"(GO:0071669), "cell wall modification" (GO:0042545), "root hair elongation" (GO:0048767), "root hair cell development" (GO:0080147), and "auxinactivated signaling pathway"(GO:0009734). Interestingly, the blue and green modules were both enriched in "symbiosis, encompassing mutualism through parasitism" (GO:0044403), with 36 and 24 genes, respectively (Table S7). Although genes in both modules were highly expressed only at 10-15 min under the treatment of N starvation, the genes enriched for the GO term "symbiosis, encompassing mutualism through parasitism" did not overlap, highlighting the independent roles of genes regulated by AtNLP3, AtNLP6, and AtNLP7 within each module. Also, this result suggested that response to N starvation at early stage may be essential for symbiosis. (ATDBP1, regulation of defense response to virus), AT2G23350 (PAB4, viral process), AT1G60800 (AtNIK3, defense response), and AT3G04720 (AtPR4, defense response to bacterium).
The downregulation of AtNLP3, along with the downregulation of immune response genes, may be beneficial for rhizobial invasion via reducing the immune response. The expression levels of AtNLP6 and AtNLP7, which have high sequence similarity, changed little (fold change < 2) between Nstarvation and N-supplementation conditions. AtNLP7 is a major regulatory element among NLP proteins [42]. Despite their weak upregulation, AtNLP6 and AtNLP7 might function primarily at the protein level rather than the transcriptional level, as previously reported [42].
We constructed a protein association network of AtNLP3, AtNLP6, and AtNLP7 to genes using STRING 11.0 (Fig. 6C,D). While these proteins were connected to genes from GO term GO:0044403 (symbiosis, encompassing mutualism through parasitism), there was no significant association between these genes and other proteins. Additional experiments should be carried out to investigate the interactions of these highly connected genes, such as AT4G13350 (response to virus, connectivity Moreover, AtNLP6 and AtNLP7 were both associated with AT1G13300 (NIGT1/HRS1), which is regulated by AtNLP7 [43]. Although AtNLP6 and AtNLP7 might play redundant roles [44], greater differences in correlation were observed between AtNLP6 and NIGT1/HRS1 than between AtNLP6 and NIGT1/HRS1 under N-starvation and N-supplementation conditions. NIGT1/HRS1 is induced by NO 3 − [43]. The higher correlation between AtNLP6/AtNLP7 and NIGT1/HRS1 and the higher connectivity of  (Table S8), indicate that these genes strongly interact under N starvation conditions.
Most NLPs (AtNLP2-AtNLP7), especially AtNLP5 from the turquoise module, which were significantly upregulated under N starvation (Fig. 4B), showed higher correlations to more genes under N starvation than under N supplementation (absolute pcc ≥ 0.8) (Fig. S9). AtNLP5 (in the turquoise module) was significantly upregulated after 20 min of N starvation, whereas AtNLP3 (in the blue module) was significantly downregulated (Fig. 5C,E); these opposite expression patterns indicate that AtNLPs play diverse roles under N starvation. AtNLP1, AtNLP8, and AtNLP9 without strong correlation to any other genes (absolute pcc ≥ 0.8) under N supplementation (defined as lonely expressed genes) showed differential expression patterns under N starvation vs. N supplementation. The expression patterns of these genes were not significantly correlated to those of other genes under N starvation, indicating that they have lost many connected neighbors.
Among the genes most closely related to nodule-specific NLPs of G. max and P. vulgaris, AtNLP1 and AtNLP2 had opposite expression patterns. AtNLP2 has many connected neighbors, whereas AtNLP1 has completely lost highly connected neighbors ( Fig. 4 and Fig. S9). AtNLP2 reflects the evolutionary imprint of nodule-specific NLP genes. Taking the expression patterns of AtNLP6/AtNLP7 together, these results suggest that the differential connectivities of the NLPs, along with their differential expression patterns, may be essential for nodulation under N starvation. The association of genes with NLP genes under N starvation, which are related biological processes of symbiosis, cell cycle, cell wall organization or biogenesis, and calcium ion transport, might have paved the way for nodulation in the NFC.
Genes regulated under N starvation and nodulation are connected to multiple transcription factors, transcriptional regulators, and protein kinases TFs, transcriptional regulators (TRs), and protein kinases (PKs) are important classes of regulatory proteins associated with numerous aspects of plant growth and development, as well as biotic and abiotic stress responses [45]. To further explore the relationships of NLPs to other TFs, TRs, and PKs under the N-starvation conditions, N supplementation in A. thaliana, and in transcriptome atlaes of G. max and P. vulgaris, we analyzed sub-networks of the NLPs connected to these regulatory proteins with absolute pcc ≥ 0.8. Many more TFs, TRs, and PKs were highly correlated to NLPs under N starvation (1,371 with 674 TFs, 199 TRs, and 498 PKs, with 397,777 edges) vs. N supplementation (288 with 166 TFs, 29 TRs, and 93 PKs, with 7,298 edges), indicating that N starvation strongly influences the TFs, TRs, and PKs connected to NLPs at the transcriptional level (Fig. 7A,B and Table   S10).
The highly correlated TFs, TRs, and PKs were enriched in both the same and unique GO terms.  Table S11). The highly correlated PKs were enriched in "protein phosphorylation" (GO:0006468), "cell communication" (GO:0007154), and "cell surface receptor signaling pathway" (GO:0007166), indicating that N starvation influences plant cell-cell signaling pathways mediated by NLPs ( Fig. 7E and Table S11).
To further explore the possible roles of RWP-RKs in nodule-forming plants, we constructed networks for G. max and P. vulgaris. P. vulgaris contains 9 RWP-RKs, including 7 NLPs and 2 RKDs, which are highly correlated to 130 regulators, with 1,639 edges. G. max contains 16 RWP-RKs, including 14 NLPs and 2 RKDs, which are highly correlated to 270 regulators, with 4,113 edges (Fig. 8A,B and Table   S12). Recent whole-genomic duplication event within G.max affecting RWP-RKs have increased the number of edges of RWP-RKs connected to gene regulators compared to P. vulgaris. Among these RWP-RKs, Phvul.009G115800 and Phvul.008G291800 in P. vulgaris and Glyma.04G000600, Glyma.02G311000, Glyma.14G001600, and Glyma.06G000400 in G. max, which are NIN (nodule inception) genes [6], were highly expressed in separate nodules and are located in separate modules, pointing to their overall upregulation in nodule tissue (Fig. 8C, D and Table S12). These nodulespecific NLPs clustered together in clade IIIc.
Two other genes in G. max, Glyma.13G346300 and Glyma.12G050100, which are closely related to AtNLP8 and AtNLP9 within clade IIIb, were highly expressed in nodules. Interestingly, Phvul.005G15510 in clade IIIb was expressed at higher levels in ineffective N-fixation nodules than in effective N-fixation nodules. This observation suggests that the regulation of NLPs from clade IIIb is also influenced by rhizobia without the capacity for N fixation and that NLPs from clade IIIc are critical for N-fixing nodules.
We also detected divergence within IIIc: Phvul.009G01120, together with Glyma.04G017400 and Glyma.06G017800, which are closely related to AtNLP4 and AtNLP5, also showed the highest expression levels in ineffective N-fixation nodules. Only genes in the small clade closely related to AtNLP1 and AtNLP2 were specifically upregulated in effective N-fixation nodules in both G. max and P. vulgaris (Fig. 4). These findings reflect the functional divergence of NLPs with regard to nodulation, that is, the existence of nodulation with and without N fixation.

Effect of nitrogen on nodulation via the regulation of NLPs in P. vulgaris
Not only are NLPs involved in the nitrate signaling pathway, but they also influence nodule inception [6]. Whether nitrate influences nodulation via the regulation of NLPs mediating nitrate signaling in P.
vulgaris has been unclear. In plants inoculated with Rhizobium tropici treated with different concentrations of nitrate, 10 mM nitrate inhibited nodulation in both early and mature nodules.
Mature nodules from plants treated with 5 mM nitrate appeared to be browner, and plant height was taller, than for plants treated with either 0 mM or 10 mM nitrate (Fig. 9A, B, C and Fig. S10).

Expression analysis of six PvNLP genes in early roots and root-nodule mixtures under different
concentrations of nitrates indicated that Phvul.004G114100, Phvul.008G291800, and Phvul.011G052100 showed higher expression at 5 mM nitrate (low-nitrogen conditions) with inoculation than they did under either 0 mM nitrate (nitrogen-free conditions) or 10 mM nitrate (highnitrogen conditions) regardless of inoculation (Fig. 10A). When the roots were treated with different concentration of nitrates without inoculation, Phvul.009G011200 and Phvul.009G115800 showed highest expression under low-nitrogen as compared to both nitrogen-free and high-nitrogen conditions, whereas the other genes showed gradual inhibition with increasing nitrogen concentration. Finally, Phvul.008G291800, Phvul.009G011200, and Phvul.011G052100 were significantly inhibited under high-nitrogen conditions (Fig. 10B). In mature nodules, Phvul.007G071900 was significantly upregulated under low-nitrogen vs. nitrogen-free conditions, whereas other genes were downregulated or varied in expression pattern (Fig. 10C).

Discussion
Nitrogen (N) is an essential element for crop growth, productivity, and grain quality. N is also a central component of DNA, RNA, proteins, and various metabolic compounds [1]. To biosynthesize the precursors of these biomacromolecules, plants must integrate inorganic carbon (CO 2 ) and nitrogen (NO 3 − /NH 4 + ); however, compared to CO 2 , NO 3 − /NH 4 + is more difficult to obtain from the environment. Therefore, plant growth is generally nitrogen limited in both natural and agricultural environments [46]. The extensive use of synthetic fertilizers in developed countries is expensive and environmentally damaging. By contrast, in developing countries, the lack of fertilizer causes low crop yields, leading to hunger and malnutrition. NLPs of RWP-RKs play important roles in nitrate signaling [4,9,47]. Meanwhile, the NFC is a clade originated at 100 million years ago, in which NIN (a member of RWP-RKs) is reported to be essential for nodulation [7,48]. Therefore, comparative research on RWP-RKs in A. thaliana and NFC which involve in both nitrogen signaling and nodulation is crucial for improving sustainable agriculture and human health now and in the future.

Distribution and features of RWP-RKs in NFC
In contrast to animals, the success of angiosperms is partially attributed to innovations caused by gene or whole-genome duplications [49]. An accurate phylogenetic tree of species is required for evolutionary comparisons of RWP-RKs. It is impossible to deduce a phylogenetic tree from one-to-one corresponding orthologues in that few of these orthogroups in 26 species contain just one orthologue from each species. Therefore, we used a newly developed method for phylogenetic tree construction based on the analysis of gene duplications for all genes [50]. In the phylogenetic tree (Fig. 1A), species from separate orders clustered together, regardless of their ability to undergo nodulation.
These results suggest that the genome-wide duplication of genes is not the direct driver of the evolution of nodulation across the NFC. This is also supported by the finding that the evolutionary relationship between polyploidy and nodulation is not sufficient to make a species able to form nodules [51]. The presence of RWP-RKs with multiple origins (Fig. 1B) might help these species adapt their growth and metabolism in response to fluctuations in nitrogen availability in different habitats, as legumes are exceptionally ecologically diverse [52,53].
In both RKDs and NLPs, there are conservative and non-conservative amino acid sites ( Fig. 2A, Fig.S3 and Fig.S4), which appears to be not relevant to status of nodulation across the species of NFC.
However, the 49th site (K, Lys) and 63th site (R, Arg) are conserved across all RWK-RDs, suggesting that these sites are vital for RWP-RK activity. Also, completely conserved sites (4th K, 13th R, 62th D and 75th D) in PB1 domain is conserved in animals, fungi, amoebas, and plants indicating in diverse biological processes beside plant-specific nitrate signaling and nodulation [54]. In addition, variation of motifs of both the RWP-RK and PB1 domains caused by insertion/deletion may provide novel regulatory function for different RWP-RK members. For example, AtNLP8 with insertion in the RWP-RK domain functions as a master regulator of nitrate-promoted seed germination and is activated by nitrate via a mechanism different from AtNLP7 without insertion [4]. Whether this insertion within RWP-RK of AtNLP8 is involved in this specific mechanism (as is the case for AtNLP6) remains to be explored. Variations in the domain lengths of RWP-RK and PB1 and specific amino acid sites (resulting in noncanonical domains) might mediate noncanonical interactions during various biological events [54]. The diversity of RWP-RKs within species indicates that these proteins play diverse roles in plant responses to nitrates.
Phylogeny of all RWP-RKs from 26 species based on protein sequences showed six subclades into two clades (RKDs and NLPs) (Fig. 3A). The variations in exon number, leading to more protein motifs in NLPs than in those in RKDs (Fig. 3B), highlight the functional novelty of the RWP-RK subclades, as these variations are usually accompanied by the additions/deletions of functional motifs. Meanwhile, high numbers of exons increase the chances for alternative splicing of precursor mRNAs, which is important for gene regulation [55]. In addition, analysis of intron phases of the six subclades (Fig. 3C) suggested that generally number of exons with phase 0 introns for RWP-RKs is higher than those with phase 0 and phase 1 introns. There is an excess of phase 0 introns also found in other cases, supported by prediction of the exon theory of genes [56]. However, the RKDs has more exons with phase 1 introns than NLPs with phase 1 introns, which indicates different origins of intron-exon structures for RKDs and NLPs, respectively [57].

Effects of N-starvation on nodulation mediated by NLPs
Over the past few decades, the model dicot A. thaliana has become an excellent model for studying symbiosis in non-nodulating plants [14]. The most important nodule-bearing legume crops such as G.
max and P. vulgaris have not been studied in as much detail as A. thaliana. RWP-RKs play important roles in both nitrate responses and nodule inception, and they interact with each other to coordinate nitrate signaling and nodulation [6]. Therefore, we felt that a comparison of RWP-RK expression and homology in A. thaliana vs. legume crops (G. max and P. vulgaris) would allow gene functional analyses in model organisms to be applied to nodule-forming crops. Such studies might also facilitate the transfer of the nitrogen-fixation trait into non-nodulating plants to improve NUE [3,58].
Interestingly, AtRKD1-AtRKD4 are highly expressed in reproductive organs, and AtRKD5 has pleiotropic effects on phytohormone pathways, highlighting the regulatory importance of AtRKDs in female gametophyte development [59,60]. Finally, almost RKDs are absent from the separated transcriptome atlas of G. max and P. vulgaris due to the lack of gametophyte tissues (Fig. 4C, D).
Together, our results indicate that the RKDs and NLPs show significant functional diversity in many biological processes.
At the post-transcriptional level, AtNLP7 localizes to the cytoplasm after N starvation, whereas it relocates to the nucleus after N supplementation [42]. Here, we determined that AtNLP7 was downregulated at 30 min and upregulated at 45 min under N-starvation conditions, and its direct targets, AtNLP1 and AtNLP3, were expressed at minimal levels at 30 min of N-starvation treatment.
However, other direct targets of AtNLP7, i.e., AtNLP2 and AtNLP5, were upregulated under Nstarvation conditions. These findings suggest that in addition to having its own cellular localization regulated at the protein level, AtNLP7 in turn functions at the transcriptional level to regulate other AtNLPs.
Modules including different AtNLPs provide differential prerequisites for nodulation. GO term of "endocytosis" (GO:0006897) in the blue module results in the formation of organelle-like structures known as "symbiosomes" during early rhizobial invasion. The term "response to cytokinin" (GO:0009735) in the blue module is also essential for the initiation of nodulation. Auxin signaling is required for rhizobial infection in Medicago truncatula. The cell wall starts to weaken when the growing infection thread is close to the base of the root hair. Therefore, unique genes associated with the term "cell-wall modifications" are also essential for nodule initiation. The GO term "cell cycle" (GO:0051726) in the purple module, also is related to the rhizobial infection [61], was uniquely enriched in the turquoise module. These unique GO terms involved in specific biological processes reflect the diverse roles played by AtNLPs under N-starvation conditions and provide the prerequisite for nodulation.

Simulation of low nitrogen on nodulation mediated by NLPs
It is reported that high nitrate repressed nodulation while nodulation only occurred under low nitrates or free nitrates [11,12,62]. In our experiment, we found the effects of nitrate on nodulation were possibly mediated by regulation of NLPs in P. vulgaris. Higher efficiency of nitrogen-fixation under 5 mM nitrates than that under 0 mM suggests that appropriately low concentrations of nitrates simulate nodulation by increasing nitrogenase activity or nodule number in P. vulgaris. The stimulation of nodulation by low nitrogen treatment has also been observed in G. max [63]. In Lotus japonicus, the NLP NITRATE UNRESPONSIVE SYMBIOSIS 1 (NRSYM1) is a key regulator of the nitrateinduced control of root nodule symbiosis [47]. The phylogenetic tree of NLPs from A. thaliana and P.
vulgaris, as well as NRSYM1 from L. japonicus, indicates that Phvul.007G071900 is closely related to NRSYM1 (Fig. S11). This finding, combined with the uniquely high expression pattern of Phvul.007G071900 in mature nodules under low-nitrogen conditions (Fig. 10C), indicates that Phvul.007G071900 is involved in integrating nitrate signaling and nodule symbiosis in P. vulgaris.
Finally, we determined that Phvul.007G071900 and NRSYM1 are closely related to AtNLP6 and AtNLP7 within clade IIIa (Fig. 4A and Fig. S11). This finding points to the functional similarity of these proteins from clade IIIa in integrating the nitrate-signaling pathway and nodulation. This notion is supported by the observation that transformation with AtNLP6 or AtNLP7 partially rescues the nodulation phenotype of the L. japonicus nrsym1 mutant [47]. Indeed, AtNLP6 or AtNLP7 might be involved in integrating symbiosis and nitrate signaling under N-starvation. Compared to nitrogen-free conditions, we propose that the differential upregulation of genes under low-nitrogen conditions partially explains the stimulatory effects of low-nitrogen treatment on nodulation, which might occur via the upregulation of specific NLPs in P. vulgaris.

Conclusions
Our analysis of and comparative evolution between A. thaliana (an ideal model dicot) vs. NFC

Inference of species tree based on gene duplication events
The complete protein sequences of 26 plant species with available whole-genome sequences were integrated from multiple databases (Table S1). To avoid false annotation as much as possible, only the proteins with complete coding sequences encoded by standard genetic codes were retained. Allagainst-all BLASTP (BLAST 2.7.1+) was conducted for complete proteins from 26 plant species.
STRIDE [16] implemented in OrthoFinder-2.2.7 was used for phylogenetic analysis with A. thaliana as the outgroup based on deduced gene duplication events. Orthogroups descended from a single gene in the last common ancestor were deduced using OrthoFinder-2.2.7 [17].

Identification and characterization of RWP-RK genes
To identify a complete set of the RWP-RK family in the 26 species as much as possible, 131 RWP-RKs from nine taxa covering the plant kingdom were retrieved from PlantTFDB 4.0 [18] [21]. After integration of the results from BLASTP and HMMER and removal of redundancy, the results were further checked using NCBI CDD [22] and the SMRT database [23]. Basic bioinformatics analysis of various features of the proteins including the molecular weight (MW), isoelectric point (pI), and length were performed using ExPASy (http://www.expasy.ch/tools/pi_tool.html), and the subcellular localizations of the proteins were predicted using BUSCA [24]. Additional conserved motifs were identified using the MEME 5.0.1 package [25] with the parameters minimum motif width, 6; maximum motif width, 50; and maximum number of motifs, 50.

Phylogenetic analysis of RWP-RKs
All identified RWP-RKs were aligned using ClustalW implemented in MEGA X. Availability of data and material The plant samples that are used in the present study are available from the corresponding author on reasonable request. The genome information in our study can be found in Table S1 in the supplemental data.
RQ, ZHW and HL conceived and designed the experiments. ZHW, WH, LSY, EDQ, TQY and YXZ performed the bioinformatic analysis, CA, YQG, XRW and JW performed the experiments. ZHW, RQ and HL wrote the paper. All authors read and approved the final manuscript.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Supplemental_figures_tables_191217.rar