Genome-wide identification of poplar malectin/malectin-like domain-containing proteins and in-silico expression analyses find novel candidates for signaling and regulation of wood development


 Background: Malectin domain (MD) is a ligand-binding protein motif of pro- and eukaryotes. It is particularly abundant in Viridiplantae, where it occurs as either a single (MD, PF1721) or tandemly duplicated domain (PF12819) called malectin-like domain (MLD). In herbaceous plants, MD- or MLD-containing proteins (MD proteins) are known to regulate development, reproduction, and resistance to various stresses. However, their functions in woody plants have not yet been studied. To unravel their potential role in wood development, we carried out genome-wide identification of MD proteins in the model tree species black cottonwood (Populus trichocarpa), and analyzed their in-silico expression and co-expression networks.Results: P. trichocarpa had 146 MD genes assigned to 14 different clades, two of which were specific to the genus Populus. 87% of these genes were located on chromosomes, the rest being associated with scaffolds. Based on their protein domain organization, and in agreement with the exon-intron structures, the MD genes identified could be classified into five superclades having the following domains: leucine-rich repeat (LRR)-MD-protein kinase (PK), MLD-LRR-PK, MLD-PK (CrRLK1L), MLD-LRR, and MD-Kinesin. Whereas the majority of MD genes were highly expressed in leaves, particularly under stress conditions, eighteen showed a peak of expression during secondary wall formation and their co-expression networks suggested signaling functions in cell wall integrity, pathogen-associated molecular patterns, calcium, ROS, and hormone pathways.Conclusion: P. trichocarpa MD genes exhibit a variety of domain organizations, and include genes apparently specific to Populus, as well as genes with potential involvement in signaling pathways regulating secondary wall formation.

kinases, extensin-like RLKs, lectin RLKs, and leucine-rich repeat RLKs. RLCKs are cytoplasmic kinases without a transmembrane domain (TMD) and they recognize signaling molecules intracellularly. The RLKs usually function as heterodimers: one subunit with a large extracellular domain interacts with a ligand, and the other, which has a smaller extracellular domain, stabilizes this interaction and enhances signal transduction (Xi et al., 2019).
Among the different clades of plant RLKs, the Catharanthus roseus receptor-like kinase 1-like proteins (CrRLK1Ls) have received signi cant attention as mediators of CWI (reviewed by Wolf  The MLD, which is characteristic of CrRLK1L proteins, and the MD, are also found in other types of plant RLKs (Zhang et al., 2016;Bellande et al., 2017). The MD was rst identi ed in the protein called malectin residing in the endoplasmic reticulum of Xenopus laevis and other animals, where it monitors protein glycosylation by binding di-glucose motifs with a 1,4-, a 1,3-, and a 1,2-linkage in glycosylated proteins (Schallus et al., 2008;. However, the crystal structure of MLD in ANX1, ANX2, and FER indicated an absence of the aromatic residues that interact with di-glucosides in animal MDs, and suggested different ligand speci cities and/or functions of the MLDs in these proteins (Du et  Recently it has been shown that the binding of RALF23 to FER is stabilized by interaction with LORELEI-like GPI-ANCHORED PROTEINS (LLGs) and the formation of such a heterocomplex is required for PTI signaling (Xiao et al., 2019). Moreover, the ectodomain of FER has been shown to bind to the leucine-rich repeat (LRR) domain of LRR-extensin 1 (LRX1) (Dünser et al., 2019) and to pectin .
MD is classi ed as CBM57 in the CAZy database (http://cazy.org). Interestingly, the CBM57 family is greatly expanded in the model tree species Populus trichocarpa compared to the herbaceous model plant Arabidopsis thaliana (Kumar et al., 2019). Moreover, transcript of the CBM57 family members are highly upregulated in developing wood tissues of Populus tremula (Kumar et al., 2019) and Eucalyptus grandis (Pinard et al., 2015). These data suggest that MD/MLD-containing proteins (subsequently called MD proteins) have important functions in trees. We hypothesize that MD proteins are involved in the regulation of cell wall formation during secondary growth via pathways analogous to those reported for primary growth (Wolf and Höfte, 2014;Hamann, 2015;Li et al., 2016;Wolf, 2017), and that they participate in signaling cascades related to stress responses and developmental processes in trees. To nd candidates for receptors active during secondary growth, we rst carried out genome-wide identi cation of P. trichocarpa genes with predicted MD and MLD. Second, we used expression datasets from different organs (Sundell et al., 2015, Immanen et al., 2016 and high-resolution expression data for wood developmental zones in P. tremula (Sundell et al., 2017) to identify those MD proteins that are expressed during wood biosynthesis, and to classify them according to expression at speci c stages of xylogenesis. Finally, we identi ed co-expression networks for the MD proteins expressed during secondary wall deposition, which include their putative interactors. Our analyses provide a framework within which to identify CWI monitoring, stress-response, and other signaling pathways operating during wood development.

Results And Discussion
Identi cation of MD proteins in P. trichocarpa and their classi cation Searches of the P. trichocarpa and A. thaliana genomes for MD proteins resulted in the identi cation of 146 and 87 gene models respectively (Table S1 and  The P. trichocarpa proteins identi ed were analyzed for sequence similarity using protein sequence alignment and phylogenetic analysis, revealing the presence of 12 clades supported by at least 87 % of bootstrap replicates, and three ungrouped sequences, two of which had orthologous sequences in A. thaliana, and were therefore considered to be two single member clades III and XI ( Fig. 1 and 2). The sequences were numbered PtMD1 to PtMD146 according to their sequential appearance in the intraspeci c phylogenetic tree (Fig. 1). The predicted protein properties and probable subcellular localizations of PtMD proteins are listed in Table S1. The deduced sequence lengths ranged from 274 to 1192 amino acids, and isoelectric points (pIs) ranged from 4.55 to 9.49. Seventy-six out of the 146 PtMD proteins had a signal peptide (SP) cleavage site. The SP was not found in any members of clades I and XIV. Thirteen of the PtMDs were predicted to be soluble proteins, with the predicted localization of six of them being extracellular, six -including all members of clade XIV -being cytoplasmic and one being peroxisomal. Out of 133 membrane proteins, one was predicted to localize in the endoplasmic reticulum.
Chromosomal distribution of MD genes in P. trichocarpa 127 out of the 146 poplar MD gene models were mapped to chromosomes, while 19 gene models were located on ve different scaffolds (Fig. 4). The majority of chromosomal genes (79) were present in clusters comprising between two and eleven genes ( Fig. 4; Table S3). Clusters were also present on the scaffolds. The clusters consisted of tandem repeats having the same or reverse orientations. This large number of tandem duplications strongly suggests that the main mechanism of MD family expansion in P. trichocarpa is via local gene duplication, rather than whole genome duplications. Gene multiplication at a given locus could occur via an unequal crossing over mechanism, which after multiple rounds would result in large numbers of tandemly repeated sequences. Such a mechanism was proposed as featuring  (Table S3).
Tandem duplications allow rapid gene family expansion and the creation of novel alleles and are thought to be particularly important for the co-evolution of R and Avr genes in hosts and their parasites (Holub 2001; Choi et al., 2016). Partial duplications with omission of some domains form a key mechanism for neofunctionalization. Such a process apparently characterized the poplar MD family, since there were seven out of 16 clusters that included genes with LRR and kinase domains along with closely related members without LRR domains ( Fig. 4; Table S3).

Analysis of exon-intron structures of PtMD genes
Exon-intron structure re ects the evolutionary history of genes; hence we analyzed the exon-intron organization of PtMDs. Although the majority of clades displayed very diverse numbers of introns (Table  S4; Fig. 1), the maximum number of introns for clades within a superclade was similar. The superclade LRR-MD-PK, comprising clades I-VIII, had genes with very large numbers of introns (maximum between 23 and 26); superclade MLD-LRR-PK (clades IX-XI) had at most 15 introns; superclade MLD-PK (clade XII or the CrRLK1L group) contained genes with up to two, but typically without any, introns; and superclades MLD-LRR (clade XII) and MD-Kin (clade XIV) had at most 10 and 17 introns respectively (Table S4;  The phylogenetic tree of MD proteins was generally consistent between P. trichocarpa and A. thaliana with bootstrap values of greater than 76 % for the main clades (Fig. 2). Three exceptions were noted, however: one orphan protein PtMD89, clade VI that included PtMD41-PtMD56, and clade VIII with PtMD62-PtMD71. These poplar genes apparently did not have orthologs in A. thaliana. Close homologs to PtMD89 were found primarily among other trees, such as several Populus and Prunus species, Quercus suber, and Juglans regia, suggesting that PtMD89 may have a specialized function in trees. Clade VI included clusters of tandemly duplicated genes located on chromosome 19 ( Fig. 4; Table S3). BLAST searches using the PLAZA database (https://bioinformatics.psb.ugent.be/plaza/) revealed the presence of similarly replicated genes in some other species, such as Hevea brasiliensis, Manihot esculenta, Ricinus communis and Prunus persica, but not in Arabidopsis. Clade VIII is speci c to some Rosids, where its genes are also tandemly and block duplicated. The largest representation of this clade outside poplar is found in Hevea brasiliensis and Citrus clementina, but there are no representative genes in Arabidopsis or other Brassicaceae. Taking into consideration the conclusion that the genes of clade VI and VIII had apparently undergone tandem duplication events in the P. trichocarpa lineage after its separation from that of A. thaliana, it is possible that they represent specialized genes, such as R genes important for immunity, that co-evolved with poplar symbionts and/or pathogens (Holub, 2001).
Besides identifying clades not represented in Arabidopsis, we found that the relative clade sizes (number of genes per clade relative to genome size) show some differences between the two species (Fig. 2).
Clade IX was expanded in A. thaliana, whereas clade I was expanded in P. trichocarpa (c 2 test at P≤0.05).

Expression of PtMDs in different organs of Populus
RNA sequencing datasets available for aspen species were analyzed to reveal differential expression of PtMD genes among different organs and tissues. Out of the 146 genes, 145 were expressed at least in one of the organs and tissues tested, and the variance-stabilized transformation (VST) of expression values are shown in Table S5. Interestingly, the majority of PtMD genes (101) showed maximum expression in leaves. Moreover, many of them showed the highest expression in leaves exposed to abiotic/biotic stress, such as beetle (32), drought (8) or mechanical damage (11). Nine PtMD genes were most highly expressed in roots exposed to drought. Genes with maximal expression values detected in stressed organs were distributed among clades I, II, IV, V, VI, VII, VIII, X, XII, and XIII ( Fig. 2), suggesting stress-response functions for these clades. Interestingly, no gene that was maximally expressed in stressed organs was found in clades III, IX, XI or XIV, suggesting their involvement in other types of signaling. Several PtMD genes were most highly expressed in the vegetative growing organs: young roots (11) or leaves (4) ( Fig. 5; Table S5: "root-control" and "expanding leaves", respectively). Eight genes, all from clades XII and XIII, were most highly expressed in female owers at various developmental stages, and four in mature seeds. The genes highly expressed in expanding female ower buds or in mature seeds were in many cases also highly expressed in developing secondary tissues, vascular cambium or developing secondary xylem and phloem ( Fig. 5 and Table S5).

PtMDs involved in wood biosynthesis
To investigate the expression of PtMDs during different stages of wood biosynthesis, we used the AspWood database (http://aspwood.popgenie.org/aspwood-v 3.0/), which provides data on high-spatialresolution transcript abundance in developing secondary xylem and phloem tissues of aspen (Sundell et al., 2017). Only 89 PtMDs (61%) were found to be expressed in developing secondary vascular tissues (Table S6), with the majority exhibiting distinct patterns of expression, clustering in ten expression groups (Table S6; Fig. 6). This clustering indicates that certain sets of PtMDs have speci c functions at certain stages of secondary vascular development. Some of the PtMD genes expressed in secondary vascular tissue also exhibited high expression under diverse stress conditions in leaves or roots (Table S6, Fig 2).
The largest group of PtMD genes (50) that were expressed in secondary vascular tissue showed a peak of expression in the phloem (Table S6; Fig. 4). These genes were mostly from superclades LRR-MD-PK, MLD-PK (CrRLK1L), and MLD-LRR-PK. Cambium and radial expansion zones were the zones characterized by the greatest variety of PtMD transcripts including members of superclades MD-Kin, LRR-MD-PK, MLD-LRR, and MLD-PK (CrRLK1L). In contrast, PtMD genes having a peak of expression at the transition between primary and secondary wall deposition were mostly from the MLD-PK (CrRLK1L) group. Intriguingly, the genes with maximum expression during secondary wall deposition were expressed at relatively low levels and many of them belonged to clade I of PtMDs, which lacks LRR. PtMD genes with the highest expression in the maturation zone were mostly from clades V and XII.

Networks of xylogenesis-related PtMD genes
To nd putative partners involved in signaling pathways together with the xylogenesis-related PtMD genes, we analyzed co-expression networks of PtMD genes identi ed as being expressed during xylogenesis. Ten PtMD genes having a peak of expression in the cambium-radial expansion zone and primary to secondary transition zone (CA-RE/PW-SW), and eight genes from clusters PW-SW/Secondary Wall and Secondary Wall (Fig. 6, Table S6), representing, respectively, the early and main stages of secondary wall deposition were used as baits for network analyses.
The baits for the CA-RE/PW-SW zones formed ve separate networks ( Fig. 7A; Table S7), the largest being that of PtMD126 -one of the two poplar orthologs of AtFER. It included several candidates for functioning in signaling by phosphate relay and ROS, and for regulation of cell wall development. Apoplastic ROS in wood forming tissues could have a double role, in signaling and in regulation of lignin polymerization. Thus the PtMD126 network included a homolog of PBS1-LIKE 19 (AtPBL19), encoding a RLCK of subfamily VII-4, which signals a response to chitin perceived by CHITIN ELICITOR RECEPTOR KINASE 1 (AtCERK1) through a phosphate relay , and ROS production . Homologs of AtTGA1 and AtTGA7, which encode basic leucine zipper transcription factors involved in oxidative-stress mediated responses to biotrophic and necrotrophic pathogens (reviewed by Gatz, 2013), were, respectively, positively and negatively correlated with PtMD126 ( Fig. 7A; Table 1). The oxidation state of AtTGA1 is regulated by a glutaredoxin, AtROXY19 , the homolog of which has been found to respond to altered secondary wall xylan in aspen (Ratke et al., 2018), suggesting that the PtMD126 network might include candidates for sensing secondary wall integrity. AtTGA1 interacts with the BLADE-ON-PETIOLE 1 and 2 (AtBOP1/2) transcription factors (Wang et al., 2019b), which are known to regulate xylem ber differentiation (Liebsch et al., 2014). Moreover, the network includes a homolog of the BEL1-LIKE HOMEODOMAIN 8 (AtBLH8) transcription factor, which controls expression of BOP1 (Khan et al., 2015) ( Fig. 7A; Table 1). The network also includes a homolog of the gene encoding GROWTH-REGULATING FACTOR9 (AtGRF9), a 14-3-3 protein that regulates developmental programs and stress signaling by binding phosphoproteins and regulating their activities ( Table 1) further supports the involvement of GRF/14-3-3 genes in the PtMD126 network.
A separate large network was formed by neighbors of PtMD98 -one of the two poplar orthologs of AtTHE1 ( Fig. 7A; Table 1; Table S7). This network comprised genes related to hormonal signaling by IAA and GA, and to the regulation of xylogenesis. One example is a homolog of AtLAX3, which encodes an auxin in ux carrier (Swarup et al., 2008). Another is a homolog of AtAGC1-12, encoding a kinase phosphorylating the auxin e ux carrier AtPIN1 (Haga et al., 2018). We have also identi ed a homolog of AtGASA4 involved in GA responses and redox regulation (Rubinovich and Weiss, 2010). It is noteworthy that GA responses and GASA genes were also found to be upregulated in response to a secondary wall xylan defect in aspen (Ratke et al., 2018). Moreover, the co-expression network included an LRR-RLK homologous to AtPXY-CORRELATED 1 (AtPXC1), which is required for secondary wall deposition .
The network of PtMD94 -the clade XII member related to AtHERK1 and AtCVY1 -included a homolog of AtRALFL31 ( Fig. 7A; Table 1; Table S7). RALF genes encode hormone peptides that signal developmental processes and stress responses by interacting with CrRLK1Ls. AtRALFL31 belongs to subfamily IIIA which includes as yet uncharacterized members, but both AtRALFL31 and Potri.017G059500 have the conserved YISY motif essential for interaction with AtFER (Campbell and Turner, 2017). Thus, Potri.017G059500 could potentially encode a peptide hormone recognized by PtMD94. The PtMD94 network also included other candidates for signaling.  (Antignani et al., 2015). Interestingly, a homolog to another PtMD94 network member encodes ROOT HAIR DEFECTIVE 4 (AtRHD4) which mediates polar localization of AtRABA4B (Thole et al., 2008). Consequently, it seems likely that these PtMD94 network members are indeed functionally linked within the same network.
Two other PtMD genes expressed during early secondary wall biosynthesis, PtMD88 and PtMD145, formed small networks, which included important regulatory genes in xylem cell differentiation ( Fig. 7A; Table 1; Table S7). One of them was the homolog of the master spatial regulator of vascular differentiation, PHLOEM INTERCALATED WITH XYLEM (AtPXY), encoding an LRR-RLK which signals tracheid fate upon binding the small CLE peptide AtTDIF (Fisher and Turner, 2007). The other was a homolog of VASCULAR-RELATED RECEPTOR-LIKE KINASE 1 (AtVRLK1) (Huang et al., 2018) which is probably responsible for the switch between xylem cell expansion and secondary wall deposition.
The late secondary wall-expressed baits formed ve networks ( Fig. 7B; Table 1; Table S8). The largest of these was associated with two PtMD genes, PtMD129, a clade XII member related to AtANX1 and AtANX2, and PtMD137, which encodes an LRR-RLK, from clade XIII. Orthologs of key signaling-related genes were included within this network. One of them was CALCIUM PERMEABLE STRESS-GATED CATION CHANNEL 1 (AtCSC1). Stretch-activated Ca 2+ channels have been predicted to be important players in CWI (Engelsdorf and Hamann, 2014). AtCSC1 belongs to a newly characterized family of stretch-activated Ca 2+ channels conserved in eukaryotes (Hou et al., 2014; Liu et al., 2018). In addition, we found Potri.015G108700/AT5G61820, encoding an uncharacterized NOD19-like protein, which has been implicated in responses to cold stress downstream of mechanosensitive Ca 2+ channels (Mori et al., 2018). The aspen homolog of AtCSC1 is thus a promising candidate for a secondary wall damage sensor. Another important signaling-related homolog is TAPETUM DETERMINANT 1 (AtTPD1), which encodes a small peptide hormone that is recognized by an RLK complex consisting of AtEMS1 and AtSERK1/2 to activate transcription factors of the BES1 family (Chen et al., 2019). Moreover, a homolog of Arabidopsis ROP (RHO OF PLANTS) GUANINE NUCLEOTIDE EXCHANGE FACTOR 7 (AtROPGEF7) was among the hits for PtMD137. AtROPGEF7 interacts with the kinase domain of AtFER, mediating downstream NADPH oxidase-dependent ROS signaling which is needed for polarized cell growth (Duan et al., 2010). Finally, we identi ed a homolog of Arabidopsis GT-2 Like (AtGT2L), which encodes a Ca 2+dependent calmodulin (CaM)-binding trihelix transcription factor involved in plant abiotic stress signaling (Xi et al. 2012). In addition to signaling-related genes, the PtMD129-PtMD137 network included some key cell fate regulator proteins ( Fig. 7B; Table 1; Table S8). One of these was the homolog of WIP DOMAIN PROTEIN 5 (AtWIP5), which encodes a zinc-nger protein involved in root patterning downstream of auxin (Crawford et al., 2015) and ROS signaling (Miao et al., 2004). Another was a homolog of Arabidopsis SQUINT (AtSQN), which encodes a cyclophilin 40-like protein that promotes the accumulation of miRNAs miR156 and miR172, targeting master regulatory genes in organ development (Smith et al. 2009;Prunet et al., 2015). The network also included orthologs of two genes encoding master transcriptional regulators, AtMYB46 and AtMYB83, which activate the secondary wall program (Zhong and Ye, 2012). Both these genes showed positive correlation with PtMD129. In contrast, orthologs of three genes with roles in cell division were negatively correlated with PtMD129 (Table 1). This supports the hypothesis that secondary wall integrity signaling results in coordination between cell division and secondary wall formation activities in developing wood (Ratke et al., 2018).
The network for PtMD110, which together with PtMD111 forms a pair orthologous to AtHERK2, included a homolog of the Arabidopsis gene encoding the transcription factor BES1-INTERACTING MYC-LIKE1 (AtBIM1) (Fig. 7B; Table 1; Table S8), which mediates brassinosteroid signaling (Chandler et al., 2009). Several other genes discussed above can be linked to BR-dependent or BES-related BR-independent signaling (Table 1) (Table S1) and expanded the set for A. thaliana (Table S2).
In total, 146 MD genes were found in P. trichocarpa and they were assigned to fourteen clades based on sequence similarity, and to ve superclades based on predicted protein domain organization and intronexon structures (Figs. 1 and 2). The variety of MD protein structures re ects their range of different functions in plants.
Certain MD genes appeared to be speci c either to trees or to the Populus lineage and absent from Arabidopsis. The prevalence of tandem duplications within the MD gene family, which apparently led to family expansion, may have created conditions conducive to gene neofunctionalization and rapid evolution (Choi et al., 2016;Schaper and Anisimova, 2015).
The majority of the poplar MD genes were found to be highly expressed in leaves, particularly those subjected to biotic and abiotic stress conditions (Fig. 5), supporting their role in stress signaling. Detailed analysis of expression in wood forming tissues revealed subsets upregulated in xylem cells during secondary wall deposition (Fig. 6). These genes, not unexpectedly, include candidates for the sensing of cell wall integrity. We identi ed their co-expression networks revealing potential molecular pathways in which these MD genes might participate to ensure the coordination of secondary wall formation (Table  1). This study provides a framework for future investigations aiming at elucidating stress and developmental signaling pathways operating in trees.

Materials And Methods
Identi cation of P. trichocarpa proteins with malectin and malectin-like domains The MD proteins of black cottonwood (P. trichocarpa Torr. & A. Gray) were identi ed by BLAST searches in the genome browser of the PopGenIE database (http://popgenie.org) containing P. trichocarpa genome assembly v3.0, using as baits the P. trichocarpa proteins containing Pfam domains 11721 and 12819, corresponding to MD and MLD respectively, retrieved from the Pfam database (https://pfam.xfam.org) (El-Gebali et al., 2019). The same approach was applied to A. thaliana using the TAIR database (v. 10.0) for BLAST searches (http://www.arabidopsis.org/). The presence of MDs/MLDs in the proteins selected for both P. trichocarpa and A. thaliana was con rmed using the CDvist web tool (http://cdvist.zhulinlab.org) (Adebali et al., 2015), which also served to identify other conserved domains in these proteins. The amino acid sequence lengths, molecular weights, isoelectric points and indices of protein stability of the putative proteins were calculated using the ProtParam tool provided on the ExPASy website (https://web.expasy.org/protparam/). The presence of signal peptides and subcellular localization were predicted with the SignalP 4.

PtMDs involved in wood biosynthesis
The AspWood high-spatial-resolution RNA-Seq dataset (Sundell et al., 2017) was used for analysis of expression of PtMDs during wood biosynthesis. Identity of wood developmental zones was based on the expression of marker genes (Sundell et al., 2017). A heatmap of PtMD expression in wood developmental zones was constructed for one representative tree (tree 1) using the AspWood server (http://aspwood.popgenie.org/aspwood-v3.0/).

Co-expression analysis
Ten PtMD genes showing maximum expression in the CA-RE/PW-SW zone, and eight genes with maximum expression in either the PW-SW/secondary wall zone or the secondary wall zone ( Figure 6; Table S6) were used as 'Guide Genes' to obtain co-expression networks for developing secondary tissues, using the AspWood website (http://aspwood.popgenie.org/aspwood-v3.0/). The corresponding GraphML les were generated using the ExNet tool (http://popgenie.org/exnet) with a Z-score threshold of 5.0, and visualized using Cytoscape 3.4.0 (Shannon et al., 2003).

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analyzed during this study are included in this published article [and its supplementary information les].

Competing interests
The authors declare that they have no competing interests.       Heatmap of PtMD gene expression patterns in different organs of aspen. Data were retrieved from the aspen expression atlas available at http://popgenie.org and are listed in Table S5.