Identification of MD proteins in P. trichocarpa and their classification
Searches of the P. trichocarpa and A. thaliana genomes for MD proteins resulted in the identification of 146 and 87 gene models respectively (Table S1 and S2). Previous analyses identified 62 MD genes in strawberry (Zhang et al., 2016), 74 in A. thaliana (Bellande et al., 2017; Sultana et al., 2019), and 84 in rice (Jing et al., 2020).
The P. trichocarpa proteins identified were analyzed for sequence similarity using protein sequence alignment and phylogenetic analysis, revealing the presence of 12 clades supported by at least 87 % of bootstrap replicates, and three ungrouped sequences, two of which had orthologous sequences in A. thaliana, and were therefore considered to be two single member clades III and XI (Fig. 1 and 2). The sequences were numbered PtMD1 to PtMD146 according to their sequential appearance in the intraspecific phylogenetic tree (Fig. 1). The predicted protein properties and probable subcellular localizations of PtMD proteins are listed in Table S1. The deduced sequence lengths ranged from 274 to 1192 amino acids, and isoelectric points (pIs) ranged from 4.55 to 9.49. Seventy-six out of the 146 PtMD proteins had a signal peptide (SP) cleavage site. The SP was not found in any members of clades I and XIV. Thirteen of the PtMDs were predicted to be soluble proteins, with the predicted localization of six of them being extracellular, six - including all members of clade XIV - being cytoplasmic and one being peroxisomal. Out of 133 membrane proteins, one was predicted to localize in the endoplasmic reticulum.
Chromosomal distribution of MD genes in P. trichocarpa
127 out of the 146 poplar MD gene models were mapped to chromosomes, while 19 gene models were located on five different scaffolds (Fig. 4). The majority of chromosomal genes (79) were present in clusters comprising between two and eleven genes (Fig. 4; Table S3). Clusters were also present on the scaffolds. The clusters consisted of tandem repeats having the same or reverse orientations. This large number of tandem duplications strongly suggests that the main mechanism of MD family expansion in P. trichocarpa is via local gene duplication, rather than whole genome duplications. Gene multiplication at a given locus could occur via an unequal crossing over mechanism, which after multiple rounds would result in large numbers of tandemly repeated sequences. Such a mechanism was proposed as featuring particularly prominently in various LRR gene families (Schaper and Anisimova, 2015) including LRR-RLK (Shiu and Bleeker 2001; Zan et al., 2013; Zulawski et al., 2014; Zhang et al., 2016, Wang et al., 2019a) and R genes (Choi et al., 2016). Indeed, 11 out of our16 clusters of PtMD genes had members with LRR domain(s) (Table S3).
Tandem duplications allow rapid gene family expansion and the creation of novel alleles and are thought to be particularly important for the co-evolution of R and Avr genes in hosts and their parasites (Holub 2001; Choi et al., 2016). Partial duplications with omission of some domains form a key mechanism for neofunctionalization. Such a process apparently characterized the poplar MD family, since there were seven out of 16 clusters that included genes with LRR and kinase domains along with closely related members without LRR domains (Fig. 4; Table S3).
Analysis of exon-intron structures of PtMD genes
Exon-intron structure reflects the evolutionary history of genes; hence we analyzed the exon-intron organization of PtMDs. Although the majority of clades displayed very diverse numbers of introns (Table S4; Fig. 1), the maximum number of introns for clades within a superclade was similar. The superclade LRR-MD-PK, comprising clades I-VIII, had genes with very large numbers of introns (maximum between 23 and 26); superclade MLD-LRR-PK (clades IX-XI) had at most 15 introns; superclade MLD-PK (clade XII or the CrRLK1L group) contained genes with up to two, but typically without any, introns; and superclades MLD-LRR (clade XII) and MD-Kin (clade XIV) had at most 10 and 17 introns respectively (Table S4; Fig. 1). Lack, or low frequency, of introns in CrRLK1L genes has also been observed in other species including strawberry, Arabidopsis and rice (Zhang et al., 2016; Bellande et al., 2017; Jing et al., 2020). Thus the exon-intron organization of poplar MD genes supported their grouping into superclades, which represent ancestral diversification of plant MD genes.
The phylogenetic tree of MD proteins was generally consistent between P. trichocarpa and A. thaliana with bootstrap values of greater than 76 % for the main clades (Fig. 2). Three exceptions were noted, however: one orphan protein PtMD89, clade VI that included PtMD41-PtMD56, and clade VIII with PtMD62-PtMD71. These poplar genes apparently did not have orthologs in A. thaliana. Close homologs to PtMD89 were found primarily among other trees, such as several Populus and Prunus species, Quercus suber, and Juglans regia, suggesting that PtMD89 may have a specialized function in trees. Clade VI included clusters of tandemly duplicated genes located on chromosome 19 (Fig. 4; Table S3). BLAST searches using the PLAZA database (https://bioinformatics.psb.ugent.be/plaza/) revealed the presence of similarly replicated genes in some other species, such as Hevea brasiliensis, Manihot esculenta, Ricinus communis and Prunus persica, but not in Arabidopsis. Clade VIII is specific to some Rosids, where its genes are also tandemly and block duplicated. The largest representation of this clade outside poplar is found in Hevea brasiliensis and Citrus clementina, but there are no representative genes in Arabidopsis or other Brassicaceae. Taking into consideration the conclusion that the genes of clade VI and VIII had apparently undergone tandem duplication events in the P. trichocarpa lineage after its separation from that of A. thaliana, it is possible that they represent specialized genes, such as R genes important for immunity, that co-evolved with poplar symbionts and/or pathogens (Holub, 2001).
Besides identifying clades not represented in Arabidopsis, we found that the relative clade sizes (number of genes per clade relative to genome size) show some differences between the two species (Fig. 2). Clade IX was expanded in A. thaliana, whereas clade I was expanded in P. trichocarpa (c2 test at P≤0.05).
Expression of PtMDs in different organs of Populus
RNA sequencing datasets available for aspen species were analyzed to reveal differential expression of PtMD genes among different organs and tissues. Out of the 146 genes, 145 were expressed at least in one of the organs and tissues tested, and the variance-stabilized transformation (VST) of expression values are shown in Table S5. Interestingly, the majority of PtMD genes (101) showed maximum expression in leaves. Moreover, many of them showed the highest expression in leaves exposed to abiotic/biotic stress, such as beetle (32), drought (8) or mechanical damage (11). Nine PtMD genes were most highly expressed in roots exposed to drought. Genes with maximal expression values detected in stressed organs were distributed among clades I, II, IV, V, VI, VII, VIII, X, XII, and XIII (Fig. 2), suggesting stress-response functions for these clades. Interestingly, no gene that was maximally expressed in stressed organs was found in clades III, IX, XI or XIV, suggesting their involvement in other types of signaling. Several PtMD genes were most highly expressed in the vegetative growing organs: young roots (11) or leaves (4) (Fig. 5; Table S5: “root-control” and “expanding leaves”, respectively). Eight genes, all from clades XII and XIII, were most highly expressed in female flowers at various developmental stages, and four in mature seeds. The genes highly expressed in expanding female flower buds or in mature seeds were in many cases also highly expressed in developing secondary tissues, vascular cambium or developing secondary xylem and phloem (Fig. 5 and Table S5).
PtMDs involved in wood biosynthesis
To investigate the expression of PtMDs during different stages of wood biosynthesis, we used the AspWood database (http://aspwood.popgenie.org/aspwood-v 3.0/), which provides data on high-spatial-resolution transcript abundance in developing secondary xylem and phloem tissues of aspen (Sundell et al., 2017). Only 89 PtMDs (61%) were found to be expressed in developing secondary vascular tissues (Table S6), with the majority exhibiting distinct patterns of expression, clustering in ten expression groups (Table S6; Fig. 6). This clustering indicates that certain sets of PtMDs have specific functions at certain stages of secondary vascular development. Some of the PtMD genes expressed in secondary vascular tissue also exhibited high expression under diverse stress conditions in leaves or roots (Table S6, Fig 2).
The largest group of PtMD genes (50) that were expressed in secondary vascular tissue showed a peak of expression in the phloem (Table S6; Fig. 4). These genes were mostly from superclades LRR-MD-PK, MLD-PK (CrRLK1L), and MLD-LRR-PK. Cambium and radial expansion zones were the zones characterized by the greatest variety of PtMD transcripts including members of superclades MD-Kin, LRR-MD-PK, MLD-LRR, and MLD-PK (CrRLK1L). In contrast, PtMD genes having a peak of expression at the transition between primary and secondary wall deposition were mostly from the MLD-PK (CrRLK1L) group. Intriguingly, the genes with maximum expression during secondary wall deposition were expressed at relatively low levels and many of them belonged to clade I of PtMDs, which lacks LRR. PtMD genes with the highest expression in the maturation zone were mostly from clades V and XII.
Networks of xylogenesis-related PtMD genes
To find putative partners involved in signaling pathways together with the xylogenesis-related PtMD genes, we analyzed co-expression networks of PtMD genes identified as being expressed during xylogenesis. Ten PtMD genes having a peak of expression in the cambium-radial expansion zone and primary to secondary transition zone (CA-RE/PW-SW), and eight genes from clusters PW-SW/Secondary Wall and Secondary Wall (Fig. 6, Table S6), representing, respectively, the early and main stages of secondary wall deposition were used as baits for network analyses.
The baits for the CA-RE/PW-SW zones formed five separate networks (Fig. 7A; Table S7), the largest being that of PtMD126 – one of the two poplar orthologs of AtFER. It included several candidates for functioning in signaling by phosphate relay and ROS, and for regulation of cell wall development. Apoplastic ROS in wood forming tissues could have a double role, in signaling and in regulation of lignin polymerization. Thus the PtMD126 network included a homolog of PBS1-LIKE 19 (AtPBL19), encoding a RLCK of subfamily VII-4, which signals a response to chitin perceived by CHITIN ELICITOR RECEPTOR KINASE 1 (AtCERK1) through a phosphate relay (Bi et al., 2018), and ROS production (Rao et al., 2018). Homologs of AtTGA1 and AtTGA7, which encode basic leucine zipper transcription factors involved in oxidative-stress mediated responses to biotrophic and necrotrophic pathogens (reviewed by Gatz, 2013), were, respectively, positively and negatively correlated with PtMD126 (Fig. 7A; Table 1). The oxidation state of AtTGA1 is regulated by a glutaredoxin, AtROXY19 (Li et al., 2019), the homolog of which has been found to respond to altered secondary wall xylan in aspen (Ratke et al., 2018), suggesting that the PtMD126 network might include candidates for sensing secondary wall integrity. AtTGA1 interacts with the BLADE-ON-PETIOLE 1 and 2 (AtBOP1/2) transcription factors (Wang et al., 2019b), which are known to regulate xylem fiber differentiation (Liebsch et al., 2014). Moreover, the network includes a homolog of the BEL1-LIKE HOMEODOMAIN 8 (AtBLH8) transcription factor, which controls expression of BOP1 (Khan et al., 2015) (Fig. 7A; Table 1). The network also includes a homolog of the gene encoding GROWTH-REGULATING FACTOR9 (AtGRF9), a 14-3-3 protein that regulates developmental programs and stress signaling by binding phosphoproteins and regulating their activities (Mayfield et al., 2007; Liu et al., 2014; Omidbakhshfard et al., 2018). The presence of a homolog of IMPORTIN b-4 (AtIMB4), which is required to transport GRF-INTERACTING FACTOR 1 (AtGIF1) to the nucleus (Liu et al., 2019) (Fig. 7A; Table 1) further supports the involvement of GRF/14-3-3 genes in the PtMD126 network.
A separate large network was formed by neighbors of PtMD98 – one of the two poplar orthologs of AtTHE1 (Fig. 7A; Table 1; Table S7). This network comprised genes related to hormonal signaling by IAA and GA, and to the regulation of xylogenesis. One example is a homolog of AtLAX3, which encodes an auxin influx carrier (Swarup et al., 2008). Another is a homolog of AtAGC1-12, encoding a kinase phosphorylating the auxin efflux carrier AtPIN1 (Haga et al., 2018). We have also identified a homolog of AtGASA4 involved in GA responses and redox regulation (Rubinovich and Weiss, 2010). It is noteworthy that GA responses and GASA genes were also found to be upregulated in response to a secondary wall xylan defect in aspen (Ratke et al., 2018). Moreover, the co-expression network included an LRR-RLK homologous to AtPXY-CORRELATED 1 (AtPXC1), which is required for secondary wall deposition (Wang et al., 2013).
The network of PtMD94 – the clade XII member related to AtHERK1 and AtCVY1 - included a homolog of AtRALFL31 (Fig. 7A; Table 1; Table S7). RALF genes encode hormone peptides that signal developmental processes and stress responses by interacting with CrRLK1Ls. AtRALFL31 belongs to subfamily IIIA which includes as yet uncharacterized members, but both AtRALFL31 and Potri.017G059500 have the conserved YISY motif essential for interaction with AtFER (Campbell and Turner, 2017). Thus, Potri.017G059500 could potentially encode a peptide hormone recognized by PtMD94. The PtMD94 network also included other candidates for signaling. For example, there was a homolog of PLANT U-BOX 13 (AtPUB13), which encodes an E3 ligase involved in signal-activated ubiquitination and subsequent degradation of different receptors including ABA INSENSITIVE 1 (AtABI1) (Kong et al., 2015), BRASSINOSTEROID INSENSITIVE 1 (AtBRI1) (Zhou et al., 2018), LYSM-CONTAINING RECEPTOR-LIKE KINASE 5 (AtLYK5) (Liao et al., 2017), and FLAGELLIN-SENSITIVE 2 (AtFLS2) (Liu et al., 2012; Antignani et al., 2015). The ubiquitination of flg22-bound AtFLS2 by AtPUB13 depends on its interactor protein RAB GTPASE HOMOLOG A 4B (AtRABA4B) (Antignani et al., 2015). Interestingly, a homolog to another PtMD94 network member encodes ROOT HAIR DEFECTIVE 4 (AtRHD4) which mediates polar localization of AtRABA4B (Thole et al., 2008). Consequently, it seems likely that these PtMD94 network members are indeed functionally linked within the same network.
Two other PtMD genes expressed during early secondary wall biosynthesis, PtMD88 and PtMD145, formed small networks, which included important regulatory genes in xylem cell differentiation (Fig. 7A; Table 1; Table S7). One of them was the homolog of the master spatial regulator of vascular differentiation, PHLOEM INTERCALATED WITH XYLEM (AtPXY), encoding an LRR-RLK which signals tracheid fate upon binding the small CLE peptide AtTDIF (Fisher and Turner, 2007). The other was a homolog of VASCULAR-RELATED RECEPTOR-LIKE KINASE 1 (AtVRLK1) (Huang et al., 2018) which is probably responsible for the switch between xylem cell expansion and secondary wall deposition.
The late secondary wall-expressed baits formed five networks (Fig. 7B; Table 1; Table S8). The largest of these was associated with two PtMD genes, PtMD129, a clade XII member related to AtANX1 and AtANX2, and PtMD137, which encodes an LRR-RLK, from clade XIII. Orthologs of key signaling-related genes were included within this network. One of them was CALCIUM PERMEABLE STRESS-GATED CATION CHANNEL 1 (AtCSC1). Stretch-activated Ca2+ channels have been predicted to be important players in CWI (Engelsdorf and Hamann, 2014). AtCSC1 belongs to a newly characterized family of stretch-activated Ca2+ channels conserved in eukaryotes (Hou et al., 2014; Liu et al., 2018). In addition, we found Potri.015G108700/AT5G61820, encoding an uncharacterized NOD19-like protein, which has been implicated in responses to cold stress downstream of mechanosensitive Ca2+ channels (Mori et al., 2018). The aspen homolog of AtCSC1 is thus a promising candidate for a secondary wall damage sensor. Another important signaling-related homolog is TAPETUM DETERMINANT 1 (AtTPD1), which encodes a small peptide hormone that is recognized by an RLK complex consisting of AtEMS1 and AtSERK1/2 to activate transcription factors of the BES1 family (Chen et al., 2019). Moreover, a homolog of Arabidopsis ROP (RHO OF PLANTS) GUANINE NUCLEOTIDE EXCHANGE FACTOR 7 (AtROPGEF7) was among the hits for PtMD137. AtROPGEF7 interacts with the kinase domain of AtFER, mediating downstream NADPH oxidase-dependent ROS signaling which is needed for polarized cell growth (Duan et al., 2010). Finally, we identified a homolog of Arabidopsis GT-2 Like (AtGT2L), which encodes a Ca2+-dependent calmodulin (CaM)-binding trihelix transcription factor involved in plant abiotic stress signaling (Xi et al. 2012). In addition to signaling-related genes, the PtMD129-PtMD137 network included some key cell fate regulator proteins (Fig. 7B; Table 1; Table S8). One of these was the homolog of WIP DOMAIN PROTEIN 5 (AtWIP5), which encodes a zinc-finger protein involved in root patterning downstream of auxin (Crawford et al., 2015) and ROS signaling (Miao et al., 2004). Another was a homolog of Arabidopsis SQUINT (AtSQN), which encodes a cyclophilin 40-like protein that promotes the accumulation of miRNAs miR156 and miR172, targeting master regulatory genes in organ development (Smith et al. 2009; Prunet et al., 2015). The network also included orthologs of two genes encoding master transcriptional regulators, AtMYB46 and AtMYB83, which activate the secondary wall program (Zhong and Ye, 2012). Both these genes showed positive correlation with PtMD129. In contrast, orthologs of three genes with roles in cell division were negatively correlated with PtMD129 (Table 1). This supports the hypothesis that secondary wall integrity signaling results in coordination between cell division and secondary wall formation activities in developing wood (Ratke et al., 2018).
The network for PtMD110, which together with PtMD111 forms a pair orthologous to AtHERK2, included a homolog of the Arabidopsis gene encoding the transcription factor BES1-INTERACTING MYC-LIKE1 (AtBIM1) (Fig. 7B; Table 1; Table S8), which mediates brassinosteroid signaling (Chandler et al., 2009). Several other genes discussed above can be linked to BR-dependent or BES-related BR-independent signaling (Table 1). Intriguingly, a secondary wall xylan defect induced transcriptomic changes suggesting stimulation of BR signaling in aspen (Ratke et al., 2018), supporting the involvement of the AtBIM1 homolog in sensing secondary wall integrity.