Genome-wide Identi cation and Characterization of G-type lectin in Fragaria Vesca

Background: Lectins make up a large and diverse group of proteins in plants. G-type lectins are important type of lectins involved in plant development and defense process. However, studies about G-type lectins are limited to lectin receptor kinases. Results: In this study, genome-wide identication was carried out on G-type lectin gene family in Fragaria vesca. A total of 133 genes were found belonging to this family and they were classied into four groups: G-type lectin receptor kinases, G-type lectin kinases, G-type lectin receptor proteins and G-type lectin proteins, according to their domain organizations. Their chromosome localization, phylogenetic and evolutionary relationship were also analyzed. The results showed that tandem and dispersed duplication occurred frequently, which led to the expansion of G-type lectin gene family in F. vesca and may have increased the types of domain arrangement. The expression prole of G-type lectin genes at different developmental stages of F. vesca and under various biotic/abiotic stresses was inferred from the available databases. G-type lectin genes are actively expressed during F. vesca development and respond to multiple biotic/abiotic stresses. Additionally, to comprehend the functions of G-type lectins, we predicted strawberry genes that may co-express with these G-type lectin genes. Conclusions: G-type lectin gene family is a large gene family in F. vesca. Domain organization and expression analysis imply their functions under biotic/abiotic stresses.

GNA-related lectins, also known as G-type lectin or G-lectin, contribute to a large part of the whole plant lectins that have an a nity for mannose or mannose complex [12][13][14]. Since the rst GNA-related lectin was isolated in the bulbs of Galanthus nivalis, this type of lectin is also named bulb-type lectin or B-lectin, and the domain is named B-lectin or GNA domain (Galanthus nivalis agglutinin-related lectin domain) [15].
Besides GNA domain, G-type lectins also contain other domains, such as S-locus glycoprotein domain (SLG), PAN/Apple domain (PAN), transmembrane domain (TM), and protein kinase domain (PK) [16]. Concerning their role, G-type lectins are predicted to have important functions in plant development and resistance. A large group of G-type lectins has shown insecticidal properties, particularly against aphids of wheat, maize, potato, and sugarcane, by affecting their development and fecundity [17][18][19][20]. CaMBL1 and CaGLP1 are pepper G-type lectin genes involved in signaling and plant cell death that were shown to play a role in defense against Xanthomonas campestris pv vesicatoria [8,14]. Similarly, G-type LecRK gene of Arabidopsis takes part in defense signaling by recognizing lipopolysaccharides of Xanthomonas and Pseudomonas [21]. Lipopolysaccharides are well-described PAMPs that trigger plant innate immunity [22,23]. The transfer of a G-type LecRK gene Pi-d2 to rice conferred race-speci c resistance to Magnaporthe grisea [4], and knocking down of OslecRK gene, also a G-type LecRK, reduced the resistance of rice plant to X. oryzae pv. oryzae and brown planthopper [24]. In strawberry, FaMBL1 was found involved in the resistance of unripe fruits to Colletotrichum acutatum [25].
Besides their role in resistance to biotic stress, G-type lectins play a role in plant adaption to abiotic stress. OsSIK2 enhanced rice tolerance to salt and drought stresses, also delayed dark-induced leaf senescence [26]. Transgenic Arabidopsis plant expressing GsSRK exhibited enhanced salt tolerance and higher yields under salt stress [6]. Both OsSIK2 and GsSRK could be induced by abscisic acid, salt, and drought stresses [6, 26].
Interestingly, G-type lectins also have potential medical applications. Some G-type lectins could recognize some of the high-mannose N-glycans exposed at the surface of gp120 of HIV-1 [27,28], acting as inhibitors of the entry of HIV-1 into CD4 + T-lymphocytes. Besides, there are also G-type lectins that could speci cally bind altered hypermannosylation N-glycans on the surface of cancer cells and cause programmed cell death of tumor cells [29].
Functional analysis and genome-wide studies of G-type lectin have been performed in different plants, such as Arabidopsis [30][31][32], soybean [30], rice [31], tomato [32], mulberry [33], and cucumber [34]. Strawberry is a good model plant for the study of Rosaceae plants and the study of G-type lectin family in this species can provide information for other Rosaceae plants as well. Recently, with the updated genome annotation and comprehensive gene expression atlas of F. vesca (https://www.rosaceae.org/species/fragaria_vesca/genome_v4.0.a2) [35], reliable data are available for genome-wide analysis of G-type lectin genes in strawberry. Moreover, most studies on plant G-type lectins focused on the G-type LecRKs, lacking insights on potential biological functions of G-type lectins without kinase domain. In this study, using the newly released F. vesca genome annotation (v4.0.a2), we identi ed the woodland strawberry lectin gene family members and characterized their genomic organization and phylogenic relationship. To get insights into their functions, we further analyzed the variation in domain composition and their expression pro le at different stages. website. The rst BLASTp was carried out using the amino acid sequence (from 70 to 208 aa) of the GNA domain (domain ID: IPR001480) encoded by FvH4_3g18380 gene. This gene is the homolog of F. x ananassa FaMBL1, that encodes for a protein containing GNA and PAN domains and was reported as overexpressed in white strawberry fruit in response to anthracnose disease [25]. The search allowed to retrieve 77 different protein sequences. To nd out more proteins and reduce redundancy, 20 of these sequences with relatively low similarity were chosen and used for a second BLASTp search, leading to a total of 133 proteins with GNA domains found in F. vesca (Supplemental le S1). Among these, 102 proteins containing PK and TM domain were classi ed into G-LecRK; 23 proteins lacking both domains were classi ed into G-LecP, and nally 4 proteins lacking PK but retaining the TM domain, were grouped G-LecRP. In addition, four genes (FvH4_3g03241; FvH4_3g03300; FvH4_3g15980; FvH4_6g44240) missing the TM domain but containing both GNA and PK domain were found, and they were classi ed into G-LecK.
Besides GNA domain, most of G-type lectins of F. vesca also contain other domains like S-locus glycoprotein domain (SLG), PAN/Apple domain (PAN), and Epidermal Growth Factor domain (EGF) (Fig.  1). SLG is involved in self-incompatibility reaction during ower fertilization [37] and the PAN domain is believed to mediate protein-protein and protein-carbohydrate interactions [38]. In some cases, G-type lectins have an EGF domain which may take part in the formation of disul de bonds [16]. Multiple arrangements of these domains lead to various G-type lectins in F. vesca (Fig. 1).

Phylogenetic tree and nomenclature of G-type lectin genes
To highlight evolutionary differences, a phylogenetic tree of all 133 proteins was generated (Fig. 2). Glectin genes are classi ed into six clades (I to VI). FvH4_1g03780 does not fall in any of these clades and it is designated singleton. All proteins in clade I have the same domain arrangement, GNA/PAN/TM/PK. In clade III, IV, and V, most of proteins are G-LecRKs with domain arrangement as GNA/SLG/PAN/TM/PK; while proteins of clade VI show the biggest diversity of domain arrangements, in total ten types of domain arrangements exist in this clade.
To make it convenient to refer to the F. vesca G-lectin genes, we propose a nomenclature based on the similarity shown in the phylogenetic tree (Table 1), where the genes included in each clade are named following a sequential numbering. In the name, letters "Fve" indicate the gene is from the organism F. vesca [39]; "GLRK", "GLRP", "GLP" and "GLK" represent G-LecRK, G-LecRP, G-LecP, and G-LecK, respectively. Since FvH4_1g03780 did not fall in any of the clades, it was named FveGLRK7.1, to distinguish it from genes in the six clades. These FveG-Lectins' names will also be used herein the following sections.

Chromosome location and duplication of G-lectin genes
In order to visualize the chromosome location, G-lectin genes were mapped to F. vesca genome (Fig. 3). G-LecRKs are found distributed on all chromosomes, where the majority of G-LecRKs being on chromosomes 3 and 6, while G-LecPs are distributed on chromosomes 2, 3, 5, and 6. G-LecRPs are found only on chromosomes 3 and 6.
Tandem and dispersed duplication are two modes of single gene duplication: i) tandem duplication, which generates consecutive gene copies in the genome and is believed to originate from unequal chromosomal crossing over [40]; ii) dispersed duplication, which occurs with un-predictable and random patterns by still unclear mechanisms, resulting in two gene copies that are neither neighbors nor colinear [41]. Out of the 133 G-lectin genes, 86 of them appear as duplicated genes, with either tandem duplication (55 genes) or dispersed duplication (51 genes) or both (20 genes), showing that duplication events are common in G-lectin family of strawberry, thus leading to the expansion of G-lectin family in strawberry ( Fig. 4). Genes from the same tandem duplicated cluster (Supplemental le S2) are usually close on the phylogenetic tree, dispersed duplicated genes are also close on the tree (Supplemental le S3). The duplication events of F. vesca G-lectin genes were visualized by chord diagram (Fig. 5). These duplications are not evenly distributed on seven chromosomes of F. vesca, chromosome 3 and chromosome 6 show more duplications compared with the other chromosomes. No tandem duplication of G-type lectin genes on chromosome 4 or chromosome 7 is found.

Kinase domain analysis of strawberry G-LecRK
The G-LecRKs kinase domain is a serine/threonine kinase type. Subdomains of kinases were de ned according to typical patterns of conserved residues that are essential for functioning [42]. The eleven subdomains of FveG-LecRKs were analyzed by aligning the kinase subdomain motifs with the kinase domains of Pi-d2 [4] and OsSIK2 [26] which were shown to be active and capable of phosphorylation (Supplemental le S4). In total, 21 G-LecRKs out of 102 were found with mutations in their kinase domains. The majority of mutated G-LecRKs were encoded by gene originating by tandem or dispersed duplication (Supplemental le S5). The main types of mutation are subdomain loss or amino acid substitution as summarized in Fig. 6. These occurred mainly in subdomain VIII to XI at the C-terminal of G-LecRKs, many of which were lost probably leading to inactive catalytic reaction (Supplemental le S5) whereas the subdomain III and IV of G-LecRKs in F. vesca appeared quite stable.

G-lectin gene expression during development and under stress conditions
The expression pro le of G-lectin genes was analyzed in different tissues and at different developmental stages, based on available strawberry RNA-seq datasets. As shown in the heatmap (Fig. 7), G-type Lectin genes display a wide range of transcription levels among the different tissues, some of them are highly expressed in various tissues, while others completely silenced (Fig. 7). Among the highly expressed genes, many of them belong to the G-LecRK group. Few genes appear speci cally expression only in one or two tissues, as, for instance, FveGLRP2.1 which is highly expressed in pollen and anthers, but not in the other tissues. On the contrary, FveGLP6.4, the homolog of a known F. x ananassa G-type lectin, FaMBL1, showed active expression in many tissues during development. In general, G-lectin genes are more expressed in the ovary wall, seedling, style, root, and leaf, while they are hardly expressed in cortex and embryo (Fig. 7) The expression pro le of those FveG-LecRK genes identi ed as encoding for mutated kinase domain as reported above, was also analyzed (Fig. 8). Basically, most of these genes are barely expressed during the entire developmental stages. However, the gene FveGLRK6.12 had a relative high expression on several tissues and developmental stages, in spite of the mutations occurring in several conserved subdomain motifs. Genes like FveGLRK1.3, FveGLRK6.15, FveGLRK2.2, and FveGLRK6.21 were expressed only in one or two tissues (Fig. 8).
The log2 fold changes of G-lectin genes' expression under stresses were visualized in a heatmap. G-lectin genes were also found differently expressed during the interaction with pathogenic fungi, B. cinerea, P. aphanis as well as Oomycete, P. cactorum (Fig. 9). Around 50 G-lectin genes were differentially expressed during the interaction with P. cactorum in strawberry root and most of them were upregulated. Several Glectin genes were also found upregulated in response to P. aphanis infection at 8 days post-infection. Compared to these two pathogens, few G-lectin genes were transcriptionally altered upon B. cinerea infection (Fig. 9). Moreover, some G-lectin genes appear to be regulated by plant resistance elicitor, benzothiadiazole, and chitosan. Interestingly, genes like FveGLP6.4 (homolog of FaMBL1) were upregulated upon B. cinera, P. cactorum, and P. aphanis infection as well as by the inducers, benzothiadiazole, and chitosan. With regard to abiotic stress, cold stress caused both upregulation and downregulation of several G-lectin genes (Fig. 9).

G-lectin gene co-expression prediction
To further get insights into the function of strawberry G-type lectins, the genes co-expressed with FveGlectin genes were retrieved from the co-expression database (www.fv.rosaceaefruits.org) [43]. Genes with functions in different plant defense pathways were predicted to co-express with G-lectin genes (Supplemental le S6

Subcellular location
The subcellular localization of strawberry G-lectin proteins was predicted by CELLO and TargetP (Supplemental le S7). CELLO uses a reliable index to compare the possibility of different subcellular locations. Based on the CELLO prediction, all G-LecRKs had a higher reliable index of being located at the plasma membrane except for FveGLRK6.31 and FveGLRK4.13, which were predicted to be located on the extracellular compartment. Conversely, almost all G-LecPs were located on the extracellular compartment. TargetP localization prediction is based on the presence of a signal peptide which drives proteins into the secretory pathway. According to this prediction, most F. vesca G-lectin proteins are driven to the secretory pathway, which is consistent with the prediction of CELLO, for which most of F. vesca G-lectin genes encode for proteins that are located on the plasma membrane or extracellular compartments.

Discussion
In plants, G-type lectin is a big gene family that is believed to play roles in biotic and abiotic stresses [44,45]. Their role in defense was also reported in strawberry. For instance, 34 G-type LecRK genes were found upregulated in F. vesca root after P. cactorum inoculation [46], and the G-type lectin gene FaMBL1 was found involved in F. x ananassa resistance against C. acutatum [25]. A study about strawberry Serine/Threonine Kinase disease resistance gene family showed that many Serine/Threonine Kinase genes belong to G-type LecRK [47], but insights about the genomic organization of G-lectin proteins in strawberry was still limited. Recently, high-quality F. vesca genome annotation provided a good chance for the genome-wide study of G-lectin genes in F. vesca.
To identify G-lectin encoding genes, we used only sequences of the GNA domain as a query rather than the whole sequence. This choice was made to avoid using the kinase domain sequence as a query, which would lead to much ambiguity in G-lectin identi cation. Eventually, 133 proteins were found belonging to the G-lectin family in F. vesca and the majority (102 out of 133) of G-lectins contained kinase domain belonging to the G-LecRK class. Four genes containing both GNA and kinase domain, but lacking TM domain, were classi ed into G-LecK. The lack of TM domain may lead to function alteration of these Glectins.
TM domains are required for the plasma membrane localization of G-LecRKs [4,48]. In rice, a single amino acid substitution (Ile144Met) in the TM domain of Pi-d2, a rice G-LecRK conferring resistance to M. grisea strain ZB15, made the plant susceptible to the strain ZB15, suggesting that the TM domain of Pi-d2 may participate in the ligand recognition and signal transduction [4]. Indeed, the substitution did not change the plasma membrane location of Pi-d2, so the altered structure of the mutated TM may have lost or modi ed its ligand-binding function and signal transduction from the extracellular domain to the intercellular kinase catalytic domain [4]. This fact implies that the TM domain of G-lectin has a role in both membrane localization and signal transduction. However, most of G-LecPs in F. vesca, despite lacking TM domains, were also predicted to anchor to the plasma membrane. In this regard, a pepper G-LecP, CaMBL1, consisting of GNA domain and PAN domain and regulating plant defense to bacterial X. campestris pv vesicatoria, was reported to be located on plasma membrane [14]. Moreover, the transient expression of CaMBL1 induces the accumulation of salicylic acid and the activation of defense-related genes, which indicates a role in defense signaling, although without TM and kinase domain [14]. These data show that despite most of the previous studies on G-lectins focused on G-LecRKs, studies on G-LecPs could also cover important functions in plant.
The kinase domain of lectin receptor kinases could interact with downstream signaling molecules and display its catalytic activity [48]. The expression analysis of G-LecRKs with mutations in their kinase subdomain reveal expression for some of these lectins in spite of a predicted loss of kinase activity.
Studies should be carried out to establish the importance of each amino acid residue in the kinase domain activity so to prove a relationship between conserved motifs and G-LecRK function.
Except for GNA, TM, and kinase domains, G-Lectins also contain some of SLG, PAN, and EGF domains. The various domain arrangements of G-lectins create an enormous degree of protein diversity. Proteins consisting of arrangements with PAN and SLG domains have GO functions related to the recognition of pollen, protein phosphorylation, and cell recognition which make these proteins important in reproduction and in general in signal perception or/and transduction [49]. Multiple domain proteins are more speciesspeci c compared with single-domain proteins, which are commonly shared among many plant species [49]. In F. vesca, more than 90% of G-type lectins were found to belong to multiple domain proteins. These species-speci c domain arrangements might be a consequence of frequent duplication events followed by lineage-speci c retention [50]. This is consistent with our result where a big portion of F. vesca G-lectin genes appear to originate from duplication and various domain arrangements. The various domain arrangements of G-type lectins could be considered as a kind of exible genetic mechanism to produce species-speci c adaptation to changing environments [49].
Tandem and dispersed duplication signi cantly contribute to the expansion of the G-lectin gene family in F. vesca. More than half of G-type lectin genes of F. vesca originate from duplication events. Chromosome 3, where the highest number of G-type lectin genes is located, showed a big number of duplication events of G-type lectin genes. Conversely, no tandem duplication event on G-type lectin genes on chromosome 4 and chromosome 7 was found and these two chromosomes also contain fewer G-type lectin genes than other chromosomes. Species-speci c expansion of the G-type lectin gene family was also reported in a study about lectins in soybean, rice, and Arabidopsis where tandem and segmental duplications have been regarded as the major mechanisms to drive lectin expansion [30]. Consistently, a study about lectin genes in cucumber also revealed that 106 out of 146 genes (76.8%) were involved in the tandem duplication events [34].
According to the transcriptome data, many G-lectin genes, no matter G-LecRKs or G-LecPs, are actively expressed on different tissues at different developmental stages of strawberries. G-lectins in F. vesca actively respond to pathogens, abiotic stress, and elicitors; and some G-lectin genes appear to respond to both biotic and abiotic stress. Up to now, only one G-lectin gene, FaMBL1 (homolog of FveGLP6.4) was studied for its involvement in resistance against pathogens in strawberries [25]; however, the molecular mechanism underneath is not yet elucidated. FveGLP6.4 appears to be not only expressed in several tissues of strawberry during its development but also found upregulated after challenges by B. cinerea, P. aphanis, and P. cactorum ( Fig. 7 and Fig. 9) pathogens, implying the involvement of FveGLP6.4 in F. vesca (or FaMBL1 in F. x ananassa) in plant defense.
The molecular features of some G-type lectins from other plant species are better known: Pi-d2 [4], LORE [21], OsSIK2 [26], and CaMBL1 [14], which could regulate plant defense responses, were proved to be located at the plasma membrane by using confocal microscopy. For CaMBL1, its ability of mannose a nity and the importance of GNA domain for its localization are known [14]. According to the study, CaMBL1 has a nity toward Manα and/or Manβ and GalNAc residues, and GNA domain is essential for its binding to D-mannose. A preliminary working model of OslecRK was also proposed by Cheng et al. [24]. Here sensing of biotic stress rst stimulates OslecRK expression, followed by the interaction of its kinase domain with OsADF (actin-depolymerizing factor) to transduce the signals. Following these events, the expression of defense-related genes (PR1a, LOX and CHS) was induced to strengthen the plant's immune response.
To further predict the function of G-lectins in F. vesca, we retrieved the genes predicted to co-express. Glectin genes could co-express with other G-lectin genes, receptor kinase, and disease resistance genes which provides clues for uncovering their function. These data need to be proven experimentally.

Conclusion
In conclusion, G-type lectin is a big gene family in F. vesca with various domain arrangements and actively response to biotic/abiotic stresses. This indicates that G-lectins consist of an important gene family that requires further studies to understand their role in-depth, especially their involvement during biotic stress. Studying mannose-binding ability and identifying downstream interacting proteins of G-type lectins is important to uncover their role in strawberry defensing.

Phylogeny analysis
To build a phylogenetic tree, full-length protein sequences were obtained by running Blastx on GDR website, using mRNA sequences from GDR as a query to score protein database. Protein sequences were aligned using MUSCLE mode by MEGA-X [53]. Aligned sequences were analyzed via maximum likelihood bootstrapping (ML-BS) using IQ-TREE 1.6.12 (http://www.iqtree.org/) [54]. Once the best-t model (WAG + F + I + G4) of molecular evolution was determined for G-type lectin genes, based on the Bayesian information criterion (BIC) scores [55], ML-BS analysis was conducted with IQ-TREE 1.6.12. Statistical support for the branches was evaluated by conducting a ML-BS bootstrap analysis of 5000 replicates. The tree was annotated by iTOL (https://itol.embl.de/) [56].

Expansion and evolution of G-lectin genes
Gene duplication was investigated in F. vesca genome. Gene tandem duplication was explored using the PTGBase plant tandem duplicated gene database (http://ocri-genomics.org/PTGBase/) [58], and dispersed duplication was investigated on the plant duplicate gene (plantDGD) database (http://pdgd.njau.edu.cn:8080/) [41]. The amount of tandem and dispersed duplication was showed by Venn diagram using R package VennDiagram [59]. The relationships of duplicated gene pairs were visualized by chord diagram using R package circlize [60].

Kinase subdomain investigation of G-LecRKs
The kinase domains are divided into eleven (I-XI) smaller subdomains, according to Hanks and Hunter's study [42]. The eleven subdomains of F. vesca G-LecRKs were analyzed by aligning their kinase domain protein sequences to kinase domains of known G-LecRKs, Pi-d2 [4] and OsSIK2 [26].

Expression analysis of G-type lectin genes
F. vesca G-lectin gene expression pro les were extracted from the database reported by Li and his colleagues [35]. The expression levels of the genes in different tissues: owers, fruit of different developmental stages, seedlings, leaves, meristems, and roots were used to draw a heatmap through R were also used to obtain G-lectin gene expression pro le. G-lectin genes from F. x ananassa transcriptome datasets were converted to their F. vesca orthologs.
The co-expression genes of F. vesca G-lectin genes were retrieved from the co-expression database (www.fv.rosaceaefruits.org) [43] upon conversion of gene names from the previous genome annotations to the version 4.0.a2 [35], here used for lectin gene identi cation. Since different networks indicate varying correlation strengths, for G-lectin genes co-expression analysis, networks with the highest correlation were chosen: consensus100_hd_ltpm (consensus100 network of hand-dissected tissues), consensus100_lcm_ltpm (consensus100 network of laser captured tissues), and consensus100_fruit_ltpm (consensus100 network of ripening fruit tissue-only).

Subcellular localization prediction
For subcellular localization, amino acid sequences of strawberry G-lectin genes were submitted to two online predictors, TargetP 1. Availability of data and materials All data generated or analysed during this study are included in this published article and its supplementary information les.

Competing interests
The authors declare that they have no competing interests.       Differently expressed strawberry G-lectin genes after challenged by pathogens and treated with inducers and cold. Data are expressed using log2 FC; negative value represents downregulation; positive value represents upregulation; and NDE represents no different expression. WF: white fruit; RF and red: red fruit; HWL: leaves of strawberry cv. Hawaii 4; YWL: leaves of Yellow Wonder 5AF7; Bc: Botrytis cinerea; Pc: