Exploring the Role of WRKY Transcription Factor in the Growth and Development of Bletilla striata based on Bioinformatics Analysis

WRKY type transcription factors (TFs) play crucial roles in the growth and development of plants. However, a comprehensive analysis of the WRKY family members in a valuable Chinese herbal orchid, Bletilla striata, or in other orchids, is limited. In this study, WRKY gene family was screened out from the transcriptome data of Bletilla striata by bioinformatics method. The 29 WRKY TFs that were identied from the B. striata genome and named BsWRKY1 to BsWRKY29 were divided into three clades: group (cid:0) (involving 8 WRKY sequences), group (cid:0) (18) and group (cid:0) (3), in which group (cid:0) was further divided into 5 subgroups: (cid:0)-a (involving 1 WRKY sequences), (cid:0)-b (5), and (cid:0)-c (3), (cid:0)-d (7), (cid:0)-e (3). EST-SSR marker mining test showed that 10 markers could be stably amplied with obvious polymorphisms among 4 landraces. Our data suggest that BsWRKY genes may work together to regulate plant growth and development. In different subcellular locations, BsWRKY genes not only played its own functions, but also coordinated the regulation of the whole life activities. Taken together, these results provided a theoretical basis for further studies on the gene functions and regulatory mechanisms of what in B. striata. among members. Its function is to respond to hormonal signals and stress. We speculated that all genes contain light-response elements, which affect the cell cycle and help plants cope with various stresses. Many studies show that WRKY gene in banana (Zheng et al., 2021), longan (Fan et al., 2017), Artemisia annua (Xue and Jue, 2021), rice (Li et al., 2021), barley (Viana et al., 2021) and other plants is involved in regulating the maturity and senescence of plants, promoting the accumulation of specic products, and responding to one or more abiotic stresses. Despite these similarities, plant development is a very complicated process, and BsWRKY genes could be directly or indirectly involved in a certain regulatory role. it is to conduct more and detailed research on these genes.


Introduction
Transcription factors (TFs) are proteins that can bind to the speci c sequence of gene, making gene express at a speci c intensity at a speci c period and tissue. WRKY transcription factors are important regulatory factors for plant growth and development in higher plants. They are involved in biological stress responses and abiotic stress responses such as drought, salt injury, pests and diseases, nutrient de ciency, high temperature and cold injury. Chen et al found that WRKY regulates fruit growth and development in jujube (Chen et al., 2019). WRKY combines with NPR1 to regulate gene expression and then activate plant defense response (Wang et al., 2006). Cheng et al demonstrated that WRKY13, WRKY45-2 and WRKY42 form regulatory cascades that are involved in rice blast resistance (Cheng et al., 2015). Xu et al used gene microarray technology to nd that SbWRKY14, SbWRKY32 and SbWRKY39 were all involved in drought stress response of sorghum (Xu, 2021). ClWRKY47 of lemon and CsWRKY47 of sweet orange can be induced by high salinity, drought and low temperature stress (Shen, 2021). The ZmWRKY102 transcription factor in maize can improve the drought resistance of plants (Li, 2015). The WRKY transcription factor of Cinnamomum kanehirae is involved in response to plant drought and low temperature stress (Zhao, 2020). WRKY transcription factor family is also involved in plant nutrient stress and plant hormone signal transduction (Jing, 2021; Bu, 2020). The conserved WRKYGQK domain, which consists of about 60 amino acids in the DNA binding domain, is the most conserved structural feature of the WRKY transcription factor family. The downstream of this family has a cis-acting element W-box [(T/C) TGAC (C/T)] structure, which regulates the transcription of target genes and responds to adversity stress by combining with speci c W-box (Wang, 2020). The C-terminal is an absolutely conserved domain composed of WRKYGQK, while the N-terminal contains a conserved domain composed of zinc nger protein. It is classi ed three types according to the number and structure of domains. The group typically contains two WRKY domains including a C 2 H 2 (CX4-5CX22-23HXH) motif, while group and group are characterized by a single WRKY domain. Group also contains a C 2 H 2 zinc-nger motif which can be further divided into ve subgroups ( -a -b -cd -e) based on the phylogeny of the WRKY domains, whereas group contains a zinc-nger motif ending with C 2 HC (CX7CX23HCX), which is only found in higher plants .
Bletilla striata (Thunb.) Reich, f. is a perennial herb of the orchid family, which contains chemicals of phenolic acids, dihydrophenanthrene, bibenzyl and other components with medications of anti-tumor, hemostasis, antibacterial, antiin ammatory, promoting wound healing and plasma substitutes (Tang, 2014). It is also used in food, cosmetics and other industries (Qian et al., 2015; Zhou et al., 2020). Abundant secondary metabolites are the material basis of the pharmacological action of the B. striata. Therefore, from the perspective of gene family, it is of great signi cance to analyze the synthesis pathway of secondary metabolites of B. striata and explore functional genes, which helps to understand the regulatory mechanism of effective components synthesis, and nally enhance the producing of secondary metabolites by plants. WRKY TFs have been studied in many plants such as peach (Yanbing et al., 2020), apple (Gu, 2015), pepper (Diao, 2015), melon (Ma, 2017), castor bean (Zou, 2013), Hordeum vulgare (Jiang et al., 2021), Prunella vulgaris (Zhu et al., 2020), etc., but little is known about WRKY gene in B. striata. Based on transcriptome data of B. striata, screening and identi cation WRKY TFs, and then analyze their genetic information, conservative domain, evolutionary relationships and functions, etc.. In addition, SSR molecular markers were mined to classify the function of WRKY TFs in B. striata, so as to provide a reference for further exploration of its function in the regulation of secondary metabolite synthesis.

Materials
The B. striata capsules were collected from the B. striata Germplasm Garden of Zunyi Medical University, Xinpu District, Zunyi City of Guizhou Province, China (27°42'N, 107°01'E), and the seeds were induced for suspension culture for a total of 45 days (Pan et al., 2020). The samples were randomly sampled every 3 days since from the callus was induced (3 replicates were taken at each time point), and the total RNA of each sample was extracted by liquid nitrogen grind. The RNA of each sample was mixed as one with equal amount to perform the subsequent transcriptome sequencing by using Iso-seq of PacBio platform . The WRKY gene sequence of B. striata was screened out from the sequencing results for the following analysis.

WRKY gene family identi cation
The total RNA of the mixed samples was retro-transcribed into cDNA for RNA-seq sequencing. The sequencing was performed on platform of PacBio with the parameters of Iso-Seq. And the resulting data were de novo assembled by using Trinity software to nally obtain the transcriptome data. The online software Pfam and NCBI blast were employed to do the annotation. The sequences annotated as WRKY genes were screened to identify the conserved domains. All obtained protein sequences were examined for the presence of WRKY (PF03106) domains by using the Hidden Markov Model of Pfam, SMART and InterPro tools. After eliminating incomplete sequences, candidate sequences were obtained for the following analysis.

Physical and chemical properties exploring
The ORF Finder of NCBI was used to nd out the open reading frame (ORF) of all BsWRKY candidate. Online server ExPASy was used to characterize the physical and chemical properties of WRKY transcription factors, such as the protein molecule weight, amino acid size, isoelectric point, instability index, GRVAY index and so on. Protein secondary structure was predicted using the online server SOPMA.

Analysis of subcellular localization and conserved domains
The conserved domains of the detected WRKY members were analyzed by using the CD-search function of NCBI. A conservative motif analysis was conducted through the MEME, in which the number of recognized motifs was set to Page 4/18 10. The WoLFPSORT was used to predict the subcellular localization of the WRKY proteins.

Signal peptide, transmembrane structure prediction and promoter analysis
Signal P 4.0 Server was used to analyze the signal peptide and TMHMM-2.0 was used to predict the transmembrane domain. The cis-acting elements of the upstream sequence (2000 bp) of the WRKY genes' promoter were analyzed by using the online Plant CARE, and the results were visualized by using TBTools.

Evolutionary analysis
The Arabidopsis thaliana WRKY sequences were obtained from the transcription factor database of Plant TFDB, Dendrobium catenatum WRKY sequences were obtained from NCBI and the MEGA-X software was applied to construct a phylogenetic tree with the WRKY gene family members via the Neigh-Joining method (NJ), and the Bootstrap value was set to 1000.

Functional analysis of the BsWRKYs genes
The Gene Ontology (GO) was performed to do GO functional classi cation. The mission of the GO Consortium is to develop a comprehensive, computational model of biological systems, ranging from the molecular to the organism level, across the multiplicity of species in the tree of life. The biological pathways of the WRKY of B. striata were mapped to the reference pathways in KEGG. Based on the molecular functions and biological pathways of diseases in the KEGG database, the analysis results were used to mine biologically signi cant information. KEGG is an encyclopedia of genes and genomes used to assign functional meaning to gene/protein elements at the molecular and higher levels. The differentially expressed cytoplasmic and nuclear proteins were matched with the KEGG pathway database to generate the predicted pathway (Ramdas et al., 2019).

SSR detection and veri cation
The NWISRL was used to detect the candidate SSR sites of BsWRKYs. Then, primers of each site were designed by using DNAMAN 6.0 program. The SSR candidates were veri ed on four landraces of B. striata, which were collected from Zheng'an, Chongqing, Xiuwen and Anhui, by coupling conducts of PCR and PAGE. PCR reaction system was volume 10 μL in total, containing 1.5 μL DNA template with concentration of 50 ng/μL, 6 μL 2×PCR MIX, 0.75 μL primer each with concentration of 10 μmol/L, and 1 μL ddH 2 O. Ampli cation conditions were set as: pre-denaturation at 95 ℃ for 5 min; denaturation at 95 ℃ for 30 s, annealing at 52 ℃ for 30 s, extension at 72 ℃ for 60 s, 30 cycles; extension at 72 ℃ for 5 min. A 10% polyacrylamide gel was used to separate the ampli ed products. The electrophoresis instrument was a PowerPac stabilized steady ow electrophoresis instrument. The electrophoresis buffer was 1×TBE, the constant voltage was 150 V, and the time was 150 minutes. After silver nitrate staining, the bands were observed and photographed.

Physical and chemical properties of WRKY protein
A total of 135 sequences annotated as WRKY gene were preliminarily obtained from the transcriptome data of B. striata. After deletions of the sequences without typical WRKY domain and incomplete sequences, 29 WRKY sequences were nally reserved and renamed as BsWRKY 1-29. Physical and chemical properties analysis showed that the protein size of the BsWRKY members was between 159-703 aa, and the molecular weight was between 17546.1-76820.1 Da (Table 1). The theoretical isoelectric point of proteins ranged from 4.48 to 9.94, 11 of them were basic proteins with isoelectric point greater than 7.5, 13 of them were acidic proteins with isoelectric point less than 6.5, and 5 of them were neutral between 6.5 and 7.5. These results indicated that most proteins of BsWRKYs were acidic. The instability coe cients of the 29 WRKY proteins were all greater than 40, while fat index were less than 100, and GRVAY values were negative, indicating that the WRKY transcription factor family of B. striata was an unstable hydrophilic protein. The predicted secondary structure of the protein showed that α-helix, β-folding and elongation accounted for 21.07%, 4.07% and 11.84%, respectively, and random coil accounted for 63.02% (Table 1). Among the 29 proteins, the β-turn was normally more than the α-helix except in BsWRKY20. The subcellular localization prediction showed that 24 WRKY proteins were all located in nucleus, except that BSWRKY16 was located in chloroplast, BSWRKY14 and BSWRKY29 were located in mitochondria, BSWRKY17 was located in vacuole, and BSWRKY28 was located in endoplasmic reticulum. Subcellular location determines its speci c biological effects. WRKYs can form a net that contributes to various cytoplasmic and nuclear processes including signaling events from organelles or the cytoplasm to the nucleus (Bakshi and Oelmüller, 2014). Studies had shown that WRKY TFs on the ABAR-ABA complex in the downstream chloroplast envelope, regulates seed germination and other processes, and is one of the key nodes of abscisic acid signaling pathways (Rushton et al., 2012). These results indicated that WRKY genes might be involved in the regulation of plant growth and development and is an important node in metabolic regulation.

Promoter cis-regulatory elements of BsWRKY genes
The upstream of these BsWRKY genes were detected for nding cis-regulatory elements, like promoter and other cisacting elements related to hormone regulation and stress-response (Fig. 1). The result showed that the cis-regulatory elements of the promoters of BsWRKY genes were related to growth and development (meristem expression, speci c to the endosperm, seed-speci c regulation and regulates circadian rhythm), plant hormones (auxin, abscisic acid, methyl jasmonate (MeJA), gibberellin, and salicylic aci), and stress (drought, low temperature, oxygen speci city induced response element and anaerobic induced indispensable cis function adjustment). It also showed that all the 29 BsWRKY genes had light response elements (LRE), and 14 of them had the drought-inducibility response elements. However, the elements of cell cycle regulatory elements, elements involved in defense and stress response, avonoid synthesis and seed germination were only existed in BsWRKY28, BsWRKY7, BsWRKY16 and BsWRKY24, respectively. This not only indicated that BsWRKY genes are associated with plant growth, but also playing a vital role in drought stress regulatory networks. Collectively, these results indicated that WRKY family members participate in embryonic development, meristem growth and environmental stress regulation during the growth and development of B. striata.

Conservative motif of WRKY protein
A total of 10 conserved motifs were obtained by using online MEME for motif analysis of WRKY transcription factors in B. striata ( Fig. 2A, B, C). Among them, motif3 were contained in 28 members except BsWRKY7. Motif1 and motif2 existed in 26 members, motif8 was found in 6 members, motif4 was detected from 4 members. The CD-search analysis found that motif1, motif2, and motif4 belong to WRKY domains, motif8 was a zinc nger domain relates to WRKY. Interestingly, the motif3 had no function record in the database currently, which needs to be further studied.

Conserved domains identi cation and evolutionary analysis
Through the online server CD-search, the structure domains of the WRKY genes family were analyzed for comparison.
The results showed that 24 members of the 29 BsWRKY had typical WRKYGQK heptapeptide domain and W-box, but there was different degree of variation which mainly occurred in the N-terminal. Five transcription factors, i.e. BsWRKY 7, 16, 17, 22 and 27, had incomplete domains, like missing N or C terminus. It was speculated that the deletion may occur in the evolutionary process.
To place the evolution role and further identify the functions of BsWRKY genes, these 29 WRKY sequences from B. striata, 22 WRKY from Arabidopsis thaliana and 20 WRKY from Dendrobium catenatum were used to perform a phylogenetic analysis (Fig.3)

EST-SSR polymorphism of WRKY genes in B. striata
The EST-SSR markers have the advantages of high polymorphism and variability, high reproducibility, accurate and rapid detection . A total of 10 among the 29 sequences were detected with SSR sites by NWISRL, of which 3 sequences were dinucleotide repeats and 7 sequences were trinucleotide repeats. The lowest number of replicates was 5, and the highest number was 18. The primer pairs of the 10 SSR sites were designed by DNAMAN software which could be ampli ed stably in all the four landraces (Table A), and the length of the ampli ed products ranged from 100 to 200 bp (Fig. 4). These results indicated that WRKY gene family was probably high-conserved in different B. striata germplasms. These newly found SSR primers could be used as molecular markers to identify the members of BsWRKY gene families in different germplasm (Zhong et al., 2021).

Discussion
WRKY transcription factors are important transcription factors involved in growth, development and response to stress in eukaryotes, and play an important role in plant survival. Based on bioinformatics analysis, the physicochemical properties, enzyme restriction sites, conserved motifs, cis elements, evolutionary relationships and functions of WRKY gene family of B. striata were analyzed, and EST-SSR sites were detected and veri ed. Based on the transcriptome data of B. striata, a total of 29 members of BsWRKY gene family were screened. Among them, 24 members had typical WRKYGQK conserved structure and downstream W-box [(T/C) TGAC (C/T)] homeostasis element. However, ve members (BsWRKY7, 16, 17, 22 and 27) had different degrees of deletion at the C-terminal or N-terminal, which indicated that they might be self-regulation or cross regulation between WRKY genes. Among them, the length of BsWRKY7 conservative domain was less than half, but it had elements involved in defense and stress response. GO enrichment indicated that BsWRKY7 has speci c sequence DNA binding transcription factor activity and participates in cell transcription regulation, which belongs to subgroups II-c in classi cation. Therefore, its function and binding speci city could be further investigated.
Subcellular localization showed that 24 BsWRKY genes played a role in the nucleus, while BsWRKY28 was located in the endoplasmic reticulum. It had a response element for regulating the cell cycle as well as other regulatory elements such as drought, anaerobic stress, salicylic acid response and ATBP-1 binding site. KEGG analysis showed that BsWRKY28 was involved in environmental adaptation and also belonged to the subgroups -c. These results indicated that it was involved in the synthesis and regulation of important substances in the process of cell growth and development and played a vital role in the stress. BsWRKY17 was located in vacuoles and had a signal peptide and transmembrane structure. It was speculated that BsWRKY17 may be function as a secreted protein, which will be secreted into cells after synthesis and then played a role. BsWRKY16 was located in chloroplasts with seven restriction sites, and also had cis-acting elements related to avonoid synthesis, suggesting that BsWRKY16 was involved in regulating the synthesis of plant secondary metabolites. Previous studies have shown that WRKY is also involved in the synthesis of terpenoids and played a key regulatory role (Park et al., 2021). BsWRKY14 and BsWRKY29 played a role in mitochondria. BsWRKY14 had auxin response element and low temperature response element. BsWRKY29 had MYBHv1 binding site, abscisic acid response element, hypoxia speci c induction element, salicylic acid response element, ATBP-1 binding site and MeJA response element. Both of them had gibberellin response element and anaerobic induction element. So we speculated that these two elements may be involved in the regulation of cell maturation and senescence, BsWRKY29 also belonged to subclass II-c, suggesting that subgroups II-c might play an important role in stress response (Na et al., 2018). The BsWRKY gene family had plant hormone or stress response elements, which were tissue-speci c and have synergistic effects among members. Its function is to respond to hormonal signals and stress. We speculated that all genes contain light-response elements, which affect the cell cycle responding to one or more abiotic stresses. Despite these similarities, plant development is a very complicated process, and BsWRKY genes could be directly or indirectly involved in a certain regulatory role. Therefore, it is necessary to conduct more in-depth and detailed research on these genes.
Former studies reported that AtWRKY3 involved in biological stress response (Aboul-Maaty and Oraby, 2019), AtWRKY40 participated in drought stress response in A. thaliana (Ju et al., 2019), AtWRKY14 played a role in plant antiviral (Che et al., 2018). Therefore, we speculated that genes BsWRKY5, BsWRKY12 and BsWRKY3 might play a regulatory role in stress resistance in B. striata. GO enrichment classi ed the BsWRKY gene family into three categories of biological processes, molecular functions, and cell components. In which the WRKY gene family was involved in important processes such as cell cycle, transcriptional regulation, and meiosis. KEGG clustering also con rmed that BsWRKY gene family was involved in signal transduction and environmental information processing, growth and development regulation, stress response and hormone signal transduction, which inferred that WRKY family was an important factor in plant growth and development and response to environmental stress. EST-SSR polymorphism detection showed that the occurrence frequency of WRKY gene family in B. striata was 34.48%, which was dominated by trinucleotide repeats, indicating that the WRKY family in different B. striata varieties was conserved. This indicated that this gene family has genetic conservative. Therefore, the SSR sites can be used to identify the genetic diversity of germplasm resources, and provide some reference for the molecular-assisted breeding and genetic diversity analysis of plants, which is of great signi cance for the development and utilization of gene function and the evaluation of germplasm resources.

Conclusion
WRKY transcription factor family was named from the highly conserved region WRKY domain, which can speci cally bind to the W-box in the promoter of the target gene and regulate the expression of the target gene. The WRKY gene family of B. striata was a highly conserved gene family with hereditary conservation among different species. It contained many hormone response elements and stress response elements, and had many enzyme digestion sites, which make it involved in the growth and development process of plants and the key points for regulation of the synthesis of secondary metabolites. It's one of regulators in primary and secondary metabolism and played an important role in the stress. According to the bioinformatics analysis and function characterization of each member of the BsWRKY family, we screened the relative sequence for each function. In different locations, WRKY not only played its own functions, but also coordinated the regulation of the whole life activities. These results provided a theoretical basis for further study of the functions and regulatory mechanisms of WRKY gene family of B. striata. As transcription factors, WRKY genes had different function, while some potential WRKY TFs may be to control the growth and development in B. striata, for instance, tubers development and biomass accumulation of B. striata. However, more functions and internal regulatory mechanisms need to be studied. By studying the WRKY gene family, gene editing technology can be used to conduct in-depth research on the members of this gene family, and further explore the functions of this gene family and the regulatory mechanisms involved in regulating plant growth, development and stress response. There will be opportunities to improve stress resistance and secondary metabolites in plants of great signi cance for cherished medicinal plants.
Declarations Table   Table 2 is not available with this version. Figure 1 Cis-responsive elements in the upstream region of the initiation codon of BsWRKY genes. Different colored boxes indicated different cis-responsive elements.

Figure 2
The size(A), sequence(B) and location(C) of the conserved motifs on WRKY protein family. The length and order of the boxes with different colors represent the actual size and location of each motif in protein sequence respectively.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Appendix.docx