Systematic analysis of HD-ZIP transcription factors in sesame genome and gene expression profiling of SiHD-ZIP class I entailing drought stress responses at early seedling stage

Sesame is an ancient oilseed crop, known for its high oil content and quality. Its sensitivity to drought at early seedling stage is one of the limiting factors affecting its world-wide growth and productivity. Among plant specific transcription factors, the association of HD-ZIPs with sesame drought responses at early seedling stage is not well-established yet and is very important to develop our molecular understanding on sesame drought tolerance. In this study, total 61 sesame HD-ZIP proteins were identified, based on their protein sequence homology with Arabidopsis and protein domain(s) architecture prediction, followed by their phylogenetic, conserved domain(s) motifs and gene structure analyses to classify them into four classes (HD-ZIP Class I-IV). HD-ZIP Class I was also subdivided into four subgroups: α (SiHZ25, SiHZ43, SiHZ9 and SiHZ16), β1 (SiHZ10, SiHZ30, SiHZ32 and SiHZ26), β2 (SiHZ42 and SiHZ45) and γ (SiHZ17, SiHZ7 and SiHZ35) by a comparative phylogenetic analysis of sesame with Arabidopsis and maize. Afterwards, twenty-one days old sesame seedlings were exposed to drought stress by withholding water for 7 days (when soil moisture content reduced to ~16%) and gene expression of HD-ZIP Class I (13 members) was performed in well- watered (control) and drought stressed seedlings. The gene expression analysis showed that the expressions of SiHZ7 (6.8 fold) and SiHZ35 (2.6 fold) from γ subgroup were significantly high in drought seedlings. This study is useful in demonstrating the role of SiHD-ZIP Class I in sesame drought responses at early seedling stage and to develop its novel drought tolerant varieties


Introduction
Sesame (Sesamum indicum L., 2n= 26) is a very ancient oil seed crop. It is lauded as the "Queen of Seeds" as its seeds are a rich source of high quality and highly stable edible oil (50-60%) [1]. It is essential to human life in several ways serving as a good source of proteins (25%), carbohydrates (13.5%), polyunsaturated fatty acids, lignans (sesaminol sesamin, sesamol and sesamolinol), vitamins (B & E) and several mineral ions (phosphorous, iron, magnesium calcium, manganese, copper and zinc) [2,3]. Due to the pleasing aroma and taste of its seeds and oil, it is widely used for multiple purposes like cooking, confectionary, baking, livestock, medicines and cosmetics [4]. Sesame is cultivated in arid and semi-arid lands, but its growth and yield are substantially reduced due to high temperatures and episodic/ terminal drought conditions [2]. Although, sesame is a wellknown "survivor" or "drought tolerant" oil crop yet it is highly sensitive to drought, particularly at germination and flowering (anthesis) stage, which ultimately affect its growth and yield [2].
The identification and development of drought tolerant varieties is a crucial to sustain production of sesame crop. However, for the development of drought tolerant varieties, it is very important to understand drought tolerance mechanisms [5]. So far, only a handful molecular studies can be found on sesame drought responses, particularly at early growth stage. Transcription factors (TFs) are one of the most important gene families participating in transcriptional regulation of drought stress responses in sesame [2].
Homeodomain-leucine zipper (HD-ZIP) is a well-characterized TF family specific to plant species. It is related to growth, development, and stress responses in plants. It belongs to a very large gene family, well-known as homeobox (HB). It is characterized by a highly conserved region (~60 amino acid sequence) known as homeodomain (HD) for specific DNA binding and less conserved motif, called leucine zipper (LZ) for protein-protein interactions through homo-and/or hetro-dimerization. HD-ZIP TF family is mainly divided into four classes (HD-ZIP Class I-IV) in plants [6][7][8].
HD-ZIP Class I which is characterized by highly conserved HD-domain and less conserved LZ motif, serves as a binding-site for dyad-symmetric sequence CAAT(A/T) ATTG [9,10]. This class has crucial roles in plant growth, development and in abiotic stress responses. Moreover, it is also reported to be linked with cell membrane stability and abscisic acid (ABA) signaling leading to reduced plant growth specifically under drought stress. HD-ZIP Class II possesses CPSCE (Cys, Pro, Ser, Cys, Glu amino acids) motif and N-terminal consensus sequence, in addition to well-conserved HD-domain and LZ motif [11]. Like Class I, Class II members also bind to CAAT-N-ATTG sequence but differ in the recognition of nucleotide located in the center of the target site. This class is particularly involved in plant growth and development, shade avoidance, auxin response and drought stress. In contrast, HD-ZIP Class III is distinguished from other HD-ZIP classes by having START domain (steroidogenic acute regulatory protein-related lipid transfer), an adjacent conserved region SAD (STARTadjacent domain) and MEKHLA at C-terminal along with HD-domain and LZ motif. Based on in-vitro studies, HD-ZIP Class III recognizes GTAAT(G/C) ATTAC sequence as binding-site [9]. It is mainly involved in plant growth/development and oxygen redox/light signaling processes [12]. In contrast to Class I and II, role of Class III is least understood in abiotic stress like drought, salt stress and ABA treatment [7]. In HD-ZIP Class IV, HD-domain and LZ motif (loop in the middle) are also present as in Class I-III. HD-ZIP Class IV also possesses START domain (lipid binding domain) like Class III which binds to AAATG(C/T) A but lacks MEKHLA domain [9]. Like Class III, Class IV has no role in abiotic stresses rather it bears functional importance in cell differentiation, flowering, and cuticle development [6].
In plants, HD-ZIP Class I of HD-ZIP TF family involved in abiotic stresses, including drought. HD-ZIP Class I is further categorized into six monophyletic subgroups or clades (α, β, γ, δ, ε, and φ) in Arabidopsis. HD-ZIP Class I TFs (like Zmhdz10 in maize) were induced by ABA treatment in drought stress and regarded as the positive transcriptional regulators to confer drought tolerance in plants [13,14]. As evident from previous studies, HD-ZIP Class I genes from γ, β and α clades showed notable response to drought stress in different plant species such as Arabidopsis [13] and [15]. In fact, depending upon plant developmental stage and environmental conditions, different gene members of this class exhibit different gene expression levels during drought stress. The expression of genes from γ-clade -AtHB7, and AtHB12 (Arabidopsis), OsHOX6, OsHOX22, and OsHOX24 (rice), HaHB4 (sunflower), MtHB1 (Medicago) and NaHD20 (tobacco) -is typically high in drought stress demonstrating reduction in plant growth [16]. Similarly, upregulation of ATHB5, -6, -7 and -12 (β-clade) in Arabidopsis [8] and Zmhdz7, Zmhdz1Zmhdz3 and Zmhdz2 (β-clade) in maize (Zhao et al., 2011)also signifies their functional importance under drought stress/ABA treatment in plants. Downregulation of Zmhdz15 (β-clade) under drought stress and in ABA treatment was reported in maize [15]. Like HD-ZIP I β-clade, gene expression of HD-ZIP α-clade can also be increased (AtHB13 and HaHB1) or decreased (Zmhdz11, Zmhdz5 and Zmhdz13) upon drought stress [17].
The genes of HD-ZIP Class I (wheat TaHDZipI-3, -4 and -5) were characterized to specifically bind to stress-related cis-elements in the promoter sequences of drought-responsive genes to regulate drought responses [18]. In addition to this, these genes might act as upstream regulator of other TFs conferring drought tolerance in plants [6]. This type of association was previously reported between AtHB13 (HD-ZIP Class I) and JUB1 (a member of NAC TF family), establishing a joint drought stress control module as an upstream regulator of JUB1 and later as a positive regulator of drought tolerance in Arabidopsis [17]. Moreover, altered gene expression of stress/ABA-responsive genes such as D1-Pyrroline-5-carboxylate synthetase 1 (P5CS1), Responsive to dehydration 22 (RD22), Responsive to dehydration 29B (RD29B) and ABA insensitive1 (ABI1) is another attribute of HD-ZIP Class I mediated drought stress tolerance, as reported in Zmhdz10 over-expressors transgenic rice plants [19].
So far, genome-wide analyses have been performed on several stress related TF families such as ERF, WRKY, MYB, NAC and bZIP in sesame. Moreover, genome-wide classification of HD-ZIP gene family and their expression analysis under drought and salinity stress in different tissues (root, stem, flower, leaf, capsule and seed) of sesame has already been reported [16]. In the current study, the classification of HD-ZIP proteins in sesame has not only extended from its four well-conserved classes to further subgrouping of Class I into α, β1, β2 and γ but also possible roles of α, β1, β2 and γ were explored in response to severe drought stress in sesame at early stage. To our knowledge, the role of HD-ZIP Class I in drought stress responses entailing early growth stages in sesame has not been studied so far.

Materials and methods
In silico identification of SiHD-ZIP proteins HD-ZIP protein sequences were identified from sesame whole proteome (S_indicum_v1.0), downloaded from National Center for Biotechnology Information (NCBI: https:// www. ncbi. nlm. nih. gov/ genome/). This identification was based on typical homology search and prediction of protein domain architecture using The HMMER approach. First of all, homology search was performed between sesame whole proteome sequences with that of protein sequences of A. thaliana transcription factors families, retrieved from The Arabidopsis Information Resource (TAIR) (https:// www. arabi dopsis. org/) to create a database for homology search on BLAST standalone version (2.6.0+). The candidate TFs orthologs in sesame were predicted based on reciprocal BLASTP hits defined as: ≥80% query coverage, e-value <10 −6 and ≥50% sequence identity. From the set of identified TFs orthologs in sesame, redundant protein data (duplicate hits) was eliminated using python scripting. Thus, obtained non-redundant protein sequences with query coverage ≥80% were selected as candidate transcription-associated proteins (TAPs) in sesame. Candidate TAPs were verified based on their domain(s) architecture using Superfamily (1.75) based on HMM library comprising Hidden Markov's Models [7] and python scripting. After domain(s) verification, the proteins were classified into plant-specific TF and TR families in sesame, according to a consensus rule based on previously established methods for domain(s) assignment in protein sequences, as reported by [20]. Total number of well-characterized HD-ZIP proteins as well as their respective gene loci, linkage groups (LG1-LG16), chromosomal locations and exon counts from NCBI were determined in sesame. Subsequently, based on this information, a numerical name was designated to each of these predicted SiHD-ZIP protein sequences depending upon their distribution order (LG1-LG16) in sesame.

Phylogenetic analysis and classification of SiHD-ZIP proteins
After removing unassigned and outlier protein sequences from the analysis, the predicted SiHD-ZIP proteins were selected for their classification into four well-conserved classes of HD-ZIP TF family known in plants. Multiple sequence alignments were built by ClustalW (http:// www. clust al. org/). Then, a phylogenetic tree was constructed by choosing maximum likelihood (ML) method and 1000 bootstrap replications in order to divide them into four classes (SiHZ I-IV) using MEGA (version X) [21]. For the sub-classification of SiHD-ZIP Class I into subgroups and clades, second ML tree was built from the multiple sequence alignment of protein sequences of SiHD-ZIP Class I and already-reported members of SiHD-ZIP Class I in maize and Arabidopsis [22]. After construction, the phylogenetic trees were analyzed using Dendroscope version 3.7.2.
Identification of conserved domain(s), isoelectric point (pI)/molecular weight (kDa) and isoelectric point (pI) in SiHD-ZIP Class I-IV To validate the classification of SiHD-ZIP Class I-IV protein sequences, the conserved motifs of these proteins were determined by subjecting protein sequences to online MEME (Multiple Expectation Maximization for Motif Elicitation) version 5.1.1 with parameters set as: optimum motif width (≥ 6 and ≤ 200); maximum number of motifs (20) [23] The conserved motifs were then further verified by two conserved motifs prediction tools, Expasy PROSITE [24] and Interpro 82.0 [25]. Moreover, domain architectures of HD-ZIP Class I-IV proteins were also predicted by SMART (Simple Modular Architectural Research Tool), to confirm the presence of class-specific domains in SiHD-ZIP protein sequences [26]. The isoelectric point (pI)/ molecular weight (kDa) of each of these proteins were also calculated by Expasy online tool https:// web. expasy. org/ compu te_ pi/ to further validate the predicted protein sequences in the given study.

Gene structure analysis of SiHD-ZIP proteins
The gene structure analysis of SiHD-ZIP Class I-IV was performed by Gene Structure Display Server (GSDS) version 2.0. [27]To determine their gene structures, the coding and genomic sequences of these proteins were obtained from NCBI database.

Plant material and drought stress treatment
Sesame seeds of TS-5 variety were collected from National Agricultural Research Centre (NARC), Islamabad, Pakistan. After overnight soaking (~24 h) in fresh water, seeds were dried and sown in plastic pots filled with a mixture of soil, sand and peat in 2:1:1 ratio and kept at 25 ± 3 °C with 16-h day, 8-h night photoperiod. After 20 days of seedling growth, drought stress was imposed by complete water withholding in drought treated seedlings. The control plants were watered regularly throughout the experiment. The leaf samples were subsequently collected from both control and drought stressed seedlings when the soil moisture reached ~16%. After collection, each leaf sample was stored in liquid nitrogen for 1 3 subsequent RNA extraction and qPCR. The experiment was performed with three independent biological replicates.

RNA extraction and quantitative polymerase chain reaction (qPCR) analysis
For the gene expression analysis of SiHD-ZIP Class I genes under drought stress, total RNA was extracted from the leaf samples of control and drought stressed seedlings using Trizole method [28]. The integrity of the isolated RNA was assessed by 1% denaturing agarose gel. The concentration of isolated RNA was determined by a spectrophotometer (Analytik Jena SPECORD 200 plus). After RNA quantification, each sample was further processed with DNase 1 (Catalog #: EN0525, Thermo Fisher Scientific) to remove DNA contamination. The DNA-free RNA sample was converted into cDNA by using a cDNA synthesis kit (Catalog #: K1621, Thermo Fisher Scientific). Gene-specific primers for qPCR were designed for the identified genes of HD-ZIP Class I by using the NCBI primer designing tool (https:// www. ncbi. nlm. nih. gov/ tools/ primer) and are listed in Supplementary  Table S3. The qPCR was carried out using Maxima SYBR Green/ROX qPCR Master Mix (Catalog #: K0221, Thermo Fisher Scientific) at 54 °C. The actin gene was used as reference gene and three independent biological replicates for control and drought stressed seedlings were used to carry out the gene expression analysis by qPCR. Relative foldchange of target genes was determined by 2 −ΔΔCt method as described by [29].

Statistical analysis
For statistical analysis one-way ANOVA was performed followed by Tukey's test at significance level P= 0.05 using Origin 2019b (https:// www. origi nlab. com/ index. aspx? go= Produ cts/ Origin/ 2019b & pid= 3322). The results were expressed as mean ± standard error (SE) of three replicates.

In-silico identification of SiHZ proteins
In the given study, total 61 well-annotated HD-ZIP proteins were predicted in sesame genome. After prediction, names were assigned to these protein sequences (SiHZ 1-45) according to their chromosomal locations on sesame linkage groups (LGs) (Supplementary Table S1).

Phylogenetic analysis of HD-ZIP proteins and their classification in sesame
Based on the phylogenetic analysis, the identified HD-ZIP protein sequences were classified into four classes (Fig. 1).
Subsequently, an ML tree was also constructed for further division of HD-ZIP Class I into its clades or sub-groups (α, β2, β1, ζ, δ, ε, γ and φ) as shown in Fig. 2. In Class I, the highest number of protein members fall into β subgroup (6). Out of these six members, SiHZ10, SiHZ30, SiHZ32 and SiHZ26 were clustered in β1 while SiHZ42 and SiHZ45 in β2 (Fig. 2). Besides these, four members (SiHZ25, SiHZ43, SiHZ9 and SiHZ16) were included in α sub-group followed by three members (SiHZ17, SiHZ7 and SiHZ35) in γ subgroup (Fig. 2). None of the members of Class I were found in subgroups ζ, δ, ε and φ in sesame (Fig. 2).

Sesame HD-ZIPs conserved domains, their isoelectric points/molecular weight (pI/ MW (kDa) in Class I-IV
In sesame, for validation of HD-ZIP proteins (Class I-IV), their conserved domains were identified by MEME tool (Supplementary Fig. 1). Total 20 conserved domains were identified in SiHZ proteins, ranging from 20 to 98 amino acids in length (T). Then, conserved amino acid sequences of these domains were used to confirm domain's presence and its molecular function in all four classes with the help of Expasy Prosite and InterPro (Supplementary Table S2). From Expasy Prosite, only Homeobox domain was predicted in Motif 1 (common feature of all four classes of HD-ZIP proteins) and START domain in Motif 4, which is a characteristic feature of both, Class III and IV (Supplementary Table S2). The selected protein sequences were analyzed with InterPro. Analysis by InterPro showed greater number of predicted conserved domains and their molecular function to validate HD-ZIP proteins in all four classes in sesame i.e., Motif 1,3,4,5,8,9 and 11 (Supplementary Table S2). The presence of specific domains belonging to four classes was confirmed by SMART ( Supplementary Fig. S2). According to results, HOX domain was found in all Classes (I-IV), HALZ domain was exclusively identified in Class II, START domain was found both in Class III and IV but MEKLA domain was only present in Class III ( Supplementary Fig.  S2).
Furthermore, number of amino acids (a.a), isoelectric point (pI)/molecular weight (MW) of these selected proteins were also analyzed. The result showed great variation among member proteins belonging to each class in their amino acids sequence length, molecular weight and isoelectric point (Table S1). In sesame, HD-ZIP proteins ranged from a minimum of 201 (SiHZ7 from Class I) to a maximum of 853 (SiHZ41b from Class III) amino acids in length. Among SiHZ proteins, the largest amino acids sequence length was obtained in Class III ranging from 803 a.a in SiHZ22c to 853 a.a in SiHZ41b while the shortest range was found in the members of Class I from 201 a.a (SiHZ7) to 340 a.a (SiHZ30). Likewise, the pI values were high in Class III and ranged from 5.81 (SiHZ34) to 6.28 (SiHZ13) as compared rest of the classes. In Class IV, it ranged from 5.40 (SiHZ6a, b) to 8.50 (SiHZ12b) while in Class II ranging from 4.99 (SiHZ3a, b,) to 8.74)] and 4.73 (SiHZ10) to 7.71 (SiHZ7) in Class I (Table 1). Similarly, the range of molecular weights (kDa) was determined highest in Class III members followed by Class IV, II and I (Table 1).

Sesame HD-ZIPs chromosomal locations and their gene structure analysis in Class I-IV
In sesame, an uneven distribution of SiHD-ZIP genes was found, located over 1-12 LGs (Table 1). Among these genes, total 13 genes of Class I were unevenly distributed on six different LGs (2, 4, 8, 11, 13 and 15) while 10 genes of Class II were located on LGs 1-4, 6, 11 and 15 (Table 1). Similarly, genes in Class III and IV, each comprising 8 and 9 genes, respectively, were located on LGs (1,3,4,6-9 and 12) and (1-4 and 8), respectively ( Table 1). The greatest number of genes (9) in Class IV were found distributed over lesser number of LGs (1-4 and 8) as compared to Class I-III (Table 1). Moreover, LG 4 was found common to all classes having two genes from Class I and one of the gene members of Class II-IV (Table 1). Besides this, LG 13 was only common to Class I and LGs (7, 9 and 12) were only identified in Class III (Table 1).
Likewise, the gene structure analysis of SiHD-ZIP genes showed great variability in the number of introns ranging from 1 to 17 ( Supplementary Fig. S1). The results showed genes of each class were quite similar in exonintron structure and clustered together. The number of introns in Class I and II were comparatively lesser than Class III and IV. In Class I, majority of the genes had two introns except for SiHZ 7, SiHZ17 and SiHZ35 which had only one intron in their genes and SiHZ26 had three introns. Likewise, most of the genes had three introns in Class II but SiHZ14 and SiHZ44 were the genes with two introns. Out of these genes, the highest number of introns (17) was found in genes of the Class III while the number of introns in Class IV genes was comparatively greater Fig. 1 Classification of HD-ZIP TF family in sesame -Maximum likelihood (ML) tree of 61 identified HD-ZIP transcription factors (TFs) from sesame. Full protein sequences of the HD-ZIP TF family in sesame were aligned by ClustalW and the ML tree was constructed by using MEGA (version X). Each class is labelled than Class I & II but lower than Class III, ranging from 8 to 10 introns (Supplementary Fig. S1).

Effect of drought stress on sesame seedlings
Under drought stress, sesame seedlings showed reduced or slow growth as compared to their control (well-watered) seedlings. In addition to slow plant growth, curling of leaves with mild yellowing were also observed in drought treated seedlings (Fig. 3).

Expression analysis of HD-ZIP Class I genes under drought stress at early seedling stage in sesame
In this study, gene expression analysis of 13 genes of SiHD-ZIP Class I was performed in drought stressed seedlings along with their well-watered control. Out of these 13 genes, 8 genes showed differential expression under drought stress. The results demonstrate significantly high upregulation of two genes SiHZ7 (6.8-fold) and SiHZ35 (2.6-fold) from γ subgroup while downregulation of two other genes (SiHZ25 with fold change ~0.76 and SiHZ43 having fold change ~0.59) of α subgroup (at P < 0.05) in severe drought stressed as compared to control seedlings (Fig. 4). In addition to this, the expression of rest of the selected genes i.e., SiHZ9 and SiHZ16 (α); SiHZ10, SiHZ26, SiHZ30 and SiHZ32 (β1); SiHZ42 and SiHZ45 (β2) and SiHZ17 (γ) showed no significant difference in the gene expression levels between drought stressed and control seedlings (Fig. 4).

In-silico identification of SiHD-ZIP proteins
HD-ZIP is a plant-specific TF family known for its roles in plant growth, development and responses to light and stress. So far, HD-ZIP is a widely characterized TF family throughout plant kingdom such as Arabidopsis (48) [8], rice (48) [30], maize (55) [15], wheat (46) [14], tomatoes [31] Medicago truncatula (52) [32] cassava (57) [6], soybean (88) [33], Chinese cabbage (113) [34], poplar (63) [35] and eucalyptus (40) [36]. In the given study, total 61 sesame HD-ZIP proteins, encoded by comparatively smaller number of genes have been identified, based on sequence homology search (A. thaliana) and domains assignment rules by HMM (Table S1). In the light of previous studies, results showed that species bias does exist in the distribution of HD-ZIP genes, and it varies from one plant species to another [35]. The HD-ZIP genes are most likely to evolve and expand in numbers due to segmental gene duplication events in different plant species [18]. Thus, the number of HD-ZIP genes in sesame (44) found in this study was almost like that of potato. In contrast to this, the highest number of HD-ZIP genes has been reported in Chinese cabbage and soybean. Thus, these findings shed light on the evolutionary history of sesame as well. From evolutionary point of view, it is distantly related to other oil crops such as soybean, castor and rape but closely related to potato (Solanaceae family) and Phrymaceae families in the core lineage of eudicotsangiosperms. Thus, the number of HD-ZIP genes in sesame sharing similarity with potato but greatly differing from soybean is providing an interesting clue about the evolutionary relationship of sesame with other oilseed crops.

Phylogenetic analysis of SiHD-ZIP proteins and their classification
In this study, a phylogenetic analysis was performed for the classification of sesame HD-ZIP proteins into four well-conserved classes known in plants (Fig. 1). The clustering of similar HD-ZIP members within the same class in sesame is clear from this phylogenetic analysis (Fig. 1). These results clearly reflect high conservation of HD-ZIP TF family in plants and agree with previously classified HD-ZIP TFs in other plant species like Arabidopsis [8], maize [15] and eucalyptus [36] .
In addition to this, sesame HD-ZIP Class I was further divided into subgroups in this study. The subgroups assignment of member proteins in Class I was done based on their  clustering within already known subgroups (α, β2, β1, ζ, δ, ε, γ and φ) in Arabidopsis and maize (Fig. 2). According to results, clades α, β2, β1 and γ were common to all three plant species selected for the phylogenetic analysis. In previous studies, some of these subgroups were shown to be lineagespecific such as β1 clade (subgroup) is exclusively shared among eudicots (Arabidopsis, Medicago, grape, and poplar) while clade ζ is common to monocots (rice, millet, and stiff brome) and clade η to mossy [35]. It is clear from these results that these subgroups were neither found in sesame nor Arabidopsis and maize. In addition to these, sesame also lacks clade δ which was common to both Arabidopsis and maize as well as clades ε, and φ which were only common to Arabidopsis [37]. So, the subgrouping within the Class I found to be consistent with previously reported studies. In Fig. 2, the presence of β1 clade and absence of clade ζ in sesame and Arabidopsis as compared to maize (monocot) ensures their eudicot lineage. It is most likely that sesame might share more functional similarity with Arabidopsis HD-ZIP I class in transcriptional regulation of genes during growth, development and/or abiotic stress responses, including drought.

SiHD-ZIP Class I-IV conserved domains and their isoelectric points (pI)/ molecular weight (Da)
In the given study, high consensus was found in the conserved domain(s) architectures of member proteins within each class, represented by their respective member as shown in Supplementary Fig. S2. According to results, the domain(s) architectures were common to all members within each class of SiHD-ZIPs and were supported by previous findings. Likewise, all SiHD-ZIP members within each class shared great similarity in the values of their amino acid number, pI/MW (Table 1). Overall, members of Class I and II have shorter amino acids sequence length as compared to Class III and IV (Table 1). Similarly, there was more difference in pI values within members of Class I and II as compared to Class III and IV in sesame. Thus, these results clearly reflect same motif organization within the same class but different motif organization among all four classes in SiHD-ZIPs. Hence, these results clearly demonstrate classspecific diversity in the protein sequence, structure and function of HD-ZIP TF family in sesame. LG15 3 I Fig. 3 Effect of drought stress on sesame early seedling growth. The seedlings were >20 days old, grown at ±25 °C (A) Control (wellwatered) seedlings (B) Drought stress was imposed to the seedlings by water withholding for 7 days and soil moisture content reached up to ~16%. The experiment was performed in three replicates

SiHD-ZIP Class I-IV chromosomal locations and their gene structure analysis
According to results, there was an uneven distribution of SiHD-ZIP genes located on [1][2][3][4][5][6][7][8][9][10][11][12] LGs in sesame (Table 1). These results represented the highest number of gene distribution on LG 8 [10 genes: Class I (3), Class III (1) and Class IV (5)] in sesame. In contrary, the lowest gene distribution of HD-ZIP genes was found on LGs 7, 12 and 13 i.e., only one gene per LG (Table 1). Moreover, the gene structure analysis of SiHD-ZIP genes also revealed similar gene structure organization (gene size, number of introns and exons as well as their lengths) shared among members of each class (Fig. S3). Thus, these findings clearly suggest that SiHD-ZIP genes located on the same LG (chromosome) are most likely to be linked with each other. Thus, it is inferred that these linked genes might function together in transcriptional regulatory gene networks involving plant growth, development, metabolic pathways and/or or stress responses in sesame. This is another inevitable finding of this study that the number of SiHD-ZIP genes (44) identified in this study was quite close to the previously reported number of SiHD-ZIP genes (45) in sesame [18]. According to results, the number of identified SiHD-ZIP proteins (61) was much greater than the number of total genes (44) in sesame (Table 1). In this respect, HDZIP Class III is the largest class having the greatest number of SiHD-ZIP proteins (21) encoded by only 9 genes in sesame (Table 1). Similarly, Class IV has total 14 protein members, encoded by only 7 genes and Class II comprises of 12 member proteins encoded by 10 genes (Table 1). In contrast to them, the 13 member proteins are encoded by 13 different genes in Class I (Table 1). Such comparative analysis between SiHD-ZIP proteins (splice variants or protein isoforms) and their respective genes provides useful information to elaborate the functional diversity of HD-ZIP TF family and organism's complexity in sesame. [38] So, in the given study, some of the HD-ZIP proteins in sesame are possible variants (protein isoforms) of the same HD-ZIP gene which can be formed at a time depending upon the tissue type or stage of development [39]. These results are consistent with previously reported studies e.g. the number of HD-ZIP genes found in potato were 15, 10, 7 and 9 in Class I-IV, respectively [40], which is quite similar to the results reported in this study. In contrary, Arabidopsis and maize, both of them have the least number of genes (5) in HD-ZIP Class III while more genes members belong to HD-ZIP Class II (18) and HD-ZIP Class I (17) followed by HD-ZIP Class IV (15) in Arabidopsis and 17, 16 and 10 in Class I, IV and II, respectively, in maize [15]. Moreover, a good comparison can also be made between the number of HD-ZIP proteins (Class I-IV) found in sesame and soybean. In soybean, the largest number of HD-ZIP proteins falls in Class I (30) and II (27) while least in the Class III (12) and 19 in class IV, as compared to sesame [41]. Thus, if the comparison of these classes is made based on number of genes, more structural and functional diversity can be linked in the genes of Class I. Thus, the functional diversity in sesame's HDZIP TF family can be implicated by two means, either through the expansion of the Class I by segmental gene duplication events [18] or alternative splicing of genes to produce multiple splice variants (protein isoforms) from a single or few genes of Class III, IV and II respectively [42].

Effect of drought stress on sesame seedling growth
It is evident from results that sesame seedlings did not show wilting rather resumed slow plant growth with slight yellowing and curling of leaves (Fig. 3). As evident from previous findings, this reduction in growth can be linked to some molecular alterations (e.g. over-expression of member gene(s) from HD-ZIP Class I) to confer drought tolerance in sesame at seedling stage [17].

Expression analysis of HD-ZIP Class I genes under drought stress at early seedling stage in sesame
The functional relevance of SiHD-ZIP Class I in drought and other abiotic stress responses is widely reported in several plant species like Arabidopsis, sunflower [43] and tobacco [14]. Recently, significantly high expression of HD-ZIP I genes (MtHDZ17, MtHDZ34, MtHDZ35, MtHDZ39, MtHDZ47, MtHDZ48, and MtHDZ50) along with several members of HD-ZIP Class II has been reported in response to drought and salinity stress treatments in Medicago truncatul [32]. However, changes in HD-ZIP Class I gene expression are inevitable, induced by water-limiting conditions or ABA treatment. In cassava, a dramatic change in the expression of Class I genes in leaves under drought stress suggests longer periods of drought could cause dramatic effects on cassava [6]. Similarly, both, upregulation or downregulation of Class I genes at different time points (6 h, 12 h & 24 h) have also been reported in soybean [33].
According to the results of this study, members of Class I showed either upregulation, downregulation or no change in their gene expression in the drought treated seedlings compared to the control (well-watered) seedlings (Fig. 3). Similar pattern of gene expression was observed in previous studies in other plants such as tobacco, where HD-Zip class 1 genes showed similar trends in response to drought stress. [14] In the given study, SiHZ7 and SiHZ35, two members of subgroup γ subgroup exhibited significant upregulation under drought stress. Similarly, as previously reported in maize, the gene expression levels of subgroup γ (Zmhdz12, -9, -6 and -4) were strongly induced by more than 2-fold compared to the control [15]. Such high gene expression of members from γ clade is typically induced by ABA treatment and water deficit conditions and have previously reported in Arabidopsis (AtHB7, AtHB12), sunflower (HaHB4), Medicago (MtHB1), tobacco (NaHD20), maize (ZmHDZ4) and rice (OsHOX6, OsHOX22, and OsHOX24) enhancing drought and salinity tolerance via ABA signaling in transgenic plants [18]. Albeit SiHZ7 and SiHZ35, are the two members of the same subgroup γ, showed high gene expression levels under drought stress in this study. They share great similarity with each other as two paralogs, yet their divergent evolution and function can be speculated in order to mediate growth responses under water deficit conditions as previously reported in Arabidopsis (AtHB7 and AtHB12) [44]. In the light of previous findings, reduction in cell elongation and plant growth can be inferred with the upregulation of SiHZ7 and SiHZ5 in sesame under drought stress. Most probably, this reduction in cell elongation and subsequent drought tolerance in sesame can be either due to inhibition of GA biosynthesis such as AtHB12 in Arabidopsis or without the inhibition of GA biosynthesis like AtHB7 in Arabidopsis [17]. In addition to this, like AtHB13 (Arabidopsis), either both or any of these two TFs (SiHZ7 or SiHZ35) may confer drought tolerance at early seedling stage by acting as upstream positive regulator(s) of other TFs such as NAC (JUB1 or its homolog) in sesame. Thus, it can be speculated that SiHZ7 and SiHZ35 may work in a coordinated fashion by changing their gene expression levels to fine tune processes associated with growth (early seedling stage) and water stress in sesame [45] However, functional analysis of SiHZ7 and SiHZ35 is required to draw a better picture of gene or metabolic network(s) involved in sesame drought stress responses/tolerance at early seedling stage.
On the other hand, two members of subgroup α (SiHZ25 and SiHZ43) showed their downregulation in drought seedlings as compared to control. These results seem quite similar with the expression of HDZ-I genes from-clade in maize. According to [15], drought stress resulted in the downregulation of Zmhdz11, -5 and -13 of subgroup α while the expression levels of the subgroup γ genes (Zmhdz12, -9, -6 and -4) were strongly induced more than 2-fold as compared to the control plant. In contrast to this, AtHB13 (Arabidopsis) and HaHB1 (sunflower) were found to constitutive overexpression of α clade members with no significant effect on the growth and yield of transgenic Arabidopsis under drought stress. However, their high expression levels may induce plant stress tolerance to cold, drought and/or salinity by enduring cell membrane integrity and stability in plant cells. [46] In the given study, no significant gene expression changes were found in case of subgroup β under drought stress, particularly, SiHZ26 and SiHZ32 (β1) and SiHZ42 (β2). In contrast to these results, either up-or downregulation of subgroup β gene members is evident in different plant species from previous studies. In Arabidopsis, elevated levels of ATHB5, ATHB6, ATHB7, or ATHB12 were induced in response to ABA and water deficit conditions to regulate plant growth. Likewise, high expression levels of Zmhdz7, Zmhdz1, Zmhdz3 and Zmhdz2 of subgroup β1 were induced while the expression of Zmhdz15 was greatly downregulated by drought stress in maize [15]. Besides this, under drought stress, the downregulation of Class I members such as SiHZ25 and SiHZ43 (subgroup α) can be linked to an increase in plant water content and malondialdehyde content to enhance drought tolerance in sesame. Thus, it is most likely that SiHD-ZIP Class I genes may function in a coordinated manner, particularly, SiHZ7 and SiHZ35 from subgroup γ, in the developmental reprogramming of sesame at early seedling stage in response to drought stress, as already reported in Arabidopsis. [45] Thus, these results clearly suggest both upregulation and downregulation of SiHD-ZIP Class I gene members under drought stress responses at seedling stage in sesame. However, it is most likely that gene members of SiHD-ZIP Class I belonging to subgroups α and γ have more functional importance to regulate early drought stress responses in sesame.

Conclusions
Based on phylogenetic analysis, the total number of identified SiHZ proteins (61) have been classified into four wellknown classes of HD-ZIP TF family (HD-ZIP Class I-IV) in sesame. The given classification was further validated by their conserved domains, isoelectric points, molecular weights, and gene structure analyses. The given study further extended for the subdivision of SiHDZIP Class I into four subgroups: α (SiHZ25, SiHZ43, SiHZ9 and SiHZ16), β1 (SiHZ10, SiHZ30, SiHZ32 and SiHZ26), β2 (SiHZ42 and SiHZ45) and γ (SiHZ17, SiHZ7 and SiHZ35) by a comparative phylogenetic analysis of sesame with Arabidopsis and maize. This study also reveals differential gene expression patterns of SiHD-ZIP Class I under drought stress in sesame seedlings due to its sensitivity to drought at early seedling stage. Therefore, this study provides meaningful information to extend our understanding on sesame drought responses at early seedling stage. In the light of these results, SiHDZIP Class I members from subgroup γ (SiHZ7 and SiHZ35) can be considered good candidate genes to have significant role in drought stress developmental reprogramming of early growth stages under drought stress in sesame. Thus, the given study may help to reveal the SiHD-ZIP Class I mediated regulatory networks of drought stress response at early seedling stage in sesame. Moreover, these findings may provide baseline molecular information to design new strategies to engineer sesame novel varieties with improved drought tolerance.