2.1 Identification of ZF-HD family genes in Medicago sativa L.
A total of 49 MsZF-HD genes, including alleles, in XinJiangDa Ye and 11 genes in Zhongmu No. 1 were identified (only the longest splice variant was reserved at each genomic locus for further analysis). For the 60 putative MsZF-HD proteins, their molecular weight (Mw) and isoelectric point (pI) values were also determined by the Expasy online service (Supplementary Table 1). The protein sequences encoded by the M. sativa ZF-HD genes ranged in length from 75 amino acids for MsMIF-Z1 to 400 amino acids for MsZHD-X1–13, with an average of approximately 232 amino acids. The predicted Mw of the MsZF-HD proteins ranged from 8.14 to 43.36 kDa, and the theoretical pI values ranged from 5.05 to 9.07. These results confirmed that the 60 MsZF-HD proteins had large differences in sequence and protein characteristics.
2.2 Phylogenetic analysis of M. sativa ZF-HD genes
MEGA X software was used to construct a phylogenetic tree of 60 M. sativa and 17 Arabidopsis ZF-HDs (Fig. 1). According to the phylogenetic tree, MsZF-HD genes can be divided into six subfamilies: ZHD Ⅰ, ZHD Ⅱ, ZHD Ⅲ, ZHD Ⅳ, ZHD Ⅴ and MIF. The number of ZHD Ⅳ subfamily proteins is the largest, with a total of 15 MsZHDs and AtZHD5–7; while the number of ZHD Ⅲ subfamily proteins is the least, with 5 MsZHDs and AtZHD1. In addition, 5 XinJiangDa Ye and 3 Zhongmu No.1 MIF proteins were assigned to the MIF subgroup and AtMIF1–3 were separately classified.
2.3 Conserved motifs and gene structures
To support the phylogenetic reconstruction, a motif analysis was performed by transferring the 60 MsZF-HD protein and 17AtZF-HD sequences to the online MEME Web server (Fig. 2). Ten conserved motifs were identified in the MsZF-HD proteins, and the results were largely consistent with the phylogenetic analysis. Motif 2 and motif 3 were detected in most of the MsZF-HD proteins except for AtZHD-14, MsZHD-X2–3, MsZHD-X3–2, MsZHD-X2–13, MsZHD-X2–14 and 7 MIFs (including only motif 2 or motif 3). Each protein contains an average of 5 motifs, and the MIF subfamily contains only 1 or 2. The motif distribution pattern of MsZF-HD proteins in each subfamily was basically the same, such that ZHD III subfamily members all contained motifs 2, 3, 6, 7 and 8.
To identify the characteristics of the alfalfa ZF-HD gene family, the structure of these ZF-HD genes was analyzed (Fig. 2B). Most MsZF-HD genes do not contain introns, and only a few of them contain one intron, such as MsZHD-X1–3, MsZHD-X2–1, MsZHD-X2–5, MsZHD-X2–13, MsZHD-X3–6, MsZHD-X4–2 and MsZHD-Z2. The results revealed that members of the MIF and ZHD I subfamilies do not contain any introns.
2.4 Chromosome distribution and gene duplication analysis
The results showed that 60 MsZF-HD genes were unevenly distributed on six chromosomes of XinJiangDa Ye and Zhongmu No. 1 while none were distributed on Chr2 and Chr4 (Fig. 3). The locations of MsZF-HD genes from the same subfamily were basically the same in XinJiangDa Ye and Zhongmu No. 1. For example, the chromosomes and relative positions of MsMIF-X1–1 and MsMIF-Z1 in the two varieties were fixed, indicating that the arrangement of MsZF-HD genes was highly conserved. When the query coverage and consistency of the candidate genes were ≥80, they were considered repetitive genes. Chromosomal regions within the 200 kb range of two or more genes were defined as tandem replication events [20]. Therefore, the analysis of MsZF-HD gene duplication showed that only one tandem repeat gene (MsZHD-X1–11 and MsZHD-X1–12) was found on chromosome 8.1 in XinJiangDa Ye and abundant genes were involved in fragment repeat events (Fig. 4).
To further infer the phylogenetic relationship between alfalfa and closely related dicotyledonous plants, we analyzed the collinear relationships between alfalfa and the other three plants (Fig. 5). The results showed that 28 MsZF-HD genes of XinJiangDa Ye were collinear with A. thaliana genes, followed by G. max (17) and M. truncatula (15), and 7 MsZF-HD genes of Zhongmu No. 1 were collinear with M. truncatula and G. max genes, followed by A. thaliana (5). The collinear relationship between the MsZF-HD genes of Chr5 and that of the other three plants was significantly greater compared with that of the other chromosomes.
2.5. Promoter region analysis of the MsZF-HD genes
The identification of cis-regulatory elements in the promoter region can provide references for tissue-specific or stress-response expression patterns of genes. The 2.0-kb promoter region located upstream of the transcriptional start site (ATG) of each MsZF-HD gene was analyzed to determine their potential regulatory mechanisms by using online Plant CARE software. Several light response, hormone response and stress response elements were relatively highly abundant among these cis-elements. (Fig. 6). The cis-elements of the MsZF-HD genes belonging to the same subfamily did not show the same pattern. Hormone-related cis-elements are predominantly represented by the MeJA (CGTCA-motif) and ABA (ABRE) response elements, which are endogenous growth regulators of higher plants and have physiological effects, such as inhibiting plant growth and germination, promoting senescence, and improving resistance (Fujita et al. 2013). These elements include 5 ABREs in MsZHD-X1–7 and 6 CGTCA motifs in MsZHD-X1–3, MsZHD-X1–5, and MsZHD-X2–13. Stress response elements can be divided into five categories: MBS, ARE, LTR, TC-rich repeats and WUN-motif. Among them, AREs (anaerobic response elements) were the most abundant, MsZHD-X4–5 and AtZHD9 all contained 6 AREs, and almost 80% of the upstream MsZF-HD genes contained ARE elements. In addition, Box-4, GT1 motifs, and G-boxes that respond to light were also prominent in the upstream MsZF-HD genes, especially Box-4. Ninety percent of the upstream MsZF-HD genes contained BOX-4 elements, and the number of these motifs was also the largest compared with the other motifs.
2.6 Expression of MsZF-HD genes in response to abiotic treatment
To investigate the molecular functions of the MsZF-HD genes in response to abiotic stress, the transcriptome data of Zhongmu No. 1 under different abiotic treatments (250 mM NaCl, 10 M ABA and 400 mM mannitol) were obtained from the NCBI and assembled into the genome of Zhongmu No.1 (Additional file 1: Table S2) .The expression levels of these genes are presented in a heatmap (Fig. 7A). Of the 11 MsZF-HD genes in Zhongmu No. 1, the transcript levels (FPKM values) of three genes indicated that they were not expressed in any tissue while the remaining 8 genes were expressed at least once. Among them, most MsZF-HD genes were differentially expressed in all the examined treatments. Different genes showed different expression patterns; for example, MsZHD-Z5 showed a tendency of upregulation under all three treatments, while most genes (such as MsZHD-Z1, MsZHD-Z4, and MsMIF-Z6) showed a low expression level. The expression of MsMIF-Z3 was upregulated under the ABA treatment, while the expression patterns of the mannitol and NaCl treatments were the opposite.
We also performed RNA-seq to detect the expression levels of the MsZF-HD gene under alkaline stress (Fig. 7B). After mapping those reads to the alfalfa genome, we determined the expression of 43 MsZF-HD genes that were expressed in at least one time period in the RNA-Seq data. All MsZF-HD genes decreased with time and could be clustered into three groups based on the expression profiles. Group I included genes with high expression levels, and members of subfamily I occupied the main position, followed by ZHD IV. Almost all genes in subfamily V were not expressed, and the expression level of only two expressed genes (MsZHD-X1–11 and MsZHD-X2–13) belonging to group II was extremely low. ZHD II and III were almost all concentrated in group III, with low expression levels. To further confirm the RNA-seq result of these MsZF-HD genes, qRT–PCR was performed for the MsZF-HD genes from different clusters (Fig. 7C). The expression patterns of most MsZF-HD genes in the qRT–PCR analysis were consistent with the RNA-Seq analysis, implying that our RNA-seq results are highly reliable. The expression level of most ZF-HD genes was very low, and the ZF-HD genes were downregulated after the alkaline treatment from 3 h to 48 h.