Genome-Wide Identification of MYB Gene Family Reveals Their Potential Functions in Cadmium Stress Response and the Regulation of Cannabinoid Biosynthesis in Hemp ( Cannabis sativa L.)

Backgroud: Heavy metal pollutant, Cadmium (Cd), is an inorganic pollutant in China. Hemp (Cannabis sativa L.) is an important fiber crop to remediate heavy metal-contaminated soils. The MYB family is one of the largest and most important gene families that influences plant growth and produce secondary metabolites. Results: 115 CsMYB genes were identified in the hemp genome. The 1R-MYB subfamily had 17 genes, the 2R-MYB subfamily 88 genes, and the 3R-MYB subfamily 8 genes. The synteny analysis of the CsMYB genes indicated that there were 3 pairs of tandem repeats, 21 pairs of segmental duplication, 6 CsMYB genes present before species differentiation, and 60 genes belonging to the hemp-specific CsMYB genes. 7 CsMYB genes were identified to influence the response to Cd stress by combining the transcriptome data of two hemp species under Cd stress. Based on the changes in the Cannabinoid content of hemp under Cd stress, the expression of different CsMYB genes in hemp with high and low cannabidiol (CBD) contents, and tissue-specific expression, it was inferred that CsMYB024 may be affected by Cd stress and mediate the CBD synthesis pathway. Conclusions: This study provides a comprehensive analysis of the hemp MYB family. These results should be helpful in understanding of their potential functions in Cd-Stress Response and the Regulation of Cannabinoid Biosynthesis in Hemp. In addition, lays the foundation for the further study of biological functions of CsMYBs in hemp. conducted BIO-RAD


Introduction
Plants are subjected to various abiotic stresses, which seriously affect plant growth and development [1]. Heavy metal pollution is a common abiotic stressor, with Cd ranking first among inorganic pollutants in China and being the most serious pollutant of arable land [2]. When plants are exposed to heavy metals, many genes change their expression and produce secondary metabolites that defend against the external stresses. Identifying key genes that respond to abiotic stresses can lay the foundation for breeding highly resistant plants [3−5]. Transcription factors are a class of proteins that bind to specific sequences upstream of gene sequences and regulate biological growth by altering gene expression. The identification of key transcription factors and their functions is an important method used for studying plant growth and development and for coping with external abiotic stresses [6]. MYB [7,8].
Many MYB gene families have been identified in various plants and analyzed genome-wide.
As of December 2020, 69,027 MYB genes were officially registered in the NCBI database (https://www.ncbi.nlm.nih.gov/). Physiological, biochemical, and molecular tests indicated that many MYB transcription factors are associated with plant growth, stress resistance, and the synthesis of secondary metabolites [9]. For example, AtMYB2 [10] and AtMYB68 [11] improved drought tolerance and boric acid tolerance in Arabidopsis; OsARM1 [12] improved resistance to arsenic stress in rice; AtMYB58 [13], AtMYB123 [14], and AtMYB11 [15] are associated with the Arabidopsis phenylpropanoid metabolic pathway, proanthocyanidin synthesis, and flavonol biosynthesis, respectively. While the MYB gene family has been intensively studied and explored in various plants, no systematic studies have yet been reported for many hemp crops, particularly industrial hemp.
Hemp (Cannabis sativa L.) is a variety of the genus Cannabis in the family Cannabaceae and is widely used in textiles, medicines, skin care, and health care [16]. A tetrahydrocannabinol (THC) content of less than 0.3 % in hemp indicates an industrial hemp and is considered to be of no use as a drug hemp variety [17]. Internationally, industrial hemp has been grown in soil contaminated with Cd to remediate the soil, producing economic benefits while also preventing the heavy metal from entering the human food chain [18]. The MYB transcription factor is the most important transcription factor for plant development and stress resistance. Systematic identification of the whole hemp genome and related research into Cd stress is necessary. In recent years, studies have found that exogenous heavy metal stress can increase CBD, a representative cannabinoid, content in hemp [19]. One of the objectives of this study was to determine whether Cd stress leads to changes in the content of cannabinoids by affecting the expression of MYB genes.
The aim of this study was to conduct a genome-wide analysis of hemp MYB genes using bioinformatics and to analyze their genetic structural differences and evolutionary relationships, and conjoint analysis with Cd-stress and cannabinoid synthesis. In this study, 115 MYB genes were identified from the hemp genome by analyzed their evolutionary relationships and gene structure. Their evolutionary relationships with dicotyledonous soybean (Glycine max), Raymond's cotton (Gossypium raimondii) and monocotyledonous maize (Zea mays L.), and rice (Oryza sativa L.) were compared. simultaneously, we analyzed transcriptome data of two hemp varieties with Cd-resistance, we identified 7 CsMYB genes that may potentially play a vital role in Cd-stress resistance. More importantly, by determining the cannabinoid changes of hemp flowers and leaves after Cd treatmet, expression patterns of CsMYB genes of hemp varieties with different cannabidiol contents, measured the expression levels of CsMYB genes in , roots, stems, leaves, flowers. we identified a CsMYB gene that may regulate cannabinoid biosynthesis. This study can provide a meaningful reference for potato breeding improvement.

Identification of hemp MYB genes
The hemp genome [20] and genome annotation (assembly number: GCA_900626175.2) were downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/). The MYB gene sequences of rice and soybean were downloaded from the PlantTFDB [21] database (http://planttfdb.gao-lab.org/) and compared to the hemp genome using the software TBtools [22], for blast sequences (E-value < 1E -5 ), to screen the candidate genes of MYB family in hemp. The candidate genes were submitted to the UniProt database (https://www.uniprot.org/) for batch comparison to verify whether the candidate genes contained MYB-conserved structural domains and to remove false-positive genes. Physicochemical properties were analyzed using the online software ExPASy Proteomics (http://web.expasy.org/compute_pi).

Construction of phylogenetic tree and gene structure analysis of hemp MYB gene
Hemp MYB protein sequences were imported into the software MEGA7.0 [23], and the results of the comparison were used to construct a phylogenetic tree using the neighbor-joining (NJ) method (execution parameter: bootstrap method 1000) [24,25]. The software FigTree [26] was used to refine the embellished evolutionary tree. Protein structure prediction was performed using the online software NCBI-CDD (https://www.ncbi.nlm.nih.gov/cdd/) to detect the protein structural domain of hemp MYB (E-value < 0.01). The conserved motifs of the hemp MYB gene were analyzed using the online software MEME (http://meme-suite.org) with the prediction number set to 10. The software TBtools was used to extract the relevant coding sequences (CDS) and untranslated regions (UTRs) of hemp MYB from the hemp genome annotation files. TBtools was then used to combine the evolutionary tree, protein structural domain, gene conserved motifs, CDS, and UTR to construct an overall comparison map of MYB evolutionary relationships and structures.

Chromosomal distribution and synteny analysis of hemp MYB gene
TBtools software was used to extract MYB gene location information from the hemp genome files and gene annotation files and to construct a physical map of the MYB gene on the chromosomes. TBtools, MCScanX [27] and Circos [28] were used to calculate and map the presence of tandem repeats of hemp MYB on chromosomes, presence of covariance genes between different chromosomes of hemp, and relationship between hemp and soybean, Raymond's cotton, maize, and rice.

Analysis of MYB gene expression in Yunnan hemp no. 1 and Inner Mongolia small grain hemp under Cd stress
The transcriptomic data of Yunnan hemp no. 1 (Cd-tolerant) and Inner Mongolia small grain hemp (Cd-sensitive) were selected under normal environment and Cd-stress [18]. The raw sequencing data were converted into Sequenced Reads or Raw Reads using CASAVA base calling analysis and then filtered again to obtain high-quality, pure reads. The quality-controlled data were compared to the Hemp (version GCF_900626175.2) reference genome [20] using both TopHat2 (http://ccb.jhu.edu/software/tophat/index.shtml) and HISAT2 (http://ccb.jhu.edu/software/hisat2/index.shtml) software, and mapped reads were spliced using String Tie [29] or Cufflinks [30]. The raw transcriptome data were matched to the hemp MYB genes obtained from genome-wide analysis, and the number of reads was counted and further converted to FPKM values [31] for comparative analysis. The expression of homologous genes, which displayed consistent change trends under Cd stress, was compared between the two hemp varieties, and the gene-to-gene correlation coefficients were calculated using the Spearman correlation algorithm [32,33] to infer the key genes responsible for Cd tolerance, based on the degree of association between the genes.

Changes in the CBD and CBDA contents of hemp flowers and leaves after Cd treatment
Adult hemp plants (Yunnan hemp no. 1 grow with greenhouse), grown to the flowering stage, were transferred to soil containing Cd at a concentration of 100 mg/kg, and the CBD and the CBD precursor (CBDA) Percentage contents in the flowers and leaves were measured after 15 days of treatment [34].

MYB gene expression analysis of hemp varieties with different cannabidiol contents
Two hemp varieties with different CBD contents were selected: H8 (higher CBD content) and FM (lower CBD content), and the expression of 14 MYB genes was examined in randomized samples from both the hemp flowers and leaves. The primers were designed using Primer 3 plus (Table S5), with DHS2 [35] as the internal reference gene, and the upstream and downstream primer sequences were sense: 5-TTGTCAACCAACCTCGTGTC-3; antisense:

5-GAGGGCTCTCTCGTAGCACATC-3. The tests were conducted using a BIO-RAD CFX
Connect fluorescence qPCR instrument. The period taken was the flowering period of the two hemp species, and RNA was extracted using the Aidlab kit. cDNA was synthesized using the Takara reverse transcription kit, and real-time fluorescence quantitative PCR was verified using the Takara kit. The reaction conditions were as follows: pre-denaturation at 95 ℃ for 2 min, denaturation at 95 ℃ for 15 s, annealing at 60 ℃ for 30 s, and 40 cycles. The relative expression of the target genes was calculated using the 2 -ΔΔ Ct method.

Tissue-specific expression analysis of key MYB genes
The key MYB genes selected in 4.4 and 4.6 were subjected to tissue-specific expression analysis in Yunnan hemp no. 1 grow with greenhouse. The methods previously described in 4.6 were used to determine their expression in tissues collected from the roots, stems, leaves, and flowers (Table S5).

Identification of hemp MYB genes
Using gene sequence comparison and conserved structural domain analysis, 115 MYB genes were identified from hemp and named CsMYB001−CsMYB115 based on the information obtained from the CDS fragments (Table s1) (Table S1).

Phylogenetic tree construction and structure analysis of the MYB gene family
To understand the evolutionary relationships among hemp MYB genes, an evolutionary tree was constructed with the 115 hemp MYB genes, which were divided into three subfamilies; 1R-MYB/MYB-related, 2R-MYB/R2R3-MYB, and 3R-MYB ( Figure 1).
The 2R-MYB subfamily had the most members, consisting of 88 genes, the 3R-MYB subfamily had the least members with 8 genes, and the 1R-MYB subfamily contained 17 genes.
CsMYB097 and CsMYB098 were identified but were not placed into any of the three subfamilies because of the dispersion of the conserved structural domains of the genes; however, CsMYB097 was similar to the combination of two 2R-MYB genes, and CsMYB098 resembled the R2R3-MYB structural domain divided into two segments. Interestingly, CsMYB024, CsMYB037, CsMYB064, and CsMYB112 were 3R-MYB genes that appeared in the 2R-MYB subfamily in the evolutionary tree, and CsMYB023 and CsMYB094 were 1R-MYB genes that also appeared in the 2R-MYB subfamily in the evolutionary tree, presumably evolving from the more closely related 2R-MYB subfamily. Figure (Table S2). Synteny analysis of the five plants revealed 60 MYB genes that were unique to hemp and may have been derived from tandem repeat amplification of the original MYB genes of hemp, which may be relevant for understanding the functional differences between the MYB genes of hemp and other plants [36]. Conversely, CsMYB005, CsMYB013, CsMYB016, CsMYB041, CsMYB053, and CsMYB081 were identified in all five plants, suggesting that these genes may be MYB genes that existed before the divergence of dicotyledons from monocotyledons [37]. In contrast, the CsMYB113, which has a covariance between hemp chromosomes, was present in all three species except in maize, where no covariance was found. None of the remaining paralogous hemp genes were present in the analysis of covariance with the other four species, strongly indicating that CsMYB113 may have arisen by amplification.

Expression analysis of MYB genes in two hemp varieties under Cd stress
In  (Table S3, Table S4). The expression amounts reported for all of the genes in Table S3 and  These hemp MYB genes that displayed greater variation in expression may be the key MYB genes that code for greater Cd tolerance. The key differential genes identified were subjected to Spearman correlation analysis, and based on the correlation between genes, it was hypothesized that CsMYB045, CsMYB016, CsMYB067, and CsMYB098 may be the key genes in the upregulated MYB network (Figure 7c), whereas CsMYB010, CsMYB061, and CsMYB005 may be the key genes in the downregulated MYB network (Figure 8c).

Changes in the contents of CBD and CBDA after Cd treatment in hemp
CBD is a unique secondary metabolite in hemp that can be used to treat a variety of diseases.
It is currently an important plant secondary metabolite of international concern. Three asexual hemp plants grown to the flowering stage were transferred into soil containing Cd at a concentration of 100 mg/kg, and three asexual hemp plants were grown in a normal environment.
Fifteen days later, the CBD and cannabidolic acid (CDBA) contents in the flowers and leaves of the six asexual hemp plants were measured. As seen in Figure 9, the CBD and CDBA contents in flowers increased in the Cd-stressed environment, whereas the CBD and CDBA contents in leaves decreased in the Cd-stressed environment. The change in CBD content of the leaves was significant (p≤0.05), which indicates that external Cd stress may affect the CBD content of hemp, even though the changes in contents of the other cannabinoids were not statistically significant.
This may have been because the tolerance to Cd in adult hemp plants was high and the external Cd stress treatment time was too short.

Analysis of MYB gene expression in hemp varieties with high and low cannabidiol contents
The application of external Cd-stress to hemp plants was found to alter the plants' CBD and CBDA contents and the expression of a large number of hemp MYB genes; thus, MYB genes might be associated with the production of plant secondary metabolites. To investigate MYB genes that might be associated with CBD production, we randomly selected 14 MYB genes:  [42], Oryza sativa L. (155 members) [41], soybean (244 members) [43], Physcomitrella patens (116 members) [44], and Brassica napus L. (249 members) [45,46].
In this study, 115 members of the MYB gene family were identified in hemp with an 876.14 Mb genome size. Considering the different genome sizes and numbers of MYB genes in these plant species, we hypothesized that the number of MYB genes is not entirely associated with their genome.
Gene duplication events, which result from unequal crossing-over, duplication-dependent chromosome breakage, and nonhomologous exchange, are the main factors contributing to the expansion of gene families [47]. In a previous study, a large number of tandem-duplicated gene pairs and segmental duplications were identified in some plants, including potato, pear, apple, grape, physic nut, and Gossypium raimondii [48,49]. Similarly, 3 pairs of tandem-duplicated genes and 21 segmental-duplicated genes were identified in the hemp genome ( Figure 4, Figure 5).
Taken together, we speculate that segmental duplication was the main factor during the evolutionary process that resulted in the expansion of the MYB family in hemp.

Potential functions analysis of CsMYB in Cd-Stress response and regulate cannabinoid biosynthesis
To further explore the evolutionary pattern of MYB genes in hemp, synteny analyses were performed between the MYB genes of hemp and four representative plants. CsMYB genes were found to have more homologous gene pairs with MYB genes in Gossypium raimondii and Glycine max than those in Oryza sativa L. and Zea mays L. (Figure 6). One possible reason for these results is that hemp, Gossypium raimondii, and Glycine max are dicot plants, whereas Oryza sativa L. and Zea mays L. are monocot plants, and the results are similar to the synteny analyses results for potato, poplar, and wheat [50,51]. Among these homologous genes, CsMYB005, CsMYB013, CsMYB016, CsMYB041, CsMYB053, and CsMYB081 were identified in all five plants, which suggests that these genes existed before the divergence of dicot and monocot plants, and some MYB genes found in different plants may have originated from a common ancestor.
Hemp can be used for the remediation of heavy metal-contaminated soils [52], and MYB genes have been widely reported to be involved in the regulation of plant defense responses to heavy metal stress [53]. In our study, the expression of most CsMYB genes was altered in both Cd-sensitive and Cd-tolerant hemp varieties in response to Cd-stress. Although the trends in expression patterns were similar in Cd-tolerant and Cd-sensitive hemp varieties, the induction of transcript levels was more pronounced in Cd-tolerant varieties than in Cd-sensitive varieties.
Based on correlation coefficient analysis, seven CsMYB genes were further hypothesized to be the key regulators in response to Cd stress ( Figure 7, Figure 8). Four CsMYB genes were upregulated under Cd stress, with CsMYB016 having a direct homolog in five species, CsMYB067 having a direct homolog in three species, CsMYB098 having direct homologs in two species, and CsMYB045 having no direct homologs. These results indicated that the four genes were in turn less primitive. Both CsMYB016 and CsMYB067 were involved in one amplification of paralogous homologs within hemp species, suggesting they may have evolved along with new functions in hemp, while CsMYB067 displayed similarity to genes in Arabidopsis that may be related to cytokinin [36] or abscisic acid [37] and are significantly upregulated under salt stress [38], thus providing additional evidence that these four genes may be the key genes in regulating Cd tolerance. Three other CsMYB genes were downregulated under Cd stress, with CsMYB005 having direct homologs in five species, CsMYB061 having direct homologs in three species, and CsMYB010 having no direct homologs, indicating that the original degree of the three genes was reduced in order. Similar genes in Arabidopsis are associated with cell growth and cell wall development [54,55]; however, the functions of these genes in hemp need further validation.
Cannabinol is a secondary metabolite and one of the main phenolic compounds in hemp. In recent years, CBD sales has captured an increased segment of the market, and scientific research into the use of CBD continues to expand. External stress can affect the content of plant secondary metabolites: when the industrial hemp is exposed to Cd stress, there are significant changes in cannabinol content and the expression of MYB genes, which are the key transcription factors for the synthesis of secondary metabolites. Although the pathway of cannabinol synthesis has been clarified, the related genes (such as MYB genes) that regulate its content within the plant are not yet clear. We found that the expression of CsMYB024, CsMYB028, and CsMYB062, selected from 2.7, was higher in both the leaves and flowers of hemp with high CBD content than in those with low CBD content. In the examination of tissue-specific expression, CsMYB024 was found to be relatively highly expressed in the flowers and leaves, both of which are organs with high CBD accumulation, indicating that this gene may be associated with CBD synthesis. While the expression of CsMYB024 showed a decreasing trend in the Cd-stressed hemp of both varieties, the CBD and CBDA contents in the hemp leaves were significantly downregulated under Cd stress. It is speculated that CsMYB024 may be involved in the regulation of a Cd-mediated decrease in CBDA content, but its executive function is not yet clear and needs to be verified in future experiments.
A total of 115 CsMYB genes were identified from the hemp genome and were further studied using bioinformatics and tissue-specific expression analysis. Furthermore, the potential function of CsMYB genes in response to Cd stress and the regulation of CBDA biosynthesis were explored.
Several potential candidate genes for participating in Cd stress responses or regulating the synthesis of CBDA have been identified and characterized. Our research provides insight into the functional analysis of CsMYB genes in hemp and indicates areas in which further analysis is needed. The next step of this study can increase the concentration gradient and time gradient of hemp Cd-stress, and it is also necessary to carry out transgenic verification of certain genes. Table S1. List of the 115 CsMYB genes identified in this study. Table S2. One-to-one orthologous relationships between hemp and Raymond's cotton.