Article Functional MYB transcription factor gene HtMYB2 is associated with anthocyanin biosynthesis in Helianthus tuberosus L.

Background: Tuber color is an important trait for Helianthus tuberosus L. (Jerusalem artichoke). Usually, purple tubers with high anthocyanin content are more nutritious than white tuber. But, the molecular mechanism underlying it is unknown. Results: In the current study, high-throughput RNA-sequencing was used to compare the transcriptomes between plants with tubers with red or white epidermis. Compared with the white-skinned tubers of cultivar QY3, anthocyanin biosynthesis structural genes had greater expression in the red-skinned tubers of cultivar QY1, indicating that the anthocyanin biosynthesis pathway was activated in ‘QY1’; quantitative PCR conrmed this difference in expression. HtMYB2 (Unigene44371_All) was the only MYB transcription factor (cid:0) homologous to the MYB transcription factor regulating anthocyanin biosynthesis, expressed in the red tuber epidermis of ‘QY1’. The anthocyanin concentration in the root, stem, leaf, ower, and tuber epidermis of ‘QY1’ was higher than in ‘QY3’, especially tuber epidermis. Correspondingly, HtMYB2 had greater expression in these tissues of ‘QY1’ than in ‘QY3’. The expression of HtMYB2 was associated with anthocyanin accumulation in the different tissues. Overexpression of HtMYB2 activated the anthocyanin biosynthesis pathway, accumulating the pigment in leaves of transgenic tobacco, supporting the model that HtMYB2 regulated anthocyanin biosynthesis. Further experiments found that HtMYB2 had the same coding sequence and genomic sequence in ‘QY1’ and ‘QY3’, but that there were several single nucleotide polymorphisms and one insertion–deletion (indel) mutation of 21 nucleotides in the promoter region between the two alleles. The deletion of three nucleotides “AAA” made the promoter of ‘QY1’ predicted to contain one more possible promoter region. and anthocyanin in

between the two alleles. The deletion of three nucleotides "AAA" made the promoter of 'QY1' predicted to contain one more possible promoter region. A speci c primer, based on the indel, could differentiate between cultivars with red or white tuber epidermis. The genetic variation in HtMYB2 was associated with the tuber skin color in a natural population.
Conclusions: RNA-seq can successfully isolate the candidate gene (HTMYB2) controlling anthocyanin biosynthesis in purple epidermis of Jerusalem artichoke tuber. HTMYB2 can regulate anthocyanin biosynthesis in plants and is closely related to the formation of purple phenotype in tubers. This study should be useful in understanding the genetic mechanism underlying different tuber skin colors and in breeding new H. tuberosus cultivars with different tuber skin colors. Background Helianthus tuberosus L., Jerusalem artichoke or topinambour, belongs to the Asteraceae family and is native to North America [1]. The tubers of H. tuberosus are rich in fructans, making them a good source of inulin [2], bioethanol [3], and animal feed [4]. Usually, the tuber skin color of H. tuberosus is white, although some cultivars produce tubers with pink, purple or red epidermis. Tuber color is an important parameter by which to differentiate between cultivars of H. tuberosus, the color difference being due mainly to qualitative and quantitative differences in anthocyanins [5,6].
Generally, expression of the structural genes of anthocyanin biosynthesis is regulated by transcription factors, namely WD40, bHLH and R2R3-MYB proteins. The transcription factors regulate the expression of structural genes by forming trimer complexes and binding with the promoters of the structural genes [11]. Allelic variation in the transcription factor genes has been associated with phenotypic variation related to anthocyanin biosynthesis. The transcription factor encoded by the R3MYB gene of dahlia, another member of the Asteraceae, has a domain typical of an MYB gene, which is expressed in colorful dahlia cultivars and can activate the anthocyanin synthesis pathway [12]. The CtMYB13 transcription factor from sa ower (an Asteraceae member) is an important transcription factor regulating the structural genes of the sa ower avonoid biosynthesis pathway [13]. The genetic mechanism of anthocyanin pigment formation has been studied thoroughly in a number of plants, but little is known of the mechanisms involved in H. tuberosus.
High-throughput sequencing (RNA-Seq) technology has become a low-cost and highly e cient tool, which can be used to quickly obtain transcripts of various plant types [14,15]. Due to the large amount of information available on the anthocyanin biosynthesis pathway in plants, the genes related to anthocyanin biosynthesis can be quickly identi ed though transcriptome analysis in plants, even without the availability of the corresponding genome sequence. Through transcriptome sequencing, the gene encoding the MYB transcription factor LrAN2 was isolated from Lycium barbarum, and those encoding bHLH transcription factors TaMYC1 and ThMYC4E were isolated from wheat without genome sequence information, and further experiments con rmed that they were the key genes responsible for black fruit, purple grain and blue grain traits in the corresponding species, respectively [16][17][18].
For H. tuberosus, there have been no reports on the identi cation of the key genes responsible for traits associated with anthocyanin biosynthesis, and only a few of the genes related to anthocyanin biosynthesis have been isolated based on homolog cloning. In the current study, RNA-Seq was employed to compare the transcript differences between cultivars with white or red tuber epidermis, and the candidate key genes were isolated to perform function veri cation, and to understand the relationship between allelic and phenotypic variation.

Transcriptome analyses of two H. tuberosus cultivars
Based on the Hiseq 2000 platform, RNAs from the tuber epidermis of QY1 and QY3 were sequenced (Fig.   1A). A total of 50 Gb clean data was obtained from three samples from each of the two cultivars after ltering (Table S1). Using Trinity software, 197,769 unigenes were assembled. A total of 55,354 unigenes were differentially expressed, of which 28,113 unigenes were up-regulated, and 27,241 unigenes were down-regulated (Fig.1B). The unigenes identi ed as being homologous to the genes involved in anthocyanin synthesis were selected, and their FPKM values for each cultivar were aggregated. None of the anthocyanin biosynthesis structural genes had lower expression levels in 'QY1' than in 'QY3' (Fig. 1C The qPCR results also con rmed these ndings, though the numerical values differed somewhat with respect to some genes (Fig. 1D). Therefore, the activation of the anthocyanin biosynthesis structural genes appeared to be the cause of the red tuber trait in 'QY1' but not 'QY3'. As with the up-regulation of expression of the structural genes in 'QY1', the genes encoding transcription factors MYB and bHLH exhibited greater expression levels in 'QY1' than in 'QY3' (Table S2). Considering that the structural genes were regulated by the transcription factors, and that the MYB transcription factor could induce expression of the bHLH transcription factor [32]. HtMYB2 (Unigene44371_All) should be the key gene responsible for the red tuber skin color trait in H. tuberosus.

Molecular characteristics of HtMYB2
Based on transcriptome information, the genomic and coding sequences (CDSs) of HtMYB2 were isolated from 'QY1' and 'QY3'. The genomic sequence of HtMYB2 from 'QY1' and 'QY3' contained 1066 bp and 1068 bp, respectively, while the length of the coding sequences were same. HtMYB2 contained three introns and two exons ( Fig. 2A). Although two nucleotide differences existed in the third exon of the CDSs of 'QY1' and 'QY3', only one amino acid difference was found in the translated sequence ( Fig 2C). The phylogenetic tree of the MYB transcription factors showed that HtMYB2 was similar to the MYB transcription factors controlling the traits associated with anthocyanin biosynthesis in same species, including members of the Asteraceae, the Solanaceae, and the Brassicaceae (Fig. 2B). Compared with the most similar MYB transcription factors CmMYB6 (from Chrysanthemum morifolium, Asteraceae), GbMYB1, GbMYB2a (from Gynura bicolor, Asteraceae), GhMYB10 (from Gossypium hirsutum, Malvaceae), and HaMYB90 (from Helianthus annuus, Asteraceae), HtMYB2 contained the intact MYB-like binding domain (Fig. 2C), which is important to carry out the function of the MYB transcription factor in regulating anthocyanin biosynthesis. This implied that HtMYB2 should have the function for regulating anthocyanin biosynthesis.

Overexpression of HtMYB2 induces anthocyanin biosynthesis in tobacco
The pJAM1502:HtMYB2 plasmid was transferred into Agrobacterium tumefaciens strain LBA4404 by the freeze-thaw method. The Agrobacterium-mediated leaf disk transformation method was performed to obtain transgenic tobacco. For further experiments, the T3 family lines carrying objective gene without the separation were used. The positive transgenic lines exhibited deep purple leaves (Fig. 3A), and the relative anthocyanin concentration of the transgenic lines was much higher than that of the wild type (Fig. 3B). The qPCR experiment showed that the expression levels of the anthocyanin synthesis-related structural genes and of HtMYB2 were up-regulated in the transgenic lines (Fig. 3C). These results showed that HtMYB2 can activate anthocyanin biosynthesis by acting as a MYB transcription factor in tobacco.

The relation between the transcript abundance of HtMYB2 and anthocyanin concentration in different tissues
Visually, the root and tuber epidermis of 'QY1' were signi cantly redder than those layers of 'QY3', whereas there was little phenotypic difference among stem, leaf and ower from the two cultivars ( Fig.   4A). Correspondingly, the anthocyanin concentration of tuber peel and root of 'QY1' was signi cantly higher than that of 'QY3', while there was no signi cant difference in anthocyanin concentration of stem, leaf or ower between the two cultivars (Fig. 4B). The expression of HtMYB2 was consistent with the anthocyanin concentrations. The tissue with highest HtMYB2 expression was the tuber epidermis of 'QY1', followed by the root of 'QY1' (Fig. 4B), whereas the other tissues of 'QY1' and all the tissues of 'QY3' showed little expression of HtMYB2. Each treatment was replicated three times.
Allelic variation of HtMYB2 in natural populations of Helianthus tuberosus L.
HtMYB2 exhibited clear differences in expression level in the tuber epidermis between 'QY1' and 'QY3'. The promoter was isolated from HtMYB2 from each cultivar, using TAIL-PCR, in an attempt to explain the difference in expression of HtMYB2 between the two cultivars. The promoter from 'QY1' had three possible promoter regions, based on the promoter prediction software BDPG, while 'QY3' contained only two (Table S3). The deletion of three nucleotides "AAA" in 'QY1' caused the difference in the promoters of the two cultivars.
Compared with the promoter of QY3, 21 bp were deleted in the region -1360 to -1342 of the promoter of QY1 (Fig. 5A). Based on the indel difference between the two promoters, the diagnostic primer HtproS was designed to differentiate the HtMYB2 from 'QY1' and 'QY3'. The length of the ampli cation fragment from 'QY1' was 103 bp, whereas that of the 'QY3' ampli cation fragment was 124 bp (Fig. 5A). This primer pair can effectively distinguish HtMYB2-QY1 from HtMYB2-QY3 ( Fig S1). In 180 selected individual plants, 90 individuals with red-skinned tubers carried the genotype HtMY23-QY1, while 90 individuals with white-skinned tubers carried the genotype HtMYB2-QY3 (Fig. 5B) ( Table S4). The results showed that allelic variation in HtMYB2 was consistent with tuber skin color in H. tuberosus.

Discussion
In this study, we isolated a MYB transcription factor, HtMYB2, from H. tuberosus and explored its function in relation to anthocyanin biosynthesis and the red tuber skin color trait.
HtMYB2 is a functional MYB transcription factor gene regulating anthocyanin biosynthesis.
HtMYB2 has the character of a functional MYB transcription factor. It has two introns and three exons. The protein encoded by HtMYB2 contained an intact MYB-like DNA-binding domain and a SANT domain, which played an important role in the regulation of anthocyanin biosynthesis. In the phylogenetic tree, HtMYB2 was closest to the MYB transcription factors GbMYB2 and CmMYB6 [33]. GbMYB2 encodes a R2R3 MYB transcription factor and regulates anthocyanin biosynthesis in leaves of G. bicolor, another member of the Asteraceae [34]. CmMYB6 from C. morifoium, also a member of the Asteraceae, could induce an approximately 34-fold increase in transcription of CmDFR, with the help of MrbHLH [35]. Most importantly, the overexpression of exogenous HtMYB2 in tobacco activated the expression of the endogenous structural genes related to anthocyanin biosynthesis, and increased the anthocyanin concentration in the tobacco leaves. The structural genes of anthocyanin biosynthesis which was mainly up-regulated were different between in QY1 and in transgenic tobacco. It should be due to the genetic variation in the promoter of the structural genes in different species. All of these results implied that HtMYB2 was a functional MYB transcription factor regulating anthocyanin biosynthesis.
The HtMYB2 function was associated with the tuber epidermis color trait.
In transcriptome analysis, expression of the structural genes of anthocyanin biosynthesis was activated in the tuber epidermis of 'QY1', a nding which was also con rmed by qPCR. As is known, expression of the anthocyanin structural genes is regulated by MYB and bHLH transcription factors, with the MYB transcription factors inducing the expression of the bHLH transcription factor [32]. In fact, HtMYB2 was the only MYB transcription factor, regulating anthocyanin biosynthesis, expressed at a high level in the H. tuberosus 'QY1' tuber epidermis, where anthocyanins accumulated, indicating that HtMYB2 was involved in anthocyanin biosynthesis in the tuber epidermis of 'QY1'. Moreover, the transcript abundance of HtMYB2 was consistent with the anthocyanin concentrations in different tissues. Anthocyanins were detected in only the root and tuber organs of 'QY1', which also contained higher transcript abundance of HtMYB2 than the other organs. In the promoter of the two alleles, although HtMYB2promoter-QY3 inserts 21 bp sequences. However, in the prediction results of promoter functional area, HtMYB2promoter-QY1 has more than one functional area with a score of 0.83 from -1300 bp to -1250 bp, which is likely that these differences lead to HtMYB2 failure to activate the anthocyanin biosynthesis pathway in white varieties. Two alleles, HtMYB2-QY1 and HtMYB2-QY3, were present in the H. tuberosus cultivars 'QY1' and 'QY3', respectively. The allelic variation was associated with the tuber epidermis color in natural populations of H. tuberosus segregating for tuber skin color trait; HtMYB2-QY1 was linked to the red tuber epidermis trait, whereas HtMYB2-QY3 was associated with the white tuber epidermis trait. All in all, HtMYB2 appears to be the key gene responsible for the red tuber epidermis trait in H. tuberosus.

Conclusion
In the present study, HtMYB2 was isolated from H. tuberosus by RNA-seq. It had the same intron and exon number and the same functional domain as other MYB transcription factors which had been shown to regulate anthocyanin biosynthesis in other plants. HtMYB2 was close to such functional MYB transcription factors in a phylogenetic tree. Overexpression of HtMYB2 induced anthocyanin biosynthesis in tobacco. Though HtMYB2 had similar coding sequences in cultivar QY1 with red-skinned tubers and cultivar QY3 with white-skinned tubers, the transcript abundance of HtMYB2 was signi cantly higher in the tuber epidermis of 'QY1' than in 'QY3'. HtMYB2 transcripts were detected in only the root and tuber epidermis of 'QY1'. Promoter differences were associated with differences in transcript abundance in HtMYB2 between 'QY1' and 'QY3'. Allelic variation in the HtMYB2 gene was closely associated with tuber color in a natural population. All results implied that HtMYB2 is a functional MYB transcription factor, regulating anthocyanin biosynthesis in H. tuberosus, and playing an important role in determining the red tuber epidermis trait, which should be useful information for breeding new cultivars of H. tuberosus with different tuber colors.

Methods
Plant materials 'QY1' and 'QY3' are H. tuberosus cultivars bred by Qinghai Academy of Agricultural and Forestry Sciences (Xining 810000, China). The tuber epidermis of 'QY1' is red, whereas that of 'QY3' is white (Fig.1A). All materials were planted and stored in the Institute of Horticulture, Qinghai Academy of Agricultural and Forestry Sciences (E101°45′08.15″, N36°43′32.06″). The library label of these samples were recorded in Table S4. The Nicotiana tabacum cultivar Samsun was chosen as a transformation plant. Nicotiana tabacum (Samsun) was given by Professor Cathie Martin from John Innes Centre, and stored now in Northwest Plateau Institute of Biology, Chinese Academy of Sciences. No permission was required in collecting the plants. In this study, Yuan Zong was responsible for the planting and identi cation of these samples.

Transcriptome analysis
Tuber epidermis samples of 'QY1' and 'QY3' were collected in triplicate and used as the source material from which the transcriptomes were generated. Each of the three transcriptomes was generated from a different sample of 'QY1' and 'QY3'. The cDNA libraries of tuber epidermis were created according to the descrition of instrument sample requirements for mRNA-Seq sample preparation (Illumina Inc., San Diego, CA, USA). The cDNA library products were sequenced by Illumina paired-end sequencing technology with read lengths of 150 bp, and they were sequenced on the Illumina HiSeq 2000 platform by Novogene with three repeats. Before assembly, original reads were ltered to obtain high-quality clean reads. Sequences with ambiguous bases (denoted with > 5% 'N' in the sequence trace), low-quality reads (the rate of reads with a quality value ≤ 10 was more than 20%) and reads with adapters should be removed. After puritfying all reads, Trinity was used to assembly the high-quality reads, with default parameters to construct unique consensus sequences [19]. The expression levels of every unigene was calculated based on the FPKM (fragments per kilobase of transcript per million mapped reads) values.
Difference in Unigenes between purple and white sample transcripts were identi ed by the Chi-square test, using IDEG6 software [20]. The False Discovery Rate (FDR) method was introduced to determine the threshold p-value at FDR ≤ 0.001, with the absolute value of |log2Ratio| ≥ 1 being used as the threshold to determine the signi cance of the differential expression of unigenes [21]. All Unigenes related to anthocyanin biosynthesis in the Kyoto Encyclopedia of Genes(KEGG) and Genomes(GO) pathways were collected and aligned to the unigenes of the transcriptome, using BlastX with e-value < 1e-5 [22]. In order to comparing the relative expression levels of unigenes, the FPKM values of unigenes aligned to genes of the anthocyanin biosynthesis pathway were accumulated together.

DNA and cDNA preparation
Genomic DNA of Jerusalem artichoke was extracted from 1 g fresh weight tuber [23]. Total RNA was extracted from root, stem, leaf, ower and tuber epidermis of different Jerusalem artichoke organs, using the Trizol method [24].  Table S5 In order to analyze the transcription level of genes related to anthocyanin synthesis, real-time uorescence quantitative PCR (qPCR) was performed on an Applied Biosystems QuantStudio® 3 Real-Time PCR System (Thermo Fisher Company, Beijing, China). The fusion curve was analyzed to con rm the speci city of the ampli cation. The reaction mixture (20 μL

Bioinformatics analysis
The online software of ExPASY translate (https://web.expasy.org/translate/) was used to predict the protein. BlastP (https://blast.ncbi.nlm.nih.gov/blast.cgi) in NCBI was used to predict the conserved protein regions. The neighbor-joining method was used to construct phylogenetic trees with default parameters based on the software MEGA6 (http://www.megasoftware.net/mega6/faq.html) [26]. BDPG (http://www.fruit y.org/seq_tools/promoter.html) was used to predict the functional domain in promoter.

Overexpression of HtMYB2 in tobacco
The overexpression vector for tobacco transformation was based on the pJAM1502 binary vector, which contains a double CaMV35S promoter [27]. The pJAM1502: HtMYB2 construct was achieved using the Gateway cloning Kit (Invitrogen, Carlsbad, CA, USA). Binary vectors were electroporated into Agrobacterium tumefaciens strain GV3101. Tobacco (Nicotiana tabacum) transformation was carried out using a leaf disc transformation method [28]. Transgenic shoots were grown on selective medium containing 3% ( the absorbance results were corrected [29,30] Genotyping of a natural population of Helianthus tuberosus The promoter sequences of HtMYB2 were isolated from 'QY1' and 'QY3', based on thermal asymmetric interlaced (TAIL)-PCR [31]. According to the nucleotide sequence differences between the promoters of HtMYB2 of 'QY1' and 'QY3', a polymorphic PCR marker HtproS was designed to distinguish between 'QY1' and 'QY3' ( conditions in this study. The funding bodies did not play a role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript, but just provide the nancial support.  HaMYB90: XP_022033410.1. The triangle repesent the site of the different amino acid of HtMYB2 from QY1 and QY3. The amino acid "R" in QY1 was "K" in QY3.