Transcriptome Analysis of Black and White Sesame Seed Reveals Candidate Genes Associated with Black Seed Development in Sesame (Sesamum Indicum)

Senouwa Segla Ko Dossou Oil Crops Research Institute Chinese Academy of Agricultural Sciences https://orcid.org/0000-0002-7556-9555 Linhai Wang Oil Crops Research Institute Chinese Academy of Agricultural Sciences Xin Wei Shanghai Normal University Yanxin Zhang Oil Crops Research Institute Chinese Academy of Agricultural Sciences Donghua Li Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences Jingyin Yu Oil Crops Research Institute Chinese Academy of Agricultural Sciences Xiurong Zhang (  zhangxr@oilcrops.cn ) Oil Crops Research Institute Chinese Academy of Agricultural Sciences

RNA-Seq is an advanced approach in transcriptome pro ling that uses deep-sequencing technologies. It has become an essential tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs [22]. RNAseq was successfully used to identify candidate genes for seed coat color in many plants, including Arabidopsis thaliana [23], Brassica rapa [24], Brassica napus [25] and soybean [26]. In addition, it has been used to detect candidate genes that shaped oil content variation in sesame [27]. In this study, we investigated the transcriptome during the seed development stages at the 5,8,11,14,17,20,23,26 and 30 days post-anthesis (DPA) of two sesame varieties different in seed coat color "Zhongfengzhi No.1" (white seed) and "Zhongzhi No.33" (black seed). Differentially expressed genes (DEGs) and some candidate genes for sesame seed coat color were identi ed. This study will facilitate further investigations on seed coat color modulation in sesame and will help to improve sesame varieties quality in the future.

Results
Overview the changes of seed and expressed genes in black and white sesame The seed samples showed the black sesame start to accumulate black pigment or related compositions from 11 DPA (Days Post Anthesis) (Fig. 1a). The Typical change was observed at 11 DPA as the seeds became brown and then black at 14 DPA. For the expressed genes, the black and white showed similar trends with the numbers increased at early stages then decreased. The black sesame reached the maximum expressed gene number with 20,253 genes over 0.1 in FPKM (Fragments Per Kilobase of transcripts per Million fragments mapped) at 11 DPA. The in ection point in white sesame appeared later than the black. The number of expressed genes increased again from 23 DPA to 30 DPA in black sesame, which may indicate some biosynthesis pathways were activated newly (Fig. 1b).
Stage differences of the expressed genes along with seed development As the black sesame starts to accumulate pigment or related compounds at 11 DPA, the 5 and 8 DPA samples were used as controls to study how the expressed genes change. At the early stage from 5 two 8 DPA, the seeds were similar in color in the black sesame (Fig. 1a). We compared the other samples to the two points respectively and found some differences in the black sesame with 8 DPA as control. When the 5 DPA was used as the control, a sharp increase was observed from 8 DPA to 11 DPA, but it becomes slightly when 8 DPA was used as control ( Figure S1 and S2 in Addition le 1). It may suggest some genes involved in the pigment biosynthesis had been initiated at 8 DPA and 5 DPA can be used as a control to compare the differentially expressed genes for black sesame. When we checked the white sesame seed, though the numbers of differentially expressed genes tended to increase with seed development, it showed similar scenes with 5 DPA and 8 DPA as controls. We also found the down-regulated genes increased more than the up-regulated in both black and white sesame seeds.
We then investigated how the expressed genes change by comparing the seeds to the former ones. It showed a great difference between the black and white sesame. In the black sesame, there were more genes up or down-regulated at the stages of 8, 11 and 30 DPA, and it was increased from 8 to 11 DPA (Fig. 2a). In the white sesame, both the up and downregulated genes decreased in number from 8 to 23 DPA and then increased slightly (Fig. 2b). The result also suggested the essential roles of 8 and 11 DPA in the black pigment biosynthesis and accumulation, which is consistent with the seed color change.
The differentially expressed genes between black and white sesame We compared the differentially expressed genes between black and white sesames at different stages. The DEG numbers changed with seed development. From 5 to 8 DPA, more up-regulated genes highlighted the early stages, then it decreased to a low level with only 225 up-regulated DEG at 20 DPA. Interestingly many DEGs were observed at the later stage of 30 DPA when the seed reached the maturity stage (Fig. 3a). It may suggest more biosynthesis pathways were strengthened to accumulate different storage energy or components in black and white sesame. We also studied the shared DEGs by comparing the adjacent two points to reduce the effect of factitious sampling stages. In total, it also showed the black sesame seed had more genes up-regulated at early stages before 17 DPA, and more genes down-regulated at the later stages from 20 to 30 DPA (Fig. 3b).
The candidate gene responsible for the coat color of black sesame The above analysis indicated the seed coat color difference not only exist between black and white sesame but also during seed development in the black sesame. We took the 5 DPA as the initiated control to identify the candidate genes and group 11 to 23 DPA for black pigment biosynthesis and accumulation. We rstly gured out the shared up and downregulated DEGs between 11, 14, 17, 20 and 23 DPA against 5 DPA in the black sesame, and there were 1,254 up and 1,617 down-regulated DEGs respectively ( Fig. 4 and Figure S3 in Additional le 1). Then the shared DEGs between black and white sesames at 11, 14, 17, 20 and 23 DPA were also gured out, with 53 up-regulated and 57 down-regulated, respectively ( Fig. 4 and Additional le 1: Figure S3). Finally, the two sets up and down-regulated DEGs were checked and the shared genes were found out, with only 31 up and 11 down-regulated left (Fig. 4).
We looked into the 11 common down-regulated DEGs in the black and white sesame seed. It showed all the genes tended to decrease during seed development, and all the genes in the black sesame expressed less than in the white ( Figure S4 in Additional le 1). In consideration of the characteristics of black seed development, these genes were unlikely to be involved in the biosynthesis of black pigment or related compounds, as all the genes expressed in sesames and none was out of the ordinary in white or black sesame.
For the 31 up-regulated genes in the black sesame, we rstly ltered out the ones (8 genes) with high FPKM over 100 in white sesame and those less than 5 in both black and white sesame, because these genes were unlikely to associate with the biosynthesis of black pigment. The 23 left genes were then grouped into 4 subgroups using the hierarchical clustering method (Fig. 5). Subgroup 1 consisted of only 2 genes SIN_1003674 and SIN_1009127, but the expression level of the later one increased in the white sesame and should be ltered out. Subgroup 2 consisted of 3 genes which all expressed in the two sesames with different FPKM and should not include in the candidate genes for black sesame. Subgroups 3 and 4 consisted of 7 and 9 genes, respectively. The 7 genes SIN_1024143, SIN_1016759, SIN_1018917, SIN_1018959, SIN_1001138, SIN_1026689, SIN_1006892 in Subgroup 3 and the 9 genes SIN_1002392, SIN_1006025, SIN_1025056, SIN_1013986, SIN_1006242, SIN_1018543, SIN_1020696, SIN_1018961, SIN_1022200 in subgroup 4 were selected as the candidate genes for black sesame according to their expression patterns and level (Figures S5 and S6 in Additional le 1).
Wang et al. [4] mapped 4 loci on linkage groups 4, 8 and 11, which were predicted to associate with the a*, b* and l* value of seed coat color. Here we gured out 17 candidate genes (Table 1), and their expression differences were validated with qRT-PCR (Figures S7 in Additional le 1). Among them, 5 SIN_1006242, SIN_1016759/PPO, SIN_1026689, SIN_1006025 and SIN_1025056 are located on chromosomes 4, 8 and 11. Especially, SIN_1016759/PPO had been reported to be responsible for black sesame based on GWAS [20]. Also, the 17 genes also included 2 chalcone synthase genes SIN_1018961 and SIN_1018959 that may function in the phenylpropanoid pathway. Thus, some of these genes were believed to involve in black pigment or related compounds biosynthesis in black sesame.

Discussion
The seed coat is the external protective layer of seed and develops from the integument initially surrounding the ovule and is maternal in origin [28]. It protects the embryo and endosperm from external factors such as mechanical injuries, desiccation and infections [29]. Moreover, it helps developing seed to regulate its metabolism in response to changes in its external environment by transmitting environmental signals to the interior of the seed [30]. In sesame, seed coat color is strongly associated with seed quality [5][7] [31]. Therefore, genetic resources on pigmentation mainly black seed coats will help to improve the sesame seed quality. In this study, RNA-seq was used to scrutinize transcriptome differences between "Zhongfengzhi No.1" (white seed) and "Zhongzhi No.33" (black seed) at different stages of seed development. DEGs differently regulated during seed coat development were screened, and candidate genes associate with black pigmentation were detected.
The key stage for black pigment biosynthesis and accumulation in black sesame In this study, we analyzed seeds samples and observed that seeds started browning and changed to black from 11 DPA. A great difference was observed between the expression pro les of black and white sesame. The black sesame reached the maximum expressed gene number with 20,253 genes over 0.1 in FPKM at 11 DPA and more genes were up or downregulated at the early stages before 17 DPA. These results are consistent with the ndings of Wei et al. [19][20], who reported that the gene SIN_1016759/SiPPO was highly expressed in the seeds from 11 to 20 DPA. PPO encodes polyphenol oxidase and generates black pigments via the browning reaction in plant [32].
As in previous studies, our results con rmed the pivotal role of later stages in biosynthesis of nutrients (oil, protein, and lignans) in sesame [33][34] [27]. Indeed, we observed that more genes were active and up or down-regulated at the later stages from 23 to 30 DPA in the two varieties especially in black sesame. Our ndings provide the support that in sesame developing seed, early stages (before 23 DPA) play an important role in pigments biosynthesis and substrates preparation for nutrients biosynthesis in the later stages. Taken altogether, we thus suggested that black pigment is biosynthesized and accumulates in seed at early stages of seed development that is from 8 to 17 DPA mainly.

Candidate genes controlling black pigment biosynthesis in sesame
Flavonoids, including anthocyanins and proanthocyanidins, lignin and melanin are secondary metabolites that in uence seed color in plants [21]. They are derived from the phenylpropanoid pathway and are controlled by a complex regulatory network with multiple transcription factors [35]. This pathway involves many genes such as chalcone synthase (CHS), chalcone isomerase (CHI), avonol 3-hydroxylase (F3H), avonol 3′-hydroxylase (F3′H), dihydro avonol-4-reductase (DFR), anthocyanidin synthase (ANS) anthocyanidin reductase (ANR) and laccase. Some of these genes have been cloned from Arabidopsis and many plants [36]. Previous studies in sesame detected that two major genes with additive-dominantepistatic effects plus polygenes with additive-dominant-epistatic effects control the seed coat color, and several major have been identi ed QTL [14] [4]. Besides, the gene SIN_1016759/SiPPO had been reported as the candidate gene for black sesame [19] [20]. Here, we studied the shared DEGs between black and white sesame and identi ed 17 candidate genes associated with black pigmentation in sesame. Among the 17 candidate genes, 5 SIN_1016759/PPO, SIN_1006242, SIN_1026689, SIN_1006025 and SIN_1025056 are located on chromosomes 4, 8 and 11. This nding is congruent with the work of Wang et al. [4]. SIN_1016759/PPO encodes polyphenol oxidase; SIN_1006242 is a cytochrome P450 gene; SIN_1026689 is a WAT1-related protein; SIN_1006025 encodes isochorismate synthase, and SIN_1025056 encodes a betaglucosidase isoform. In plants, isochorismate synthase converts chorismate the last product in the shikimate pathway into isochorismate, a precursor of phylloquinone (vitamin K1) and salicylic acid [37]. In addition, the 17 candidate genes also included 2 chalcone synthase genes SIN_1018961 and SIN_1018959. Three chalcone synthase-like genes have been identi ed in A. thaliana [23]. Chalcone synthase is the rst committed enzyme in the biosynthesis of all avonoids which function in the phenylpropanoid pathway [23]. It catalyzes the reaction leading to naringenin chalcone formation from pcoumaroyl-CoA and three molecules of malonyl-CoA [38]. All the 17 candidate genes will be targeted in the future for functional genomic study. We preferentially suggested that the genes SIN_1016759/SiPPO, SIN_1018961, and SIN_1018959 may play a major role in pigments biosynthesis especially black pigment in sesame.
The compound responsible for black seed color in sesame In plants, browning reactions on seed coat pigments are often induced by the oxidation of phenolic compounds by polyphenol oxidases (PPO) such as laccases and tyrosinases and result in melanin formation mostly [39] [32]. Tyrosinase is the enzyme that catalyzes the rst two steps in the melanin synthesis pathway and controls the rate and yield of melanin production [40]. Xiaoli et al. [41] had reported that free tyrosine and polyphenols were needed for melanin biosynthesis and compared to anthocyanin, the melanin content in brown seed was higher than in yellow seed rapes. The metabolomic analysis demonstrated that phenylpropanoid biosynthesis, tyrosine metabolism, and ribo avin metabolism were the main pathways differentially activated between black and white sesame and were responsible for the color difference [13]. Furthermore, the black pigments present in the skin of banana and sun ower have been assumed to be melanin [42] [43]. Wan et al. [29] detected that brown and dark peanut seed (mutants) had lower level of lignin, anthocyanin, proanthocyanidin content, and a higher level of melanin content compared to wild type with light color. Also, they reported that the expression of polyphenol oxidases was signi cantly activated in mutants developing seed. The above results indicated that PPO might be one of the dominant genes responsible for black pigment biosynthesis in sesame. Therefore, we suggested that the black pigment in sesame seed would be melanin. Future studies will help to con rm whether melanin is the black pigment or not.

Strategies to improve black sesame quality
Sesame is especially widely grown for its high-quality nutritional seeds [18]. However, compared with white sesame, black sesame content less oil, protein, linoleic acid, sesamin, and sesamolin [5][6][7] [31]. Hence the necessity to improve black sesame quality. The study carried out by Wei et al. [19] revealed that in sesame, the genes SiPPO and SiNST1 (SIN_1005755) associated respectively with black pigmentation and ligni cation in the seed coat are strongly associated oil, protein, sesamin, and sesamolin content variation in seeds. Furthermore, in sesame aromatic amino acids Lphenylalanine (Phe) and L-tyrosine (Tyr) are needed for protein biosynthesis and served as precursors for numerous compounds including avonoids, melanin, lignin, lignans, quinones, and condensed tannins [44]. These amino acids are produced from chorismate, the nal product of the shikimate pathway, which involves many genes [44]. Here, we gured out 17 candidate genes associated with black pigment synthesis in sesame including the genes function for PPO, chalcone isochorismate, and so on. Thus, functional analysis coupling with the genetic transformation of these genes simultaneously with other pivotal genes, may be su cient to improve black sesame quality.

Conclusion
Overall, our study revealed the transcriptome difference between black and white seed coats during seed development in sesame using RNA-seq analysis. Our results provide valuable information on the complex transcriptome dynamics involved in the control of seed coat color in sesame. The early stages play a crucial role in the biosynthesis and accumulation of the black pigment in S. indicum black seed. The phenylpropanoid and avonoid biosynthetic pathways genes previously identi ed in other plants are also involved in the formation of seed coat color in sesame. Notably, 17 candidate genes controlling black pigmentation in sesame were identi ed and will be targeted in future studies for validation. As seed coat color in sesame is strongly associated with seed biochemistry and disease resistance, functional studies (cloning, genome editing, and transformation in sesame or Arabidopsis) of these candidate genes will help to understand molecular mechanisms involved in these correlations and for breeding high-quality sesame varieties.

Sesame varieties
Two sesame varieties "Zhongfengzhi No.1" (white seed) and "Zhongzhi No.33" (black seed) were used in this study. The seeds were given by the national sesame medium-term genebank (Wuhan, China).

Planting and sampling
The two varieties were shown under identical growth and experimental conditions in Wuhan in 2019. Flowers were labeled every three days post-anthesis (DPA). At 5,8,11,14,17,20,23,26, and 30 DPA, capsules for each variety were sampled from 10 plants (Figure 1), and seeds were separated from the capsules on ice. Different plant seeds were therefore mixed equally and represented samples at 5,8,11,14,17,20,23,26, and 30 DPA. All samples were prepared for two repeats and were subjected to RNA-seq analysis.

RNA extraction and library preparation
The 18 seed samples RNA were extracted and sequenced as per Wang et al. [27]. Brie y, each sample total RNA was extracted with the TRIzol reagent (Invitrogen Corp.). After that, using the Oligotex mRNA Midi Kit (Qiagen, Germany) we puri ed the mRNA from the total RNA. The quantity and quality of the mRNA were investigated with Invitrogen Qubit2.0 and Agilent 2100. All the mRNAs were then transcribed into double-stranded cDNAs with the SMART cDNA Library Construction kit (Clontech, USA) following the user guide. Finally, appropriate fragments (200 ± 25 bp) were chosen for PCR ampli cation, and adapters were ligated to the targeted fragments.
Data generation and quality assessment The libraries of 18 cDNA generated from the sesame seeds were sequenced for paired-end reads using the Illumina Hiseq 2000 platform. The FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to check the reads base qualities. We then removed all paired-end reads with more than 5% ambiguous residues (Ns) and those which contained more than 10% bases with a Phred quality score of less than 20. The remaining reads were considered as "clean reads" [45].

Statistical analysis of gene expression
Samples gene expression levels were evaluated based on the read numbers which were uniquely mapped to the sesame genome sequence [1]. Each gene expression level were then normalized to the number of FPKM as per Trapenell et al. [46] using the Cu inks 2.0 software. The differentially expressed genes (DEGs) were found out for sesame seed samples following the method described by Chen et al. [47] and Wang et al. [48]. The threshold P-value in multiple tests was determined using the Poisson distribution [49] and false discovery rate (FDR). The signi cance of the DEGs was determined using an FDR ≤ 0.01 and absolute value of log2 Ratio ≥ 1 [50].

Real-time quantitative PCR (qRT-PCR)
The expression pro les of the 17 candidate genes were validated with qRT-PCR referring to Wang et al. [48] using LightCycler® 480II Real-Time PCR Detection System (Roche Diagnostics, Rotkreuz, Switzerland). Five stages (5, 11, 17, 23, and 30 DPA) seed samples were run in triplicate on the same plate with a negative control that lacked cDNA. The gene actin7 of sesame was used as the positive control. The relative expression levels of the target genes were calculated using the 2-ΔΔCT method [51].   The differentially expressed genes between black and white sesame at different stages. a DEG number between black and white sesame. b The shared differentially expressed genes between black and white sesame at two contiguous stages.