Comparative transcriptome profiling provides insights into endocarp lignification of walnut ( Juglans Regia L . )

Background Lignin is the main component of walnut endocarp, although we know little about the molecular mechanism of lignin formation in walnut endocarp. To understand the molecular mechanisms behind the two kinds of walnut phenotype and explore the genes involved into lignin formation, transcriptome sequencing was conducted in the walnut endocarp of the ‘Zanmei’ (ZM) and ‘Liaoning 7’ (L7) cultivars, which have different endocarp thicknesses. Compared with L7 walnut endocarp, the endocarp of ZM walnut is thicker, which decreases dehiscent nuts and compromised kernels. Results There are more differentially expressed genes (DEGs) in the ZM walnut cultivar. The DEGs involved in the phenylpropanoid biosynthesis were significantly upregulated in both cultivars 45 days after full bloom (DAFB), but more genes were upregulated in ZM than in L7. Moreover, the same DEGs showed different expression levels in the two cultivars. Most of the key genes in ZM had more different multiples than those in L7. Interestingly, when qRT-PCR was used to determine the expression of the key genes in different development stages of the two varieties, the expression patterns were different from those known in other species. Furthermore, transcription factors regulating secondary cell wall and lignin biosynthesis were identified. Quantitative real-time PCR results were consistent with transcriptome data. Conclusion In this study, transcriptome analysis was used to understand the molecular mechanisms of lignin formation in two walnut cultivars with different shell thickness. Several important key genes in the phenylpropanoid biosynthesis pathway were significantly different in the two cultivars, which may be the reason for the phenotypic differences. The analysis of transcription factors revealed that the regulation network in endocarp of walnut may be different from that of drupe such as apricot or peach. This study provides important candidate genes for exploring the complicated metabolic processes involved in the formation of walnut lignin. Dalian, China). It was used as a template for qRT-PCR to detect the expression level of candidate genes. Real-time quantification was performed on the Lightcycler 96 (Roch, USA) with SYBR Green Master (Roche). The expression levels of the candidate genes were calculated using the 2 −ΔΔCT method. All PCR experiments were performed on three independent biological samples, each including three technical replicates. Actin was used as the reference gene in walnut.


Conclusion
In this study, transcriptome analysis was used to understand the molecular mechanisms of lignin formation in two walnut cultivars with different shell thickness. Several important key genes in the phenylpropanoid biosynthesis pathway were significantly different in the two cultivars, which may be the reason for the phenotypic differences. The analysis of transcription factors revealed that the regulation network in endocarp of walnut may be different from that of drupe such as apricot or peach. This study provides important candidate genes for exploring the complicated metabolic processes involved in the formation of walnut lignin.

Background
As an economically important tree, walnut (Juglans regia L.) is widely cultivated globally for its nuts and timber [1,2]. Walnut endocarp plays a vital role in the growth, development, maturation, rinsing, transportation and storage of walnut, and it is also closely related to the commercial quality of the kernels [3]. In recent years, the cultivation area of thin-shell walnut in China has increased rapidly, and problems of incomplete hulls, dehiscent nuts and dark kernels have become increasingly serious in some areas. These problems are caused by shell dysplasia; without complete shell protection, walnuts are vulnerable to pests and other injuries during growth, and they are also vulnerable to contamination during transportation, cleaning or storage after harvesting, thus reducing the quality of the walnuts. The lignin content of walnut shells has been shown to be significantly positively correlated with the thickness of the shell and the tightness of the suture seal [4].
Lignin is the second most abundant plant biopolymer; it is deposited most commonly in the secondary cell walls of plants and is essential for water transport, mechanical support and plant pathogen defence [5][6][7]. The lignin biosynthesis pathway has been studied by scholars for many years, and with the progress of science and technology, understanding of this pathway has undergone a major revision in the past decade. Lignin biosynthesis is energy-consuming and irreversible, and is affected by many developmental and environmental factors, such as light, circadian cycles, plant hormones, and injuries [8]. The lignin biosynthesis pathway is complicated and involves many enzymatic reactions. Great progress has been made in cloning new genes using genetic information and biochemistry [9,10]. The biosynthesis pathway of lignin is very complex and can be divided into three steps: 1) the shikimic acid pathway: a series of processes from the assimilation of plant photosynthesis products to the synthesis of aromatic amino acids; 2) the phenylpropane pathway and lignin synthesis pathway: a natural lignin polymer generally comprised of guaiacyl (G), syringyl (S) and phydroxyphenyl (H) units, which are produced by three primary monolignols, coniferyl, sinapyl alcohols and pcoumaryl respectively; and 3) the polymerization of lignin monomers into lignin [11]. Recent studies have found that there may be other types of monomers [12]. Since most enzymes involved in the biosynthesis of lignin monomers have been found and documented, the monolignol biosynthesis pathway is becoming better understood [13][14][15]. Metabolic engineering of novel lignin is now used to improve forage crop nutritional quality and industrial processing [16].
Many genes encoding key enzymes have been separated and the functions of these genes are gradually being clarified [17][18][19]. In Arabidopsis, Rohde et al. found a significant positive correlation between the AtPAL and lignin content, affecting both lignin and flavonoids in the knock-out mutant without the gene [20][21][22]. Xu et al. inhibited Pv4CL in Panicum virgatum by RNAi and found that the expression of 4CL in transgenic switchgrass decreased by 80%, the total lignin content and G lignin content were significantly reduced, and the biomass of the transgenic plants and fibre hydrolysis ability were significantly improved [23]. Wagner et al. inhibited the 4CL activity of Pinus tabulaeformis, and found that the total lignin content of the transgenic plants decreased by 28% and the lignin type was changed, although the plant phenotype remained unchanged [24]. In hard-seeded pomegranate, CCR and CAD had higher expression levels during the critical period of seed hardening [25]. Although the genes encoding key enzymes in lignin metabolism have been isolated from many plants, the function of these genes in specific species cannot be fully inferred from other species. For example, the genes involved in lignin metabolism vary in different species, and the same gene may have different effects. The removal of 4CL from Arabidopsis had no significant effect on lignin content or monomer composition [26]. There remains much to be learned about lignin metabolism; the transcriptional regulation network of lignin synthesis is also being explored and revised. Related gene family members such as MYB and NAC continue to be discovered in various species. Zhao constructed a transcriptional network model of secondary cell wall biosynthesis in Arabidopsis, and he found that the whole secondary cell wall biosynthesis can be activated by either NST1/2/3 or MYB46/MYB83 [27]. Chen found that MYB46 in apples altered plant resistance by affecting lignin content [28].
In recent years, transcriptome sequencing technology has been widely used in plants. Zhang et al. [29] used transcriptome technology to analyse key genes in apricot endocarp and identify the molecular mechanisms in 'Liehe' apricot, the endocarp-cleaving phenotype. Zhang [30] used transcriptome technology to analyse the sequence of cotton and screened out candidate genes for lignin biosynthesis, which provided a new mechanism for cotton lignin biosynthesis and laid a foundation for further improving the fibre quality. With the help of transcriptome sequencing, more comprehensive understanding of the biological system can be achieved. Using transcriptome sequencing technology to reveal the molecular mechanism of lignin synthesis in the endocarp during development can improve walnut quality and lay a foundation for walnut breeding.
Research on walnut cultivars mainly focuses on the study of the fat, protein, fatty acid and other nutrients of seed kernel [31,32], or the flavonoids, phenolic acids and quinones in the shoots [33], but few studies have focused on the development of the hard shell, which plays a vital role in the health of the seeds. Yu [34] et al. used transcriptome sequencing technology to screen the differentially expressed genes in the lignification stage of the endocarp of 'Zhipi' walnut and predicted their functions. However, they only studied the lignification stage, and did not explain the differences in different growing stages or varieties. Because of good taste and rich nutrition, 'Zanmei' and 'Liaoning 7' walnut are widely cultivated in China. However, the endocarp in 'Liaoning 7' is relatively thin, which often results in split and exposed kernels, seriously affecting the quality of the nuts. In this study, the mechanisms of endocarp development and phenotype formation were investigated through analysis of transcriptome sequencing and the expression patterns of candidate genes in walnut endocarp development and lignification. It is of great significance to clarify the molecular mechanism of lignin formation to achieve artificial regulation to improve walnut quality and enrich germplasm resources.

Morphological characteristics of walnut endocarp
To identify the critical lignification periods during development of young fruit, the timing and pattern of lignin deposition was studied. Within 30 DAFB, the pericarp was clearly stratified, and the endocarp appeared milky white and the Wiesner reaction revealed that no lignification occurred during this period ( Fig.1 A, C, E, G). Only the vascular tissue at the junction of the mesocarp and endocarp was stained. At 45 DAFB, the endocarp near the outer pericarp began to lignify. The Wiesner reaction showed that lignification gradually occurred out from the outside to the inside of the endocarp (Fig.1 B, D, F, H). The apex (blossom end) and the junction of the endocarp and mesocarp were most stained. Pink gradually diffused inward from the junction and became shallow. Little or no staining was observed in tissues other than the endocarp with the exception of scattered small vascular elements in the mesocarp, trichomes present on the fruit surface and residual floral structure. Seed kernels were still water-like at this time. The key point at which the lignification of the endocarp begins to develop is between 30 and 45 DAFB, suggesting that the lignification of the endocarp begins relatively early in the development of the fruit. To ensure there was no lignin deposition in the whole endocarp, the endocarp of ZM and L7 of 20 DAFB was collected for transcriptome sequencing, when the endocarp and mesocarp had differentiated, ensuring the accuracy of sampling. Forty-five DAFB is a time of rapid lignin formation in the endocarp. Therefore, samples of 45 DAFB were also collected for transcriptome sequencing. As seen from the fresh fruit pictures in each dimension, the mesocarp of ZM is thicker and oval in shape, while the mesocarp of L7 is thinner and round in shape. The difference in the endocarp between the two cultivars, however, was not obvious.

Transcriptome sequencing and quality control
In total, 501,163,896 raw reads were obtained from the transcriptome sequencing of all samples, and 423,289,427 clean reads were mapped to the walnut reference genome (https://www.ncbi.nlm.nih.gov/genome/? term=txid51240[orgn]) after filtering out the low-quality reads ( Table 1). The results of the mapping rate, Q30 rate and GC content suggested the high quality of our data. Clean reads: the reads removing low-quality reads. HQ clean reads: the reads from the second filter. Q20 and Q30: the mistake rate lower than 1% and 0.1%, respectively. GC content: the GC ratio of total base number. Total mapped: the number of reads which can be mapped to the reference genome. Multiple mapped: the number of reads with multiple alignment positions on the reference sequence. Uniquely mapped: the number of reads with unique alignment positions on the reference sequence.
Supplementary Fig. S1 showed the Pearson's correlation coefficient results of all groups. The correlation of each group was higher than 0.95, suggesting the high quality of the sequencing data. PCA is largely used to show the structure and relationship of the samples; the results of PCA in this study showed good correlation in each group and the difference between the two periods in each cultivar ( Supplementary Fig. S2).

Analysis of genes between groups
There were 36,861 reference genes in total, 30,276 of which were known. The new gene number of all the samples through sequencing was 1,870. The expressed gene number of each group was clearly illustrated in Table 2. To identify the overall difference, differential expression analysis was performed for the different periods and cultivars. The (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups revealed the shells' DEGs in different developmental stages and showed that they were related to lignin biosynthesis, which had the most DEGs. As a result, 3502 and 3663 genes were individually up-and downregulated between T-ZM and F-ZM while there were 3418 and 2727genes between T-L7 and F-L7 (Fig. 2a). Compared with L7, ZM had more DEGs during the lignin formation period. In the (T-ZM vs F-ZM) group, there were more downregulated genes than upregulated genes. But in the (T-L7 vs F-L7) group, there were more upregulated genes than downregulated genes. The two groups had 4,537 DEGs in common ( Fig. 2b) from which the same regulatory genes controlling the growth and development of the two varieties can be found. Apart from these common differences, the (T-ZM vs F-ZM) and (T-L7 vs F-L7) group had 2,628 and 1,608 genes, respectively, which may hold the key to the differences between the two varieties. The above results showed that the number of DEGs in different developmental stages of ZM was higher than that of L7, indicating that the developmental regulation of ZM was more complicated than that of L7.

Function prediction
To learn the potential functions of the DEGs between the different groups, GO (http://www.geneontology.org/) and KEGG (https://www.genome.jp/) enrichments were conducted for functional classification. GO analysis showed that all DEGs can be classified into biological processes (BP), cell components (CC) and molecular functions (MF) and each of them contains several exact functions and GO items (Fig. 3).
The overall GO analysis showed that the differential genes for CC were mainly concentrated in the cell parts, organelles and membranes, the MF included catalytic activity and binding; and the BP were metabolic processes, cellular processes and single-organism processes, etc. The proportion of upregulated genes was significant in (T-L7 vs T-ZM) and (F-L7 vs F-ZM) groups, while the proportion of downregulated genes was significant in the other two groups, and the total number of DEGs was much larger. In comparing the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups, the number of downregulated genes of ZM in each component was significantly higher than that of L7, while the upregulated genes were not significantly different. After FDR<0.05 (adjusted p value) condition screening and refinement classification, further functional analysis could be performed (Supplementary Table  S1). Except for the (F-L7 vs F-ZM) group, the lignin metabolic process pathway and phenylpropanoid metabolic processes were significantly enriched in the rest of the groups. The lignin metabolic process pathway was the most significant in the (F-L7 vs F-ZM) group. No lignin was found in the endocarp at 30 DAFB ( Fig. 1), nor 20 DAFB. The enrichment of the lignin metabolic process pathway and phenylpropanoid metabolic process in the (T-L7 and T-ZM) group may have been due to the growth and development of cell walls in the endocarp. It may also have been due to the development and formation of vascular tissue. There were more different pathways in the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups, and the cell wall related pathways and DEGs were also significantly increased. Secondary metabolic processes, responses to hormone and hormones transport were also significantly enriched. A large number of hormones regulated and participated in regulating the changes in the different developmental stages, indicating that the growth and development at this time were due to secondary metabolism.
The top 20 KEGG pathway enrichment analyses of four groups are shown in Fig. 4. In the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups, most DEGs were enriched in plant hormone signal transduction, phenylpropanoid biosynthesis, tryptophan metabolism and starch and sucrose metabolism (Fig. 4A, B). A large number of hormones were involved in the transformation of the endocarp from growth to secondary metabolism. The differences in these metabolic pathways were consistent with the results in GO enrichment. In the (T-ZM vs F-ZM) group, fatty acid elongation and DNA replication were also significantly enriched. In the (T-L7 vs F-L7) group, photosynthesis, ascorbate and aldarate metabolism, thiamine metabolism and ABC transporters were significantly enriched. These differences might explain the developmental status and nutrient supply differences between the two varieties. Glutathione metabolism and taurine and hypotaurine metabolism showed significant differences in the (F-L7 vs F-ZM) group, while DNA replication, glutathione metabolism and purine metabolism showed significant differences in the (T-L7 and T-ZM) group (Fig. 4C, D).

Expression analysis of DEGs related to lignin biosynthesis and validation of DEG expression by RT-qPCR
The continuous deposition of lignin in walnut endocarp and its eventual lignification is complicated. A series of biological processes are involved, including the synergistic action of many genes to regulate various pathways, of which the phenylpropanoid biosynthesis pathway is one of the most important.
The relative expression of the DEGs in the (T-ZM vs F-ZM) and (T-L7 vs F-L7) groups in the phenylpropanoid biosynthesis pathway are intuitively showed in Fig.5. The expression of the genes corresponding to the key enzymes in the metabolic pathway at different stages of ZM and L7 are also visually illustrated. In total, seventysix DEGs were annotated in this pathway in ZM and forty DEGs were highly expressed, including, 4CL (6), CAD (1) and POD (18). Most of the genes were upregulated and the most downregulated gene was POD. A total of sixty-five DEGs in L7 were enriched in this pathway, of which thirty-two were highly expressed, including PAL (1), (1) and POD (14). As in the case of ZM, most of the DEGs in L7 were upregulated, although there were fewer DEGs than in ZM. The expression of most of the DEGs was consistent in the two varieties, and the differences were mainly concentrated in PAL, HCT, POD, CSE, etc. PAL is located at the entrance of the phenylpropanoid pathway and is involved not only in the synthesis of lignin monomers but also in the synthesis of other phenolic substances [34]. Therefore, there was no expression difference between the two periods in ZM. PAL, 4CL and C4H often form multi-enzyme complexes to efficiently regulate the phenylpropanoid metabolism pathway [35]. Both 4CL and C4H showed an upregulated trends. HCT controls the biosynthesis of lignin monomers and is an important control point for the biological conversion of H monomer and G/S monomer [36,37]. During the period of lignin synthesis, the downregulation of the gene expression might be related to the regulation of lignin monomer type. POD enriched the most DEGs in both ZM and L7. POD is a key enzyme involved in the last step of lignin synthesis, which catalyses the dehydrogenation of the monomers metabolized by phenylpropane to form lignin. POD is encoded by multiple gene families. Due to the differences in gene structure and function, POD also participates in a variety of metabolic pathways, such as auxin metabolism, cell wall extension and thickening, metabolism of reactive oxygen species and reactive nitrogen, and various resistance processes of plants [38,39]. CSE has been demonstrated to have the ability to catalyse the conversion of caffeoyl shikimate into caffeate in Arabidopsis thaliana [40], and Wout [41] confirmed its role in poplar lignin synthesis. However, due to a lack of further evidence, its function has not been fully explored.
The transcriptome results narrowed the scope for further screening. RT-qPCR was performed on some identified genes related to the lignin pathway. The specific gene primers involved in the RT-qPCR analysis and corresponding gene information are listed in Supplementary Table S2. Eighteen genes encoding key enzymes in the phenylpropanoid biosynthesis pathway were selected. Flavonoids are important secondary metabolites, and involved in plant growth and development, disease resistance and other important processes [42]. The synthesis of flavonoids is derived from the phenylpropanoid pathway, of which chalone synthase (CHS) is the first step. Thus, a gene encoding CHS was also being examined.
The results showed that the expression profile of the pathway related to DEGs determined by RT-qPCR was consistent with the corresponding log 2 (F/T) value of the RNA-seq analysis ( Table 3).
The expression level of genes involved in the phenylpropanoid pathway was downregulated in L7 compared with the ZM cultivar in both the development stages and lignin synthesis. The two COMT genes showed the same expression pattern in ZM and L7. They were highly expressed at the growth stage in ZM, and they continuously increased in the lignin formation stage until they reached the peak value of 65DAFB. This gene showed the same change trend in L7, but the difference multiple was much lower than that of ZM (Fig.6). Except for CCR, the expression levels of other genes in the phenylpropanoid biosynthesis pathway were all greater in ZM than in L7. C3'H (gene11203) showed the same expression trend in ZM and L7, but the peak value of L7 appeared later (70DAFB). C3'H (gene21068) was always in highly expressed in ZM, reaching its maximum value at 65DAFB, which was much higher than that in L7. HCT (gene 11284), CAD (gene39775), and C4H (gene40342) had a similar situation with C3'H (gene21068). CCR1 was highly expressed at the beginning of lignin formation in L7 and reached its peak at 50 DAFB. The trend of CCoAOMT (gene 32982) was similar. The expression levels of POD varied greatly. The large differences in these key genes may be the key reason for the differences in endocarp lignification between the two varieties. Note: The data in RNA-seq is the difference multiple of log 2 (F/T) value, and the data in qPCR is the relative expression of 2 -ΔΔCT value. The endocarp of 20 DAFB was used as control group.
According to the previous studies, the lignin biosynthetic genes are generally regulated by different families of transcription factors, such as the MYB, NAC,LBD and NST families etc. [27,43]. We made a heat map analysis of all the screened genes in these families (Supplementary Figure S3). The specific gene information is shown in the Supplementary Table S3.Under the given conditions (FPKM>5, |log 2 F/T|>1), the lignin-related transcription factors concerned in some studies were not screened out, such as MYB46 and MYB83. NST1 and LBD41 were highly expressed in the lignin generation stage of both varieties, but they were significantly higher in ZM than in L7, which is note worthy. Several transcription factors whose differential multiples did not meet the criteria were also screened because their expression might be involved in lignin metabolism. Most transcription factors were upregulated during lignin formation. Several groups of different transcription factor families had similar expression patterns, which were related to the lignin formation law. NAC100 and NAC12 had significant high expression in F-L7.

Discussion
It takes approximately 130 days from siphonogamy to ripening for walnut, which can be roughly divided into three stages, including rapid growth, shell hardening and ripening. The period of 35-40 days after fertilization is a rapid growth period, and the next 35-40 days is the shell hardening stage. During the shell hardening stage, the endocarp of the walnut gradually becomes lignified, forming a sclereid layer, which is the endocarp of the nut. Studies have shown that lignin is the most important component of walnut endocarp [44]. In lignified cells, lignin firstly forms in cell corners, then synthesizes between cells and gradually becomes primary and secondary cell walls [45]. With the further development of mature lignification of plant tissues, lignin deposits inside and outside cell walls and connects cells [9]. Shell structures are formed in the cell walls of plants due to the deposition of lignin [46]. The content and composition of lignin vary not only between plant taxa but also between different cell types within the same tissue and even within the same cell wall [47]. Dardick believes that the lignin in peach endocarp begins to deposit in the parenchyma layer radiating out from the flower end [48], while Zhang believes that the lignin deposition process of apricot begins at the top of the endocarp [29]. Walnut fruits, however, do not belong to the true drupe, as they develop from bracts, perianth and ovaries. Wen et al. believed that the deposition of lignin in walnut endocarp begins at the top corner [49], while Yu et al. believed that the lignification of 'Zhipi' walnut endocarp might occur almost simultaneously throughout the entire endocarp [34]. In this study, dyed walnut samples were collected 30 and 45 days after full bloom, since lignin surely began to form between these two points. Unfortunately, due to the large time span, the exact time of lignin formation cannot be determined in this study. The vertex, junction and vascular bundle were stained deeply. However, it was not sufficiently accurate to judge the position of lignin only by the depth of colour. Dense sampling can be conducted near the key points of lignin synthesis, and the application of microsection technology is conducive to observing the initial site of lignin synthesis more directly. Approximately 45 days after flowering is a period of rapid lignin production in the endocarp, but it is not the turning point. Therefore, transcriptome sequencing was performed on 20 DAFB, a time when there was no lignin at all, and on 45 DAFB, a time at which lignin is produced rapidly.
Lignin can be used as a source of pulp, animal feed, or as fuel resource, and it serves as an important supporting and protective material in plants. Fruit endocarp lignification occurs in many angiosperms and plays an important role in seed dispersal and protection. Lignified endocarp can protect seed kernels from foodborne pathogens and is also an important symbol of fruit ripening [50]. In recent years, increasing attention has been paid to it. As the two main walnut cultivars in China, ZM and L7 differ significantly in their development of endocarp. The endocarp of ZM is thick and hard after ripening, while the endocarp of L7 is thin and brittle after ripening. Phenylpropanoid biosynthesis is vital in endocarp lignification in both ZM and L7. Analysis of transcriptome sequencing revealed a series of differentially expressed genes involved in the phenylpropanoid pathway, such as C3'H, CCoAOMT, COMT, HCT, CCR, POD, and CAD. Among these genes, the genes that differ between the two cultivars at different periods may better reveal the molecular mechanism of lignin synthesis in walnut endocarp, and their different expression patterns may be the reason for the phenotypic differences between the two cultivars. We therefore measured the expression of these DEGs over more time periods. CCoAOMT and COMT are methylases at different substrate levels in the lignin biosynthesis pathway, that form two parallel pathways in the lignin biosynthesis pathway [51]. A previous study found that a reduction in COMT reduced the lignin content in transgenic alfalfa and affected the structure of S-type lignin [52]. After inhibiting CCoAOMT expression in transgenic tobacco, lignin content decreased significantly [53]. COMT in pomegranate endocarp was highly expressed in the early stage of development, and the expression in soft seeds was significantly higher than that in hard seeds at the same time. The CCoAOMT was also highly expressed at the beginning of lignin biosynthesis and then gradually decreased [25]. The expression of COMT in apricot endocarp also decreased gradually from the initial high expression [29]. The qRT-PCR results in this study demonstrated that the expression level of COMT and CCoAOMT in the two varieties was similar at the early stage of development, but the expression level of COMT in ZM was significantly higher than that in L7 at the middle and late stages of lignin formation. When T-DNA was inserted into C3'H in Arabidopsis thaliana, the mutant plant was severely stunted in growth and development, and the lignin content was reduced by 20%-40% compared with the wild type, while the G-type and S-type monolignol content was significantly decreased [54]. The xylem development and lignin deposition mode of HCT1-rnai transgenic poplar were changed, and the wall of vessel and wood fibre in the xylem decreased [55]. The relationship between HCT and C3'H also affected the proportion of different lignin monomers [56][57][58][59]. In this study, both C3'H and HCT in ZM were significantly highly expressed at 65 days after flowering, while only some of the expression levels in L7 were consistent with those in ZM.
When there was no lignin at the early stage of growth and development, C3'H (gene21068), HCT (gene11284) and POD72 (gene16314) were highly expressed in both varieties. Whether the high expression of these genes was due to the development of vascular tissue, preparation for subsequent lignin synthesis, or the need for cell proliferation is uncertain. C4H, which catalyses the second step in the phenylpropanoid pathway, showed high expression at the stage of lignin biosynthesis in ZM, while L7 had high expression at 85 DAFB. CAD and COMT had similar expression patterns. The expression of CCR in hard seed pomegranate was significantly higher than that in soft seed [25], but CCR was significantly lower in ZM during the lignin formation period. Several POD showed different expression patterns in different varieties at different periods. These results suggested that difference in endocarp thickness cannot be the result of one specific phenylpropanoid pathway gene. These genes in the phenylpropanoid pathway may be responsible for the differences in endocarp development and lignification in the two cultivars. The monolignol biosynthesis pathway is still being revised as new genes and regulation mechanism continue to be discovered [43].
The synthesis of flavonoids shares the same substrate, p-coumaroyl-CoA, with lignin (Fig.5). There is no doubt that the flavonoid synthesis pathway competes with the lignin pathway, and the coordination of this process may be the key to controlling some important agronomic traits in fruits and nuts. CHS is the first key enzyme in plants that shifts from p-coumaroyl-CoA in the phenylpropanoid metabolic pathway to flavonoids biosynthesis [60,61]. We speculate that the relationship between CHS and HCT may determine the direction of the two major metabolic pathways. The expression of CHS2 increased in the late period in L7 and HCT decreased. CHS2 in ZM did not show the same trend. Perhaps the competition between the flavonoid pathway and the phenylpropanoid pathway does not start when sufficient substrates are accumulated in the metabolic pathway in the beginning. Maybe when the substrate is insufficient, the ratio of CHS to HCT determines the next flow direction. The induction of the lignin pathway is variable and the induction of flavonoids is conservative in Rosaceae, which may also be the case in walnuts. There may be a common transcription factor upstream of these two enzymatic control genes that regulates at the overall level of the plant. This conjecture still requires further experiments to verify. The competition model, however, would be more complicated. These expression data reveal a complex relationship between lignin and flavonoid pathways regulating the development of walnut fruit. This may be partly responsible for the difference in endocarp between the two varieties.
SHP and NST are transcription factors that mediate the development of the lignified endocarp layer (enb) in Arabidopsis. NST1 promotes enb lignification after tissue identity has been established [62]. The expression of SHP and STK decreased before the lignin deposition, and then induced the expression of NST1in peaches [48]. NST1 was found in DEGs, but SHP and STK were not found in our analysis. The expression of NST1 in the lignin generation stage was higher than that in the early stage. Walnuts differ significantly from Arabidopsis and peach. In Arabidopsis thaliana, MYB46 and MYB83 are considered to be key regulators of lignin, while NST1 is the switch that regulates MYB46 and MYB83 [27]. However, in our transcriptome data, both MYB46 and MYB83 were filtered out due to low FPKM values. We screened out MYB4 and MYB7 downstream of MYB46 and MYB83 regulation. AtMYB4, a lignin inhibitor, was downregulated in both the wounded leaves and in systemically nonwounded leaves of the same rosette. Therefore, we speculate that the reduction in AtMYB4 levels upregulate the monolignol pathway, enabling lignin to be produced in damaged tissues as a defense strategy. However MYB4 was highly expressed in the lignin synthesis stage in this study and the expression level was higher in L7, which has a thinner shell. NAC transcription factors can control the formation of secondary walls in woody tissues. The positive feedback regulation network formed by LBD and NAC protein regulates the secondary growth of Arabidopsis by controlling the differentiation of xylem cells [63]. The LBD family has several biological functions such as plant development, secondary growth, pathogen response, biological stress and hormone signaling [64][65][66][67][68]. LBD25 can control the growth of the hypocotyl of Arabidopsis by regulating the morphogenesis of light, resulting in the failure of the hypocotyl of LBD25 mutant under dark culture conditions [69]. In the transcriptome sequencing results, LBD41 was highly expressed in the lignin synthesis stage and the value in F-ZM was significantly higher than in L7. In the KEGG enrichment pathway, hormone signal transduction enriched the most differentially expressed genes, so it can be seen that the initiation and development of endocarp lignification is necessary to mobilize a large number of hormones, and the significant changes in LBD41 may be related to this. LBD25 was highly expressed at 20DAFB, during the growth and development stage of walnut. AtLBD6/AtAS2 downregulates the expression of KNOX, thus participating in leaf morphogenesis [69]. In our results, LBD6 was highly expressed at 45 DAFB, which may be related to the location of lignin deposition. Walnut endocarp showed some ways of transcriptional regulation that were different from those of Arabidopsis regulatory network, and perhaps the function of these genes was also different from the existing known functions; hence the regulation of walnut endocarp still requires further study. The lignification of walnut endocarp is a complex process under the combined action of hormones, enzymes, related genes and regulatory networks.

Conclusions
The results showed that the lignification of the endocarp in both of ZM and L7 walnut started between 30 DAFB and 45 DAFB, the early stage of fruit growth and development. Transcriptome sequencing results and analysis showed that there were differences in endocarp development and lignification between ZM and L7 cultivar. The analysis of sequencing and qPCR identified DEGs involved in phenylpropanoid pathway including C3'H, CCoAOMT, COMT, HCT, CCR, POD, and CAD. The different expression patterns of these key genes in the phenylpropanoid pathway may be responsible for the differences in endocarp phenotypes in the two cultivars. The analysis of transcription factors revealed that the regulation network in endocarp of walnut may be different from that of drupe such as apricot or peach. Meanwhile some transcription factors, such as MYB, NAC and LBD, were also significantly different in the two varieties, which may be the reason of discrepancies in endorcarp development and lignification of ZM and L7 walnut. were used for transcriptome sequencing. ZM had a thicker shell than L7; the shell's average thickness of ZM was 1.30mm, the average shell thickness in L7 was 0.97mm. Walnut endocarp tissue was mashed and mixed in liquid N then stored at −80°C for follow-up experiments. Three biological replicates of walnut endocarp were collected in each treatment and some samples were collected for slice observation.

Specimen preparation for light microscopy
The endocarp was cut into cubes of approximately 5 mm 3 and fixed with FAA. The tissues were dehydrated with 70%, 85%, 95% and 100% ethanol at room temperature (1 hour for each step). Gradient vitrification was performed from 100% ethanol to 100% xylene; then the samples were permeated and embedded in paraffin. Sections of 8 μm each were then made using a microtome (SLEE-CUT4062) and subsequently mounted on adhesive microscope slides that were gradually dewaxed to expose the pure tissue. The sections were stained with 1% phloroglucinol-HCl and then were quickly observed under a microscope. Phloroglucinol-HCl staining is the fastest and simplest method to detect the presence and content of lignin. Under acidic conditions, lignin is dyed pink by phloroglucinol, and the higher the lignin content, the darker the colour is.

Samples in this part
The endocarp of ZM and L7 samples at 20 DAFB (T) and 45 DAFB (F) were used for transcriptome sequencing. Samples were named T-ZM, F-ZM, T-L7 and F-L7 (shown in Table 4).

RNA isolation and qualification
Total RNA was extracted from the samples using RNAprep Pure Plant Plus Kit (Tiangen Biotech, Beijing, China). RNA degradation and contamination were monitored on 1% agarose gels and RNA purity and concentration were measured using the BIOSPEC-NANO ® (230V) spectrophotometer (SHIMADZU). Three biological replicates were used for each sample. After total RNA was extracted, mRNA was enriched using Oligo (dT) beads.

cDNA synthesis and amplification
The enriched mRNA was fragmented into short fragments using fragmentation buffer and reverse transcribed into first strand of cDNA with random primers using cDNA first strand synthesis kit (Takara, Dalian, China). Second-strand cDNA were synthesized by DNA polymerase I, RNase H, dNTP and buffer. Then the cDNA fragments were purified with QiaQuick PCR Purification kit (Qiagen, Shanghai, China), end repaired, poly (A) added, and ligated to Illumina sequencing adapters. The ligation products were size selected by agarose gel electrophoresis, PCR amplified, and sequenced using Illumina HiSeq 2500 by Gene Denovo Biotechnology Co. (Guangzhou, China).

Sequencing data analysis
Reads obtained from the sequencing machines included raw reads containing adapters or low-quality bases which should be filtered out to get high quality clean reads. The rRNA removed reads of each sample were then mapped to the reference genome by TopHat2 (version 2.0.3.12). The reconstruction of the transcripts was performed using Cufflinks software, which together with TopHat2, allows biologists to identify new genes and new splice variants of known genes. To identify the new gene transcripts, all of the reconstructed transcripts were aligned to the reference genome and were divided into twelve categories by using Cuff compare. Gene abundances were quantified by RSEM software. The gene expression level was normalized using the FPKM method. Next, the relationship analysis of the samples was performed using correlation analysis and principal component analysis (PCA). PCA was performed using R package models (http://www.r-project.org/). To identify differentially expressed genes across samples or groups, the edge R package (http://www.r-project.org/ ) was used. We identified genes with a fold change ≥2 and a false discovery rate (FDR) <0.05 in a comparison as significant DEGs. DEGs were then subjected to enrichment analysis of GO functions (http://www.geneontology.org/) and KEGG pathways (https://www.kegg.jp/). Transcription factor analysis was based on the HMM model HMMER v3.1 TFs predict b2 program (http://hmmer.org/), while the configuration of the database for plant transcription factor was PlantTFDB (http://planttfdb.cbi.pku.edu.cn/).

Gene expression analysis by quantitative RT-PCR
Eighteen genes were chosen from the differentially expressed genes (DEGs) that control the key enzymes in the phenylpropanoid biosynthesis pathway, which can catalyse phenylalanine turn into lignin. Total RNA was separately extracted from T-ZM, F-ZM, T-L7 and F-L7. Gene-specific primers were designed by Primer Premier 5.0. The first strand of cDNA was synthesized with reverse transcription following the TransScript One-

Ethics approval and consent to participate
No specific permit was required for the present study in P. R. China.

Consent for publication
Not applicable.

Availability of data and materials
The raw RNA-seq data of the samples is uploading into the NCBI with accession number SUB7208520.

Competing interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Funding
This work was supported by the Natural Science Foundation of Hebei Province (C2019204270), the Science and Technology Research Project of the Universities of Hebei Province (QN2019159), and Research Foundations for Returned Scholars from Overseas of the Human Resources Dept. of Hebei Province (C20190345). The funders played no role in the design or operation of the study or in the analysis of data or in writing the manuscript.

Author' contributions
XTW conceived and designed the experiments. XTW and YBQ collected the material. XTW and ZCZ performed the experiments and analyzed the data. XTW wrote the manuscript. MTS XHA and SGZ participated in the revision of manuscripts and gave suggestions. All authors read and approved the manuscript.    The circle size represents the number of different genes. The more DEGs, the larger the circle. Richfactor refers to the ratio of the number of genes in the pathway entry in DEGs to the total number of genes in the pathway entry in all genes. The larger the rich factor, the higher the degree of enrichment. Qvalue is a pvalue corrected by multiple hypothesis tests. The value ranges from 0 to 1. The closer it approaches zero, the more significant the enrichment is. The graph is plotted using pathways in the top 20 rankings of qvalue from small to large.  qPCR analyses of selected candidate genes involved in this study. Blue bars indicate ZM walnut; red bars indicate L7 walnut. Numbers under the x-axis indicate the days after full bloom.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. AdditionalFiles.zip