Identication of Potential Shade Response Regulators in Endangered Species Magnolia Sinostellata

Magnolia sinostellata is one of the endangered species in China and largely grows under canopy shade. However, the shade response molecular mechanism remains unclear. To explore potential regulators in the mechanism, weighted gene co-expression network analysis (WGCNA) was performed to analysis the trancriptome data of M. sinostellata leaves subjected to shading with different time. Gene co-expression analysis illustrated that lightsteelblue1, paleturquoise, darkolivegreen modules are closely associated with shade treatment. Gene ontology and KEGG analyses showed that genes in lightsteelblue1 module mostly participated in amino and nucleotide metabolism, genes in paleturquoise module mostly involved in carbon xation and genes in darkolivegreen module mainly participated in photosynthesis related pathways. Through Cytoscape3.8.2, we identied 6, 7 and 8 hub genes in lightsteelblue1 module, paleturquoise module and darkolivegreen module, respectively. This study found that shading impacted photosynthesis, carbon assimilation and owering of M. sinostellata. In addition, key shade response regulators identied in this study have laid a rm foundation for further investigation of shade response molecular mechanism and protection of shade sensitive plants.


Introduction
Records on Magnoliaceae plants can be traced back to the Mesozoic era. Until now, many Magnoliaceae plants have been favored by people for their important ornamental characteristics. However, with the changes of the climate, forest community composition, and the succession of forests, deciduous Magnoliaceae species are facing endangerment in the wild 1 . The growth and distribution of endangered species of Magnoliaceae are mostly con ned to narrow areas. Magnoliaceae plants was mainly adapted to coniferous and broad-leaved mixed communities, while declined to evergreen broad-leaved communities [2][3][4] . Magnolia sinostellata is an endangered species belonging to family Magnoliaceae. It is a deciduous shrub which can grow to 3 m in height and blossom in early spring (February and March) in subtropical regions 5 . M. sinostellata is endemic to a narrow area of China (mainly distributed in Jingning, Wenzhou counties, Zhejiang Province), with extremely small population. Pre-investigations indicated that M. sinostellata mainly distributed in coniferous forests, sparsely distributed in broad-leaved forests and mixed forests 4 . The upper tree layer of forest community determines the light intensity in the community.
The light intensity of coniferous communities is higher than that of evergreen broad-leaved communities 6 .
Canopy shade is the major factor that affect plant growth and development in understories in natural environment [7][8][9][10] . The primary effect of shading on plants is weakened photosynthesis e ciency 11 .
Under shady conditions, light captured by light harvesting complex (LHC) is limited in plants 12 . Electron transport rate through PS I (ETR I) is signi cantly reduced in Rice due to the defect in chloroplast 13 .
Moreover, in Zea mays, photochemical quenching coe cient(qp) and effective quantum yield of PSII photochemistry (ΦPSII) were signi cantly reduced under shade, which indicating that the photosynthesis ability weakened 14 . In photosynthetic plants, Ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco) is the key enzyme in Calvin cycle, which transforms CO 2 into carbohydrates for further metabolism 15 . Rubisco activity is repressed and the content of Rubisco enzyme is decreased under shade in plants 16 , showing that the xed carbon for further carbon assimilation in plants was reduced. In addition, the activity of sucrose phosphate synthase and sucrose synthase are limited in wheat, indicating that its starch and sucrose metabolism was inhibited 17 . With the increase of shading degree, the carbon assimilation and metabolism capacities of seedlings decreases 18 , water usage and transpiration are also affected 19 , which inhibit plant growth and community regeneration. At the same time, with the growth of seedling age and the development of plant reproduction, the demand for light of light-needing species will also increase. The lack of light intensity will inevitably reduce the number of owers and affect the fruiting of plants. Shading dramatically reduced ower quality, alkaloid yield and seed number of Papaver somniferum 20 . In Paeonia lacti ora, shading led to delayed owering date, reduced ower fresh weight and faded ower color 9 .
Transcriptome data analysis is a sharp tool to analyze and reveal underlying molecular mechanism in response to various abiotic stress in plants [21][22][23] . In recent studies, various shade-responsive genes involved in photosynthesis, plant hormone signal transduction, circadian rhythm and metabolic pathways have been identi ed through RNA-seq [24][25][26] . However, the shade response molecular mechanism in plants remains unclear. More importantly, the key shade response regulatory genes in M. sinostellata and other shade sensitive plants remains unknown. Weighted gene co-expression network analysis (WGCNA) is an effective tool to identify co-expressed modules and hub genes 27 , which has been applied in various plants including Camellia sinensis 28 , hot peppers 29 , rice 30 and wheat 27 to explore various abiotic stresses response molecular mechanisms. It is a signi cant way to understand gene function and association from the trancriptome-wide [31][32][33] . However, there is no previous report of using WGCNA to identify shade response hub genes in M. sinostellata.
In this study, transcriptomes of shade treated and untreated M. sinostellata leaves were rstly analyzed to identify core differential expression genes (DEGs) under shade. Further, WGCNA was applied to detect key modules involved in shade response. Finally, hub genes in response to shade were pinpointed, which can be breakthrough points to understand shade response molecular networks in M. sinostellata. Our results provide a theoretical basis for further understanding of shade response molecular system in plants and provide essential information for protection of shade sensitive species.

Results
Identi cation of the shade-responsive core DEGs in M.sinostellata A total of 181,902 genes were detected in this transcriptome sequencing, and the average comparison rate with the compared gene set was 80.49%. 246481 non-redundant transcript sequences were obtained.
The total length of these sequences is 270,112,156, the longest sequence is 20445bp, the shortest sequence is 199bp, and the average sequence length is 3420.95bp. In this study, we analyzed the global gene expression pro les of M. sinostellata for shade response using ve datasets, including CK-D0, CK-D5, ST-D5, CK-D15 and ST-D15, the details of which are shown in Table S2. Each dataset contains 3 replicates. Correlation analysis was performed among these 15 samples, which showed good reproductivity in the same group and a signi cance difference between control groups and shade treated groups ( Figure S1). This analysis suggests that these samples can be used for subsequent screening and analysis.
From a total of 181,902 expressed genes, 4734 core DEGs were identi ed using cross-compared venn diagrams including CK-D0 vs ST-D5, CK-D0 vs ST-D15, CK-D5 vs ST-D5 and CK-D15 vs ST-D15 (Log2FC > 1, Qvalue < 0.05) (Fig. 1). Then, we characterized these 4734 core DEGs to get more insight into their contributing molecular pathways. First, 4734 DEGs were subjected to GO analysis, which were classi ed into three groups and 33 subgroups ( Fig. 2A). The biological process group can be divided into 18 subgroups, among which cellular process and metabolic process were the top two sub-groups involved the most genes. Four subgroups were related to cellular component, among which cellular anatomical entity and intracellular were the main subgroups involved most genes. Eleven subgroups constitute the molecular function group, and the catalytic activity and binding involved the most genes. GO enrichment analysis showed that most signi cant GO terms are related to photosynthesis. The top ve enriched GO terms were (GO:0009522); photosynthesis, light harvesting (GO:0009765); photosystem (GO:0009521); photosynthesis (GO:0015979) and photosystem II (GO:0009523) (Fig. 2B). Then, we performed KEGG analysis to explore contributing pathways of 4734 DEGs (Fig. 3A). For KEGG classi cation, genes were annotated into ve groups and 19 subgroups. Among metabolism group, global and overview maps, carbohydrate metabolism and energy metabolism subgroups were signi cantly enriched. Photosynthesisantenna proteins was the most signi cantly enriched KEGG pathway. The top ve frequently enriched pathways were as follows: photosynthesis-antenna proteins, galactose metabolism, phenylpropanoid biosynthesis, starch and sucrose metabolism and alanine, aspartate and glutamate metabolism (Fig. 3B).
WGCNA of core shade responsive DEGs in M. sinostellata To identify key shade responsive genes in M. sinostellata, WGCNA was performed to analysis 4734 core DEGs, which can identify modules of highly related genes. The selection of an optimal soft threshold power is an essential step to construct WGCNA. A network topology research of 1-20 was executed, and the scale independence and the average connectivity of the WGCNA relative equilibrium were determined.
Threshold 18 was selected to construct a hierarchical clustering tree of DEGs (Fig. 4A). MEDiss Thres was set to 0.22 to merge similar modules and 4 modules was generated (Fig. 4B). Genes in grey module that cannot be assigned to any modules were not analyzed in the further study. Four different modules were generated by WGCNA including lightsteelblue1, paleturquoise, darkolivegreen and violet. The darkolivegreen module, lightsteelblues1 module, paleturquoise module and violet module including 2481, 2008, 75 and 71 DEGs, respectively (Table 1). To identify co-expression similarity of modules, characteristic genes were calculated and clustered according to their correlation (Fig. 4C). We found that these 4 modules are divided into two categories: the rst included lightsteelblue1 and paleturquoise modules; the second included darkolivegreen and violet modules. Gene modules of the same category may have similar functions or contributing to the same regulatory mechanism. To investigate modules associated with shade treatment, we plotted module-trait relationship heat map (Fig. 4D). This result shows that lightsteelblue1 module positively correlated with two shade-treated groups (ST-D5, r = 0.7; ST-D15, r = 0.49) but negatively correlated with control groups (CK-D0, r=-0.40; CK-D5, r=-0.38; CK-D15, r=-0.41). Similarly, paleturquoise module had a strong correlation with ST-D15 (r = 0.68), while had a negative correlation with CK-D0 (r=-0.26), CK-D5(r=-0.25) and CK-D15 (r=-0.25). In contrast, darkolivegreen showed a positive correlation with control groups (CK-D0, r = 0.38; CK-D5, r = 0.35; CK-D15, r = 0.49) but illustrated a signi cant negative correlation with treated groups (cor<-0.5, p < 0.05). However, genes in violet module showed no signi cant correlation with treated groups or control groups. These results suggest that DEGs in lightsteelblue1 and paleturquoise modules mainly up-regulated and DEGs in darkolivegreen module down-regulated under shade treatment. These results showed that lightsteelblue1, paleturquoise and darkolivegreen modules were signi cantly correlated with shade treatment. To obtain further understand of the expression pattern of these three modules, heat maps of gene expression for these modules were generated along with eigengene expression values (Fig. 5). We observed that lightsteelblue1 module genes were signi cantly responsive to shade treatment and mainly up regulated. In paleturquoise module, genes were slightly induced in ST-D5 while markedly up-regulated in ST-D15. In contrast, darkolivegreen genes mainly downregulated under shade treatment. The functions of genes under shade treatment in speci c modules and its contributing regulatory pathways were revealed by GO and KEGG analysis. GO analysis suggested that 'N-acetyltransferase activity' and 'acetyltransferase activity' as the most signi cantly enriched GO terms in lightsteelblue1 module, which means genes in this module mainly response to shade by regulating acetyltransferase activity (Fig. 6A). KEGG analysis in lightsteelblue1 module identi ed 'Valine, leucine and isoleucine degradation' and 'Amino sugar and nucleotide sugar metabolism' as the most enriched regulatory pathways (Fig. 6B). Concerning paleturquoise module, GO analysis displayed that 'hydrolase activity' and 'hydrolyzing O-glycosyl compound' were the top two enriched GO terms indicating that these genes regulating hydrolase-related metabolism (Fig. 6C). KEGG analysis demonstrated that 'Starch and sucrose metabolism' pathway was appreciably in uenced by shade (Fig. 6D). These results matched GO and KEGG analysis conclusion of 4734 core DEGs, indicating that the results are reliable. In darkolivegreen module, GO analysis indicated that enriched GO terms of cellular component, biological process and molecular function were mainly related to photosynthesis, among which the top ve GO terms were 'photosystem I', 'photosynthesis, light harvesting', 'photosystem', 'photosynthesis' and 'thylakoid' (Fig. 6E). KEGG analysis showed that 'Photosynthesis-antenna proteins' and 'Photosynthesis' were the most enriched pathways (Fig. 6F). These indicate that genes in darkolivegreen module mainly involved in photosynthesis.
Hub genes identi cation in key Co-Expressed modules in M. sinostellata under shade To identify hub genes in these three modules, genes with a weight parameter over 0.4 were analyzed and visualized through Cytoscape 3.8.2 to construct interaction networks (Fig. 7). We identi ed 6, 7 and 8 hub genes in lightsteelblue1, paleturquoise and darkolivegreen modules respectively via integrated analysis results of MCODE, cytoHubba and Centiscape2.2 in Cytoscape3.8.2 (   (Table S3).

Veri cation of hub genes in M. sinostellata under shade by RT-qPCR
We executed a quantitative reverse-transcription PCR (RT-qPCR) analysis to verify the expression level of 21 hub genes under shade treatment after 0d, 5d and 15d in M. sinostellata. These results suggested that the expression level of all the hub genes were signi cantly changed during shade treatment, which indicating that these genes were all shade-responsive (Fig. 8)  to have essential roles in stress response 43,44 . MsUGT73C7 (isoform_12760) and MsUGT91C1 (isoform_121918) belongs to UDP-glucose: glycosyltransferase (UGT) family, which acatalyzed the steps from derhanmosyl acteoside to acteoside and from hydroxytyrosol glucoside to decaffeoylacteoside 45 .
MsUGT73C7 and MsUGT91C1 reduced the expression of superoxide dismutase synthesis genes (isoform_15962) to weaken the stress resistance ability of M. sinostellata.
In general, under shading conditions, the photosynthesis and stress resistance ability of M. sinostellata were weakened, leading to diminished growth potential. Due to the weakening of photosynthesis, M. sinostellata can only support its life activities by consuming the carbohydrate in the plant. The oweringrelated genes and pathways of M. sinostellata were enhanced, which might lead to earlier ower bud differentiation and owering. In natural environment, the early owering of plants can lead to excessive consumption of nutrients and affect its population renewal in the next year. The hub genes identi ed in this research have set a solid foundation for further research on the mechanism of shade response in plants. µmol·m − 2 ·s − 1 ) and bamboo poles. The control group seedlings were unsheltered (100% full light, luminance 1400 ± 30 µmol·m − 2 ·s − 1 ), and the other conditions remained unchanged. There were no leaves or stems overlapping between plants, each treatment had 3 replicates. Leaves were collected from the plants in both the treated and control groups after shade treatment for 0d, 5d and 15d, among which the samples of 0d were evenly pooled samples of control and treated groups, then immediately frozen in liquid nitrogen and stored at -80°C for further experiments and RNA-seq analyses. At sampling, collections were performed from 3 plants, and each sample collection was repeated 3 times for biological replicates.

Methods
Transcriptome sequencing and data analysis To identify shade responsive genes, RNA sequencing was performed to analyze transcriptome gene expression pro les in 15 samples. These 15 samples were divided into ve groups, including three control groups (CK-D0, CK-D5 and CK-D15) and two shade-treated groups (ST-D5 and ST-D15). Total RNA of 15 samples were proceed by mRNA enrichment method or rRNA removal method. The puri ed RNA was fragmented with the interrupted buffer and reversed with random N6 primer, and then synthesized into cDNA two-strand to form double-stranded DNA. The ends of synthetic double-stranded DNA are lled in and the 5'end is phosphorylated. The 3'end forms a sticky end with an 'A' protruding, and then a bubbly linker with a protruding 'T' on the 3'end is connected. The ligation product is ampli ed by PCR with speci c primers. The PCR product is heat-denatured into single-stranded, and then the single-stranded DNA is circularized with a bridge primer to obtain a single-stranded circular DNA library. DNBSEQ platform was employed to sequence the libraries. The R package (edgeR v3.16) was employed to identify the differentially expressed genes (DEGs) between shade-treated and control samples. Genes ful lled stringent criteria were identi ed as DEGs (fold-change > 2 and q value < 0.05, with false discovery rate (FDR) less than 0.05). The function and involved pathways of the DEGs were classi ed according to the GO and KEGG annotation results and o cial classi cation. Phyper function in R software was used for enrichment analysis.
Weighted gene co-expression network construction and hub genes detection The weighted gene co-expression network was constructed using WGCNA package in R software to further analysis gene functions and contributing pathways in response to shade in M. sinotellata. The 4734 core DEGs of 15 samples were used to construct this network. The similarity matrix was calculated by identifying the Pearson correlation coe cient between all gene pairs. The correlation matrix was transformed by soft-thresholding process to mimic the scale-free topology. The adjacency matrix was converted into a topological overlap matrix (TOM), and all coding sequences were hierarchically clustered by TOM similarity algorithm. The co-expression gene modules of the gene dendrogram were detected by the dynamic tree cut method, which using a height-cut less than 0. 22 sinostellata EF1-α was employed as the reference gene 5 . DNA primers used are listed in Table S1. Each sample testing was repeated at three times.

Declarations
Author contribution D.L. contributed to writing, statistical analysis and visualization of this article. Z.L., Y.W., S.Z., Y.S. put forward and carried out this study, and Y.S. was the corresponding author. Q.Y., Z.L., M.R. and C.W. investigated and collected data.     Qvalue were selected to plot these charts.