Genome-scale Transcriptomic Insights Into the Gene-regulatory Network of Seed Abortion in Triploid Siraitia Grosvenorii

Rongchang Wei Guangxi Academy of Agricultural Sciences Dongping Tu Guangxi University of Chinese Medicine Xiyang Huang Guangxi Institute of Botany, Guangxi Zhuang Autonomous Region and Chinese Academy of Sciences Zuliang Luo Chinese academy of Medical Science & Peking Union Medical College Xiaohua Huang Guangxi Academy of Agricultural Sciences Nan Cui Guangxi Institute of Botany, Guangxi Zhuang Autonomous Region and Chinese Academy of Sciences Juan Xu Cash Crops Research Institute, Guangxi Academy of Agricultural Sciences Faqian Xiong (  xfq2002@126.com ) Guangxi Academy of Agricultural Sciences Haifeng Yan Guangxi Academy of Agricultural Sciences Xiaojun Ma Chinese academy of Medical Science & Peking Union Medical College


Background
Siraitia grosvenorii (Swingle) C. Jeffrey is a species of the genus Siraitia Merr. (Cucurbitaceae). It is native to the southern parts of China and is mainly found in Guangxi Province [1]. S. grosvenorii is also known as Luohanguo (LHG) or monk fruit. LHG is not only used as a food ingredient, but also as in traditional Chinese medicine. In China, LHG has been used as a natural cough suppressant and expectorant for over 300 years and is one of the rst approved medicinal food homology species. It is also highly favored internationally [1,2].
Previous studies have revealed that several different classes of compounds were isolated from S. grosvenorii, including triterpenoids, iridoid, avonoids glycosides, vitamins, proteins, saccharides, and a volatile oil [3][4][5]. Because LHG contains sweet glycosides that are naturally low in calories, the extracts from ripe LHG can be used as supplements and sweeteners in sugar-free health foods and beverages [6,7]. In addition, this plant is currently used as an analgesic for the lungs and an emollient for the treatment of sore throat, thirst, and constipation [8]. The crude extracts and puri ed products of S. grosvenorii have several biological activities, including immunologic, antioxidant, anti-tussive, sputumreducing, hypoglycemic, hepatoprotective, and antimicrobial [9][10][11][12][13]. The triterpene compound, mogrosides, is considered the main active ingredient, contributing to the sweet taste and the main biologically active ingredient of S. grosvenorii [14]. With the development of science and technology, the application of S. grosvenorii in the eld of medicine can be further explored. Therefore, S. grosvenorii is a natural product with high development potential for medicine and commodity production, and it is attracting increasing attention from scienti c research community and commercial establishments [3,15].
In the past 10 years, chromosome engineering has been a hot topic in research on the germplasm resources of S. grosvenorii, and several polyploid S. grosvenorii resources have been obtained. Triploid S. grosvenorii is an excellent polyploid germplasm resource. The tetraploid female plant obtained by colchicine induction and the diploid male parent were hybridized, and triploid seeds were obtained after the fruit ripened. Compared with the diploid S. grosvenorii, the triploid plants had larger vegetative and reproductive organs. Further, it had seedless fruits or fruits with only few seeds. The content of mogroside V in triploid fruit was 36.28% higher than that in the control. In addition, triploid plants were found exhibiting strong resistance [16,17].
Seed development can be divided into three major phases: embryogenesis, seed maturation, and desiccation. Embryogenic processes include cell division and expansion, morphogenesis, and the beginning of the endosperm and embryo development. A series of morphological, physiological, and biochemical changes are involved in seed development [18][19][20]. The production of viable seeds is thought to require the co-regulation of multiple plant endogenous hormones. Abscisic acid (ABA) and gibberellin (GA) are the main hormones that regulate seed formation. The process of seed dormancy interruption is determined by the distribution of GA and ABA. In addition, ABA regulates seed maturation, embryo morphogenesis, and desiccation. Auxin is an important molecule that regulates seed development in conjunction with ABA [21][22][23][24]. Trans-zeatin (ZR) is a cytokinin that exists in seeds that can induce cell division and proliferation [25]. We speculate that seed abortion may be regulated by uctuations in the abovementioned four hormones; however, related evidence is absent.
At present, there are few reports on breeding new species of S. grosvenorii using genetic-engineering technology. Thus, studying the molecular mechanism of seed abortion and identifying the genes and pathways related to seed abortion in S. grosvenorii will provide a theoretical basis for genetic engineering to breed new seedless S. grosvenorii species. Here, we analyzed the transcriptome sequencing data of the triploid seeds of LHG at different developmental stages and combined this with the endogenous hormonal changes of seed development to reveal the gene-regulatory network involved in the underlying mechanisms for formation of triploid LHG abortive seeds.

Results
The pro le of seed abortion in S. grosvenorii The structural changes in the tissues at each stage were observed using TEM. There was no obvious edema in the cytoplasm and no obvious separation of the plasma wall at stage 5DAF (Figure 1a and b). The nucleus was irregular, the mitochondria were oval, the crest and rough endoplasmic reticulum were moderately dilated, the surface ribosome was locally exfoliated, and the number of intracellular starch granules was abundant. Severe edema of the cytoplasm and slight separation of the plasma wall were observed at stages 10DAF (Figure 1c and d) and 15DAF (Figure 1e and f), respectively. Severe separation of the plasma wall, local damage to the cell membrane, and thickness of the cell wall were observed at 20DAF (Figure 1g and h) and gradually recovery was observed from stages 25DAF (Figure 1i and j) to 30DAF (Figure 1k and l).
We also examined the levels of endogenous plant hormones, ABA, ZR, IAA, and GA3, during seed abortion using ELISA (Figure 2a-d). Interestingly, the levels of ABA and ZR decreased remarkably at stages 15DAF and 20DAF, respectively. The above changes in submicroscopic structure and levels of endogenous plant hormones indicated that 15-20DAF was a key transition period during seed abortion. We will mainly focus on the exploration of the transcriptomic changes at stages 15DAF and 20DAF in the subsequent analyses.
Global analysis of the RNA-Seq data Genes with signi cant differential expression (|log2FC|>1 and FDR-adjusted p-value < 0.05) were identi ed between consecutive stages. The number of DEGs in the ve comparisons is shown in Figure  3a  The DEGs were further functionally classi ed into GO slims (Tables S1-S5; p < 0.05). DEGs in 20DAF vs. 15DAF were identi ed to be involved in meristem structural organization, morphogenesis of a branching structure, adaxial/abaxial axis speci cation, positive regulation of catalytic activity, oxidoreductase activity, and glucosidase activity (Table S3), whereas those in 25DAF vs. 20DAF were involved in trehalose metabolic process, seed germination, response to sucrose, transmembrane receptor histidine kinase activity, and so forth. (Table S4).
Seven common DEGs shared by all comparisons of adjacent stages were identi ed (Figure 3b), which revealed uctuations in expression levels across all stages (Figure 3c). These DEGs were found to be associated with toxin metabolic processes, regulation of gene expression, glucan metabolic process, and external encapsulating structure organization.
Identi cation of temporal expression trends across S. grosvenorii seed transcriptomes We performed STEM analysis to visualize the expression patterns of mRNAs. As a result, seven expression pro les were found to be statistically signi cant (Figure 4a). Pro les 0 and 19 tended to be continuously downregulated and upregulated during seed abortion, respectively, whereas pro les 1 and 18 reached an expression peak and valley at stages 15DAF and 20DAF, respectively.
KEGG pathway enrichment analysis was performed for the four expression trends (19, 0, 1, and 18). The genes in pro le 0 with a continuously downregulated trend were mainly enriched in starch and sucrose metabolism, plant hormone signal transduction, linoleic acid metabolism, replication and repair, and other glycan degradation (Figure 4b), whereas those in pro le 19, which were continuously upregulated, were related to amino acid metabolism; phenylpropanoid biosynthesis; ubiquinone and other terpenoidquinone biosynthesis; avonoid biosynthesis; stilbenoid, diarylheptanoid, and gingerol biosynthesis; diterpenoid biosynthesis; glycolysis/gluconeogenesis; and plant-pathogen interactions (Figure 4c). The genes in pro le 1 with the lowest expression level at 15DAF and 20DAF were involved in RNA polymerase, butanoate metabolism, inositol phosphate metabolism, and fatty acid degradation ( Figure 4d). Finally, the genes in pro le 18 with the highest expression levels at stages 15DAF and 20DAF were involved in arachidonic acid metabolism, amino sugar and nucleotide sugar metabolism, one carbon pool by folate, cyanoamino acid metabolism, glutathione metabolism, isoquinoline alkaloid biosynthesis, phagosome, glycerolipid metabolism, and fructose and mannose metabolism ( Figure 4e).

Co-expression network associated with seed abortion
A gene co-expression network was used to cluster 21,152 genes after ltering those with low expression levels, resulting in 16 modules with different colors (Figure 5a). Combined with the module-trait relationship and module signi cance, we eventually identi ed the brown2 module as the most phenotypically relevant module, which showed a negative correlation with the level of ABA (the correlation coe cient was -0.71 and the FDR-adjusted p-value was 9e-04), whereas the lightsteelblue module showed a positive correlation (Figure 5b). The indianred4 and skyblue3 modules were also found to be negatively and positively associated with the level of ZR, respectively, with statistical signi cance ( Figure 5b).
We explored the expression pro les of each module across the different stages ( Figure 6a). Interestingly, we found that the genes in the brown2 module were speci cally, highly expressed at 15DAF and 20DAF ( Figure 6a). By combining the above results, we considered the brown2 module to be negatively associated with the changes in ABA levels at 15DAF. According to the results of GO enrichment analysis, the genes in the brown2 module were enriched in monoterpene metabolic processes and terpene metabolic processes ( Figure 6b). Additionally, KEGG pathway analysis revealed that the genes were involved in sesquiterpenoid and triterpenoid biosynthesis and amino sugar and nucleotide sugar metabolism ( Figure 6c). These ndings indicate that these genes, which are related to terpene metabolism, are highly expressed at stage 15DAF and may play an important role in the decrease of ABA levels, thereby leading to seed abortion.

Discussion
The structural changes in the aborted seeds of S. grosvenorii at different phases observed using TEM revealed that the developmental abnormality began on 10DAF and gradually deteriorated ( Figure 1). Furthermore, the defects were found to be serious at 20DAF. The results of the endogenous plant hormone determination of these sterile seeds were also consistent with those of TEM ( Figure 2). ABA is considered one of the main hormones that regulate seed formation [26], and ZR can induce cell division and proliferation [25]. Both were signi cantly decreased at stages 15DAF and 20DAF. These ndings indicate that 15-20 DAF is the critical period during seed abortion.
Transcriptome data at different developmental stages provide a large amount of data for the dynamic formation of triploid S. grosvenorii seeds. The results showed that the number of genes with signi cant changes in expression was highest during the early stages of seed formation. However, relatively fewer genes changed signi cantly before and after 20DAF (Figure 3a). Although few genes were differentially expressed at 20DAF vs. 15DAF, these genes were most likely associated with seed abortion as [15][16][17][18][19][20] DAF was considered to be the critical period during seed abortion. During the embryonic development of a seed, a complete tissue structure must be established, and the cells of the embryo need to become speci c and must differentiate into cell types in an integrated manner. This process establishes a simple body structure. This structure is composed of shoot meristem, cotyledons, hypocotyl, root, and root meristem arranged along the apical-basal axis and a concentric arrangement of epidermis, subepidermal basal tissue, and central vascular cylinder arranged concentrically along the radial axis [27]. The basic body of a plant is formed during embryogenesis, and formation of most organs and tissues occurs in the late stage of the embryo [18]. As early as the eight-cell division stage in Arabidopsis, the regions that generate the shoot meristem and root meristem begin to differentiate. Meristem is vital in postembryonic organ formation; moreover, the shoot apical meristem can be subdivided into regions with different properties and functions [27]. In addition to meristem structural organization, morphogenesis of a branching structure and adaxial/abaxial axis speci cation are also important for the formation of the complete tissue structure. Both ABA and ZR regulate embryo morphogenesis. Hence, seed abortion may be caused by abnormal embryonic morphology induced by uctuations in ABA and ZR levels.
The continuously downregulated genes involved in starch and sucrose metabolism may lead to seed abortion. Sucrose is not only a nutrient, but also a signal that induces storage-related differentiation. It can induce premature mitotic termination and cell enlargement, activate the expression of storage-related genes in maize and Vicia, and promote storage activity in cotyledons at the transcriptional level and starch accumulation [28][29][30][31][32][33]. Under drought stress, the transcription and activity of invertase, which is important for controlling the ratio of sucrose to hexose, are reported to be reduced, thereby interfering with the sucrose use capacity and the ratio of sucrose to hexose, which disrupt maternal tissue development and eventually lead to seed abortion [34]. Sucrose and ABA also have similar effects on gene expression. Sucrose may increase ABA levels or enhance ABA sensitivity [35]. However, this hypothesis has not been proven. Here, the variation trend of plant hormone signal transduction is similar to that of sucrose metabolism, which also re ects the potential mutual assistance between endogenous hormones and sucrose. Notably, many genes that are continuously upregulated during all developmental phases are involved in the biosynthesis of various secondary metabolites. Some classes of secondary metabolites, including phenylpropanoids, triterpenoid glycosides, and avonoids, have been isolated from S. grosvenorii in previous studies [3]. However, upregulated genes associated with secondary metabolite biosynthesis were not involved in the synthesis of mogrosides or other triterpenoid glycosides. Although mogrosides are responsible for the main biological effects of S. grosvenorii fruits, the biological effects of the other chemical components were still present. For example, the avonoid fraction of S. grosvenorii leaves showed antioxidative effects in rats [36]. These data can serve as a reference for the directional breeding of S. grosvenorii with important medicinal value. In addition, some genes related to plantpathogen interactions were continuously upregulated. This may explain the previously reported strong resistance of triploid S. grosvenorii [16] and provide a theoretical basis for breeding S. grosvenorii cultivars with high resistance.
Some genes with the lowest or highest expression levels at 15DAF and 20DAF are involved in the biological processes that are regulated by ABA. These biological processes include inositol phosphate metabolism, fatty acid degradation, and arachidonic acid metabolism ( Figure 4). The mRNA levels of Dmyo-inositol-3-phosphate synthase and inositol bisphosphate were induced by ABA in Spirodela polyrhiza [37]. In Chlorella vulgaris, changes in the regulation of fatty acid biosynthesis are triggered by exogenous ABA [38]. In Mortierella alpina, ABA stimulates arachidonic acid biosynthesis [13]. In addition, ABA is considered to trigger a wide range of transcriptome changes that help plants respond to environmental stimuli [39]. RNA polymerase is essential for gene transcription. Although the regulation of RNA polymerase by ABA has not been reported, we suspect that these RNA polymerase-related genes, which are expressed at extremely low levels in the critical phases of triploid seed abortion, may be in uenced by the endogenous hormone, ABA, to regulate transcription and lead to seed abortion. Furthermore, we speculated that these biological processes, which have been reported or may be regulated by ABA, may lead to abnormal seed development and sterility due to the effects of decreased ABA in these triploid plants.
Further analysis of the co-expression networks provided new insights into the regulatory mechanisms of ABA genes. The genes in the brown2 module, which were negatively associated with the changes in ABA levels, were enriched in monoterpene metabolic processes and terpene metabolic processes ( Figure 6). Both monoterpenes and terpenes contain a series of terpenoids. In higher plants, terpenoids are substrates for several compounds, including ABA, cytokinins, and the phytol side chain of chlorophyll. In addition, ABA and the phytol side chain of chlorophyll, as well as many secondary metabolites, including monoterpene and terpene, are downstream products of isopentenyl diphosphate [40][41][42]. CAB40 in brown2 module encodes the chlorophyll a-b binding protein of LHCII type 1. The expression level of CAB40 was high and negatively associated with the uctuation of ABA level at stage 15DAF, suggesting that terpenoids ux toward synthesis of the phytol side chain of chlorophyll. These results suggest that at 15DAF, the enhancement of the metabolic processes of monoterpene and terpene may lead to seed abortion by reducing substrate ow to ABA and cytokinins.
Notably, a variety of genes in the brown2 module were associated with plant resistance. CHIT3-2 encodes an acidic chitinase. Acidic chitinase is found in plants responding to pathogen attack and plants treated with salt solutions or salicylic acid [43,44]. Another gene, LYK2, may also be involved in plant-microbial interactions. LYK2 is a homolog of Nod factor receptor 5 (NFR5) of Lotus japonicus, which functions in Nod factor recognition during nodulation. The bacterial Nod factor is a GlcNAc lipochitooligosaccharide with a structure similar to that of peptidoglycan [45]. In Arabidopsis thaliana and rice, broad-spectrum Resistance 2 (BSR2), a gene encoding an uncharacterized cytochrome P450 protein belonging to the CYP78A family, enhances resistance against Rhizoctonia solani [46]. Trehalose-6-phosphatase synthase S4 (TPS4) is a protein involved in trehalose biosynthesis. It was considered to play a positive role in the response to salt stress in Brassica napus var. oleifera [47]. These resistance-related genes were found to be negatively correlated with ABA, suggesting that they may be affected by ABA or they may be one of the factors causing ABA uctuations. This hypothesis requires further veri cation.
Previously, LTP was reported to be induced by ABA in the vegetative tissues of rice [48]. However, LTP, which had the highest betweenness centrality score in the brown2 module, was negatively correlated with ABA based on our results, indicating that ABA may have different effects on the regulation of LTP expression in different tissues or developmental phases. In previous reports, two cytochrome P450 monooxygenases, CYP735A1 and CYP735A2, were demonstrated to catalyze the biosynthesis of ZR in Arabidopsis [49]. Here, in the indianred4 module, CYP75B2 expression was negatively correlated with the level of ZR, suggesting a different role of cytochrome P450 monooxygenases in the biosynthesis of ZR.

Conclusions
In conclusion, we found that ABA and ZR uctuated signi cantly at 15 and 20 DAF when the abortive seeds displayed structural abnormalities. Subsequent transcriptomic analyses provided us with a large amount of information, including the gene networks that may regulate the biosynthesis of hormones, ABA and ZR, as well as the metabolic pathways or biological processes that may be affected by plant hormones. These ndings provide a strong theoretical basis and candidate genes for breeding seedless S. grosvenorii species using the genetic-engineering technology.

Methods
Plant materials and microscopic monitoring of seed abortion S. grosvenorii line seedless 1 used in this study were kindly provided by Mr. Xiangjun Jiang, Guilin yiyuansheng modern biotechnology Co., Ltd, Guilin, China. The triploid seeds of LHG were collected at consecutive intervals of 5 days (5DAF, 10DAF, 15DAF, 20DAF, 25DAF, and 30DAF). Transmission electron microscopy (TEM) analyses were performed to gain insights into the submicroscopic structural changes in the tissues. The fresh tissues were xed with Servicebio's xative solution, followed by xation in 1% osmic acid and 0.1 M phosphate-buffered saline (pH 7.4) for 5 h. Specimens were then dehydrated with alcohol, embedded in SPI-Pon 812 epoxy resin monomer, thinly sectioned, stained with uranyl acetate and lead citrate, and examined under a transmission electron microscope (model HT7700; Hitachi, Japan).

Detection of endogenous plant hormone levels during seed abortion
The levels of endogenous plant hormones, such as ABA, ZR, indole-3-acetic acid (IAA), and gibberellin A3 (GA3), were determined using enzyme-linked immunosorbent assay (ELISA). Each sample was measured in parallel three times, except for the time point 15 days after owering (DAF) with four replicates.
Brie y, the tissues stored in liquid nitrogen were thawed and homogenized at a temperature of 2-8 °C. Following the manufacturer's instructions of ELISA kit provided by SHUANGYING Biological Ltd.
(Shanghai, China), standards and samples were added to the wells of microtiter plates and incubated for 30 min at 37 °C. A total volume of 50 μL of horseradish peroxidase (HRP)-labeled detection antibody was then added and mixed gently, followed by incubation for 30 min at 37 °C. The 3,3,5,5′-tetramethybezidine dihydrochloride (TMB) was used as a substrate, and the reaction between the hormones (in standard and samples) and HRP was stopped with a reaction inhibitor within 15 min. The absorbance of each hormone was measured at a wavelength of 450 nm. After obtaining the hormone concentration in the sample, the hormone content in the sample was calculated.

RNA extraction and sequencing
Three biological replicates were collected for each stage. Total RNA at each stage was extracted according to the manufacturer's instructions for the Trizol reagent kit (Invitrogen, Carlsbad, CA, USA). Eukaryotic mRNA with a polyA tail was enriched from total RNA using oligo (dT) coupled to magnetic beads and fragmented. The rst strand of cDNA was synthesized with M-MuLV reverse transcriptase, using fragmented mRNA as a template and random oligonucleotides as primers. The mRNA template chain was subsequently degraded by RNase H, and the second strand of cDNA was synthesized using dNTPs and DNA polymerase I. The double-stranded cDNA was puri ed with the QiaQuick PCR extraction kit (Qiagen, Venlo, Netherlands) and subjected to end repair, dA tailing, and adaptor ligation. cDNA fragments of approximately 200 bp were obtained using AMPure XP beads for PCR ampli cation, and the PCR product was puri ed using the AMPure XP system. The quality and quantity of the library were analyzed using a NanoPhotometer® spectrophotometer (IMPLEN, CA, USA), Qubit® RNA Assay Kit with a Qubit® 2.0 Fluorometer (Life Technologies, CA, USA), and Agilent 2100 bioanalyzer and RNase free agarose gel electrophoresis. Based on the nal libraries, a paired-end sequencing strategy (PE150) was carried out on the BGISEQ-500 platform (BGI Technology, Shenzhen, China).
Global and differential gene expression analysis After removing adapters, low-quality reads, and ambiguous reads from the raw data, the clean reads were aligned to the rRNA reference sequences (no mismatches allowed) using bowtie2 (version 2.2.8) [50]. The rRNA-depleted reads were mapped to the reference genome of S. grosvenorii using HISAT 2. 2.4 [51] with "-rna-strandness RF" and other parameters set as a default. Assembly of the aligned reads was performed using StringTie software according to the reference gff le [52,53].
The number of reads mapped to each transcript was counted for each sample and then normalized as fragments per kilobase of transcripts per million fragments mapped (FPKM). The Pearson's correlation coe cients of RNA expression between samples were calculated and principal component analysis was performed using R version 3.6.3. Differential expression analysis was performed using DEseq2 [54], and the genes with |log2FC|>1 and false discovery rate (FDR) < 0.05 were determined as differentially expressed genes (DEGs) between pairs of adjacent stages.
Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of the DEGs were performed. All DEGs were mapped to GO terms in the GO database (http://www.geneontology.org/) and KEGG database for the pathways [55,56]. Gene numbers were calculated for every term/pathway, and signi cantly enriched GO terms/KEGG pathways in DEGs compared to the genome background were de ned using a hypergeometric test. The calculated p-values were FDR corrected, and GO terms and KEGG pathways with FDR ≤ 0.05 were de ned as signi cantly enriched GO terms and KEGG pathway respectively.

Temporal analysis
Time-series analysis was performed using a short time-series expression miner (STEM) [57] (p.191). Each mRNA was assigned to the model pro le based on the expression pattern based on the correlation coe cient. The number of mRNAs assigned to each model pro le and expected to be assigned to a pro le were calculated by randomly permuting the original time point values, renormalizing the mRNA expression values, assigning mRNAs to their most closely matching model pro les, and repeating this process for a large number of permutations. The statistical signi cance of the number of mRNAs assigned to each pro le compared with the expected number was then computed.

Seed abortion associated gene co-expression network analysis
To explore the relationship between gene co-expression modules and plant hormone levels, weighted gene co-expression network analysis (WGCNA) was conducted using the WGCNA package in R (3.2.2.) [58] (p. 559). The genes with reads per kilobase per million (RPKM) > 0.3 were used for WGCNA [59], and modules were obtained using the automatic network construction function, blockwiseModules, with default settings, except that the power was 18, minModuleSize was 50, and mergeCutHeight was 0.7. A module-trait relationship analysis was also performed using the module eigengene and the levels of ABA and ZR.
For genes in each module, GO and KEGG pathway enrichment analyses were conducted to analyze the biological functions of modules as described above. In addition, the network connections among the most connected genes (topological overlap above the threshold of 0.15) for the modules with statistical signi cance were further visualized using Cytoscape

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
Raw data for this project have been submitted to the Sequence Read Archive (SRA) database of the National Center for Biotechnology Information (Accession number: PRJNA773651).

Competing interests
The authors declare that they have no competing interests.