Unique Transcriptome And Gene Expression Analysis of Flag Leaves of A Super-Hybrid Rice WFYT025

Background: The health and physiology of ag leaves are closely related to rice yield, and ag leaves play an important role in providing photosynthetic products during grain lling, many breeding studies have tried to improve the performance of ag leaves. However, there are few studies on the heterosis of rice ag leaves up to now. Results: Thus, the present research is focused on the ag leaves heterosis of a widely used late-cropping indica super hybrid rice combination WFYT025 in China using a high-throughput next-generation RNA-seq strategy under different environment with two stages, trying to nd some genes related to photosynthesis, transpiration, and development of seeds. Mid-parent heterosis (MPH) and higher parent heterosis (HPH) were estimated for the heterosis of ag leaf. Under the environment of middle rice, the number of genes up-regulated in CHT025, WFB and WFYT025 were 892,1,273 and 819, down-regulated in CHT025, WFB and WFYT025 were 616,1934 and 2196, respectively. Among the SDGhps on the rst day after owering, 10.9% had a dominant effect, 41.81% had a partial dominant effect, 22.07% had an additive effect and the remaining 25.22% had an over-dominant effect. Meanwhile, on the tenth day after owering, there were 491 genes, accounting for 27.16%, showed over-dominance; 222 genes, accounting for 12.28%, showed dominance; 760 genes, accounting for 42.04%, showed partial dominance; and 335 genes, accounting for 18.52%, showed additive effect. Conclusion: The co-expressed gene sets via weighted gene co-expression network analysis (WGCNA) were identied, and total of 5,000 highly expressed genes were divided into 24 co-expression groups. In the two stages, we found 9 identical transcription factors. Except for 5 reported TFs, the other 4 TFs may play an important role in grain number and photosynthesis heterosis.


Background
As a principal organ in rice, leaves are involved in many fundamental physiological functions such as photosynthesis and transpiration [1]. Among the rice and other major cereals, the uppermost three leaves, especially the ag leaf, are the main source of the carbohydrates that eventually accumulate in the grains [2,3,4,5]. The top three leaves on the stem are the primary source of carbohydrate production.
Particularly, the ag leaves produce over 50% of the carbohydrates that are accumulated into grains [6], which nally in uences grain yield. The phenotype of the ag leaves, in particular its size and angle, is regarded as the major determinant of plant architecture and strongly affects high-yield performance, such as 1,000-grain weight [7,8].
Previous studies on rice leaf phenotype mainly focused on the location of quantitative trait loci (QTL). Few genes related to leaf phenotype were identi ed or cloned. Longitudinal cell division, cell elongation and cell arrangement determine rice leaf elongation. For example, the dwarf mutant dwarf and gladius leaf 1 (dgl1) have shorter leaves and rounded tips compared to wild type (WT). The dgl1 phenotype was abnormal due to poor elongation of longitudinal epidermal cells and coarse cell deformation [9]. The variation of leaf width in rice was mainly caused by changes in vascular bundles and cell division. The NARROW LEAF 1 (NAL1) mutant has narrower leaves than the WT, and its abnormal phenotype is caused by the decrease in the number of longitudinal veins. NAL1 affects vascular morphology and polar auxin transport in rice, and plays an important role in controlling lateral leaf growth [10]. In NAL 2/3 double mutant, narrowed curly leaves are caused by less longitudinal veins and reduced lateral axial branch. NAL2 and NAL3 are paralogs, encoding the same WUSCHEL-RELATED HOMEOBOX 3A (OsWOX3A; OsNS) transcriptional activator, which is related to organ development, tiller development, and so on [11]. The leaf width phenotype of the NAL1 mutant is reduced, and microscopic analysis shows that compared with WT, the leaves of the NAL1 mutant have fewer longitudinal veins [13]. The ABNORMAL VASCULAR BUNDLES (AVB) mutant has narrower leaves than the wild type. The phenotypic abnormality of AVB leaves is caused by a decrease in the vascular bundles of gas producing organs [14].
High-throughput RNA sequencing has been used to search for heterosis in rice to avoid defects of methods with low throughput, high cost, low sensitivity, clonal preference, and high background noise. Several studies have investigated gene transcript abundance during leaf senescence in crops such as cotton and maize [15,16] and in a range tissues during grain lling in rice [17], including ag leaf tissue, and transcript data are publicly available. However, there are few studies on the heterosis of rice ag leaves and the mechanism of the ag leaves to improve yields is not well understood. It is very necessary to study the heterosis of rice ag leaves. In this study, we focused on a super rice variety, WFYT025, which is widely used in China. Therefore, we took the ag leaves of WFYT025 and its two parents for high-throughput transcriptome sequencing, trying to nd some genes related to photosynthesis or transpiration and development of seeds. The aim of our study is (1) we investigated the length, width, and leaf area of ag leaves for WFYT025 and its parents. MPH and HPH were estimated for the heterosis of ag leaf. (2) We selected the ag leaves of hybrid rice WFYT025 and its parents for transcriptome sequencing, the rst day and the tenth days after owering in the environment of early rice and middle rice, respectively. (3) we referred to further investigate the gene regulatory network. Several major subnetworks were found to represents interactions among genes with similar expression pro les via gene regulatory network analysis.

Plant materials and growth conditions
The hybrid WFYT025 along with its parental lines Changhui T025 (CHT025) and Wufeng B (WFB) were planted in the experimental eld of Jiangxi Agricultural University.WFYT025 is a super-hybrid rice combination derived from the cross between female parent WFB and male parent CHT025. WFYT025 and the two parents were sown at the experimental plot in Jiangxi Agricultural University in a completely randomized block design with three replications in summer 2019. Each plot consisted of 40 rows, with each row consisting of 10 plants, each separated from its neighbour by 20 cm. Crop management followed normal procedures for rice. These three lines were selected in this study to measure phenotypic traits and conduct transcriptome analyses [18]. We took the grains and sword leaves from the rst day, the fth day, the tenth day, the fteenth day and the twenty-rst day after owering of rice. By measuring the 1000 grain weight of the grains, we nally selected the sword leaves from the rst day and the tenth day for sequencing. The ag leaves were collected and stored at − 80°C for RNA-Seq analysis, and each sample had at least three biological replications to minimize systematic errors. Pathway enrichment analysis KEGG (Analysis of Kyoto Encyclopedia of Genes and Genomes) is the major public pathway-related database. Pathway enrichment analysis identi ed signi cantly enriched metabolic pathways or signal transduction pathways in DEGs (different expression genes) comparing with the whole genome background [19]. Pathway enrichment analysis was performed using the OmicShare tools, a free online platform for data analysis (www.omicshare.com/tools). Signi cantly enriched pathways in DEGs comparing to the genome background were de ned by hypergeometric test. The calculated p-value was gone through FDR Correction, taking FDR ≤ 0.05 as a threshold. Pathways meeting this condition were de ned as signi cantly enriched pathways in DEGs.

GO enrichment analysis
Gene Ontology (GO)is an international standardized gene functional classi cation system which offers a dynamic-updated controlled vocabulary and a strictly de ned concept to comprehensively describe properties of genes and their products in any organism. GO has three ontologies: molecular function, cellular component, and biological process. The basic unit of GO is GO-term. Each GO-term belongs to a type of ontology.
GO enrichment analysis provides all GO terms that signi cantly enriched in DEGs comparing to the genome background, lter the DEGs that correspond to biological functions. GO enrichment analysis was performed using the OmicShare tools, a free online platform for data analysis (www.omicshare.com/tools).Firstly all DEGs were mapped to GO terms in the Gene Ontology database (http://www.geneontology.org/), gene numbers were calculated for every term, signi cantly enriched GO terms in DEGs comparing to the genome background were de ned by hypergeometric test [20]. The calculated p-value was gone through FDR Correction, taking FDR ≤ 0.05 as a threshold. GO terms meeting this condition were de ned as signi cantly enriched GO terms in DEGs. This analysis was able to recognize the main biological functions that DEGs exercise.

Quantitative real-time PCR (qRT-PCR) validation
To validate the RNA-seq results, different expression patterns of several genes were con rmed by quantitative real-time RT-PCR (qRT-PCR) [21]. For qRT-PCR, 1 µg of total RNA was used to synthesized cDNA using PrimeScript™ RT reagent Kit (Perfect Real Time) (TaKaRa). The qRT-PCR was carried out using SYBR® Premix Ex Taq II (Tli RNaseH Plus; TAKARA BIO Inc., Shiga, Japan) and determined in LightCycler 480 (Roche, Basel, Switzerland) according to the manufacturer's instructions. The qRT-PCR reactions were ampli ed for 95℃ for 30 s, followed by 40 cycles of 95℃ for 5 s, 55℃ for 30 s and 72℃ for 30 s. All reactions were performed with three independent biological replicates for each sample and three technical replicates for each biological replicate were analyzed. The relative gene expression was calculated by the software of ABI7500 Real-Time PCR System using the 2 −∆∆Ct method.

The mode of inheritance analysis
For statistical analysis, the analysis of variance (ANOVA) was usually by the model: y = u+(GA)+(GD)+ (SR) + e, where y is the acquired gene expression, u is the overall mean, GA is the additive effect, GD is the dominant effect, SR is the replication effect, and e is the residual error [18]. Hp=[d]/[a], referred to as the dominance ratio or potency (where [a] and [d] represent GA and GD, respectively), was also calculated to measure the non-additivity of the F1 hybrid relative to its parents [22]. Considering gene expression levels as quantitative traits, we adopted traditional quantitative genetic parameters, such as composite additive effect [

Normalization of gene expression levels and identi cation of differentially expressed genes
Sequencing reads were mapped to the reference sequences. The expression level of each gene was measured by fragments per kilobase of exon model per Million mapped reads (FPKM). To determine the time-dependent transcriptional differences between early rice and middle rice, the differential expression genes (DEGs) at 1 and 10 days after anthesis were determined by comparing the expression levels. To correct for multiple testing, the false discovery rate (FDR) was calculated to adjust the threshold of pvalue [25]. The standard of different expression between compare is a minimal 2-fold difference in expression (|log2 Ratio| ≥ 1) and a FDR ≤ 0.005 [23].
Weighted gene co-expression network analysis (WGCNA) The gene co-expression network analysis used the raw expression counts (RAW counts) of all Rice reference genes in the RNA-seq data of the 36 leaf groups obtained from the previous analysis. Firstly, the genes with low expression were eliminated. The elimination standard was that if the genes were not expressed in more than 80% of the samples, the genes with low expression were considered. Then select median absolute deviation from the genes with high expression(mad) 5000 genes of maximum value for total express network analysis using WGCNA (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/).In the process of WGCNA analysis, 20 soft power was selected. GO enrichment analysis through the agriGOv2 (http://systemsbiology.cau.edu.cn/agriGOv2/), its parameters as follows: Statistical Test Method as HyperGeometry, Multi_ Test Adjustment Method as Hochberg FDR, Signi cance level as "0.05" and Minimum number of mapping entries as "5".

Results
Phenotype analysis for WFYT025 and its parents In this study, we investigated the length, width, and leaf area of ag leaves for WFYT025 and its parents (Table 1). In addition, the 1,000-grain weight of WFYT025 and its two parents on the rst day and the tenth days after owering were measured (Table 1). In the environment of early rice, the 1,000-grain weight of WFYT025 was 7.30g, WFB was 6.70g, and CHT025 was only 5.68g on the rst day. On the tenth day after owering, of WFYT025 was 20.42g, WFB was 24.86g, and CHT025 was only 14.77g. As the same time, the fresh weight of WFB and its progeny WFYT025 was signi cantly higher than that of CHT025 both on the rst and tenth day after owering. RNA sequencing of WFYT025 and its parents To further explore the possible genes and mechanisms involved in the heterosis of WFYT025. We selected the ag leaves of hybrid rice WFYT025 and its parents for transcriptome sequencing, the rst day and the tenth days after owering in the environment of early rice and middle rice, respectively.
As a result, a total of 1,725.9 million reads were generated, and an average of 47.94 million reads were generated for each sample. The total number of valid reads is 155,921 million, and the average valid read for each sample is 43.31 million (Supplementary Table S1). We randomly selected 12 genes using the speci c primers (Supplementary Table S2), and validated their accuracy and reproducibility using qRT-PCR ( Supplementary Fig. S1).

Analysis of differentially expressed genes
To analyze genes expression in different stages under different environment, genes expression level was mainly measured by FPKM (Fragments Per Kilobase of exon model Per Million mapped reads) or RPKM (reads Per Kilobase of exon model Per Million mapped reads) (Fig. 1A). In the leaves of CHT025 under the early rice environment, the number of DEGs (p < 0.05) on the rst day and the tenth days were 3,072 in total, among which 1,937 were upregulated and 1,135 were downregulated. At the same time, the number of DEGs in WFB leaves in the rst day and the tenth days were 5,039 in total, among which 3,244 were upregulated and 1,795 were downregulated. In the leaves of WFYT025, the number of DEGs in the rst day and the tenth days were 3,757, among which 1,937 were upregulated and 1,135 were downregulated (Fig. 1B). Under the environment of middle rice, the number of genes up-regulated in CHT025, WFB and WFYT025 were 892, 1,273 and 819, respectively, down-regulated in CHT025, WFB and WFYT025 were 616, 1,934 and 2,196, respectively (Fig. 1C).
O enrichment analysis of hybrid rice and its parents GO enrichment analysis of these DEGs has identi ed several signi cant biological processes at different stages under different environment. In the environment of early rice, "chloroplast stroma" and "oxidationreduction" process were signi cant enriched in CHT025 ( Fig. 2A). Meanwhile, "closed chloroplast", "ATP binding", "cytosol" and "mitochondrion" were signi cant enriched in WFB (Fig. 2B). At the same time, "cytosol", "chloroplast" and "oxidation-reduction" process were signi cant enriched in WFYT025( Supplementary Fig. S2A). In the environment of middle rice, "protein serine/threonine kinase activity", "protein phosphorylation", "extracellular region" and "plasma membrane" were signi cant enriched in CHT025 (Supplementary Fig. S2B). "Closed protein serine/threonine kinase activity", "ATP binding", "plasma membrane" and "defense response" were signi cant enriched in WFB ( Supplementary  Fig. S2C). Meanwhile, "ATP binding", "integral component of membrane", "plasma membrane" and "protein serine/threonine kinase activity" were signi cant enriched in WFYT025 (Supplementary Fig. S2D). This suggests that the considerable differences in ag leaves between the rst day and the tenth day, may be related to photosynthetic e ciency.

Identi cation of gene co-expression modules of ag leaf of WFYT025
After obtaining the gene expression pro le of ag leaves, we referred to further investigate the gene regulatory network. Several major sub-networks were found to represents interactions among genes with similar expression pro les via gene regulatory network analysis, which were designated as co-expression modules [27]. The co-expressed gene sets via WGCNA were identi ed, and total of 5,000 highly expressed genes were divided into 24 co-expression groups (Fig. 4). After that, we further analyzed the correlation between the gene expression pro le of the co-expression module and the three phenotypic data (TKW; length; width) ( Supplementary Fig. S4). Since the aim is to nd genes related to grain development, the co-expression module with high correlation with TKW (the color of the module is Midnightblue, and the gene expression pro le in this module is moderately negatively correlated with the thousand-grain weight phenotype) was focused on. We found the overall gene expression pro le of the co-expression module MidnightBlue, which had the highest correlation with TKW, and the co-expression module contained a total of 106 genes ( Supplementary Fig. S5). The GO enrichment analysis of these genes revealed several signi cant biological processes. Modules associated with TKW showed biological processes that are associated to "protein amino acid phosphorylation" and "protein modi cation process" and so on (Fig. 5).
Overall, these modules represent the speci c gene regulatory processes at each stage and are the indicators of operated programs of ag leaves development.

Discussion
In the cycle of plant life, leaves play a signi cant role in grain yield, ag leaves length has been recognized as an important factor that determines plant type for high-yield potential in rice [28]. However, most of the previous studies on rice yield focused on rice grains, and few focused on the effect of rice ag leaves on rice yield. In our study, we investigated the relationship between transcriptional pro les and heterosis in super hybrid rice WFYT025 by RNA-Seq.
The genetic basis of heterosis SDG (signi cant difference gene) was de ned as the signi cant difference gene between rst days and tenth days after anthesis. SDG S (signi cant difference genes) were de ned as the signi cant difference genes between rst days and tenth days after anthesis. Meanwhile, SDG that existed in hybrid but not in parents were de ned as SDG hp .
We have been able to identify a number of SDG hp s underlying ag leaves between hybrid WFYT025 and paternal line CHT025, con rming the suggestion that heterosis is a polygenic phenomenon [29]. Among the SDG hp s on the rst day after owering, 10.9% had a dominant effect, 41.81% had a partial dominant effect, 22.07% had an additive effect and the remaining 25.22% had an over-dominant effect. Meanwhile, on the tenth day after owering, there were 491 genes, accounting for 27.16%, showed over-dominance; 222 genes, accounting for 12.28%, showed dominance; 760 genes, accounting for 42.04%, showed partial dominance; and 335 genes, accounting for 18.52%, showed additive effect. Thus, partially dominance was the major contributor to the ag leaf heterosis of WFYT025 (Fig. 6).
However, on tenth day after owering, 43.19% of SDG hp s (between hybrid WFYT025 and paternal line CHT025 was partially dominance, 16.65% was dominance, 32.17% was over-dominance, and the remaining 7.99% was additive effect on the rst day after anthesis. In contrast, on the 10th day after anthesis, 52.83% of the genes were super dominant, 26.54% partial dominant, 9.56% dominant and 11.07% additive. It may be caused by too high temperature, which is inconsistent with the results of other environments (Fig. 7).

Transcription factors probably underlying heterosis
It is well known that the analysis of expression patterns of important transcription factors is helpful to the study of heterosis in rice. In CHT025, we found 13 transcription factors of WRKY family in SDG hp s under the environment of early rice and 16 genes of WRKY family in SDG hp s of under the environment of middle rice. Transcription factors including WRKY3, WRKY68 and WRKY77 were found in both environments. Recently, a research reported that WRKY family genes are involved in the process of seed growth and matching in Arabidopsis and rice [30]. However, the expression level of WRKY68 in WFYT025 and its parents in the tenth days after owering was signi cantly higher than that in the rst day after owering. The expression level of WRKY68 in WFYT025 was signi cantly higher than that of its parents. In the two stages, we found 9 identical transcription factors. Except for 5 reported TFs, the other 4 TFs may play an important role in grain number and photosynthesis heterosis.
The key metabolic pathways of heterosis In order to eliminate the impact of environment, we studied the metabolic pathway with same SDG hp (SSDG hp ) in two environments. There were 312 SSDG hp s in total. These SSDG hp s were mainly concentrated in the phosphorus metallic process, phosphorylation, and plasma membrane and so on. During senescence in annual crop plants, P (phosphorus) is mobilized from leaves and other vegetative tissues and translocated to developing seeds, which are a strong P sink during the reproductive growth phase [31,32,33]. Previous studies have shown that developing seeds are strong sinks for P, and the remobilization of P from vegetative tissues results in insu cient P for other processes necessary for continued growth such as photosynthesis. In rice (Oryzasativa L.), P is predominantly mobilised from leaves during the later stages of grain lling [34,35]. This might be the reason why the lling speed of rice is faster from the rst day to the tenth day after owering than from the eleventh day to the twenty-rst

Declarations
Ethics approval and consent to participate WFYT025 along with its parental lines Changhui T025 (CHT025) and Wufeng B (WFB) obtained from the native cultival line and all plants were grown in the test elds of Jiangxi Agriculture University. Sampling of plant materials were performed in compliance with institutional, national, and international guidelines.

Consent for publication
Written informed consent for publication was obtained from all participants.
Availability of data and materials GSE175967, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE175967 provides access to all of data

Competing interest
The authors declare that they have no con ict of interest.     Analysis results of 5000 genes in co expression module. Go enrichment analysis of midlightblue module about 106 genes in the co expression module, showed statistically signi cant GO term (FDR < 0.05). The x-axis indicates the level of statistical signi cance, and the y-axis indicates the signi cantly enriched GO term.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplementarymaterials1.docx