Phenotypic variations, correlations, and ANOVA for growth period traits
Five growth period traits (DIF, DFF, FP, MT and GP) of 588 rapeseed lines were investigated in three years. Significant variation among the genotypes for five growth period traits was observed: for instance, DIF ranged from 138 d to 207 d with an average of 157.98 d in 2017, from 149 d to 174 d with an average of 157.34 d in 2018, and from 137 d to 189 d with an average of 151.87 d in 2019 (Table 1). The coefficient variation (CV) of FP and MT was the highest (over 10%) followed by DIF and DFF, and GP was the lowest. In addition, the phenotypic frequency distributions of five traits in the three years all showed approximately continuous and normal distributions, which showed that the group was suitable for association analysis (Fig. 1).To test the effects of genotype (G), environment (E) and their interactions (G´E), we conducted analysis of variance (ANOVA) for each trait (Table S1a). Significant variations were observed among environments and genotypes for all five traits. The DFF has the highest broad-sense heritability (h2) of 92.66%, the FP has relatively low heritability (74.11%). Overall, these findings indicated that all five traits were stably inherited.
To determine whether there are any relationships between five growth period traits in B. napus, the average value of three years for each trait was used for correlation analysis (Table 2). DIF showed a significant positive correlation with DFF and GP but was negatively correlated with FP and MT. DFF was significantly correlated with FP, MT, and GP with correlation coefficients of 0.148, -0.664 and 0.760, respectively. It can be concluded that a shorter flowering time corresponds to a shorter growth period and a longer maturity time.
Genome-wide association analysis for growth period traits in three years
Through a set of processes of library construction, paired-end sequencing and SNP calling, a series of 3,856,91 highly consistent and locus-specific SNPs (minor allele frequency > 0.05 and call frequencies > 0.9) were retained for the following analysis. Fig. S1 shows the density distribution of SNP markers on different chromosomes. The 3,856,91 SNPs were covered and unevenly distributed on all 19 B. napus chromosomes.
To avoid false negative associations, three general linear models (GLM), naïve, PCA and Q models, and three mixed linear models (MLM), K, Q + K and PCA+ K models were chosen to evaluate the effects of population structure (Q, PC) and relative kinship (K). According to the Q-Q plots of the six models, the MLM model can control false positives well for each trait in three years (Fig. S2). To minimize the effect of environmental variation, best linear unbiased predictor (BLUP) value for each line were also calculated for each trait using the R package lme4 [20]. Therefore, the MLM models were conducted for GWAS of five growth period traits with three years’ data and BLUP values. We mainly focused on the QTLs detected in at least two environments. In total, 146 SNP loci significantly associated (-log10(P) ≥ 5.58) with five growth-related traits were identified, including 13 related to DIF, 23 to DFF, 6 to FP, 19 to MT and 85 to GP for the whole panel of accessions (Table S2 and Fig. 2).
Identification of candidate genes for growth period traits in B. napus
Among the 146 SNP loci significantly associated with growth period traits, 60 SNPs were divided into 19 genomic regions using a haplotype block estimation, with sizes ranging from 7 bp to 270.31 kb, while the remaining 86 SNP loci were not present in the LD blocks (Table S3). Then, we obtained the genes within the same LD block or within 300 kb to either side of the significant SNPs using B. napus ‘Darmor v4.1’ as the reference genome. Finally, a total of 101 candidate genes were identified as orthologous to Arabidopsis flowering genes reported in the Flowering Interactive Database (FLOR-ID), most of which were involved in six flowering pathways of aging, autonomous pathway, vernalization, photoperiod, GA, and circadian clock; other genes functioned as flower development and meristem identity and flowering time integrator (Table S4). The flowering time integrators FLOWERING LOCUS C (FLC) control the transition from vegetative to reproductive meristem by integrating the signals from six pathways and then precisely regulating the expression of specific flower meristem identity genes APETALA2 (AP2) and FRUITFULL (FUL) [21].
By comparing the SNP regions, 28 SNP loci (within 300 kb) associated with growth period traits, were located in or near the QTL regions identified in previous studies: 23 were reported by Raman et al., seven were reported by Wang et al., and four were reported by Zhou et al. (Table S3 and Fig. 3) [3, 14, 22, 23]. Among these loci, S1_5075025, S2_5719334 and S15_5922896 were reported in two studies, and S3_13708544 and S7_10803897 were detected simultaneously in three studies. At the same time, the candidate genes we identified in this study were also reported in previous studies [14, 22-28]. The flowering time integrator (BnaFLC.C02, BnaFUL.A03 and BnaSVP.A09) were reported in three or more previous studies (Table S3). Overall, the comparison of QTLs and candidate genes further validates the reliability of the GWAS mapping and help us mine the novel QTL.
QTL linkage mapping of growth period traits in two years
The RIL population (GH06 ´ P174) grown in 2017 and 2018 was evaluated for DIF, DFF, FP, MT and GP phenotypes and then used for QTL analysis. The flowering time of GH06 was later than that of P174 in the two investigated environments. A wide range of variation (Table 3) and the normal phenotypic distribution for the growth period traits (Fig. S3) were observed in the RIL population, indicating a quantitative inheritance suitable for QTL mapping. ANOVA was performed on the phenotypic data from two years, and h2 ranged from 49% (DFF) to 53% (GP) (Table S1b).
The LOD score plots across 19 linkage groups were shown in Fig. S4. A total of 17, 25, 7, 21 and 13 QTLs for DIF, DFF, FP, MT and GP were detected in two years and were located on all B. napus chromosomes except C02 (Table S5 and Fig. 4). These QTLs for DIF, DFF, FP, MT and GP explained ~17.48%, ~15.92%, ~7.95%, ~18.30% and ~9.43% of the phenotypic variance, and the additive effect varied from -0.43 to 2.30, -1.12 to 1.79, -0.83 to 0.71, -2.09 to 1.13, and from -0.70 to 1.10, respectively. The QTLs have the overlapped confidence intervals with the same direction of additive effect are considered to be the same QTL. Among the 83 QTLs, 14 QTLs were associated with at least two traits. The QTL q18DIF.A01-2, q18DFF.A01 and q18MT.A01-1 have the same QTL region and explained the highest total phenotypic variation of each trait. However, the additive values of DIF and DFF were positive, and the additive value of MT was negative, which indicates that the QTL from GH06 increased DIF and DFF, while the QTL from P174 increased MT. The QTL q17DIF.A06-1, q17DFF.A06-2, q17MT.A06-3 and the QTL q17DIF.A07-2, q17DFF.A07-2, q17FP.A07, q17MT.A07-3 also has the same region and opposite additive effect. These results indicate that the same QTL may have opposite effects on the different traits. Based on the rapeseed genome annotation and the physical locations of these QTL regions, a total of 74 Arabidopsis flowering homolog genes were identified (Table S6).
Identification of DEGs for floral transition and flower development by RNA-sequencing
To identify candidate DEG transcripts controlling flowering time, we sequenced four RNA samples from leaves of early-flowering cultivar 18Z134 and late-flowering cultivar 18Z88 sampled at vegetative and reproductive development stages (EV, ER, LV and LR). Every sample had two replicates. The flowering time differences between 18Z134 and 18Z88 are shown in Table S7. The total reads, mapped and unique mapped reads to the reference B. napus genome are shown in Table S8. After removing low-quality sequences, 34,862,287 (86.32%) ~ 37,786,256 (90.79%) clean reads were successfully mapped to the genome using TopHat. Of these clean reads, 33,089,439 (81.93%) ~ 38,684,877 (85.93%) were uniquely mapped.
Two types of comparisons were performed: (1) to identify DEGs between the two extreme lines at each development stage: LR vs. ER and LV vs. EV; (2) to identify expression changes between different stages in each line: EV vs. ER and LV vs. LR. The false discovery rate (FDR) ≤ 0.05 and absolute value of |log2 (fold change) | ≥ 1 were used as thresholds to judge the DEGs between the two groups. In total, 3727 DEGs were identified in LR vs. ER (2421 upregulated, 1306 downregulated), 2327 DEGs were identified in LV vs. EV (1557 upregulated, 770 downregulated), 3750 DEGs were identified in EV vs. ER (2135 upregulated, 1615 downregulated), and 2038 DEGs were identified in LV vs. LR (1281 upregulated, 757 downregulated). In addition, 1662 and 683 DEGs were common to LR vs. ER/LV vs. EV and EV vs. ER/LV vs. LR, respectively (Fig. S5 and Table S9).
GO and KEGG analysis of differentially expressed genes
As the first criterion, we analyzed DEGs between vegetative and reproductive stages (EV vs. ER and LV vs. LR) to identify the key phase transition-associated genes. The 683 common DEGs between EV vs. ER and LV vs. LR were subjected to an enrichment analysis for GO annotation terms. The top 20 significantly enriched GO terms are shown in Fig. S6a and Table S10a. Among these terms, the salicylic acid biosynthetic process (GO:0009697), response to cold (GO:0009409), negative regulation of floral organ abscission (GO:0060862), intracellular auxin transport (GO:0080162) and long-day photoperiodism, flowering (GO:0048574) were involved in the phase transition process. To further determine the metabolic pathways in the phase transition process, we performed KEGG enrichment analysis (Fig. S6b and Table S10b). Circadian rhythm–plant (ko04712) and plant hormone signal transduction (ko04075) were significantly enriched and participate in plant development and flowering processes.
As the second criterion, we searched for DEGs between early-flowering and late-flowering lines that result in flowering time variation. The 2327 DEGs between LV vs. EV and 3727 DEGs between LR vs. ER were subjected to GO and KEGG enrichment analysis (Fig. S7). DEGs in the circadian rhythm were enriched in both GO and KEGG analysis between LR vs. ER, suggesting that the different expression of genes involved in circadian rhythm could result in flowering differences.
Identification of TFs and hormone-related flowering genes involved in floral transition and flower development
In this study, fifty transcription factor (TF)-encoding DEGs, including the basic helix-loop-helix (bHLH; five members), ERF (eight members), MIKC_MADS (eight members), NAC (three members) and WRKY (four members), were identified in the early and late-flowering lines at two developmental stages. Fig. S8a shows the overall expression trend in the four samples, and most TF families were significantly upregulated in the reproductive stages.
In addition, 20 genes in the ABA signaling pathway, 23 genes in the auxin signaling pathway, five genes in the GA signaling pathway, five genes in the cytokinin signaling pathway, four genes in the JA signaling pathway and five genes in the SA signaling pathway were identified in our transcriptome data. The genes involved in the SA signaling were upregulated in the reproductive stage. While the genes from other hormone signaling pathways have no obvious expression trend (Fig. S8b).
Identification of floral transition- and flower development-related genes
According to the annotation of unigenes in Arabidopsis, a total of 125 DEGs related to flowering time were identified in the comparisons LR vs. ER, LV vs. EV, EV vs. ER and LV vs. LR (Table S11). These genes mainly included the photoperiod, circadian rhythms, vernalization, GA signaling, aging, flowering time integrator and flower meristem identity genes, and the expression value of these genes in the four samples is shown in Fig. 5.
In the photoperiod pathway, 36 unigenes homologous to EARLY FLOWERING 3 (ELF3), EARLY FLOWERING 4 (ELF4), CONSTANS-like 9 (COL9), cycling DOF factor 1 (CDF1) and LATE ELONGATED HYPOCOTYL (LHY) were identified. The circadian rhythm, as an internal timekeeper, controls daily and seasonal changes, is an important part of the photoperiod pathway and plays a key role in controlling plant flowering [29]. Circadian clock associated 1 (CCA1) is a key component of the Arabidopsis circadian oscillator, and it interacts with LATE ELONGATED HYPOCOTYL (LHY) and TIMING OF CAB EXPRESSION 1 (TOC1) to inhibit transcription of the Evening Complex (EC) proteins ELF4 and ELF3. In the circadian clock pathway, two CCA1 genes, three TOC1 genes, two pseudo response regulator 7 (PRR7) genes and three pseudo response regulator 9 (PRR9) genes were included. For the aging pathway, eight unigenes, including SQUAMOSA PROMOTER-BINDING-LIKE PROTEIN 4 (SPL4), SQUAMOSA PROMOTER-BINDING-LIKE PROTEIN 5 (SPL5), SQUAMOSA PROMOTER-BINDING-LIKE PROTEIN 15 (SPL15) and TARGET OF EARLY ACTIVATION TAGGED 1(TOE1) were found. Seven homologous genes of the GA signaling pathway were also identified, including GA2 oxidase (GA2ox, three unigenes), GA2 oxidase 1 (GA2ox1), gibberellin 20-oxidase 3 (GA20OX3), DELLA protein RGA-like 2 (RGL2), and the GA receptor GA INSENSITIVE DWARF1A (GID1A). Additionally, five unigenes were annotated in the vernalization pathway, which included the AGAMOUS-like 19 (AGL19), INDUCER OF CBF EXPRESSION 1 (ICE1), vernalization5/VIN3-like (VEL1) and REDUCED VERNALIZATION RESPONSE 1 (VRN1). Furthermore, ten floral pathway integrator genes related to FLC, FLOWERING LOCUS T (FT), AGAMOUS-like 20 (AGL20), SHORT VEGETATIVE PHASE (SVP) and TWIN SISTER OF FT (TSF) and ten flowering meristem-identifying genes, such as AGAMOUS-like 8 (AGL8), AGAMOUS-like 14 (AGL14), AGAMOUS-like 24 (AGL24) and APETALA 2 (AP2), were all identified in our transcriptome database. All these DEGs are important resources for the further study of floral transition and floral development in B. napus.
Identification of candidates for growth period traits by integrating QTLs with DEGs
To further understand the roles of these DEGs in regulating floral transition and flower development, the DEGs were integrated with the significant QTLs identified in either association analysis or linkage mapping. Therefore, the DEGs were considered candidate genes if they were located within the confidence interval (CI) of the QTLs identified by GWAS, linkage mapping, or both. According to the above criteria, a total of twelve DEGs located in the CI of sixteen significant loci were identified as candidate genes of growth period traits. The loci were BnaC04g15640D, BnaA03g40160D, BnaC06g15270D, BnaC09g05250D, BnaA04g22640D, BnaA05g05560D, BnaA05g05000D, BnaC09g48370D, BnaC02g36310D, BnaC09g23670D, BnaA03g39820D and BnaA05g05010D. The twelve DEGs were known to regulate floral development by affecting aging, photoperiod/circadian clock, GA, vernalization, flower development and meristem identity and flowering time integrator (Table 4 and Fig. 6). BnaAGL6.A05, which encodes a MADS-box transcription factor, negatively regulates the FLC/MAF clade genes and positively regulates FT in Arabidopsis [30]. Among them, BnaC06g15270D (BnaLNK2.C06), BnaC09g48370D (BnaGA20OX3.C09) and BnaA03g39820D (BnaFUL.A03) were also identified as EDGs between BBCH20 and BBCH50 (vernalized and nonvernalized) associated with flowering time and yield QTLs [27]. In addition, an ortholog of the circadian clock pathway, BnaTOC1.A03 and floral meristem identity gene BnaFUL.A03 were identified in both GWAS and linkage analysis [31]. Thus, we considered these five genes to be our most promising candidate genes for future prospects.
To analyze the polymorphism and explore their relationship with five growth traits, the sequence variation was analyzed in the genomic sequence including the complete coding sequence and 2000 bp upstream of the ATG translational start codon in the twelve candidate genes in the 558 lines based on our resequencing data. Among twelve genes, seven genes had polymorphic sites: one, fifteen, four, five, one, three and ten SNPs were identified in the BnaFUL.A03, BnaELF4.A04, BnaSOC1.A05, BnaELF4.A05, BnaLNK2.C06, BnaTOC1.C09 and BnaGA20OX3.C09, respectively, and formed different types of haplotypes. Detailed haplotype information is listed in Table S12.
The association of each haplotype with five growth periods was then analyzed in the GWAS population. We compared the phenotypic variations of different haplotypes for the above agronomic traits (Fig. S9, Fig. S10, and Table S12). In general, for BnaELF.A04, accessions with BnaELF.A04-Haplc accounted for 94.2% and showed a significantly shorter MT period in 2018 and 2019 compared to BnaELF.A04-Hapla. For BnaELF.A05, varieties with BnaELF.A05-Hapla accounted for 90.8% and showed the shortest FP compared to the other two haplotypes in 2017 and 2019. For BnaSOC1.A05, most accessions have BnaSOC1.A05-Haplb (97.5%) and exhibited shorter DFF and longer MT over three years compared to the BnaSOC1.A05-Hapla. For BnaKNK2.C06, most varieties have BnaKNK2.C06-Hapla (93.7%) and showed lower DFF and GP than BnaKNK2.C06-Haplb. For BnaTOC1.C09, accessions with BnaTOC1.C09-Haplc accounted for 94.1% and had lower DFF, lower FP and longer MT over three years. For BnaGA20OX3.C09, the two haplotypes did not exhibit significant differences between the five growth period traits. Thus, the haplotype BnaSOC1.A05-Haplb and BnaLNK2.C06-Hapla showed more favorable phenotypic traits and may be used for further earliness molecular breeding.
Confirmation of candidate gene expression using qRT- PCR
To verify the accuracy and reproducibility of the transcriptome analysis, nine candidate DEGS listed in Table 4 were selected for qRT-PCR analysis. As shown in Fig. 7, the expression of nine genes were consistent with the RNA-Seq results in the four samples. These results demonstrated the reliability of the RNA-sequencing results.