Analysis of the spatio-temporal dynamics of gene expression during initiation and elongation of fiber development
We first obtained the spatiotemporal overview of the early stages of cotton fiber development using in paraffin-embedded sections of cotton ovules sampled at -1.5 DPA, -1 DPA, 0 DPA, 1 DPA, 3 DPA, and 5 DPA (Fig. 1a). To gain insight into the mechanisms underlying cotton fiber development, ST was used to explore the overall spatiotemporal dynamics of gene expression during fiber development. Specifically, ST was conducted on cotton bolls at -1.5 DPA, -1 DPA, 0 DPA, 1 DPA, 3 DPA and 5 DPA (Fig. 2). The dataset comprised 26,140 spots (each spot containing 10-20 cells), and a total of 11,696 genes were detected, with an average of 3,934 genes and 6,332 unique transcripts per spot (Supplementary information, Table S1). Quality control of the sample data was conducted as described in the ‘Materials and Methods’ section (Supplementary information, Fig. S1a-c). Next, we performed a t-distributed stochastic neighbor embedding (t-SNE) analysis on the genes in all spots at all six developmental stages and obtained 19 cell clusters (Fig. 2b). We mapped the spots to their original coordinates in the tissue sections and annotated them in detail (Fig. 2c). The 19 clusters were mapped based on the specific location distribution of each cluster in tissue sections and the function of identified marker genes. The resulting clusters corresponded to specific anatomical regions in all six bolls of DPA (Fig. 1a). ‘Mesocarp/Diaphragm cells’ (clusters 1, 2, 6, 9 and 14) appeared in the largest number of stages (Supplementary information, Table S2), while ‘fiber cells’ (clusters 3, 10 and 19) mainly appeared in 3 DPA and 5 DPA boll slices (Fig. 2b and 2c). By analyzing differentially expressed genes (DEGs) in the spatial transcriptomes, we identified several key marker genes (Fig. 2d). For example, the Arabidopsis homologous genes gretchen hagen 3 (GH3.6.2) (Gh_A03G188900) and subtilase 4.12 (SBT4.14) (Gh_D11G090900), and the Solanum lycopersicum homologous gene alpha dioxygenase 2 (DOX2) (Gh_A09G193100) were found to be highly expressed in ‘integument_1’ (cluster 4) (Fig. 2d). GhGH3, AtSBT4.14, and alpha DOX2 were identified to play a role in auxin metabolism, stress responses, and lipid synthesis, respectively 34-36. The sugars will eventually be exported transporters 15 (SWEET15) (Gh_D04G160000) was found to be highly expressed in ‘nucellus’ (clusters 5 and 13) (Fig. 2d). Two consecutive field experiments showed that, the GhSWEET15 up-regulated fibers became thinner, longer and stronger than the wild-type control, showing that SWEET15 promotes fiber elongation 37 (Fig. 2d). The Arabidopsis homologous gene brassinosteroid (BR) enhanced expression 3 (BEE3) (Gh_A09G062900) was highly expressed in ‘placenta’ (cluster 11) (Fig. 2d). The Arabidopsis homologous gene GA-stimulated 14.1 (GASA14.1) (Gh_A07G244400) was highly expressed in ‘integument_2’ (cluster 8) and has been reported to play a role in the regulation of the accumulation of ROS 38. The Arabidopsis homologous genes 3-ketoacyl-CoA synthase 19.4 (KCS19.4) (Gh_A10G219300) and high expression of osmotically responsive 3.7 (HOS3.7) (Gh_D01G000600) were found to be highly expressed in ‘fiber cells’ (clusters 3 and 10) at two different time points in the elongation stages of fiber cells (3 DPA and 5 DPA) (Fig. 2f). KCS19 is a member of the same gene family as KCS1 in Arabidopsis, and has been demonstrated to play a role in increasing the level of very-long-chain fatty acids (VLCFAs) and improving the cold tolerance of seedlings 39. Gh_D12G200300 is an ortholog of Arabidopsis marker gene basic leucine -zipper 43.11 (bZIP43.11) showing a high expression level in ‘fiber cells_3’ (cluster 19). The bZIP43.11 was reported to play a role in the regulation of bHLH109 during the induction of somatic embryogenesis (SE) 40.
Next, we identified DEGs in each cell cluster (Supplementary information, Table S3). The expression levels of the top 10 DEGs are shown as a heatmap (Fig. 2d). Gene Ontology (GO) enrichment analysis of the DEGs in each of the clusters indicated that terms associated with ‘fatty acid oxidation’ and ‘fatty acid beta-oxidation’ were associated with ‘integument_1’ (cluster 4), ‘exocarp_1’ (cluster 12), and ‘nucellus_2’ (cluster 13) (Fig. 2e). The DEGs in ‘integument_2’ (cluster 8) were also associated with ‘response to osmotic stress’ (Fig. 2e). The DEGs in ‘mesocarp’ or ‘diaphragm’ (clusters 1, 2, 6, 9 and 14) were mainly associated with ‘response to lipid’, ‘ubiquitin-dependent protein catabolic process’, ‘nucleolus’, and ‘carboxylic acid catabolic process’ (Fig. 2e). The GO terms of ‘fiber cells’ (clusters 3, 10 and 19) were mainly associated with ‘mitochondrial respiratory chain complex 1’, which is similar to the GO terms associated with the DEGs in the ‘ovary wall’ (cluster 7) (Fig. 2e).
Next, we conducted an independent ST analysis of cotton bolls at each of the six early developmental stages (Supplementary information, Fig. S2-S7), and identified DEGs in each cluster (Supplementary information, Table S4-S9). While different cell populations in the six stages of early fiber development shared similar biological processes, each cell population also had its own set of uniquely expressed genes (Supplementary information, Fig. S2-S7). Some key marker genes for specific tissue or cell types were identified. For example, 11 clusters were identified in -1.5 DPA boll samples (Supplementary information, Fig. S2a). Gh_D12G199400, a homolog of the Arabidopsis teosinte branched 1, cycloidia, and proliferating cell nuclear antigen factor 5.3 (TCP5.3) (Gh_D12G199400), was found to be highly expressed in ‘mesocarp’ (cluster 1) (Supplementary information, Fig. S2b). In Arabidopsis, TCP5.3 was reported to play a role in controlling plant thermo-morphogenesis by positively regulating PIF4 activity 41. The marker gene Gh_D09G117000 is highly expressed in ‘fiber cells_1’ (cluster 2) of the 3 DPA boll and was identified as an ortholog of the Arabidopsis fasciclin-like arabinogalactan-protein 11.3 (FLA11.3) (Supplementary information, Fig. S6b), which was shown to act with FLA12 to fine-tune the initiation of secondary cell wall development, and the balance of lignin and cellulose in the secondary cell wall in Arabidopsis 42. In 5-DPA boll samples, the marker gene transmembrane kinase 3 (TMK3) (Gh_D09G144600) in ‘fiber cells_1’ (cluster 1) is a member of the receptor-like kinase (RLK) family and encodes a key regulator of cell expansion and proliferation 43 (Supplementary information, Fig. S7b).
scRNA-seq analysis of cotton ovule
Gene expression patterns of ovules in four different early developmental stages from -1.5 to 1 DPA were analyzed by scRNA-seq to determine the heterogeneity of cells. After performing quality control and screening of the raw data, 31,651 high-quality cells were obtained, possessing an average of 1,071 genes and 1,439 UMIs per cell that were used in the subsequent analysis (Supplementary information, Fig. S8a-e, Table S10). DEGs were identified for all four early developmental stages of -1.5 DPA to 1 DPA single-cell transcriptomes (Supplementary information, Table S11-S15).
Then the relationship between the scRNA-seq and ST data was analyzed, and a strong correlation between the two datasets was observed (Supplementary information, Fig.8f). Through the association analysis of spatial transcriptome and single-cell transcriptome data, we screened the genes highly expressed in the corresponding cell type of ST and scRNA. The results are shown in Supplementary information, Table S16. We then performed t-SNE dimensionality reduction and clustering on the data of the four single-cell components (inner integument, outer integument, nucellus, and diaphragm) (Fig. 3a) and obtained 16 cell clusters. The percentage of cells in each cluster ranged from 0.8% to 12.86% (Supplementary information, Table S17). We annotated the 16 different cell clusters based on the expression levels of marker genes and the annotation information of the ST data (Fig. 3b). A trend analysis was conducted on these genes to better understand the expression patterns of the DEGs. Supplementary information, Fig.9 shows the dynamic expression of DEGs of the four early developmental stages from -1.5 DPA to 1 DPA. A total of 4,638 genes were highly expressed at 0 DPA.
We analyzed gene expression in different cell clusters and identified the DEGs in each cluster (Fig. 3c). The expression of the top 10 ranked DEGs are presented as a heatmap (Fig. 3d). Some of the identified DEGs are involved in the regulation of hormone accumulation, such as the ortholog (Gh_A05G105700) of Arabidopsis marker gene OPDA reductase 2 (OPR2), which was reported to regulate JA accumulation 44. The cotton homologue (Gh_D08G075000) of Arabidopsis gene indole-3-pyruvate monooxygenase 10.4 (YUC10.4) has been reported to mediate auxin synthesis 45. We also conducted a trend analysis on the top 10 ranked DEGs in each cluster, and identified 33 marker genes that were specifically expressed at 0 DPA (Supplementary information, Fig.10a). Their homologous genes, such as xyloglucan endotransglycosylases/hydrolase 22.3 (XTH22.3) (Gh_A08G028100), a member of the GhXTH1 family 46, expansin 2 (EXPA2) (Gh_D10G123800) and EXPA4.6 (Gh_A10G149600) 9, auxin-inducible 22 (AUX22) (Gh_A05G114100) 47, aminocyclopropane-1-carboxylic acid (ACC) synthase 2.1 (ACS2.1) (Gh_A02G179600) 48, and phytosulfokine 6 (PSK6) (Gh_A01G235100) 49, were reported to regulate cotton fiber development. GO enrichment analysis indicated that genes that were highly expressed in 0-DPA ovules are mainly related to the term ‘translation’ and ‘ribosomal small subunit assembly’ (Supplementary information, Fig.10b). We then calculated the number of TFs in different cell types to further identify the potential mechanisms responsible for the regulation of the early development of fiber cells (Supplementary information, Fig.10c). For example, anthocyaninless 2 (ANL2.3, a HD-ZIP member) (Gh_A11G092100), ERF008.1 (Gh_A12G080000), and MYB306.11 (Gh_D12G192800) were highly expressed in the ‘0-DPA outer integument cells’ (cluster 3) (Supplementary information, Fig.10d).
GO enrichment analysis of the DEGs in each cluster indicated that the term related to the ‘response to lipid’ was the most abundant, and was mainly enriched in DEGs in the ‘outer integument’ and ‘nucellus’ regions (clusters 1, 2, 3, 5 and 15) (Fig. 3e). The ‘outer integument’ (clusters 1, 3 and 15) had similar GO terms, mainly related to ‘response to lipid’ and ‘response to osmotic stress’ (Fig. 3e). Further pseudotime analysis indicated that the cells from four samples could be generally distributed along one main developmental trajectory, allowing identification of five pseudotime states (Fig. 4a). The proportion of pseudotime states in the four samples indicated that the pseudotime trajectory was generally consistent with their developmental stage (Fig. 4b). Further analysis indicated that the cells of different cell clusters had a relatively independent distribution along the pseudotime trajectory (Fig. 4c). The expression patterns of the top 10 DEGs are presented in Fig. 4d, while the annotation of the genes appears on the right of the heatmap. Fig. 4e illustrates the expression curves of some representative marker genes along pseudotime. We could see from the expression heatmap that several genes, such as MYB102 (Gh_A01G069800), responsive to dehydration 21B (RD21B) (Gh_A12G080300), glutathione 3.1 (GSH3.1) (Gh_D08G232200), GA insensitive dwarf 2.2 (GID2.2) (Gh_D05G264400) and ATHB-20 (Gh_A09G024100), exhibited a sharp decreasing trend in expression along the pseudotime trajectory. Other genes, such as DOF zinc finger protein 2.1 (DOF2.1) (Gh_D05G076800), lateral organ boundaries domain 12.5 (LBD12.5) (Gh_D03G160000), ERF026 (Gh_A11G008000), ERF017 (Gh_A06G088500), NRT1/PTR family 1.2 (NPF1.2) (Gh_A05G233200) and XTH22, exhibited a rapid increase at the end of the pseudotime trajectory.
SM analysis of cotton fiber development using air flow-assisted ionization-MSI
Through the ST and scRNA-seq analyses, we found that many genes related to hormone metabolism, such as GH3.6.2, HOS3.7 and DOX2 (Fig. 2), play a role in fiber development. Therefore, we carried out a highly sensitive and wide-coverage imaging analysis of metabolites in sections of cotton bolls to further explore the metabolic changes that occur during early fiber development (Supplementary information, Fig. S11a). Images obtained by air flow-assisted ionization-MSI (AFAI-MSI) illustrate the changes in metabolite intensity in different regions of the cotton boll samples (Supplementary information, Fig. S11b). Metabolites in different tissues can be detected by combining the results of K-means clustering with the results of hematoxylin and eosin (H&E) staining of sample tissue sections (Supplementary information, Fig. S11b-d). Supplementary information, Fig. S11e shows images of representative tissues, including the pericarp, diaphragm, placenta, ovule and gland.
Average mass spectrograms were obtained from the whole section of 3 DPA cotton bolls in positive and negative ion modes, with a mass range of m/z 70-1000 (Supplementary information, Fig. S12a, b). The main compounds in the mass range of m/z < 650, based on the matching analysis, were amino acids, polyamines, carbohydrate compounds, fatty acids, aliphatic aldehydes, aliphatic ketones, alkaloids, and metabolic intermediates of the above compounds. The m/z range from 650 to 1000 contains a number of compounds located in gland tissues. Supplementary information, Fig. S12c-h shows the average mass spectrograms obtained from the micro-regions of the gland, ovule and ovary in positive and negative ion modes, while some ions revealed specific metabolic profiles. We identified a total of 439 metabolites using high-throughput matching of all of the characteristic peaks (when positive and negative ions matched the same metabolite, only one of them was selected). We used the MetaboAnalyst website (https://www.metaboanalyst.ca) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to map the classified metabolites into metabolic pathways to construct a metabolic relationship network. We identified a total of 86 metabolic pathways using this approach (Supplementary information, Table S18). Supplementary information, Fig. S13a, b illustrate the metabolic relationship network diagram for some of the metabolites. The representative metabolites specific to different tissue micro-regions of cotton are shown in Supplementary information, Fig. S13c.
An unsupervised principal component analysis (PCA) analysis was performed on the metabolic profile data of cotton ovules at -1.5 DPA and 5 DPA. The scanning data in positive ion mode is presented in Supplementary information, Fig. S14a, while the scanning data in negative ion mode are presented in Supplementary information, Fig. S14b. We then utilized supervised orthogonal projections to latent structures discriminant analysis (OPLS-DA) to further mine the differential variables. The OPLS-DA score diagrams of -1.5 DPA and 5 DPA samples in both positive and negative ion modes revealed an obvious pattern of clustering and grouping (Supplementary information, Fig. S14c-f). We, next, identified differentially abundant metabolites (DAMs) based on the constructed OPLS-DA model. As a result, we identified 135 DAMs with significant changes in abundance between -1.5 DPA and 5 DPA (Supplementary information, Table S19). In general, the abundance of carbohydrate compounds at 5 DPA mostly increased, while the abundance of amino acids mostly decreased. Supplementary information, Table S19 lists the trend of the abundance level of the DAMs from -1.5 DPA to 5 DPA, which encompasses seven early growth stages (-1.5, -1, 0, 1, 2, 3 and 5 DPA).
We also identified metabolites that were present in all regions of cotton boll sections at different early developmental stages (Fig. 5, Supplementary information, Table S20). Among the 365 DAMs that were identified, we were able to link 47 DAMs to metabolic pathways (Supplementary information, Fig. S15). Fig. 5a illustrates the DAMs and 46 metabolic pathways that were identified. Some of the identified metabolites play an important role in regulating development and growth. For example, dehydroascorbic acid plays a key role in maintaining the ascorbic acid pool in leaves, increasing the ascorbic acid content in leaves and improving the oxidative stress tolerance of plants 50. Succinic acid, a four-carbon dihydric acid, that is oxidized by succinate dehydrogenase (SDH) as part of the TCA cycle, mediates ROS metabolism in plants 51. Zeatin is a cytokinin molecule that is an important regulator of plant development and environmental stress responses 52. The dynamics of the accumulation of the DAMs are illustrated in Fig. 5b, while the spatial distribution of some representative DAMs is shown in Fig. 5c.
Screening for metabolites involved in fiber development
Fiber cell development is initiated from cells in the outer integument of 0 DPA ovules 6. We conducted a metabolic network analysis of the metabolites in 0 DPA ovules to characterize the metabolic activities occurring during fiber initiation (Supplementary information, Fig. S16). Among the 115 DAMs (Supplementary information, Table S21), 17 were linked to metabolic pathways (Supplementary information, Fig. S16a, b, Table S22). Supplementary information, Fig. S16c illustrates the distribution of some representative metabolites (from -1 DPA to 0 DPA).
Our analysis of paraffin-embedded sections indicated that cotton fiber cells on the outer integument of the ovules significantly increased in length at 3 DPA. Therefore, we identified DAMs in 3 DPA ovules to determine the changes in metabolites that occur during fiber lengthening. A total of 116 DAMs were identified (Supplementary information, Fig. S17a, Table S23), and 16 of them were linked to metabolic pathways (Supplementary information, Table S24). Analysis of the metabolic network of these DAMs indicated that L-aspartic acid, and O-Acetyl-L-serine were closely related to each other (Supplementary information, Fig. S17b). Notably, α-linolenic acid and linoleic acid are key metabolites in the fatty acid metabolism pathway, and their abundance was high in 3 DPA ovules, suggesting that they were critical to the regulation of cotton fiber lengthening. Spermidine, a type of spermine compound, was also highly abundant in 3 DPA ovules, indicating that it is also potentially involved in fiber lengthening. Furthermore, α-linolenic acid, linoleic acid and spermidine were significantly abundant in 3 DPA ovules, while they were low abundant in other tissues of cotton bolls (Supplementary information, Fig. S17c).
Correlation analysis of the ST and SM data
We identified numerous DAMs during early fiber developmental process, especially L-arginine, cellulose, α-linolenic acid, and spermidine, which were all highly accumulation in ovular tissues. The genes related to these metabolites were also identified. Then a combined analysis of the metabolome and transcriptome data was conducted to further explore the interrelationship between the gene expression levels and related DAMs at the corresponding time points (Supplementary information, Table S25).
Next, we performed a combined analysis on the metabolic pathways of the metabolites and genes encoding the metabolic enzymes involved in these pathways. The α-linolenic acid metabolic pathway is illustrated in Fig. 6a. One can see in the pathway diagram that α-linolenic acid can be transformed into JA. Combined with the ST data, we identified 64 genes encoding metabolic enzymes in the entire α-linolenic acid metabolic pathway (Supplementary information, Table S26). Some of the identified genes were subjected to a detailed expression analysis and a heatmap of gene expression was generated (Fig. 6b). Our analysis indicated that most of the genes in the α-linolenic acid metabolic pathway were up-regulated in 3 DPA and 5 DPA fiber cells, and a few were highly expressed in -1 DPA and 0 DPA outer integument cells. We also analyzed the expression of genes related to metabolic enzymes in the metabolic transport pathway (Figures S18, S19, and S20) and synthesis of oxoglutaric acid, L-aspartic acid, cellulose, auxin, BR, linolenic acid and spermidine. Genes related to fatty acid synthesis and amine metabolism were highly expressed during fiber lengthening (3 DPA to 5 DPA) (Supplementary information, Fig. S19). These results indicate that L-aspartic acid, L-arginine and other amino acids, auxin and other hormones, and cellulose-like substances potentially represent signature metabolites during fiber initiation, while fatty acid metabolites, such as α-linolenic acid and linoleic acid, and amine metabolites, such as spermidine, may play an important role in fiber lengthening.
The expression patterns of representative marker genes determined by in situ hybridization
Based on the joint analysis of spatial transcriptome and spatial metabolome data, we screened out many genes and metabolites that may affect fiber development (Fig. 6, Supplementary information, Fig. S18, Fig. S19, and Fig. S20). The expression levels of several representative marker genes were determined by RNA in situ hybridization (ISH) using the mRNA of the corresponding cotton genes to verify the reliability of the ST data. The ST and ISH results were compared in detail (Fig. 7 and Supplementary information, Fig. S21). Notably, we used ST data to identify the expression patterns of three marker genes in the ‘integument cell’ cluster (cluster 4). DOX2 and SBT4.14 were strongly expressed in the outer integument at -1 DPA, while GH3.6.2 was strongly expressed in the outer integument at 0 DPA (Fig. 2d). Results of the ISH analysis also indicated that DOX2, SBT4.14 and GH3.6.2 were expressed in the outer integument region of boll paraffin sections at -1 DPA and 0 DPA, respectively (Fig. 7a-c). Similarly, we observed in the ST data analysis that ENDO1 was highly expressed in the outer integument of the -1 DPA boll (Fig. 2d). In accordance with these observations, we found in the ISH results that the outer integument region of paraffin sections of the -1 DPA cotton boll paraffin sections exhibited a deep brown-yellow color, indicating that ENDO1 was mainly expressed in the outer integument region (Fig. 7d). Consistent with the ST results, ISH analysis revealed that HOS3.7 and SWEET15 were highly expressed in the outer integument of 3 DPA boll, and the placenta cells and nucellus of ovules at 5 DPA (Fig. 7e, f), respectively. The expression patterns of DOX2 and SWEET15 at different developmental stages are presented in Supplementary information, Fig. S21, and are generally consistent with the expression patterns indicated in the ST analysis.
Role of representative marker gene BEE3 in regulating the initiation of fiber cells
In placenta we identified the higher expression of BEE3, which encodes an important component of BR signaling (Fig. 2d). BRs had been reported promote fiber initiation and elongation 9.To further determine if the screened marker genes play a role in the early development of cotton fiber, we produced the RNAi and overexpression lines of BEE3 (Gh_A09G062900) for scanning electron microscopy (SEM) (Supplementary information, Fig. S22). The results of -1 DPA showed that in total very few protruding fiber cells protruded on the ovules of ZM24 (control), OE-BEE3 and RNAi-BEE3 (Supplementary information, Fig. S22a). Compared with ZM24 at 0 DPA, the number of protruding fiber cells in OE-BEE3 ovule was significantly higher, while that in RNAi-BEE3 protruding fiber cells were lower (Supplementary information, Fig. S22a, b). Similarly, at 1 DPA, OE-BEE3 ovule surface showed significantly more protruding fiber cells, while RNAi-BEE3 ovule surface displayed less protruding fiber cells in comparison with ZM24 (Supplementary information, Fig. S22 a, c). The protruding fiber cells on the surface of ovules increased significantly in length at 2 DPA and 3DPA; however, compared with ZM24, the protruding fiber cells of OE-BEE3 were denser, while those of RNAi-BEE3 were sparser. These results indicate that BEE3 plays an important role in the initiation of cotton fiber.