Evaluation of transcriptome sequencing results
In our previous transcriptome data, six different fiber samples, including three 5DPA and three 15DPA, were obtained using Illumina sequencing technology (San Diego, CA, USA), and the statistical results of transcriptomes are shown in Table 1. The ratio of clean reads and the total mapped rate were 90.0% and 93.0%, respectively. A total of 36G data were obtained after quality control, and the Q30 base ratio was above 92.0%. In addition, the uniquely mapped rate was higher than 94.0%. All these indicators demonstrated that the transcriptomic data were highly accurate for subsequent analyses. Six differentially expressed CesA genes were identified from the transcriptome data (Table S1). KEGG pathway analysis showed that the CesA genes were enriched in signaling pathways, such as the ATP binding pathway (Fig. 1).
Table 1
Overview of high-quality of the transcriptome sequencing data
Sample
|
Clean Reads No.
|
Clean Reads %
|
N (%)
|
Q30 (%)
|
Total Mapped (%)
|
Uniquely Mapped (%)
|
15DPA1
|
39292978
|
92.90
|
0.001869
|
93.39
|
37546270 (95.55)
|
35759208 (95.24)
|
15DPA2
|
41001628
|
93.08
|
0.001863
|
93.27
|
39217817(95.65)
|
37417619 (95.41)
|
15DPA1
|
39213268
|
92.90
|
0.001862
|
93.33
|
36545369 (93.20)
|
34587599 (94.64)
|
5DPA1
|
38202518
|
90.82
|
0.001859
|
92.96
|
36437115 (95.38)
|
34694598 (95.22)
|
5DPA2
|
41301746
|
92.06
|
0.001853
|
93.53
|
39214123 (94.95)
|
37089080 (94.58)
|
5DPA3
|
40624562
|
93.07
|
0.001434
|
92.99
|
38573488 (94.95)
|
36695222 (95.13)
|
Note: Clean Reads No. represents the number of high-quality sequencing reads. Clean Reads (%) represents the ratio of high-quality sequencing reads among all the sequenced reads. N (%) represents the percentage of fuzzy bases. Q30 (%) represents the percentage of bases whose base recognition accuracy is above 99.9%. Total_Mapped (%) represents the total number of clean reads mapped on the reference genome. Uniquely_Mapped (%) represents the total number of clean reads which uniquely mapped on the reference genome. |
Classification And Phylogeny Of The Cesa Gene Family
In order to investigate the evolution of all CesA gene members, the BLASTp method was used to retrieve and identify the CesA genes in G. arboreum [13], G. raimondii [14] and G. hirsutum [15], and 70 CesA genes were finally identified based on the conserved structure of CesA (Table S2-S4). MEGA7 software was used to construct the evolutionary tree of CESA. The systematic structure showed that CesA was divided into seven subfamilies (Fig. 2), respectively (CesA1, CesA3, CesA4, CesA6, CesA7, CesA8 and CesA9). Among of them, 21 CesA6 genes (30.0%) and 16 CesA3 genes (22.8%) were discovered with a larger number of gene members, indicating that the subgroup CesA3 and CesA6 may be significant in cellulose-related physiological processes in Gossypium.
CesA3 and CesA6 genes are highly differentiated in dicot groups
To study the evolutionary history of cotton CesA3 and CesA6 genes and the evolutionary relationship of other model species in more detail, we constructed a phylogenetic tree using the CDS sequences of CesA genes in Selaginella [16], grape [17], Arabidopsis [18, 19] and cotton with the neighbor joining method. The phylogenetic tree of four species indicated that all the genes can be categorized into several groups, and almost all AtCesA genes located at the most basal branches (Fig
. 3a). MapChart software was used to draw the diagram of the position of cotton CESA on each chromosome (Table S5). The results showed four CesA3 genes and five CesA6 genes were identified among seven chromosomes (Fig. 3b), with the gene length varying from 3.0 kb to 4.6 kb (Table S6). The CesA genes from other plants, such as Solanum lycopersicum and Arabidopsis thaliana, were so complicated, since it appeared that they are not clustered based on the organisms or gene subfamilies. However, the CesA3 genes (blue) and CesA6 genes (red) from G. hirsutum appeared to be most differentiated, since their position in phylogeny tree is the most advanced. Three CesA3 genes were clustered, indicating their relatively close relationship. However, Gohir.D13G163700.1, a member of CesA3, did not appear in concert with the other three CesA3 genes. So we can speculate that the genes from CesA3 and CesA6 are highly differentiated in cotton.
Gene structure and selection pressure of CesA3 and CesA6 during the evolutionary process
The conserved motif structures were predicted to clarify the conserved structure of CesA3 and CesA6 genes that encode proteins. All the CesA3 and CesA6 genes from the three species of Gossypium (G. arboreum, G. raimondii and G. hirsutum) were queried to identify motifs in CesA3 and CesA6 to improve the quality of motif prediction. The three most likely motifs were regarded as the final results (Fig. 4). The six predicted motifs are highly conserved, with E-values that range from 3.5E− 268 to 2.9E− 284. The length of genes clearly vary (3.0 kb to 4.4 kb in CesA3, 3.8 kb to 4.5 kb in CesA6; Table S6), and all genes have the three motifs which are similarly distributed on the transcripts. The order of three motifs also remained unchanged among the CesA3 and CesA6 genes. To understand the evolutionary history of Gossypium arboretum CesA3 and CesA6 genes in more detail, the selective pressure (Ka, Ks, Ka/Ks ratio) of the CesA gene family was estimated in the evolutionary process using the kekeThecc1EG002092t1 [20] gene as a reference (Table S7). We statistically analyzed the amount of selective pressure (Table S5). The selection pressure of CesA genes during the evolutionary process is less than 1, i.e., indicating that they are subject to purifying selection.
Analysis of the expression patterns of CesA genes in Gossypium hirsutum
In order to research the function of CesA genes, expression pattern analysis was conducted. We selected two representative genes to study the CesA gene in more detail. The expression levels of the Gohir.A08G144300.1 and Gohir.D05G245300.1 genes in different tissues and fiber development stages were detected at 5, 10, 15, 20 and 25 dpa using fluorescence quantification method and the results were shown in Fig. 5. The results indicated that expression level of two genes in the fiber was significantly higher than other tissues. Besides, expression levels of two genes at different fiber development stages were analyzed. The Gohir.D05G245300.1 gene reached a peak at 25 dpa in fiber development. However, the Gohir.A08G144300.1 gene reached a peak at 10 and 15 dpa in fiber development and then decreased significantly. This gene was highly expressed throughout the entire fiber period.
Gohir.A08G144300.1 gene may play important roles in cotton fiber development
To study the possible biological functions of CesA genes during the process of plant growth, we selected Gohir.A08G144300.1 to construct the CesA plant interference vector [21] and quantified the results of cotton phenotype (Fig. 6). The number and size of cotton bolls was reduced (Fig. 6a, b), with the average length decreasing by 20% from 35 mm (wild type, wt) to 28 mm. The growth of Gohir.A08G144300.1 fibers was inhibited significantly compared with that of the control group (CK) (Fig. 6c-f). The fiber length was shortened (Fig. 6e and f), with the average length decreasing by 13% from 25 mm to 18 mm. The variations of cotton fibers may be caused by the changes in the cell volume and number.
Fiber cell glass method was used to study the cause of reduced fibers Significant differences in cell volume and number and fiber length were observed between silencing plants in which target gene Gohir.A08G144300.1 was silenced and normal plants. Fiber cell glass method was used to study the cause of reduced fibers of the TRV-CaCesA knockout and the film was observed under a 200X optical microscope (Fig. 6). The results suggested that this gene play an important role in fiber growth.