Phylogenetic and Functional Analysis of CesA Genes in Cotton

Background: Cotton (Gossypium hirsutum) is widely distributed all over the world, and improving the quality of its ber is one of the most important tasks in cotton breeding. Cotton bers are primarily composed of cellulose, which is synthesized and regulated by cellulose synthase (CesAs). But the molecular mechanism of CesA genes in cotton was unclear. Results: In this study, the phylogenetic history and purifying selection of CesA genes were investigated along with their functions. CesA3 and CesA6 are the two largest subgroups in G. arboreum, comprising 52.8% of the whole CesA family. These two CesA subgroups were then chosen for further research, and the results showed that they are highly differentiated in dicot groups. The two subgroups were also discriminated with the use of a Ka/Ks analysis. This indicated that they may play an important role in ber development based on their unique phylogenetic status. Functional studies were subsequently conducted using the most puried genes (Gohir.A08G144300.1 in CesA3 subgroup). The silencing of Gohir.A08G144300.1 visibly inhibits the growth of cotton ber, showing that it is critical for the growth of cotton bers. Conclusions: The results presented here a target gene Gohir.A08G144300.1 based on the analysis of CesA gene members, and it is found that this gene was crucial to the growth of cotton bers. This study provides more information for the understanding of the molecular mechanism of cotton ber development.

Cotton is an important plant for ber and oil production. Fiber cells primarily undergo two periods during their developmental process: cell elongation, which controls the length of bers, and secondary cell wall thickening, which determines the strength of bers. The two periods are arranged in a chronological order and do not overlap [7]. SCW thickening is closely related to the quality of the cotton ber, which merits further study.
The development of cotton bers is a complicated process which is regulated by the expression of a series of genes. The CesA gene family plays a central role in this process. A better understanding of its physiological functions and evolutionary history is of vital signi cance.
With its rapid development, the large-scale application of next generation sequencing (NGS) technology has helped to decipher a substantial amount of genetic and transcriptomic information [8]. NGS technology has been a powerful tool in the research of many plants, including Arabidopsis [9], rice [10], soybean [11] and sesame [12]. This bioinformatic method has provided much information about CesA genes at the genomic level. However, the evolution and function of CesA genes in cotton has not been systematically studied. In this study, the evolutionary history of CesA genes in cotton were investigated among a series of selected plant species. In addition to this, the function of representative CesA gene Gohir.A08G144300.1 was studied, providing more references to expand our understanding of cotton CesA genes at the evolutionary and functional levels.

Evaluation of transcriptome sequencing results
In our previous transcriptome data, six different ber samples, including three 5DPA and three 15DPA, were obtained using Illumina sequencing technology (San Diego, CA, USA), and the statistical results of transcriptomes are shown in Table 1. The ratio of clean reads and the total mapped rate were 90.0% and 93.0%, respectively. A total of 36G data were obtained after quality control, and the Q30 base ratio was above 92.0%. In addition, the uniquely mapped rate was higher than 94.0%. All these indicators demonstrated that the transcriptomic data were highly accurate for subsequent analyses. Six differentially expressed CesA genes were identi ed from the transcriptome data (Table S1). KEGG pathway analysis showed that the CesA genes were enriched in signaling pathways, such as the ATP binding pathway (Fig. 1). represents the ratio of high-quality sequencing reads among all the sequenced reads. N (%) represents the percentage of fuzzy bases. Q30 (%) represents the percentage of bases whose base recognition accuracy is above 99.9%. Total_Mapped (%) represents the total number of clean reads mapped on the reference genome. Uniquely_Mapped (%) represents the total number of clean reads which uniquely mapped on the reference genome.

Classi cation And Phylogeny Of The Cesa Gene Family
In order to investigate the evolution of all CesA gene members, the BLASTp method was used to retrieve and identify the CesA genes in G. arboreum [13], G. raimondii [14] and G. hirsutum [15], and 70 CesA genes were nally identi ed based on the conserved structure of CesA (Table S2-S4). MEGA7 software was used to construct the evolutionary tree of CESA. The systematic structure showed that CesA was divided into seven subfamilies (Fig. 2), respectively (CesA1, CesA3, CesA4, CesA6, CesA7, CesA8 and CesA9). Among of them, 21 CesA6 genes (30.0%) and 16 CesA3 genes (22.8%) were discovered with a larger number of gene members, indicating that the subgroup CesA3 and CesA6 may be signi cant in cellulose-related physiological processes in Gossypium.
CesA3 and CesA6 genes are highly differentiated in dicot groups To study the evolutionary history of cotton CesA3 and CesA6 genes and the evolutionary relationship of other model species in more detail, we constructed a phylogenetic tree using the CDS sequences of CesA genes in Selaginella [16], grape [17], Arabidopsis [18,19] and cotton with the neighbor joining method.
The phylogenetic tree of four species indicated that all the genes can be categorized into several groups, and almost all AtCesA genes located at the most basal branches ( Fig   . 3a). MapChart software was used to draw the diagram of the position of cotton CESA on each chromosome (Table S5). The results showed four CesA3 genes and ve CesA6 genes were identi ed among seven chromosomes (Fig. 3b), with the gene length varying from 3.0 kb to 4.6 kb (Table S6). The CesA genes from other plants, such as Solanum lycopersicum and Arabidopsis thaliana, were so complicated, since it appeared that they are not clustered based on the organisms or gene subfamilies. However, the CesA3 genes (blue) and CesA6 genes (red) from G. hirsutum appeared to be most differentiated, since their position in phylogeny tree is the most advanced. Three CesA3 genes were clustered, indicating their relatively close relationship. However, Gohir.D13G163700.1, a member of CesA3, did not appear in concert with the other three CesA3 genes. So we can speculate that the genes from CesA3 and CesA6 are highly differentiated in cotton.
Gene structure and selection pressure of CesA3 and CesA6 during the evolutionary process The conserved motif structures were predicted to clarify the conserved structure of CesA3 and CesA6 genes that encode proteins. All the CesA3 and CesA6 genes from the three species of Gossypium (G. arboreum, G. raimondii and G. hirsutum) were queried to identify motifs in CesA3 and CesA6 to improve the quality of motif prediction. The three most likely motifs were regarded as the nal results (Fig. 4). The six predicted motifs are highly conserved, with E-values that range from 3.5E − 268 to 2.9E − 284 . The length of genes clearly vary (3.0 kb to 4.4 kb in CesA3, 3.8 kb to 4.5 kb in CesA6; Table S6), and all genes have the three motifs which are similarly distributed on the transcripts. The order of three motifs also remained unchanged among the CesA3 and CesA6 genes. To understand the evolutionary history of Gossypium arboretum CesA3 and CesA6 genes in more detail, the selective pressure (Ka, Ks, Ka/Ks ratio) of the CesA gene family was estimated in the evolutionary process using the kekeThecc1EG002092t1 [20] gene as a reference (Table S7). We statistically analyzed the amount of selective pressure (Table S5). The selection pressure of CesA genes during the evolutionary process is less than 1, i.e., indicating that they are subject to purifying selection.

Analysis of the expression patterns of CesA genes in Gossypium hirsutum
In order to research the function of CesA genes, expression pattern analysis was conducted. We selected two representative genes to study the CesA gene in more detail. The expression levels of the Gohir.A08G144300.1 and Gohir.D05G245300.1 genes in different tissues and ber development stages were detected at 5, 10, 15, 20 and 25 dpa using uorescence quanti cation method and the results were shown in Fig. 5. The results indicated that expression level of two genes in the ber was signi cantly higher than other tissues. Besides, expression levels of two genes at different ber development stages were analyzed. The Gohir.D05G245300.1 gene reached a peak at 25 dpa in ber development. However, the Gohir.A08G144300.1 gene reached a peak at 10 and 15 dpa in ber development and then decreased signi cantly. This gene was highly expressed throughout the entire ber period.
Gohir.A08G144300.1 gene may play important roles in cotton ber development To study the possible biological functions of CesA genes during the process of plant growth, we selected Gohir.A08G144300.1 to construct the CesA plant interference vector [21] and quanti ed the results of cotton phenotype (Fig. 6). The number and size of cotton bolls was reduced (Fig. 6a, b), with the average length decreasing by 20% from 35 mm (wild type, wt) to 28 mm. The growth of Gohir.A08G144300.1 bers was inhibited signi cantly compared with that of the control group (CK) (Fig. 6c-f). The ber length was shortened (Fig. 6e and f), with the average length decreasing by 13% from 25 mm to 18 mm. The variations of cotton bers may be caused by the changes in the cell volume and number.
Fiber cell glass method was used to study the cause of reduced bers Signi cant differences in cell volume and number and ber length were observed between silencing plants in which target gene Gohir.A08G144300.1 was silenced and normal plants. Fiber cell glass method was used to study the cause of reduced bers of the TRV-CaCesA knockout and the lm was observed under a 200X optical microscope (Fig. 6). The results suggested that this gene play an important role in ber growth.

Evolutionary analysis of the CesA genes
Cotton is a highly valuable resource plant species. Improving the yield and quality of its bers is one of the essential issues in the eld of cotton breeding. In recent years, the gradual improvements in technology have resulted in substantial progress in the study of developmental mechanisms for cotton bers [22][23][24]. This has been made possible by such new technologies as RNA-Seq and Chip-Seq, which enable studies of the regulation of transcriptomes, proteomes, metabolomes and transcription and translation. The mechanism of cotton ber development has been analyzed from many different perspectives, and a large number of ber-related genes have been excavated. The CesA gene family is an essential component of cellulose synthase and is simultaneously directly responsible for the process of production of cotton bers. In this study, we analyzed the cotton ber transcriptome and found that the CesA genes are widely involved in the process of cotton ber development. We focused on the phylogeny and function of the CesA genes. We also investigated the phylogenetic history of CesA genes in cotton (G. hirsutum) and examined A08G144300.1 at the functional level.
It can easily be deduced that the CesA3 and A6 genes from G. hirsutum are located on the most differentiated branches on the dicot phylogenetic tree (Fig. 2a), which suggests that there are strong links between the CesA genes and speci c characteristics in cotton. We used the evolutionary node plants S. lycopersicum, V. vinifera and Arabidopsis to examine the evolutionary history of CesA genes in more detail. We did not compare CesA3 Gohir.A08G144300.1 and CesA6 Gohir.D05G245300 with the genes in other species. Notably, we also found that these results are not caused solely by the chronological order of the species along the overall physiological history. Therefore, we concluded that Gohir.A08G144300.1 and CesA6 Gohir.D05G245300 are very import for cotton and retention it.

Functional Analysis Of The Cesa Genes
There are many members of the CesA transcription factor family that are largely involved in the regulation of plant growth and development, response to biological stress and other processes [25][26][27]. The AtCesA3 and AtCesA6 genes in A. thaliana are involved in the composition of primary cell wall and cellulose synthesis. With the advent of the post-genomic era, increasing numbers of functions for CesA gene have been veri ed [28][29][30][31]. However, CesA is rarely reported in genes that regulate the development of ber. In this study, a series of biochemical and functional identi cation experiments, such as the determination of gene structure and Ka/Ks and the quanti cation of uorescence, were conducted on CesA genes The structure of CesA gene family is highly conservative, and these genes appear to have been the subject of selection during the process of evolution. To study the potential function of the Cest genes in more detail, we constructed the A08G144300.1 gene interference vectors and found that the length of bers of the TRV-CmANT-1 and TRV-CesA.A08G144300.1 knockout strains was signi cantly reduced compared with those of the TRV control group. The ber length appears visually as signi cantly shorter. Similar results have been observed in other species, For example, when the CESA gene is mutated in Arabidopsis [32,33], the synthesis of cellulose is hindered. This leads to a decrease in the thickness of the cell wall, which causes a series of changes in cell morphology, that result in a new perspective on the study of signal pathways of A08G144300.1. In addition, the Gohir.D05G245300.1 gene was found to be highly expressed in the ber. This implies that this gene plays an important role in ber growth. We will present an advanced analysis of their detailed functions in a future report.

Conclusions
In this study, the phylogenetic history of CesA gene families, and the function of a representative gene Gohir.A08G144300.1 were investigated. CesA3 and CesA6 were two are the two largest subgroups in CesA gene family, and target gene Gohir.A08G144300.1 play an vital role in cotton ber development by functional analysis, providing more insights into the research of molecular research of CesA genes in cotton. Although we obtained primary functional information of the genes, more experimental and computational evidence is needed to fully elucidate the function of the genes and process of cotton ber growth.

Materials
A standard system for genetics is provided by the National Cotton Breeding Center of the Cotton Research Institute of the Chinese Academy of Agricultural Sciences (Zhengzhou, China). New roots, stems, young leaves, owers (during the owering period) and different stages of bers during development were frozen in liquid nitrogen and stored at -80°C until their RNA was extracted for an analysis of tissue expression.
Identi cation of the cotton CESA gene family All of the gene sequence and annotation information of the cotton CESA family was downloaded from the CFGD (http://www.cottonfgd.org/) database [34] under the accession number iPF03552 (Table S2-S4). Six cotton CESA protein domain sequences were used as probes to examine other families using the probe search method [35], and the results were compared with the Phytozome botanical genome database (https://phytozome.jgi.doe.gov/pz/portal.html#) (Addiyional le8). The e-value was set to 1 e-5 . The transcriptome data was completed by Shanghai Personalbio (Shanghai, China).
An Illumina HiSeq platform was used to sequence the genes. The raw data obtained from the computer was processed by quality control, ltering, comparing, and other analyses. Clustal X1.8 software was used to perform multiple sequence alignment of protein sequences, and the alignment results were analyzed using MEGA7 [36]. MapChart software was used to map the distribution of genes on chromosomes. The online software MEME [37] was simultaneously used to pre-analyze the conserved motifs of proteins using the D cotton CESA sequence, and KaKs_Calculator2.0 software [38] was used to calculate the selection pressure between collinearity genes.

Quantitative real-time PCR
The cotton plant was split into several parts, including the root, shoot, leaves, owers and bers. Fibers were collected at different stages (5, 10, 15, 20 and 25 d) after owering. The RNA was extracted separately. A quantitative real-time PCR experiment was conducted using TIANGEN RealUniversal Color PreMix (SYBR Green) (QKD-201, Tiangen Biotech, Beijing, China) following the manufacturer's instructions. GaHistone3 was used as the reference gene.

Para n sectioning
Three-day-old cotton bers were xed in a xing solution that contained 4% FAA (formaldehyde-glacial acetic acid-absolute ethanol) for 24 hours, dried under vacuum and then incubated overnight at 4°C. The samples were dehydrated using a series of gradient concentrations of ethanol (50%, 70%, 85%, 95% and 100%) with each gradient consisting of 30 min. The soaked tissues were embedded in liquid para n and then cooled at -20°C. The samples were cut into 4 μm-thick sections with a para n section base. The sections were suspended in a 40°C water bath to atten them before their placement on a glass slide.
They were dried overnight at 37°C. The sections were then stained with safranin and Fast Green, and photographed with a digital camera under a microscope.

Virus-induced gene silencing (VIGS) experiments
Full-length Gohir.A08G144300.1 was ampli ed from G. arboreum cDNA (Table S6). The EcoRI and KpnI sites of pTRV:RNA2 were used as cloning sites, and the target sequences were inserted. The recombinant vectors were transformed into Agrobacterium GV3101 competent cells following the manufacturer's instructions The Agrobacterium transformant was cultivated in liquid Luria-Bertani (LB) medium containing 25 μg/mL -1 rifampicin to an OD 600 from 1.8 to 2.2. The OD 600 of the medium was adjusted to 1.5 using buffer that contained 0.5 mol/L -1 MES, 200 mmol/L -1 acetosyringone and 1 mol/L -1 MgCl 2 for transfection. Liquid medium containing pTRV:RNA2-Gohir.A08G144300.1 and pTRV:RNA2 were mixed with pTRV at an equimolar ratio. They were then injected into cotton cotyledons at the three-leaf stage. Each group had 10 replications. The cotton plants were exposed to negative pressure and subsequently grown in the dark for 48 h. The plants were then moved to a greenhouse and grown under a 16-hour photoperiod for 30 d. Finally, they were grown under an 8-hour photoperiod to trigger the differentiation of ower buds. Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.