2.1 Identification of Betula pendula cellulose cell wall synthesis genes
A total of 29,439 coding genes in B. pendula genome  were used to identify putative cell wall synthesis genes. In total, 46 gene models (Table 1) in 7 families were identified as putative cell wall synthesis genes in B. pendula genome. The 46 genes encode 10 cellulose synthase proteins (CESAs) and 36 cellulose synthase-like proteins (CSLAs, CSLBs, CSLCs, CSLDs, CSLEs and CSLGs) in 7 families. Among these families, CESA was the predominant cellulose synthase gene family and contains ten members. The rest of the gene families all belong to the cellulose synthase-like family, CSLG was the largest cellulose synthase-like family containing eleven members, while CSLA was the smallest family with only three members. We then applied quantitative criteria to assign the genes likely to be cell wall synthesis genes based on transcript abundance and specificity. The tissue-specific expressional data include xylem, roots, leaves and flowers, and we calculated the expression of the 46 identified genes. A total of 8 genes showed that expression in the xylem was higher than the expression in both flower and leaf. These genes were identified as the secondary cell wall synthesis genes BpCESA4, BpCESA9, BpCESA10, BpCSLA2, BpCSLA3, BpCSLC1, BpCSLC4 and BpCSLD4.
2.2 Chromosomal location and gene duplication
Cellulose synthase complex mainly includes cellulose synthases (CESAs) and cellulose synthase-like proteins (CSLs), so we investigated the formation of CESAs and CSLs based on the chromosomal location and intra-genome syntenic information. Similar to the A. thaliana, the multiple BpCESAs were scattered across the B. pendula genome and mapped in 13 of the 14 chromosomes (Figure 1). The BpCESAs were concentrated on Bpe_Chr6, Bpe_Chr7, Bpe_Chr8, Bpe_Chr9, Bpe_Chr10 and Bpe_Chr11, with one or two genes per chromosome. The BpCSLs were scattered on 13 chromosomes except for Bpe_Chr5, and we found that some BpCSLs were organized into duplicated blocks, such as BpCSLB1-7 on Bpe_Chr2, BpCSLG2-7 on Bpe_Chr14 and BpCSLG8-10 on Bpe_Chr1. This situation always originated from the duplicative transposition.
2.3 Cellulose synthase (CESA) gene family
Cellulose are the principal ingredient of the cell walls in B. pendula, and the small microfibrils are crystallized by 36 tails of H-bonded-β-1,4-Glc chains catalyzed by cellulose synthases . Thus, cellulose synthase (CSEA) was one of the indispensable glycosyltransferases in plants, which plays a crucial role in regulating cell wall cellulose synthesis and plant cell morphogenesis. We have identified 10 BpCESAs in the B. pendula genome, of which BpCESA4, BpCESA9 and BpCESA10 were abundant in xylem (Figure 2). BpCESA4 was the highest expressed gene in the root and xylem of the CESA family. The most similar protein to BpCESA4 was AtCESA4 in Arabidopsis thaliana, which confers plant resistance to bacterial and fungal pathogens while encoding a cellulose synthase. The protein most similar to BpCESA9 and BpCESA10 was AtCESA8 in A. thaliana.
2.4 Cellulose synthase-like (CSL) gene family
The cellulose synthase-like (CSL) gene family was divided into six families, which were CSLA, CSLB, CSLC, CSLD, CSLE and CSLG. The cellulose synthase-like gene family was divided into six families, which were CSLA, CSLB, CSLC, CSLD, CSLE and CSLG. The functions of the CSL family are still being explored, but a substantial number of studies were published in recent years. Jensen et al.  reported that the CSL genes is associated with hemicellulose synthesis, Schreiber et al.  and Doblin et al.  reported that cellulose synthase-like protein CSLFs and CSLHs mediate the synthesis of cell wall (1,3)(1,4)-β-D-Glucans, but the vast majority of CSL genes functions require further study.
We identified 36 BpCSLs in the B. pendula genome of which 5 genes were abundant in xylem (Figures 2 and 3). They were BpCSLA2, BpCSLA3, BpCSLC1, BpCSLC4 and BpCSLD4, respectively. Both BpCSLA2 and BpCSLA3 were most similar to AtCSLA9, and BpCSLD4 was most similar to AtCSLD6 in A. thaliana. In addition, the most similar protein to BpCSLC1 was AtCSLC4 in A. thaliana, which encode a protein similar to cellulose synthase and its mRNA can mobile in cell-to-cell.
2.5 Involvement of transcription factors in cell wall synthesis
Based on transcriptome sequencing data, we performed an extensive analysis between putative cell wall synthesis proteins and 2,816 transcription factors (Table S1) of B. pendula. The results showed that a total of 51 transcription factors were co-expressed with 6 cell wall synthesis proteins, which were BpCESA4, BpCESA9, BpCSLA2, BpCSLC1, BpCSLC4 and BpCSLD4 (Figure 4).
The highest number of transcription factors were co-expressed with BpCSLC1, up to 27, including ARF, IAA and several other auxin-related transcription factors. BpARF6 was most similar to AtARF17, and BpIAA16 was most similar to AtIAA16, which has transcriptional wiring with cell wall-related genes in A. thaliana . In addition to BpCSLC1, there was a co-expression relationship between BpCESA4 and BpCESA9, with 13 transcription factors regulating these two cellulose synthase genes. Among them, BpMYB-HB162 was most similar to AtMYB83, and BpNAM69 was most similar to AtNAC43 (NST1) in A. thaliana, which is known to be involved in cellulose synthesis.