Integrating ATAC-seq and RNA-seq to identify differentially expressed genes with chromatin-accessible changes during photosynthetic establishment in Populus leaves

Leaves are the primary photosynthetic organs, providing essential substances for tree growth. It is important to obtain an anatomical understanding and regulatory network analysis of leaf development. Here, we studied leaf development in Populus Nanlin895 along a development gradient from the newly emerged leaf from the shoot apex to the sixth leaf (L1 to L6) using anatomical observations and RNA-seq analysis. It indicated that mesophyll cells possess obvious vascular, palisade, and spongy tissue with distinct intercellular spaces after L3. Additionally, vacuoles fuse while epidermal cells expand to form pavement cells. RNA-seq analysis indicated that genes highly expressed in L1 and L2 were related to cell division and differentiation, while those highly expressed in L3 were enriched in photosynthesis. Therefore, we selected L1 and L3 to integrate ATAC-seq and RNA-seq and identified 735 differentially expressed genes (DEGs) with changes in chromatin accessibility regions within their promoters, of which 87 were transcription factors (TFs), such as ABI3VP1, AP-EREBP, MYB, NAC, and GRF. Motif enrichment analysis revealed potential regulatory functions for the DEGs through upstream TFs including TCP, bZIP, HD-ZIP, Dof, BBR-BPC, and MYB. Overall, our research provides a potential molecular foundation for regulatory network exploration in leaf development during photosynthesis establishment. Anatomy and RNA-seq of Populus leaves revealed the crucial stage for establishing photosynthesis. Integrating ATAC-seq and RNA-seq identified differentially expressed genes and their potential upstream transcription factors in this stage.


Introduction
Populus is a model system for plant biology, growing in a completely different ecology to that of Arabidopsis thaliana, and current research has mainly focused on wood formation, vascular development, lignin biosynthesis, flowering, and seasonality (Jansson and Douglas 2007;Chao et al. 2019;Andre et al. 2022;Quan et al. 2019).Indeed, the carbon assimilation products produced by leaf photosynthesis are transported to the stem through the vascular system and are the source of biomass for woody plants (Cano-Delgado et al. 2010;Taku Demura 2010).However, research on leaf photosynthetic establishment in woody plants is not clear.
The leaf, the primary photosynthetic organ, is designed to effectively capture light and exchange gases, resulting in high photosynthetic efficiency, and providing the basis for the growth of the whole plant.Although numerous studies have demonstrated that leaf anatomy is inseparable from photosynthesis, the molecular mechanisms are still poor understanding for optimizing leaf anatomy (Theroux-Rancourt and Gilbert 2017; Terashima et al. 2011;Tholen et al. 2012).Therefore, there is no doubt that exploring potential target genes for engineering leaf architecture is an effective strategy for enhancing photosynthetic capacity (Ren et al. 2019).
The leaves have a complex architecture.A typical C 3 leaf consists of two epidermal layers, with densely arranged columnar cells adjacent to the upper epidermis (palisade layer) and irregularly shaped cells adjacent to the lower epidermis (spongy layer).There are also two groups of specialized cells forming powerful internal conduit cells (vascular cells) traversing through the mesophyll: the xylem and phloem.Cells within the epidermis also have specialized functions.Lenticular cells in the upper epidermis can focus the incident light several times, and tubular palisade cells, which are oriented at right angles, enhance the penetration of light.Loosely arranged, irregularly shaped spongy cells facilitate scattering of light to enhance its absorption, and the extensive intracellular spaces in this layer promotes gas exchange (Beck 2010;Evans 1999;Martin et al. 1989;Poulson and Vogelmann 1990;Ustin et al. 2001).The development of these different cell types must be coordinated.For example, the development of the leaf vein, which provides a pathway for photosynthates produced by the mesophyll to be transported to the stem, is coordinated with the development of mesophyll (Scarpella et al. 2004).Previous studies have found that changing the geometry and packing of palisade cells greatly affects the diffusion of internally transmitted light (Cui et al. 1991).Other potential ways of altering leaf anatomy to optimize leaf photosynthetic capacity include manipulating mesophyll cell density to increase the surface area of mesophyll cells and chloroplasts exposed to the intercellular air space (Ren et al. 2019).
In previous studies of dicotyledons, the development of the growing leaf has been divided into three main stages: primordium initiation, primary morphogenesis, during which cells proliferate, and secondary morphogenesis, during which cells expand (Donnelly et al. 1999;Andriankaja et al. 2012;Xu et al. 2013).Confirming the onset of photosynthesis in Populus leaves enables us to find the target genes that alter the leaf anatomy.In the model plant Arabidopsis, the transition from primary to secondary morphogenesis occurs in the third leaf, when retrograde chloroplast signaling induces cell expansion and the leaf becomes capable of photosynthesis (Andriankaja et al. 2012).The early anatomical studies in Populus showed photosynthetic products can only accumulate once the leaves have completed the transition from primary to secondary morphogenesis, during which the leaf switches from a sink to a source (Larson et al. 1980).A Populus leaf at leaf plastochron index (LPI) 0.0 has recently emerged from the apex, and its lateral margins are folded lengthwise for about one-half the lamina length, whereas a leaf at LPI 6.0 has a fully expanded lamina and is considered mature and have completed the transition from the sink to source status (Larson andIsebrands 1971, 1974).Consequently, we defined LPI 0.0 as the first leaf and selected the first leaf to the sixth leaf (L1 to L6) along a development gradient to confirm the onset of photosynthesis in Populus leaves.
A variety of key genes that regulate cell proliferation and expansion have been discovered in other species, which provides more possibilities for manipulating leaf anatomy to enhance photosynthetic capacity.For example, GREEN FOR PHOTOSYNTHESIS (GPS), a partial loss-of-function allele of NARROW LEAF1 (NAL1) in rice (Oryza sativa L), was found to increase the number of mesophyll cells between vascular bundles, leading to increased photosynthesis rates (Takai et al. 2013).Promotion of KIP-RELATED PROTEIN 1 (KRP1) increased palisade cell density thus promoting the proportion of light energy used (Lehmeier et al. 2017).Overexpression of AINTEGUMENTA (ANT) gene prolongs cell proliferation and results initiation of new veins in Arabidopsis leaves, which is correlated with leaf photosynthetic capacity (Kang et al. 2007;Feldman et al. 2017).Generally, cell proliferation and cell expansion are coordinated, and an increase in cell numbers leads to a decrease in cell volume to maintain appropriate leaf size (Horiguchi and Tsukaya 2011).For instance, although RETINOBLASTOMA 1 3 RELATED PROTEIN 1 (RBR1) in Arabidopsis leads to the generation of more and smaller cells in the mesophyll, thereby increasing photosynthetic capacity, it results in transgenic lines that were smaller than wild type (Lehmeier et al. 2017).Hence, the target genes that regulate leaf anatomy require more accurate and extensive research, and the analysis of the regulatory networks of target genes probably facilitates us to precisely manipulate leaf anatomy in the future.
Chromatin structure plays a pivotal role in the precise control of gene expression.In eukaryotic nuclei, accessible chromatin regions are considered to be cis-regulatory DNA elements (e.g.promoters and enhancers) (Berger 2007).The binding of TFs to cis-regulatory DNA elements primarily controls the activation or repression of expression of nearby target genes (Du et al. 2013;Lu et al. 2017;Felsenfeld 1995;Boyle et al. 2008).The process of leaf photosynthetic establishment leads to differential expression of genes.However, the cis-regulatory elements and TFs that control these changes in the regulatory network remain unclear.The assay for transposase-accessible chromatin (ATAC)-seq is an emerging method for mapping chromatin accessibility genome-wide (Buenrostro et al. 2015).This method is fast and sensitive, superior to DNase-Seq, and has been used in several species.For example, ATAC-seq has been used to profile differences in chromatin accessibility and TF regulatory networks between stem cells of the shoot apical meristem and differentiated leaf mesophyll cells in Arabidopsis (Sijacic et al. 2018).In rice, ATAC-seq has been used to generate a genome-wide accessible chromatin map and identify novel transcriptional regulatory modules involved in fertility transition (Jin et al. 2020).Cross-species studies using ATAC-seq data from Arabidopsis, Medicago truncatula, Solanum lycopersicum, and rice revealed that the cis-regulatory structures of the four plant genomes are remarkably similar and that TF target gene modules are generally conserved across species (Maher et al. 2018).Overall, ATAC-seq is widely useful for the identification of the open chromatin regions and can contribute to the discovery of cis-elements and the TFs that bind them.
In our study, we first combined anatomical observation and RNA-seq analysis along the developmental gradient of poplar from the newly emerging leaf to the sixth leaf (L1 to L6).These analyses revealed that the critical period of photosynthesis establishment is L1 to L3. Next, we selected L1 and L3 integrating ATAC-seq with RNA-seq to explore the genes with changes in accessible chromatin regions.We obtained 735 differentially expressed genes whose promoter accessibility was changed in the development from L1 to L3.Meanwhile, motif enrichment analysis found 32 upstream TFs.The differentially expressed genes and the upstream TFs may be related to leaf development during photosynthetic establishment in Populus.

Plant sample collection
Populus deltoides × P. euramericana cv 'Nanlin895' was grown in a phytotron with a light and dark cycle of 16 and 8 h at 23 °C under a light density of 150 μE/m 2 /s.Leaf samples were collected from the first leaf (L1), the second leaf (L2), the third leaf (L3), the fourth leaf (L4), the fifth leaf (L5), and the sixth leaf (L6) of plants.Each sample was collected from 10 plants, and three biological replicates were performed on L1-L6 leaf tissue, for a total of 30 plants.Six mixed samples from 30 plants were select for RNA-Seq, and then 2 mixed samples of L1 and L3 from 30 plants for the integrated ATACseq and RNA-seq analysis.Net photosynthetic rate was measured by gas-exchange and fluorescence system (GFS-3000, Walz, Germany) from L1 to L6.

RNA isolation, library preparation, and sequencing
Total RNA from pulverized leaf samples was isolated using the RNAprep Pure Plant Plus Kit (TIANGEN, China).
On-column DNase digestion was performed according to the manufacturer's protocol.RNA degradation and contamination were evaluated on 1% agarose gels and a Nanodrop 1000 spectrophotometer (Nanodrop, Wilmington, DE, USA).RNA integrity number and concentration were checked using an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA).At least 2 μg high-quality RNA per sample was used for the preparation of sequencing libraries using the NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, United States).Sequencing of libraries was performed using standard methods on an Illumina HiSeq X Ten platform.

Transcriptome sequencing (RNA-seq) data processing
We cleaned the raw data by removing reads with adaptor sequences, unknown nucleotides > 5%, or Q20 < 20% using Perl Script.The clean reads were then mapped to the Populus trichocarpa reference genome v3.0 using Tophat2 software (Trapnell et al. 2009).Gene expression levels were estimated by FPKM values (Fragments per Kilobase of transcript per Million fragments mapped) using the HTseq software (Trapnell et al. 2010).DEGs were identified using the DESeq2 R package (Michael et al. 2014).The resulting P values were adjusted using Benjamini and Hochberg's approach to control the false discovery rate (FDR) (Hochberg 1995).Genes with an FDR ≤ 0.01 and |log2(FoldChange)|≥ 1 were considered to be differentially expressed.
K-means clustering was conducted based on the Pearson correlation coefficients for gene expression profiles (Walvoort et al. 2010).Gene ontology (GO) enrichment analysis was performed using topGO (Alexa et al. 2006) with Fisher's exact test.Three GO categories: biological process (BP), molecular function (MF) and cellular component (CC) were annotated by the Gene Ontology database (Ashburner et al. 2000).And the negative log10 transformed P values were visualized using a heatmap as previously described (Xue et al. 2013).

ATAC-seq library preparation
Intact cells were isolated and purified from L1 and L3 leaf samples following a previous protocol (Wilkins et al. 2016).Three biological replicates were prepared for each sample.In brief, leaf tissues were cut into pieces with a sharp blade in lysis buffer and suspended in a homogenous single-cell suspension.After filtration with a 40 μm cell strainer, the homogeneous solution was placed on top of a 60% Percoll solution and centrifuged at 1500g for 30 min.The intermediate phase of the solution after centrifugation was extracted and then diluted with an equal volume of lysis buffer.The diluted suspension was incubated at 4 °C on a rotation mixer for 10 min.Cell pellets were harvested by centrifuging at 800g for 10 min and washed with PBS three times.
Approximately 50,000 purified and intact nuclei were used to perform tagmentation according to standard protocols (Corces et al. 2017), and Tn5 transposed DNA was purified using AMPure DNA magnetic beads.A qPCR reaction was performed on a subset of the DNA to determine the optimum number (average 11) of PCR cycles, and amplified libraries were run on an Agilent Tapestation 2200 (Agilent Technologies) using a D5000 DNA ScreenTape to assess quality by visualizing nucleosomal laddering.The final library was sequenced on the Illumina HiSeq X Ten platform (San Diego, CA, United States) with the 150 PE mode.

ATAC-seq data processing
Prior to data analysis, quality control steps were performed.Trimmed reads were aligned to reference genome using Bowtie2 (version 2.2.6) (Langmead et al. 2012).Reads mapping to the mitochondrial genome were removed using removeChrom (https:// github.com/ jsh58/ harva rd), and PCR duplicates were removed using Picard (version 1.126) (http:// broad insti tute.github.io/ picard/).We checked the insert size distribution of sequenced fragments to evaluate ATAC-seq data, when the insert size distribution had clear periodicity of approximately 200 bp, it suggested many fragments are protected by integer multiples of nucleosomes.We also checked Transcription Start Site (TSS) Enrichment Score to evaluate ATAC-seq data, the TSS enrichment calculation is a signal to noise calculation.if there is high read signal at transcription start sites (highly open regions of the genome) there should be an increase in signal up to a peak in the middle.We take the signal value at the center of the distribution after this normalization as our TSS enrichment metri.
Call peaks on replicates, self-pseudo-replicates using MACS2 (-nomodel -extsize 200 -shift -100) (Zhang et al. 2008).To identify a set of high-to medium-confidence peaks, we selected peaks that overlapped across replicates, between each sample, and between pseudo-replicates from each sample.We ran an irreproducibility discovery rate (IDR) assessment on peaks between each sample and self-pseudo-replicates with an IDR threshold set to 0.05.All called peaks from all samples were merged to acquire a consensus peak set.The R package DESeq2 was utilized to detect potential different accessible regions (DARs) from the consensus peak set.Statistically significant DARs were defined as peaks with log2 fold enrichment > 1 and FDR < 0.05.Motif analysis of DARs was performed using the MEME Suite (Heinz et al. 2010) with default settings.Motifs were only kept when the P value was < 0.01.GO enrichment analysis was performed using the R package 'Cluster Profiler' with default settings.All sequencing tracks were viewed using the Integrated Genomic Viewer (IGV 2.3.61).We classified the DARs present as more open in L3 as gain DARs and the DARs found more open in L1 as loss DARs.

Quantitative real-time PCR
To verify the RNA-seq results, qRT-PCR reactions were performed using SYBR Premix Ex Taq (Takara, Shiga, Japan) on a LightCycler 480 system (Roche, Basel, Switzerland).The PCR protocol was as follows: 95 °C for 30 s of initial denaturation; 30 cycles of 95 °C for 10 s, 60 °C for 20 s, and 72 °C for 20 s; and a final extension step at 72 °C for 20 s.We used PtrActin as the internal control, and the relative gene expression was calculated using the 2 −ΔΔCt method.

Anatomical analysis of Populus leaves
We selected P. deltoides × P. euramericana cv.'Nanlin895' cultivar as a model to elucidate the microscopic changes of cells during the development of woody plant leaves, because it is widely grown in southern China, and it is fast growing and easily transformed (Song et al. 2011;Zhu et al. 2013).We first measured the net photosynthetic rate (Pn) of a newly emerged leaf at LPI 0.0 (hereafter referred to as L1) and five successive leaves below the L1 (referred to as L2, L3, L4, L5, and L6), as described in a previous study (Larson and Isebrands 1971) (Fig. S1).The data showed a rapid increase in net photosynthetic rate from L2 to L5 (Fig. 1p).To further investigate the changes in leaf anatomy during the rapid increase in net photosynthetic rate, we performed semi-thin sections, TEM, and SEM observation of L1 to L5 (Fig. 1).
Observation of semi-thin sections revealed that in L1 the cell layers were complete and regularly arranged in a tight and uniform cell size (Fig. 1a).There was no obvious structural differentiation of the various types of cells in L1, except that the basic structure of main vein of leaf was visible (Fig. 1a).And the primary veins in the leaf blade showed obvious signs of xylem and phloem differentiation in L2 (Fig. 1b).The differentiation of palisade tissue and spongy tissue appeared in L3, and there were obvious intercellular spaces (Fig. 1c-e).Images of the base, middle and tip of the leaf showing the pattern of developmental maturation from the tip to the base (Fig. S2).
To dissect the changes occurring during the formation of various types of cells in poplar leaves, we conducted TEM and SEM of the mesophyll cells and lower epidermis of poplar leaves at different stages of development.Based on TEM observations, numerous small vacuoles were observed in the cytoplasm of L1 (Fig. 1f), which then fused into a large central vacuole (Fig. 1g-j, q).The chloroplast gradually moved toward the periphery of the cell as the vacuoles fused (Fig. 1g, h).
During photosynthesis, CO 2 needs to enter the leaf intercellular space through stomata and eventually enter the chloroplast.Therefore, the development of stomata is crucial for photosynthesis.Previous study showed epidermal cells divide through the stomatal lineage to produce stomata and pavement cells simultaneously, which is a sign of epidermal maturation (Bergmann and Sack 2007).SEM observation of the morphology of abaxial epidermis cells showed that the cells in L1 were dense with rare stomata (Fig. 1k).Cells in L2 showed fewer stomata cells than in L3-L5 (Fig. 1l-o, r).In addition, the epidermal cells of L3 began to expand to form pavement cells, the general type of cells in the epidermis (Fig. 1m, s).And in both L4 and L5, the morphology of the pavement cells were completed (Fig. 1n, o).Through observation of the leaf base, middle, and tip, we found that the tip cells matured earlier than the base cells in the same leaf, and the tip cells of L3 had significantly more stomata than the base cells and that the epidermal cells already had a typical cell morphology (Fig. S4).
In conclusion, the anatomy of the poplar leaves are well equipped for photosynthesis after L3 and the leaf develops from the apical to basal end.

Sequencing of the Populus transcriptome using RNA-seq
To reveal the potential molecular mechanisms that regulate leaf development in 'Nanlin 895', we sequenced the transcriptomes of L1 to L6 of Populus using high-through RNA-seq.Each sample was set up with three biological replicates, and a total of 130.38 Gb of clean data were obtained.More than 93% of the reads were successfully mapped to the Populus genome (Phytozome, P. trichocarpa Pt_v3.0)(Table S1).We identified 15,877 DEGs, of which 13,502 (85.07%) were annotated in the 'Nanlin895' annotation database; these DEGs that were selected for subsequent analysis (Table S2).Pearson's correlation coefficients were higher between biological replicates than between samples, indicating that specific genes are expressed during different stages of leaf development (Fig. S5).
To better investigate the expression patterns of these DEGs in different leaves, we performed K-means clustering, and 13,502 annotated DEGs were classified into six clusters (K1-K6) (Fig. 2a and Table S3).Most genes (4256 genes) were clustered in the K5 cluster; these genes were predominantly expressed in L1 with expression decreasing rapidly from L1 to L3.We propose that these DEGs are probably involved in the rapid division and differentiation of mesophyll cells and vascular cells, which were observed in semi-thin sections of L1 (Fig. 1a-e).Expression levels of DEGs from the K3 cluster (2197 genes) were highest in L1 and L2 and gradually decreased from L2 to L4, which was similar to the expression pattern of genes in K5.The expression levels of DEGs in the K2 cluster (1247 genes) increased progressively from L1 to L3 and then rapidly decreased, possibly indicating that these genes are involved in chloroplasts development, which was observed in TEM analysis (Fig. 1f-j).In contrast, the expression levels of DEGs in K1 (1648 genes) gradually increased from L1 to L3 but reached a steady state in L3-L6.The expression levels of DEGs from the K4 cluster (2694 genes) increased gradually from L1-L6 and reached the highest level in L5-L6, while the expression levels of K6 genes (1460 genes) increased sharply from L3 to L4 and decreased slightly at L4-L6.
To further understand the biological functions of the DEGs in different leaves, we performed GO enrichment analysis of each cluster of co-expressed genes using genes in the updated annotation as the background (Fig. 2b and Table S4).GO enrichment analysis indicated that genes with peak expression in L1 (K5) are involved in "phloem or xylem histogenesis" and "cell cycle", and genes with peak expression in L1 and L2 (K3) are involved in "cell differentiation".For example, cell cycle related genes, namely Anaphase-promoting complex/Cyclosome 11 (APC11), APC6, and APC8 were predominantly expressed in L1 (K5).APC is a multisubunit E3 ubiquitin ligase that ubiquitinates different regulatory proteins during the cell cycle and degrades them through the 26S protein degradation system, and down-regulation of APC11 expression prevents entry into the G2/M phase of cells (Shi and Huo 2012).ARABIDOPSIS THAL-IAN HOMEOBOX8 (ATHB8), ATHB15, SHORT-ROOT (SHR), and SCARECROW (SCR) have similar expression patterns as APC.ATHB8 and SHR are required for the transition of the ground cells to preprocambial cells, which presage vein formation (Gardiner et al. 2011).SHR and SCR Given the immovable nature of plant cells, plant growth and development are reliant on cell expansion rather than cell migration.The new primary cell walls are born within the cells as the cells proliferate and the surface area increases rapidly as the cells expand, which consists of cellulose microfibrils, hemicellulose xyloglucan, and glycoproteins spread throughout the pectin matrix (Ochoa-Villarreal et al. 2012).GO enrichment analysis showed that genes with expression peaking at L1 (K3) and L3 (K2) are enriched in the biological process term "cell wall organization or biogenesis" and the cell component term "cell wall".The genes for Expansin (EXPA), Xyloglucan endotransglucosylase/hydrolase (XTH), and Pectinesterase (PME), showed an important role in cell expansion.Expansins catalyze cell wall loosening without lysis of wall polymers to mediate acid-induced growth (Cosgrove 2015).For example, the expression of EXPA1, EXPA2, and EXPA15 gradually increased from L1 to L3, peaking at L3 and decreasing from L4 onwards (K2).XTH directly regulates the loosening of cell walls and the ability of cells to expand, and XTH8, 9, and 16 also have the same expression pattern in leaves as the EXPA genes in our study, which have been reported to be highly expressed in young to mature rosette leaves of Arabidopsis thaliana (Weraduwage et al. 2018).PME proteins harden cell walls by catalyzing the release of pectins and methanol through demethylesterification of cell wall methylated galacturonic acid (Oikawa et al. 2011).In our study, PME1, PME3, PME22, PME34, and PME59 were peak expressed L3 (K2).Above all, L2 and L3 are probably the key periods of rapid cell expansion.
A previous study showed that in 24 poplar species, the chloroplast genome encodes 130 genes, including 30 genes encoding tRNA and 4 genes encoding rRNA (Zong et al. 2019).The ribosomes required to synthesize the proteins encoded by the chloroplast genome are different from cytoplasmic ribosomes in that they have a 70S sedimentation coefficient and are composed of 30S and 50S subunits, a structure very similar to that of prokaryotic ribosomes.The expression levels of chloroplast ribosomal genes was high in L1 and L2 and decreased beginning at L3 (K3).These genes  and L6).b Gene ontology enrichment analysis of the six clusters (K1-K6).Yellow to red, significant enrichment; white, not significant included Ribosomal protein large subunit (RPL), Ribosomal protein small subunit (RPS), and Plastid-specific ribosomal protein (PSRP).In contrast to these chloroplast ribosomal protein genes, the expression levels of cytoplasmic ribosomal subunits (40S and 60S subunits) and translation elongation factors involved in the translation of genes encoded by the nuclear genome were highest expressed in L1 (K5) and decreased rapidly from L2.The translation of nuclear gene-encoded proteins precedes that of chloroplast genomeencoded proteins suggesting that the complete differentiation of the cells possibly precedes that of the chloroplast.In addition, Epidermal patterning factor 1 (EPF1) and EPF2 were highly expressed in L1 and L2 and began to decrease in expression after L2 (K3), which are negatively regulate stomatal development early in Arabidopsis leaf development (Hunt et al. 2010).Consistent with the SEM observation, we speculate that L2 to L3 is probably the critical period of stomatal emergence.
High expression of these genes in the Populus leaves, suggests that different cell fate decisions may be completed at L1 and rapid cell expansion probably occurred in L2 and L3.Furthermore, the full differentiation of chloroplasts may be later than the full differentiation of cells, and transcriptome data showed genes involved in the establishment of the photosynthetic machinery peak expressed in L3.Those evidences above were consistent with our anatomical observations of the leaf using semi-thin section, SEM, and TEM methods.Thus, we selected L1 and L3 for further research.

Integration of ATAC-seq and RNA-seq data identifies DAR associated DEGs in open chromatin regions
TFs occupying cis-regulatory elements in the open chromatin regions of the genome play a vital role in the regulation of nearby gene expression.To identify genes that regulate the establishment of photosynthesis and the TFs that regulate these genes, we selected L1 and L3 leaf tissues to perform joint ATAC-seq and RNA-seq analysis to explore DEGs and their upstream TFs.Each sample consisted of three biological replicates, and the data from each sample were highly correlated according to Pearson's correlation coefficient, which demonstrated that there was a high degree of reproducibility (Fig. S7AB).The accessible regions identified were mostly enriched near the transcription Start Site (TSS) in the genomes indicated the good quality of the ATAC-seq (Fig. 3a).An average of 113.2 million clean RNA-seq reads were obtained and mapped to the P. trichocarpa reference genome (Table S5).We obtained a total of 11,872 DEGs through pairwise sample comparisons, namely 5908 up-regulated DEGs and 5967 down-regulated DEGs (Table S6).ATAC-seq analysis of the same samples yielded about 38.1 MB clean reads that mapped to the P. trichocarpa reference genome after removing organelle sequences (Table S7).We identified 5605 DARs, namely 2263 gain DARs (DARs present as more open in L3) and 3342 loss DARs (DARs present as more open in L1), through pairwise comparison of DARs using DESeq2 software (Tables S9, S10).
We compared the genomic distribution of ATAC-seq peaks and RNA-seq reads in L1 and L3 using an integrative genomics viewer (IGV).ATAC-seq peaks and RNA-seq reads map to the same regions in L1 and L3 revealed that they have a similar consistency (Fig. S8).The genome-wide functional regions were divided into promoter, gene body (5'UTR, 3'UTR, exon, and intron), downstream of gene body, and distal intergenic regions (Fig. 3b).Gene annotation of the DARs to find the gene represented by the TSS closest to the centre of the DAR.Compared to L1, gain DARs associated to 2090 genes and loss DARs associated to 3053 genes.A previous study reported that promoters are associated with accessible chromatin in plant genomes (Du et al. 2013).We identified 903 genes with increased accessibility in the promoter region (35.45%) and 938 genes with loss of accessibility in the promoter region (23.82%) (Fig. S7CD).Next, we examined whether genes with different accessibility in the promoter region were related to DEGs.A total of 311 up-regulated DEGs and 79 down-regulated DEGs were found in 903 genes with increased accessibility in the promoter region (gain-up and gain-down), while 93 up-regulated DEGs and 252 down-regulated DEGs were found in 938 genes with loss of accessibility in the promoter region (loss-up and loss-down) (Fig. 4a-d and Table S11).Among significant DAR-DEG pairs in the gain-up, gaindown, loss-up, and loss-down categories, change in RNAseq transcript abundance was correlated with change in ATAC-seq signal.(Fig. 4e).Heatmap analysis confirmed the correlation of differential mRNA abundance with differential ATAC-seq signal (Fig. 4f).

Analysis of DEGs with differentially accessible promoter regions
TFs play important roles in leaf development.To identify key candidate transcriptional regulators with accessible regions in promoters involved in leaf development, we counted the number of TF genes with different expression patterns, including gain-down, gain-up, loss-down, and loss-up groups, as described above.Overall, 87 TF genes belonging to 26 families were obtained (Table S12).Of these TFs, 38 gain-up TFs and 31 loss-down TFs were the most obtained, suggesting that chromosome accessibility is positively correlated with gene expression in more cases during L1 to L3 development.For example, most of ABI3VP1, AP-EREBP, bHLH, C2H2, MYB, NAC, and WRKY TFs with gain accessible promoter regions belong to the gain-up group, and most of AP-EREBP, ARF, bHLH, C2H2-Dof, GRF, SBP, and Trihelix TFs with loss accessible promoter regions belong to the loss-down group.
As previous showed, developing leaves undergo coordinated cell proliferation and cell expansion to establish the final size and shape (Wang et al. 2021).Consistent with this, in this study, genes possibly responsible for cell proliferation, such as GRF1, GRF3, GRF6, Dof3.4/OBP1, and ANT were belong to the loss-down group in L3 relative to L1. AtGRF family genes act as positive regulators of cell proliferation to control the size and shape of leaf organs (Horiguchi et al. 2005;Lee 2006;Kim et al. 2003).Overexpression of AtGRF1 was reported to cause the formation of larger leaves and, conversely, the grf1grf2grf3 triple mutants had in narrow leaves caused by a reduction in cell number (Kim et al. 2003;Kende 2004).OBP1 is directly targeted to the core cell cycle gene CYCD3;3 and the replication-specific TF gene AtDOF2;3; it also functions as a transcriptional regulator of key cell cycle genes within the cell cycle re-entry process (Skirycz et al. 2008).The loss-of-function mutant ant has smaller lateral shoot organs than wild type because of a decrease in cell number (Mizukami and Fischer 2000).Several auxin-associated TF genes were identified which had a loss-down pattern, such as ARF2 and ARF5.Loss of function of ARF2 increases leaf size by promoting cell proliferation, and plants with expressing a gain-of-function form of ARF without domains III and IV have narrow leaves and parallel veins (Krogan et al. 2012;Schruff et al. 2006;  Okushima et al. 2005).Genes possibly responsible for cell expansion, such as NGA1 and NGA3, had a gain-up expression pattern.Previous studies have shown that the TCP-NGA regulatory module controls leaf expansion, and loss of function of NGAs results in the formation of wider than normal leaves (Alvarez et al. 2016;Ballester et al. 2015).
In previous studies, NAC and MYB family TFs have been shown to play essential roles in secondary cell wall biosynthesis in the vascular plant Arabidopsis (Zhong 2010).In the present study, all NAC family genes and 9 of the 14 MYB TFs were up-regulated in L3 relative to L1, implying a potential role for NAC and MYB family TFs in the regulation of leaf morphogenesis.For example, NAC035 is involved in regulation of cell size, and overexpression of this gene causes the formation of leaves and stems with smaller cells (Kato et al. 2010).In contrast, the MYB family down-regulated gene AS1 in loss DARs is involved in the control of cell division and early cell differentiation in leaves (Sun et al. 2002;Byrne et al. 2000).The maize RS2 gene is homologous to AS1, and it regulates leaf morphology by repressing homeobox gene expression (Richard Schneeberger et al. 1998).
Many non-TF genes which are differentially expressed in leaf development are enriched (Fig. S9 and Table S13).For example, the gain-up group, which is enriched in the GO term "regulation of cell size," includes the aquaporin genes SIP1-2 and PIP2-1.The BEL1-like homeodomain protein BLH2 is annotated to the GO term "leaf morphogenesis", and the xyloglucan endotransglucosylase/hydrolase protein genes XTH6 and XTH23 are annotated to the GO term "cell wall macromolecule metabolic process".In contrast, genes in loss-down group have different functions.The auxinresponsive protein genes IAA26 and IAA33 are annotated to the GO term "auxin-activated signaling pathway", and XTH32 and the mannan endo-1,4-beta-mannosidase gene MAN7 are annotated to the GO term "cell wall macromolecule metabolic process".XTH32 has an expression pattern opposite to that of XTH6 and XTH23.The differentially expressed genes may be related to leaf development.
To confirm the reliability of the DEGs with differentially accessible promoter regions, we randomly selected NGA3, NGA1, NAC035, OBP1, KAN2, AS1, GRF6, ANT, and PEAR2 among those for qRT-PCR.The results showed that the expression patterns of all these genes were consistent with the data we obtained (Fig. 5).And the chromatin accessibility patterns between ATAC-seq (L1) and ATACseq (L3) indicated that the DEGs expression might be regulated by key TFs that play a vital role in leaf development.

Highly enriched upstream transcription factor-binding motifs in open promoter regions in L1 and L3
The open chromatin regions were enriched for binding sites of TFs that play important roles in Populus leaves.To determine which upstream TFs bind to specific motifs in the open chromatin regions in the promoters of the overlap genes, we performed motif enrichment analysis, and the TFs that potentially bind the enriched motifs are shown in Fig. 6 (E-value ≤ 0.05).In total, 17 motifs were enriched in the gain DARs in the promoter regions of DEGs (gain DAR DEGs), and 15 motifs were enriched in the loss DARs in the promoter regions of DEGs (loss DAR DEGs).Gain DAR DEGs were enriched in motifs for various TFs, including TCP family members TCP8, TCP19, AT-Hook family members AHL25, AHL20, and AHL12, bZIP family members HY5, ABF2, ABF1, GBF2, and bZIP28, and HD-ZIP family As previous studies reported, ATHB23 was expressed in the adaxial region of young leaves (Kim et al. 2007), and the encoded protein can interact with phytochrome B, which is important for phytochrome B-mediated red light signaling Motifs with an E-value equal to or less than 0.05 were considered significant (Choi et al. 2014).ATHB20 is expressed early in procambial development, and its expression domain is limited by MONOPTEROS, which controls vascular differentiation (Mattsson et al. 2003).ELONGATED HYPOCOTYL5 (HY5) acts as a master regulator of a light-mediated transcriptional regulatory pathway, and it regulates the transcription of a large number of genes by directly binding to cis-regulatory elements.For example, HY5 and BBXs (B-box proteins) form a complex transcriptionally regulatory network that promotes photomorphogenesis in Arabidopsis (Xiao et al. 2021).In agreement with this, the binding regions of HY5 was more open in the L3 relative to the L1, implying that many target genes are under the regulation of HY5 to affect leaf photomorphogenesis.In addition, TFs in loss DARs were most enriched in Dof TF family.For example, Dof5.6/HCA2 (HIGH CAMBIAL ACTIVITY2) in Populus revealed that overexpression of HCA2 results in a significant reduction in leaf epidermal cell area, a more compact arrangement of cells in the palisade and spongy tissue, and decreased photosynthesis (Zhao et al. 2022).Combined with our findings, this revealed that HCA2 is upstream targeting many genes, regulating mesophyll development, and influencing photosynthesis.VDOF2 (VASCULAR-RELATED DOF2) is expressed in vascular tissues, which negatively regulates vein formation in Arabidopsis seedlings (Ramachandran et al. 2020).Consistent with our study, the binding region of VDOF2 is lost at L3 relative to L1, suggesting that many target genes are negatively regulated by VDOF2 to manipulate the vascular development of the young leaves.
To illustrate the potential regulatory network, we plotted the partial regulatory networks of the top 4 gain DAR DEGs, including TCP8, TCP19, AHL25, and AHL20, and top 4 loss DAR DEGs, including TMO6, VDOF2, PEAR1, and HCA2 and their respective target genes in the comparisons of L1 and L3 (Fig. S10).As the regulatory network showed, many DEGs were common target genes among different TFs, especially in the same TF family, indicating the close relationship between genes.For example, ATHB-15, CESA2, SAUR67, BHLH10, and MOR1 were common target genes among TCP8, TCP19, AHL25, and AHL20 TFs.And ARF2, PIP2-1, NAC100, LAX5, and ORC6 were common target genes among TMO6, VDOF2, PEAR1, and HCA2 TFs.And detailed information on regulatory relationships of gain DAR DEGs and loss DAR DEGs and their target genes was shown in Table S14.

Conclusions
In this study, the observation of the anatomical structure of Populus leaves and the exploration of leaf-specific transcriptomic changes using RNA-seq revealed that L1 to L3 is the critical period of leaf photosynthetic establishment.Next, by integrating ATAC-seq data and RNA-seq data from L1 and L3 leaf tissue, we were able to characterize DEGs containing specific open promoter regions, which were involved in leaf development, such as GRF1, GRF3, GRF6, Dof3.4/OBP1, and ANT that affect cell proliferation, NGA1 and NGA3 that affect cell expansion, and ARF2 and ARF5, which are associated with the auxin response.Analysis of motif enrichment obtained various TFs regulating these DEGs, mainly included TCP, AT-Hook, bZIP, and HD-ZIP TFs in gain DAR DEGs, and Dof, BBR-BPC, and MYB TFs in loss DAR DEGs.There are many DEGs and their possible upstream TFs that we have identified as being involved in leaf development through our study, but the regulatory network remains to be explored.This study provides a useful resource for future studies on optimizing leaf anatomy to improve photosynthetic efficiency.

Fig. 2
Fig. 2 K-means clustering and gene ontology enrichment analysis of differentially expressed genes between L1 to L6. a Plots of K-means clusters showing transcriptome expression profiles.Six clusters were identified based on expression levels in six sequential leaves (L1, L2,

Fig. 3
Fig. 3 Accessible chromatin profiling by ATAC-seq.a ATAC-seq signal enrichment around the transcription start site (TSS) ± 3000 bp in L1 and L3 samples.b The genome-wide distribution of differential

Fig. 4
Fig. 4 Integration of ATAC-seq and RNA-seq results.a-d Venn diagram showing the overlap gene number between DARs-associated genes in promoter regions identified by ATAC-seq and DEGs identified by RNA-seq.The numbers of down-regulated DEGs with gain DARs in promoter region (a), up-regulated DEGs with gain DARs in promoter region (b), down-regulated DEGs with loss DARs in promoter region (c), and up-regulated DEGs with loss DARs in promoter

Fig. 5
Fig. 5 Quantitative real-time PCR validation of DEGs with differentially accessible promoter regions in Populus L1 and L3.Nine genes were selected for the test included NGA3, NGA1, NAC035, OBP1, KAN2, AS1, GRF6, ANT, and PEAR2.The images on the left of the

Fig. 6
Fig. 6 Upstream TFs binding motifs enriched in promoter-specific open chromatin regions.a, b The motifs enriched in gain DAR found in DEG promoters (a) and loss DAR found in DEG promoters (b) are shown.Position weight matrix (PWM) describes the probability