Early spring onset increases carbon uptake more than late fall senescence: modeling future phenological change in a US northern deciduous forest

In deciduous forests, spring leaf development and fall leaf senescence regulate the timing and duration of photosynthesis and transpiration. Being able to model these dates is therefore critical to accurately representing ecosystem processes in biogeochemical models. Despite this, there has been relatively little effort to improve internal phenology predictions in widely used biogeochemical models. Here, we optimized the phenology algorithms in a regionally developed biogeochemical model (PnET-CN) using phenology data from eight mid-latitude PhenoCam sites in eastern North America. We then performed a sensitivity analysis to determine how the optimization affected future predictions of carbon, water, and nitrogen cycling at Bartlett Experimental Forest, New Hampshire. Compared to the original PnET-CN phenology models, our new spring and fall models resulted in shorter season lengths and more abrupt transitions that were more representative of observations. The new phenology models affected daily estimates and interannual variability of modeled carbon exchange, but they did not have a large influence on the magnitude or long-term trends of annual totals. Under future climate projections, our new phenology models predict larger shifts in season length in the fall (1.1–3.2 days decade−1) compared to the spring (0.9–1.5 days decade−1). However, for every day the season was longer, spring had twice the effect on annual carbon and water exchange totals compared to the fall. These findings highlight the importance of accurately modeling season length for future projections of carbon and water cycling.


Introduction
In deciduous forests, the timing of seasonal changes in leaf area is closely tied to ecosystem-level shifts in carbon, water, and nutrient cycling. When new photosynthetic tissues (leaves) emerge in the spring, deciduous forests transform from carbon sources to carbon sinks, and from evaporation-to transpiration-dominated systems. New leaves cause abrupt changes in canopy structure, affecting ecosystem-level sensible and latent heat exchange (Hogg et al. 2000), surface roughness (Young et al. 2021), and albedo (Ollinger et al. 2008;Hollinger et al. 2010). In the fall, as deciduous forests transition back to dormancy, they also transition to a more passive exchange of carbon, water, and nutrients. The timing of these transitions is critically important because they regulate how long deciduous forests remain active, directly affecting annual carbon uptake Communicated by Rodrigo Vargas.
We improve phenology algorithms in ecosystem-level models using empirical observations from multiple data streams. The new phenology algorithms affected daily estimates and interannual variability of modeled carbon exchange, but did not have a large influence on the long-term trends of annual totals. Using the improved phenology algorithms, we assessed the relative influence of spring and fall on future ecosystem carbon exchange.
1 3 Richardson et al. 2010) and water-use (Zha et al. 2010). Therefore, being able to predict the timing of seasonal transitions in deciduous forests is of great importance for biogeochemical modeling and future projection of land-atmosphere exchange .
Growing seasons in deciduous forests are already lengthening in response to climate change worldwide (Piao et al. 2019). The timing of seasonal transitions in the spring and fall (i.e., the dates and duration of phenology changes) are particularly sensitive to changing temperatures. Increasing global temperatures are causing spring to arrive earlier (Jeong et al. 2011) and fall to arrive later (Liu et al. 2016) compared to just a few decades ago. These longer growing seasons have led to higher rates of annual gross primary productivity (GPP), and net ecosystem productivity (NEP) (Richardson et al. 2010;Dragoni et al. 2011;Keenan et al. 2014). However, increasing temperatures are also associated with higher rates of ecosystem respiration (Duveneck and Thompson 2017) and water use (Beamesderfer et al. 2020) that can counteract increases in forest carbon uptake (e.g. Sanders-Demott et al. 2020). Therefore, the future effects of longer growing seasons on ecosystem-level carbon and water cycling are not fully understood. Predictions range from slight increases in NEP (Richardson et al. 2009) to stable long-term trends due to increased rates of ecosystem respiration (Piao et al. 2007). Thus, future warming conditions will likely result in tradeoffs between increases in photosynthetic uptake and increases in ecosystem respiration.
Biogeochemical models are frequently used to better understand how ecosystems use carbon, water, and nutrients under current and future conditions. These models range in complexity from simple processes at individual sites to complex land-atmosphere feedbacks at the global scale. Although biogeochemical models are widely used to make inferences about ecosystem functioning, these models tend to have poor representations of phenology, oftentimes overestimating growing season length (Levis and Bonan 2004;Randerson et al. 2009;Richardson et al. 2012;Chen et al. 2016). Overestimating growing season length directly affects the amount of modeled carbon, water, or even nutrients exchanged in biological processes. Integrating empirical observations of phenology into existing biogeochemical models can overcome these weaknesses to represent growing season lengths more accurately (Reyer et al. 2013).
The phenology algorithms within biogeochemical models are used to simulate changes in leaf area index (LAI). Spring models conventionally trigger the onset of leaf development based on the accumulation of temperature over a set threshold (e.g., growing degree days, GDD). Simple spring models use accumulated warming temperatures (e.g., Lany et al. 2016;Menzel et al. 2006;Miller-Rushing and Primack 2008), but may also include photoperiod, water balance and/ or chilling requirements (Morin et al. 2009;Körner and Basler 2010). Like spring, fall senescence is also regulated by temperature  and photoperiod (Estiarte and Peñuelas 2015). Fall phenology models use cold temperatures under a certain threshold, accumulating in a manner similar to GDD, called chilling degree days (CDD). Compared to spring, fall senescence is less well characterized in modeling efforts. This is in part because it has received less attention in the literature, and is less predictable because leaf senescence can be more suddenly affected by harsh weather such as frost, or wind (Gallinat et al. 2015). Despite many advances in modeling spring and fall phenology (e.g., Basler 2016;Hufkens et al. 2018), incorporating more sophisticated phenology models into biogeochemical models has lagged as a priority in model development.
Using empirical data to train phenology models is becoming more practical as the availability of phenology data has increased with automated methods of observation . Near-surface remote sensing is a powerful way to monitor phenology in deciduous forests in a consistent and reproducible manner. Digital repeat photography (i.e., time lapse) is one example of near-surface remote sensing offering a cost-effective, automated tool for visually monitoring forest ecosystems from years to decades at an intermediate scale of observation between ground-based observations and satellite-based remote sensing . The PhenoCam Network provides near-real time, open-access digital repeat photography and developed techniques to extract data that can quantify seasonal changes in vegetation phenology based on a greenness index (Sonnentag et al. 2012).
In this project, we use PhenoCam data to calibrate phenology models to regional long-term datasets, then use the new models to ask the following: (1) how does altering the spring and fall phenology algorithms affect future modeled carbon, water, and nutrient exchange in a northern deciduous forest? And (2) does phenological variation in spring or fall have a larger influence on modeled ecosystem-level carbon, water, and nutrient cycling? We predicted that altering the phenology algorithms would shorten the period of modeled photosynthetic activity and therefore reduce the amount of modeled carbon, water, and nutrients used annually. We hypothesized that longer springs would have a larger effect on carbon and water cycling compared to fall because daylength in the spring is longer and rates of incoming solar radiation are higher compared to the fall (Zhang et al. 2020). By addressing these questions, we improve our understanding on the effects of changing climate on phenology and how that regulates important ecosystem processes.

Methods
We used existing near-surface remote sensing data from eight different northern forests of North America to train spring and fall phenology models (Fig. 1). Using data from several different northern deciduous forests allows us to use more years of data to train models and makes models that are more generalizable across region. Each of the forests used for model training had some combination of maple (e.g., Acer saccharum and Acer rubrum), beech (Fagus grandifolia), and birch (e.g., Betula alleghaniensis and Betula papyrifera) species in the overstory. Each site had between 6 and 14 years of PhenoCam data available for training models, in total we used 96 site years of data to train the models.

Detecting transition dates using PhenoCam
We estimated transition dates in the spring and fall of each site year using long-term PhenoCam data . Provisional data from the PhenoCam website (phenocam.nau.edu) were collated and used to estimate transitions at the beginning and end of each of the spring and fall transitions using phenocamR package  in the R Statistical Software (R Core Team 2020). We first extracted greenness values from a region of interest in PhenoCam images that was representative of the forest canopy, an approach that is well suited for canopy-level, or "species ignorant" modeling. The relative greenness of a pixel (i.e., green chromatic coordinate or Gcc) in each region of interest is calculated by dividing the average green intensity value by the sum of the red, blue and green intensities across all pixels. To reduce the noise associated with illumination and suboptimal lighting, we extracted the 90 th percentile of the Gcc across a 3-day moving window . For more information on processing digital images to extract phenology dates, see Sonnentag et al. (2012). In the spring, we estimated the transition dates as the dates Gcc values reached the 10% and 80% of the maximum seasonal amplitude. In the fall, we found greenness was unable to predict transition dates late enough to capture the end of leaf senescence (based on ground observations at Hubbard Brook and measured leaf area index from Bartlett Experimental Forest [see Toda and Richardson 2018]). We found that using yellow chromatic coordinate (Ycc; equivalent to 1 minus the blue chromatic coordinate, Kosmala et al. 2016) was better suited for modeling the end of leaf senescence because it could predict fall transition dates later than Gcc (for more information, see Fig. S1 in the Supplementary Materials). For this reason, we used a combination of Gcc and Ycc to predict the start and end of fall senescencethe start of the fall transition was estimated as the date the Gcc reached 50% of the seasonal amplitude, and the end of the fall transition was the day Ycc reached 50% of the seasonal amplitude.

Model selection, calibration, and validation
We evaluated a variety of phenology transition date models that represent a range of environmental drivers and levels of complexity (Table 1). The spring models have been evaluated in many different forest types and represent some of the best-known phenology models used to predict the spring transition. The fall models have not been widely tested (but see Richardson et al. 2006;Delpierre et al. 2009) and the drivers of fall senescence in mesic deciduous forests are not well known. Our models require a combination of accumulating warming temperature, photoperiod, and accumulating chilling temperatures.
The spring and fall phenology models were calibrated using the phenoR package . We first parameterized models for the beginning and end of the season based on dates extracted from PhenoCam: 10% of the seasonal Gcc amplitude in the spring and 50% of the seasonal Ycc amplitude in the fall. Then, to model the duration of the transition period, we trained models to predict the end of the spring transition, and the beginning of the fall transition by holding all other parameter values constant except for the critical value of accumulating temperature. By doing so, we are effectively modeling four dates for each year-the initiation and completion of both spring onset and fall senescence for each model. Modeling the beginning and end of phenology transitions is important because the rate of leaf development or senescence can vary independently from year-to-year (Klosterman et al. 2018). We optimized models using a generalized simulated annealing algorithm (GenSA package; Xiang et al. 2013) in the phenoR package that identifies the global minima of different combinations of parameter values for each model. We ran 25 chains (i.e., individual ensembles) of 40,000 iterations and identified the parameter sets with the lowest root mean square error (RMSE) as the parameters that best fit the data.
The calibrated models were then evaluated against the ground-based observations of spring and fall phenology transitions from Hubbard Brook Experimental Forest, in the White Mountains of New Hampshire, USA (Likens et al. 1970;Fahey et al. 2005). Spring leaf development and fall leaf senescence data were collected every 3-7 days during spring and fall since the early 1990s. Individual marked trees were given a score of 0-4 for both spring and fall transitions; 0 corresponding to no leaves and 4 corresponding to fully developed leaves (see Richardson et al. 2006). We compared estimates from our calibrated models to the linearly interpolated ground observations that ranged from 0-4. In the spring we compared our dates to the date the ground observations reached a 3 out of 4 (50% canopy closure), and in the fall we compared our dates to the date ground observations reached a 1 out of 4 (no more green in the canopy, half of the leaves have fallen) -see Richardson et al. (2006) for more information.
We selected the three best performing spring models for future projections based on the lowest Akaike information criterion (AIC) values. We included three spring models (each representing different drivers), and both fall models for future projections of phenology transitions and ecosystem processes. We included more models than just the top performing spring and fall models (i.e., also including models with ΔAIC > 2) for quantifying uncertainty of our model outputs.

Future phenology projections
We modeled phenology transition dates into the future at Bartlett Experimental Forest to year 2100 with an ensemble of six different future climate scenarios from NA-CORDEX data archive (Mearns et al. 2017). The archive offers a selection of different downscaled future climate projections based on General Circulation Model (GCM) simulations in the Coupled Model Intercomparison Project, Phase 5 (CMIP5) multi-model ensemble (Maurer et al. 2007). We selected six climate models based on the criteria that the models (1) had daily data resolution, (2) had representative climate pathways (RCP) 4.5 and 8.5, and (3) were bias corrected based on Daymet gridded datasets. The Daymet corrected datasets were chosen because the phenology transition models described above were fit using Daymet data in the phenoR package-this minimized biases associated with transitioning from historic data to future climate projections. These two RCPs represent two projections of atmospheric greenhouse gas concentrations under high (RCP 8.5) and intermediate (RCP 4.5) human emission scenarios.

Biogeochemical modeling in PnET-CN
Finally, we performed a sensitivity analysis to better understand how phenology models affect rates of modeled carbon, water, and nitrogen cycling. We used the Photosynthesis and Evapotranspiration Model (PnET)-a simple biogeochemical model developed in New England forests (Aber and Federer 1992;Aber et al. 1996). The family of PnET models is built on the relationships that maximum photosynthetic rate is a function of leaf area and foliar nitrogen concentrations (see Ollinger et al. 2008) and that stomatal conductance is a function of the actual photosynthesis rate. We used a daily version of the PnET-CN model, a model that has been adapted to run at the daily timescale and includes both carbon (C) and nitrogen (N) cycling (Aber et al. 1997).
PnET-CN has been tested in several biomes (Thorn et al. 2015) and has previously been validated at Bartlett Experimental Forest (Ollinger and Smith 2005). In the remainder of this manuscript, we refer to the daily version of PnET-CN simply as PnET. Previous work has shown the default spring phenology model in PnET predicts spring onset too early, and the duration of leaf development lasts too long (Chiang and Brown 2007;Klosterman et al. 2018). Less is known about the performance of PnET's fall phenology models. We replaced the default phenology models with new spring and fall phenology models described above, and ran the model to year 2100 to assess the effects the phenology algorithm has on model output. The default spring phenology model for spring onset in PnET is a simple growing degree day (GDD) model (i.e., an ecodormancy model) that begins on day of year 61 and accumulates daily temperatures above 0 °C until a userdefined parameter threshold is reached. This is similar to the Thermal-time (TT) model used in model selection and validation section. Once leaves emerge, they continue to develop until a second user-defined GDD threshold is reached. Fall senescence in PnET is indirectly affected by temperature and is triggered by the date that leaves in different canopy layers reach a negative carbon balance (due to cooling temperatures and shortening day lengths). We used a similar two-tiered approach of simulating the start and end of the spring and fall transitional periods.
We replaced the default phenology models in PnET with regionally calibrated models while keeping all other inputs and drivers constant. We ran the model at Bartlett Experimental Forest, hindcasting from 2004 to 2020 and forecasting to year 2100. This site was selected because an existing AmeriFlux tower established in 2004 (Ouimette et al. 2018;Richardson 2004) allows us to evaluate model performance compared to eddy covariance data. For the period overlapping with tower measurements (2004-2020), we used temperature, precipitation, and solar radiation from the DayMet interpolated dataset as PnET model drivers ). For future model runs, we used projections of temperature, precipitation, and solar radiation from one of the NA-CORDEX future climate projections. We selected the projection with the lowest bias compared to Daymet data over the 2004-2020 period. The projection (EC-Earth RCA4) had an average level of warming for RCP 8.5 over the next 80 years (~ 0.5 °C per decade) compared to other models. We held other model drivers constant (atmospheric CO 2 , O 3 , and deposition of NH 4 , and NO 3 ) to remove variability not directly related to phenology. We used input parameters initially developed by Aber et al. (1996) from site-specific measurements from Harvard Forest and later refined by Ollinger et al. (2002) for implementation of PnET at Hubbard Brook. Vegetation-specific input parameters such as foliar nitrogen and specific leaf weight were developed directly from field measurements in the region.
We compared the daily modeled GPP, NEP, and ecosystem respiration to the corresponding daily tower based estimates from 2006 to 2020. We then compared how annual totals of modeled carbon exchange, as well as evapotranspiration (ET), water-use efficiency (WUE), and nitrogen mineralization (Nmin), were affected by altering the phenology subroutine. To ensure the model runs were close to equilibrium, we used long spin-up times (i.e., the length of time a model is run since initiation, by recycling climate driver data) and chose not to include past disturbance after spin-up was complete. Using long spin-up times is standard practice for these types of ecosystem-level biogeochemical modes because they alter carbon and nutrient pools to balanced levels that more accurately represent natural systems. However, long spin-up times result in close to "steady-state" conditions. We decided to use the long spin-up times without past disturbance simulation because they resulted in more realistic carbon and nutrient pool sizes, albeit they also resulted in modeled NEP that was closer to steady state than the eddy covariance data suggest [Bartlett Experimental Forest is a consistent carbon sink (Ouimette et al. 2018)]. We account for this when comparing model outputs relative to fixed phenology models (i.e., spring and fall transitions occurring at the same day of year), because these differences isolate the effects of the phenology models on carbon, water, and nutrient cycling outputs.

Model selection and validation
By training spring and fall phenology models based on data from the eight regional PhenoCam sites, we found our spring models were better at predicting transition dates compared to fall models (Fig. 2) based on root mean square error (RMSE) and r 2 values. The spring models consistently fit the data better (with an average RMSE of 6.7 days) than the fall models (an average RMSE of 8.7). We determined the best fitting spring models were the sequential M1 (SM1), the parallel (PA), and the photothermal-time (PTT) models based on AIC (Table 2). These three models required different drivers, only sharing warming temperature as the common driver across models. More complex spring models (including photoperiod and, or chilling requirements) tended to outperform the simplest models (i.e., those requiring only warming temperatures) but having more variables or parameters did not always result in improved model fits based on AIC. For example, the PTT model fit the spring validation dataset as well as any other model (based on RMSE and r 2 ) but required a third of the parameters of the other top models. For the fall models, the simpler model that only requires accumulating cold temperature (CDD) performed better than the other fall model which also includes photoperiod based on RMSE or r 2 Models performed slightly better for the validation dataset (i.e., 28 years of ground observations at Hubbard Brook) compared to the regional training dataset (i.e., 96 sites years of PhenoCam data from 8 sites). This  validation shows our models captured the magnitude and variability of an independent dataset reasonably well (Fig. 3), justifying the conclusion that the models are not overfit to the training dataset. This also suggests that models trained using near surface remote sensing data (in this case, PhenoCam data) can be validated using independently derived ground-based measurements that rely on entirely different methods (also see Fig. S2).

Future phenology projections
Each of the calibrated models predicted earlier spring and later fall transition dates under all future climate scenarios ( Fig. 4) with larger shifts in growing season length during the fall compared to spring. Spring models predict an average advancement of 0.9 and 1.4 days decade −1 for RCP 4.5 and RCP 8.5, respectively. In comparison, the 1 3 average fall senescence dates were 1.2 and 2.9 days later decade −1 for RCP 4.5 and 8.5, respectively. Based on this ensemble of future climate scenarios, our models predict the growing season in northern deciduous forests of North America will lengthen by 2-5 days decade −1 depending on the climate scenario, with more pronounced changes projected in the fall compared to the spring.

Implications for daily PnET model output
Incorporating new spring and fall models into PnET resulted in transition dates that matched the interannual variability of observed phenology dates better than the original PnET phenology algorithm. The unrepresentatively high interannual variability predicted by PnET suggested the original phenology algorithm was overly sensitive to temperature compared to the new models. The original PnET spring algorithm performed reasonably well at estimating the beginning of the spring transition dates compared to the validation dataset at Hubbard Brook (r 2 = 0.44; RMSE = 10.3), but not as well as the new spring models (r 2 = 0.69-0.73; RMSE = 3.6-4.3). For the fall, the original PnET algorithm did well at predicting the average end of fall date (October 21st compared to the average observed date of October 19th) at Hubbard Brook, but the high variance in PnET's estimates resulted in overall poor fits with observed transitions (r 2 = 0.03; RMSE = 9.3). Therefore, we found our new models performed much better at predicting independent ground observations compared to PnET.
Incorporating new spring and fall phenology models into PnET also resulted in shorter phenology transitions that had better agreement with transition duration estimated from PhenoCam data at Bartlett Experimental Forest. The original PnET spring phenology model tended to predict the onset of spring too early in the season (27 April) followed by a disproportionately slow leaf development ending much later than observations (25 Jun). This almost 2 months of leaf development was uncharacteristically long compared to the average 16-day spring transitions estimated from PhenoCam. The new spring phenology models resulted in the initiation of leaf development starting later (08 May) and end of leaf development ending much earlier (25 May) than the original PnET model, resulting in a leaf development period of 16-17 days on average. The duration of the Fig. 4 Projected spring and fall transition dates from 2020 to 2099 using two different representative concentration pathways (RCP). Light gray lines represent individual projections from the CMIP5 future climate projections from the NA-CORDEX Program (Mearns et al. 2017). The ensemble mean (average of the six individual climate projections) is in black and standard deviation around the mean is represented with a colored ribbon. The spring transition dates are in red and fall transition dates in blue. The equation represents a least squares linear regression of the ensemble mean over time with the y-intercept representing the projected transition date at year 2020. All regression lines are significant at an alpha of 0.05 fall transition period using the PnET model was also longer than using the new models. The PnET fall phenology model started fall senescence approximately around the same time as the new models (26 Sep) but the end of fall senescence was on average 9 days longer (25 Oct) than the new models. As a result, the PnET fall transitions lasted nearly a month, while the observations and new model predictions lasted approximately 3 weeks. The early initiation in spring, and late fall senescence from PnET phenology models resulted in longer season lengths overall, but the long duration of transition periods resulted in a shorter period with maximum LAI (Fig. 5). The more rapid phenology transitions resulting from the new models showed clear improvements in spring LAI, but for the years with LAI data available, the PnET fall models resulted in better fits. However, there were other problems associated with fall transitions produced by PnET. The beginning of the fall transition was a fixed date (i.e., no interannual variability), and the end was sometimes 2 weeks later than observations from PhenoCam. As a result, the duration and the end of the fall transitions were found to be unrealistically variable.
We found that altering the phenology algorithms in PnET resulted in relatively small improvements between daily modeled and measured ecosystem carbon exchange (Fig. 5). The most notable improvements were found between modeled and measurement NEP. The more rapid phenology  Toda and Richardson (2018) transitions from the new models resulted in better agreement with measured NEP. The slow transitions from the original PnET models resulted in unrepresentively slow increases in NEP, and in the late fall senescence resulted unrealistically late net carbon uptake (Fig. 5). However, daily GPP and ecosystem respiration were less impacted by the phenology algorithms than we had expected; the original PnET phenology subroutine produced daily estimates of carbon exchange that performed as well as using the new phenology models. The slower transitions used in the original PnET model fit the gradual increases and decreases found in tower-based measurements of GPP and ecosystem respiration reasonably well (Fig. 5). As a result, we found the new spring and fall models resulted in minimal differences in the agreement between modeled and measured values of GPP and ecosystem respiration. Our models demonstrated smaller advancements (2.6-3.6 days decade −1 ) in spring compared to the PnET spring model (advancing at 3.8 days decade −1 ) using the projection with the lowest bias compared to Daymet data (EC-Earth RCA4, RCP 8.5). In the fall, the PnET phenology model predicted the end of canopy senescence around the same time as the new models, but with much higher interannual variability than the new models. The original PnET fall model predicted smaller future changes in fall transitions than new models, with fall arriving 1.1 day decade −1 later compared to 1.9-2.0 days decade −1 using the new models.

Implications for annual PnET model outputs
Despite the differences in phenology model structure, the new phenology models had relatively small effects on annual ecosystem carbon and water cycling. The effects of longer growing seasons using the original PnET phenology algorithm (i.e., earlier spring onset and later fall senescence compared to the new models) were compensated by longer and slower transitions to and from maximum LAI (i.e., the right answer was obtained for the wrong reason). As a result, there were only subtle differences in annual outputs of carbon and water cycling resulting from altering the models. Compared to other annual model outputs, the largest differences were seen in modeled GPP and NEP (Fig. 6). New models resulted in lower interannual variability of GPP and NEP compared to the original PnET phenology model. Using the original PnET phenology models resulted in modeled GPP increasing at a slightly faster rate (4.3 ± 0.5 g C m −2 year −1 ) compared to the new models (3.7 ± 0.1 g C m −2 year −1 ). This relatively small difference in modeled GPP over the sampling period resulted in a very slight increasing trend for modeled NEP using the PnET phenology model (0.1 ± 0.5 g C m −2 year −1 ), compared to the slight decreasing trend in NEP using the new models (− 0.3 ± 0.2 g C m −2 year −1 ). These differences demonstrate that incorporating new phenology models does indeed result in differences in annual modeled GPP and NEP over long timescales. Altering the phenology models had smaller effects on projections of water use (transpiration and WUE) and nitrogen cycling (Nmin). Similar to the carbon exchange variables, the timing of transpiration was altered with the optimized phenology models and concentrated over a shorter period, but over annual timescales the effects were fairly small. As a result, the future projections of annual transpiration and WUE show minor sensitivity to altering the PnET phenology model to the new models.
By comparing the output using one fixed transition (i.e., no interannual variability of spring or fall transitions) to output with both seasons fixed to constant transition dates, we can isolate the effect of either spring or fall transitions on model output. Using these fixed phenology dates, we found that longer seasons in the spring had larger effects on modeled carbon and water cycling compared to longer seasons in the fall (Fig. 7). For every day the season was extended in the spring, GPP increased by an average of 10.2 ± 0.2 g C m −2 compared to 4.8 ± 0.1 g C m −2 for the fall. Extended seasons in the spring or fall had smaller effects on ecosystem respiration than GPP, with slopes of 4.4 ± 0.1 g C m −2 day −1 and 3.3 ± 0.1 g C m −2 day −1 for spring and fall, respectively. As a result, NEP was also more sensitive to the spring transition date compared to fall, increasing by 5.7 ± 0.1 g C m −2 day −1 and 1.5 ± 0.1 g C m −2 day −1 for spring and fall, respectively. We found similar effects on water cycling; longer growing seasons in the spring had larger effects on transpiration (0.29 ± 0.005 cm m −2 day −1 ) compared to fall (0.14 ± 0.003 cm m −2 day −1 ), as with water use efficiency. The longer seasons had relatively little effect on nitrogen mineralization on annual scales. Based on these models' outputs, our analysis shows that for each day the season gets longer, spring can have as much as double the effect on the net carbon uptake and water use compared to fall.
Our calibrated phenology algorithms predicted springs will have smaller shifts in growing season length compared to the fall (Fig. 4, in days decade −1 ), but our biogeochemical model predicts springs will have larger effects on carbon cycling for every day of change (Fig. 7, in g C m −2 day −1 ). By combining these projections, we estimate spring will have a larger overall effect on NEP in the future (increasing by 5.2-8.7 g m −2 C decade −1 in the spring compared to 1.7-4.8 g C m −2 decade −1 in the fall), but spring and fall will have similar influences on GPP (increasing by 9.3-15.3 g C m −2 decade −1 in the spring compared to 5.3-15.4 g C m −2 decade −1 in the fall). We found that increases in respiration in the fall were higher relative to GPP, therefore spring phenology had a larger overall effect on NEP.

Discussion
Spring and fall phenology are often overlooked aspects of biogeochemical modeling despite their importance in defining the duration of photosynthetic carbon uptake and plant water use each year. Here, we use regional phenology data to train leaf development and leaf senescence algorithms in an ecosystem-level biogeochemical model. Using empirical data to train phenology algorithms is becoming easier due to the growing amount of open-source phenology data (Crimmins et al. 2017;Richardson et al. 2018;Templ et al. 2018) and tools  available. We found that incorporating the new phenology algorithms into PnET resulted in outputs that better fit measured NEP (as seen in Desai et al. 2010), but the model did not show any notable improvements for GPP and ecosystem respiration. On the daily scale, we show that the phenology algorithms exert a strong control on the timing of carbon and water cycling, but had surprisingly small effects on the magnitude and long-term trends of the future annual predictions. By improving the phenology subroutine in biogeochemical models, we strengthen a well-documented weakness of ecosystem modeling (Levis and Bonan 2004;Randerson et al. 2009;Richardson et al. 2012). Annual modeled carbon exchange [gross primary productivity (GPP), ecosystem respiration, and net ecosystem productivity (NEP)], water-use efficiency, transpiration, and nitrogen mineralization using PnET phenology subroutine (red line and dashed red trendline) modeled to year 2100 compared to output using the new models (blue cir-cles and solid blue trendline). The error bars represent the standard deviation of the outputs using different combinations of new models. Trend lines represent linear regressions of the timeseries-only regression lines that were significant at an alpha of 0.05 are included We found that parameterizing phenology models using empirical data from multiple PhenoCam sites allowed us to produce phenology models that are broadly applicable to the northeastern deciduous forests of North America. We did not find that spring models requiring chilling requirements showed any improvements over models that used warming temperatures and photoperiod, similar to other North American  and European (Basler 2016) tree species. This likely suggests the species in this region have sufficient winter chilling temperatures to release bud dormancy (Vitasse et al. 2011). Previous work has shown that including chilling temperatures can improve species specific models (Chiang and Brown 2007;Kaduk and Los 2011;Jeong et al. 2012;Flynn and Wolkovich 2018) but our canopy-level approach did not benefit from chilling temperatures enough to justify the additional model parameters. It is possible that in the future, warmer winters in North America may require more sophisticated models that include chilling requirements (Morin et al. 2009), but this is not currently the case in these northern deciduous forests. This illustrates an inherent weakness in extrapolating beyond current conditions into a future where phenology may be affected by other factors.
Our results demonstrate that fall transition dates derived from PhenoCam are reproducible and robust enough to train fall models that can be validated against ground observations. We found that using the yellowness of a forest canopy from PhenoCam, in addition to greenness, can be a useful metric for predicting the end of fall senescence in these forest types (Kosmala et al. 2016). We also found that incorporating these fall models into PnET allowed us to improve predicted leaf senescence dates produced from the original phenology algorithms that were overly sensitive to climate drivers. These are promising findings for the future of modeling fall senescence in deciduous forests. However, we also find our fall models were unable to capture the year-to-year variability as well as the spring models-also found by Vitasse et al. (2011). Our more complex fall model (CDDP, Fig. 7 Relative influence of spring and fall phenological transitions on annual carbon cycling [gross primary productivity (GPP), ecosystem respiration, and net ecosystem productivity (NEP)], transpiration, water use efficiency, and nitrogen mineralization. These values were derived by taking the difference between one-season-fixed models and models with both spring and fall fixed phenology transitions to remove interannual variability unrelated to phenology transitions. The red circles represent models with new spring phenology algorithms and fixed fall transition dates (i.e., no interannual variability) and the blue triangles represent models with new fall phenology algorithms and fixed spring transition dates. Trend lines represent linear regressions-all regression lines were significant at an alpha of 0.05 including photoperiod and temperature) also did not show any improvements over the simpler temperature only model. As previously documented, we found fall transitions were more difficult to characterize compared to spring transitions because they can be unexpectedly sudden in response to stressful environmental conditions including heavy rainfall events, heat stress (Xie et al. 2018), frost, or wind events (Gallinat et al. 2015). Compared to the literature, our models predict future changes in spring and fall transition dates that are either on track with, or less than, observations from other North American deciduous forests over the last few decades (de Beurs and Henebry 2005; Jeong and Medvigy 2014; Piao et al. 2019). Our models project growing season length will be more affected in the fall compared to the spring as temperatures are expected to increase over the next few decades [as documented by Jeong et al. (2011) and Zhu et al. (2012) from the early 1980s to mid 2000s]. Combined, our projections suggest growing season lengths will change by 2.0-4.7 days decade −1 depending on future climate projections. These are similar to other recent projections for northeastern North America (Piao et al. 2019) but are less than historical observations (Wang et al. 2015).
Using the original PnET spring algorithm, the beginning of spring was triggered too early, had higher variability, and lasted much longer than observed transitions from Pheno-Cam (as seen previously in Klosterman et al. 2018). We also found that original PnET algorithms predicted highly variable end of fall senescence dates, and because the start of senescence was fixed (i.e., no interannual variability), the duration of the fall season was poorly constrained. Despite the clear improvements our models had on season lengths generated by PnET, we were surprised that altering the phenology algorithms only marginally affected the magnitude of modeled carbon, water, and nutrient cycling on annual timescales (contrary to our hypothesis that shorter seasons would lead to decreased modeled carbon, water, and nutrient use). On a daily timescale, we found incorporating the new algorithms resulted in faster seasonal changes in carbon uptake and transpiration. As a result, the slower phenological transitions between active and dormant periods in the original PnET model compensated for uncharacteristically long growing season lengths, and annual totals were similar in magnitude to the new model outputs. We also note the longer seasons predicted by the original PnET algorithms are likely benefited by the 14% of the stand composed of evergreen conifers (Ouimette et al. 2018). Conifers tend to have longer periods of carbon uptake compared to their deciduous counterparts, and the tower-based GPP occurring before leaves are present and after leaves senesce is likely from uptake from conifer species, coincidentally resulting in better agreement with the original PnET outputs. Despite this, our new models resulted in notable improvements in PnET's ability to model NEP compared to tower-based estimates.
All of our simulations suggested increasing trends in GPP and plant water use (i.e., transpiration) at Bartlett Experimental Forest over the next 80 years (see also Chen et al. 2016). Our projected GPP increases of 5-8 g C m −2 day −1 are on the upper end of the empirical observations in the literature (Falge et al. 2002;Piao et al. 2007;Baldocchi 2008). Projections of future NEP and WUE, however, demonstrated relatively stable long-term trends. Modeled NEP showed increased variability under future climate change, but no increasing trend due to counteracting increases in GPP and ecosystem respiration in response to longer growing seasons and elevated temperatures (as also described by Piao et al. 2007;Duveneck and Thompson 2017). The stable long-term trend in NEP in this study was largely due to the long spinup times used in the model calibration process (Thornton and Rosenbloom 2005) and our decision not to simulate past disturbance. We accounted for the effects of long spin-up times (and incidentally, increasing temperature) by taking the difference of modeled outputs from a fixed spring and fall phenology model. Differences between dynamic and fixed phenology models demonstrated that phenological change (i.e., the longer growing seasons) did, in fact, have a positive effect on annual modeled NEP.
Our results indicate that, compared to the fall, longer growing seasons in the spring have a larger effect on carbon and water cycling for every day of change, as hypothesized. Compared to fall phenology, spring phenology had a larger effect on annual GPP relative to ecosystem respiration and therefore also had a larger effect on NEP (contrary to findings by Wu et al. 2013a, b). This may be partly due to higher temperatures in the spring also having higher rates of incoming solar radiation and soil moisture compared to the fall. Increasing temperatures in fall months may actually result in net carbon losses despite the later leaf senescence (Piao et al. 2008) because ecosystem respiration can respond to increasing fall temperatures more than GPP (Barichivich et al. 2013). These findings are, however, different from Keenan et al. (2014), where later fall phenology was associated with higher rates of both NEP and GPP compared to the spring. Presumably, the larger effects on modeled ecosystem carbon and water balance in our models are due to longer daylength during leaf development compared to leaf senescence allowing for higher levels of photosynthetic uptake (see Zhang et al. 2020). The seasonal effects on ecosystem respiration were less apparent because ecosystem respiration is less affected by daylength and more affected by temperature. While our projections suggest that both early springs and late falls will result in increased carbon uptake, spring was clearly more influential on annual model outputs due to the longer days and higher rates of GPP early in the season compared to the fall.

Conclusions
Training phenology models at four different seasonal transitions (start of onset, end of onset, beginning of senescence, and end of the senescence) allowed us to improve a biogeochemical model's ability to estimate a deciduous forest's growing season length. The optimized phenology subroutine resulted in season lengths that were shorter and more representative of phenology observations. Our work establishes a theoretically rigorous, but still straightforward, method to improve existing ecosystem-level biogeochemical models to be more representative of actual ecosystem processes. By highlighting direct effects of longer growing seasons on ecosystem carbon and water cycling, we quantitatively showed the importance of improving phenology models in modeling ecosystem-scale processes.