Predicting Peatland Net Ecosystem Exchange of CO 2 during the Non-Growing Season using Machine Learning

The world’s cold regions are experiencing some of the fastest warming, especially during the winter and shoulder seasons. Recent studies have highlighted the significance of carbon dioxide (CO 2 ) emissions during the non-growing season (NGS) to the annual carbon budgets of northern peatlands. Because of the positive feedback of soil microbial respiration to warming, a warmer NGS may be expected to alter the carbon balance of peatlands, which are estimated to store about one-third of global terrestrial organic carbon stocks. However, estimates of NGS net ecosystem CO 2 exchange (NEE) remain highly uncertain. In this study, we determine key environmental variables affecting the NGS-NEE from a temperate peatland (Mer Bleue Bog; Ottawa, Canada) and predict future NGS-NEE under three climate scenarios (RCP2.6, RCP4.5, and RCP8.5) using a variable selection methodology, global sensitivity analysis, and data-driven model. The model successfully reproduces the observed NGS-NEE fluxes using only 7 variables, with NGS-NEE being most sensitive to changes in net radiation. Our projections estimate that mean NEE during


Introduction
While peatlands are estimated to cover only 3% of the continents 1 , they store approximately 30% of land-based organic carbon (C) (300-450 Pg C) [2][3][4][5] . Northern peatlands are of particular interest because high latitude regions are warming at greater rates than the global average [6][7][8][9][10] . Furthermore, the greatest warming in cold regions, including northern peatlands, is occurring during the nongrowing season (NGS) 11,12 . Many previous studies have focused on peatland C dynamics during the growing season, given the inherent difficulties of measuring NGS carbon dioxide (CO2) fluxes in remote cold regions. However, a growing body of literature shows that net ecosystem CO2 exchange (NEE) from various landscapes during the NGS are non-trivial and may contribute significantly to annual ecosystem C budgets [12][13][14][15][16][17][18][19][20] . Despite these efforts, there is a lack of predictive understanding on how NGS CO2 emissions from northern peatlands will change under an evolving and uncertain 21 st century climate.
Only a few studies have projected future NGS-NEE of peatland ecosystems under representative climate concentration pathways (RCPs) for northern regions 12,[21][22][23] . In part, this reflects an incomplete understanding of the key drivers of NGS soil respiration. Previous work, however, shows that soil CO2 effluxes during the NGS are modulated by the interactions between numerous environmental parameters, including soil temperature and moisture 12,[24][25][26] , snow depth 25,[27][28][29][30] , vegetation cover 31 , availability of labile C substrates, and the soil microbial abundance and community structure 32-34 . Deterministic models are traditionally used for understanding environmental processes. The processes included, and their model representations, are derived from prior knowledge and theoretical considerations. The models in turn form the basis for hypothesis-driven experimentation with, as a possible unintended consequence, biases the observational data due to the underlying hypotheses 35 . As an example of this traditional deterministic approach, terrestrial ecosystem C cycling has been successfully simulated using process-based dynamic vegetation models (DGVMs). As an alternative approach, data-driven (rather than hypothesis-driven) modeling techniques, including machine and deep learning methods, are increasingly employed to derive quantitative relationships between environmental variables and ecosystem functions (e.g., soil respiration) based on patterns in the measured data.
Machine learning (ML) algorithms have been applied to reproduce CO2 fluxes measured at eddy covariance (EC) flux towers 36,37 . Here, we train a ML model for NGS-NEE on a 13-year (1998-2010) continuous record of EC flux measurements at the Mer Bleue Bog located in Ottawa, Canada. The most important variables modulating the NGS-NEE at the research site are determined using a variable selection methodology and a moment-independent global sensitivity analysis method. In addition, we project the NGS-NEE CO2 emission rates over the remainder of the 21 st century by considering how the key environmental variables will change under a low, moderate, and high radiative forcing scenario (RCP2.6, RCP4.5, RCP8.5, respectively). The findings of this study provide novel insights in NGS CO2 emissions from northern peatlands as they transition into a warmer world, with implications for future climate policy, evolving northern landscapes, and the associated hydrometeorological, snow and frozen-ground processes.

Site Description
The Mer Bleue research site is a low shrub domed ombrotrophic bog located within a 2,800 hectare wetland complex in Ottawa, Canada (45°24′N, 75°30′W) 38 . Originally part of the Peatland Carbon Simulator Project (PCARS) 39 , the site has been continuously monitored for CO2, energy (latent and sensible heat), radiative (long and shortwave radiation) and momentum fluxes since the construction of an EC tower in 1998. Mer Bleue has been part of a number of flux tower networks including the Fluxnet-Canada Research Network, the Canadian Carbon Program, Ameriflux and FLUXNET.
Vegetation at the Mer Bleue research site is comprised of a near-continuous cover of Sphagnum capillifolium and S. magellanicum with an overstory dominated by ericaceous shrubs including Chamaedaphne calyculata, Kalmia angustifolia, Rhododendron groenlandicum, and Vaccinium myrtilloides, along with some sedges (Eriophorum vaginatum) and herbs (Maianthemum trifolium) 40 . The area underwent bog formation starting approximately 7,000 years ago with current peat depths ranging between < 0.3 m at the margins to > 5 m in the centre of the bog 41 . The research site area has a typical hummock-hollow (70%-30%) microtopography with a mean elevation difference of 0.25 m between hummock-tops and hollow-bottoms 42 with few scattered tree species (Larix laricina, Betula populifolia, and Picea mariana) 38 .

Data Acquisition
Fluxes and ancillary measurements of micrometeorological parameters recorded at the Mer Bleue site were obtained from the Fluxnet Canada Research Network via the Oak Ridge National Laboratory's Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC) (https://daac.ornl.gov). The 13-year dataset (1998-2010) was retrieved and filtered to only extract NGS specific data. For this study, we define NGS as the period starting with the first day of the first 3 consecutive days with ground snow coverage and ending with the first day of the first three consecutive days with bare ground. The corresponding dates along with the associated NGS durations are provided in Supplementary Table S1.
Values of NEE were derived from CO2 fluxes measured by EC with a three-dimensional sonic anemometer thermometer (model 1012R3 prior to September 1, 2000, and model R3-50 thereafter; Gill Instruments Ltd., Lymington, UK) and closed-path H2O/CO2 gas analyzer (model LI-6252 until September 1, 2000, LI-6262 until January 1, 2004, and LI-7000 thereafter; LI-COR Inc., Lincoln, NE) located 3 meters above the bog surface 41 . High frequency data (10 Hz prior to 2004, 20 Hz thereafter) were used to average fluxes over a 30-minute period. Details on the flux calculation are given by Roulet et al. 41 ; they involved accounting for air density fluctuations using either the WPL procedure 43 or high frequency calculation of CO2 and H2O mixing ratios. Although no spectral corrections were applied, a correction factor of 1.25 was applied to the CO2 fluxes prior to 2004 to account for the changes in EC instrumentation in 2004, which improved high frequency response times 41 . A detailed analysis of uncertainties associated with EC data acquisition is presented by Massman and Lee 44 , including unreliable spectral nighttime corrections, high frequency attenuation biases, and 2D and 3D advective effects. At any given time, NEE is the sum of the turbulent CO2 fluxes and the rate of change in CO2 storage below the EC tower calculated from the concentrations of CO2 measured by the EC instrumentation. We define positive NEE as net release of CO2 from the ecosystem while negative NEE represents net CO2 uptake by the ecosystem. Thus, NEE represents the difference between gross primary productivity and heterotrophic plus autotrophic respiration (together referred to as ecosystem respiration). Note, that although primary productivity of vascular plants during the NGS is usually negligible, this may not be the case for peat-forming mosses (see section 4.2). NEE fluxes were removed from the dataset in case of instrument malfunction, statistics outside of acceptable limits, and when the nighttime friction velocity dropped below 0.1 m s -1 .

Variable Selection
The initial dataset containing 68 variables (67 predictor variables and 1 response variable) was subjected to a four-step variable selection methodology ( Figure 1). The methodology was designed to reduce the number of variables represented in the dataset, leaving only those that exert the greatest influence on the NGS-NEE of CO2. Only these remaining predictor variables were then used in the creation of the machine learning (ML) model.

Correlated Variables
As a first step, collinearities between variables were examined by computing Pearson's linear coefficients of correlation (ρ) between pairs of variables. The correlation matrix ( ) of for each pair of variables ( , ) was obtained using the following equations: where represents the covariance between variables ( ) and (y), ̿ and ̿ are the mean values of the variables ( ) and ( ), respectively, and σ is the standard deviation for each variable. Note that in M pairwise correlations between the same variables, i.e., ( , ) and ( , ), equal unity. Pairs of variables with | | ≥ 0.75 were identified and labeled as correlated variable 1 (CV1) and correlated variable 2 (CV2). The average correlation between CV1 and the remaining 67 variables was then calculated. The process was repeated for CV2. The CV with the largest average correlation with the remaining 67 variables in the dataset was removed. Using this approach, 40 predictor variables were removed. Figure 2 provides a visual representation of variable collinearities prior to the application of the variable selection methodology. For variables in which the linear correlation coefficient could not be calculated due to missing observations or a standard deviation of zero, a Not a Number (NaN) placeholder was assigned.

Missing Data and Near Zero Variance
In order to reduce biases in the development of the ML model, no gap-filling or data imputation methods were applied. Variables with more than 50% of observational values missing were removed from the dataset. With this threshold, 9 variables were eliminated. The remaining 19 variables were screened and removed if their variance approached zero. Variables with minimal to no variation in their values contribute minimally to the model learning process. They are also likely to be of lesser relevance in predicting future NEE than variables prone to change 45 . Variables were therefore discarded if the fraction of their unique values was low, and the ratio of the mode to the second most common value (defined as the frequency factor) was high 46 . Specifically, in this study, predictors with <10% of their total observed values being unique and a frequency factor greater than 20 were removed, as proposed by Kuhn and Johnson 46 . Using these criteria, 1 predictor was removed leaving 18 remaining predictors.

Random Forest Permutation
A Random Forest Ensemble Model (RFM) was used to rank the importance of each of the variables remaining after the selection procedure. The RFM consisted of an out-of-bag variable importance estimation algorithm modified for regression analysis according to Breiman 47 . The RFM was trained on the observations for each of the remaining variables ( ) using 500 learner trees. Surrogate splits helped reduce biases due to the presence of missing observations. To further minimize bias, an interaction-curvature test was applied during the training process such that tree splits were made on predictors that minimize p-values from pairwise chi-squared ( 2 ) tests of independence between the predictor and the response 48 .
To estimate the importance of each variable, predictions based on the data used for the training of the RFM were compared to observations not used in the creation of the trees (i.e., the out-of-bag samples). The mean-square error (MSE) between predictions using the training data and the outof-bag samples was determined and referred to as the out-of-bag error ( ). The observations of each variable ( ) were randomly permuted and a prediction ( ) obtained. The model error for each observation (εmj) was calculated by comparing to the out-of-bag samples. The mean difference (∆ ) between and εmj, as well as the standard deviation (σj) of the differences in permuted values over all learners were calculated. The predictor importance was then expressed as: where measures the importance of each predictor : a larger value of represents a greater importance. Based on the value of , soil temperature and moisture (at 20 cm depth below the hummock surface), wind direction and speed, air temperature, net radiation (above the canopy), and the upwelling photosynthetic photon flux density (PPFD) were identified as the final predictor variables for use in developing the NGS-NEE ML model.

NGS-NEE Model Development
Further data pre-processing standardized the values of each of the final predictor variables between 0 and 1. A K-means clustering algorithm was applied to n =15,267 observations to collapse and group the dataset into 122 representative centroids. Each centroid contained the statistical information of individual clusters hence allowing for increased training speeds by reducing the number of data points 49 . The number of clusters used in this analysis was determined through applying the K-means clustering algorithm for increasing values of K ( ∈ ℤ + , 1 ≤ ≤ ⌈√ ⌉) until 99% of the variance in inter-cluster Euclidean distances between datapoints and individual clusters was explained.
A wide range of regression algorithms (linear, tree-based, support vector, gaussian process-based, and ensembles) were trained and validated using a 10-fold cross-validation strategy on 85% of the entire dataset (training dataset). The 10-fold cross-validation strategy randomly partitioned the data into 10 segments of roughly equal size, and systematically trained data on 9 of the 10 partitioned folds while using the remaining 10 th segment to estimate model performance. The process was repeated until all 10 of the segments had been used to evaluate model performance. The 10-fold cross validation strategy was run 1,000 times, with each run consisting of a newly partitioned training (85%) and testing dataset (15%).
Model performance was evaluated based on the root-mean square error (RMSE) and coefficient of determination (r 2 ) averaged over the 1,000 runs of the 10-fold cross validation: represents the sum of squared errors between the predicted ( ) and observed NEE ( ) and ̿ represents the mean observed NEE flux over the entire data record.
Model testing using the remaining 15% of the dataset (testing dataset) was evaluated using the mean r 2 and RMSE values over 1,000 runs, with each run consisting of a uniquely partitioned testing and training dataset. Comparing the computed RMSE and r 2 values for all trained models, an Epsilon Insensitive Support Vector Machine Regression (ε-SVM) model was selected because it had the lowest RMSE and highest r 2 values. An additional advantage of a ε-SVM model is its relatively simple structure and implementation. Support Vector Machines comprise a set of Kernel-based machine learning algorithms for classification and regression analyses 50 . With few model parameters, SVMs have been shown to mimic the performance of the best artificial neural networks 51 that have been used extensively in modeling CO2 fluxes 36,37 . Moreover, the computational complexity of SVM regressions does not depend on the dimensionality of the input space making for an attractive choice for large datasets with many variables 52 . The trained ε-SVM uses a quadratic kernel, with hyperparameters determined based on the interquartile range of the observed NGS-NEE.

Sensitivity Analysis
A moment-independent global sensitivity analysis (GSA) was conducted based on PAWN 53 . Unlike other density-based sensitivity analyses, PAWN uses empirically computed cumulative density functions (CDFs) rather than probability density functions (PDFs) to calculate sensitivity indices (SI). Given the non-normal distribution of the observed NGS-NEE (Supplementary Figure  S1), the variance of the NGS-NEE distribution would not adequately capture the model uncertainty 54 . Hence, PAWN offers a more natural choice for the calculation of SIs.
We implemented PAWN using the SAFE MATLAB Toolbox, an open-source toolbox developed by F. Pianosi, F. Sarrazin, and T. Wagener (https://www.safetoolbox.info/). Modifications to this toolbox were made to handle the non-parametric distributions unique to the underlying distribution of the variables used in this study (Supplementary Figure S1). Sensitivity indices were calculated as the distance between the conditional and unconditional CDFs. Here, this distance was approximated using the Kolmogorov-Smirnov statistic. Unconditional CDFs were calculated by varying all input variables simultaneously, while conditional CDFs were obtained through varying all input variables expect one, which was allowed to vary within a specified range of values (conditioning intervals). Supplementary Figure S2 provides a visual representation of the empirically derived CDFs through a regional sensitivity analysis.
The GSA used n=8,000 samples obtained via Latin Hypercube Sampling in five conditioning intervals, yielding a 2,000 bootstrapped 95% confidence interval. A dummy variable in the GSA served as a control against which the SIs of other variables were compared. SI values range from 0 to 1, with zero representing null effects (no sensitivity) and 1 representing the largest sensitivity to modelled NEE. For a more detailed description of PAWN, the reader is referred to Zadeh et al. 55 , Pianosi and Wagener 53,56 , and to Noacco et al. 57 for examples of the use of the SAFE toolbox.

Climate Projections
Three future climate scenarios were considered: RCP2.6, RCP4.5 and RCP8.5. These scenarios describe the climate trajectories associated with stringent, moderate, and high degrees of radiative forcing, respectively 58 . For each RCP, the values of the most sensitive model variables (soil temperature and moisture, air temperature, wind speed and direction, upwelling PPFD, and net radiation above canopy; see section 3.3) were altered to match the predicted changes in air temperatures under the given RCP. Mean NGS-NEE CO2 effluxes were then calculated with the ML model using the predicted temporal trends of the 7 predictor variables for the 2021-2100 period.
Winter air temperature projections for the Ottawa region under each RCP were obtained from statistically downscaled (10 km spatial resolution) predictions from 24 global climate models participating in the fifth phase of the Coupled Model Intercomparison Project (CMIP5). The obtained data were previously statistically downscaled using the Bias Correction/Constructed Analogues with Quantile mapping version 2 (BCCAQv2) algorithm 59,60 . Changes in soil temperatures were assumed to be 1°C lower than the projected changes in air temperature based on the work of Zhang et al. 61 and Wisser et al. 62 . Winter wind speeds were kept constant for all RCPs as predictions by Jeong and Sushama 63 and McInnes et al. 64 showed little to no increase in winds speeds in the Ottawa region, in contrast to the predicted increases in regions at higher latitudes in Canada.
Predicted changes for upwelling PPFD and soil moisture were obtained as outputs from the Second Generation Canadian Earth System Model (CanESM2). Predicted reductions in upwelling shortwave radiation were used as proxy for reductions in upwelling PPFD, given that photosynthetically active radiation falls partly in the shortwave radiation spectrum. It is recognized that changes in upwelling radiation are difficult to predict due to complex interactions between changing snow-cover fractions 9 , sea ice-snow albedos 9,65 and aerosol concentrations 66 . Net changes to the model predictor variables under each RCP by the end of the 21 st century are presented in Table 1.

Model Performance and Validation
Model performance, both validation and testing, of predicted NGS-NEE is illustrated in Figure 3. Model validation and testing performance metrics varied slightly with each new training cycle. Averaged over n=1,000 model training cycles, the cross-validation results showed a mean RMSE = 0.09 and mean r 2 = 0.60, implying that the model explained most of the observed variability of NGS-NEE. Model testing results further implied good model performance across the testing data with RMSE = 0.08 and r 2 = 0.65.

Model Dependencies
The effects of each of the 7 predictor variables on the modelled NGS-NEE are shown in the partial dependence and individual conditional expectation plots of Figure 4. In most simulations, the modelled NGS-NEE showed net positive responses to increases in wind speed, soil moisture content, as well as air and soil temperatures. Wind direction exhibited little to no net influence on modelled NGS-NEE, and upwelling PPFD a slightly negative effect. By contrast, net radiation above the canopy exhibited a strong negative effect on the predicted NGS-NEE. This negative control can be attributed to increased photosynthetic CO2 uptake with increased net radiation.

Sensitivity Analysis
The calculated SIs provide further insight into the relative influence of the 7 predictor variables on NGS-NEE estimates ( Figure 5). The results implied that net radiation above the canopy has the greatest influence on modelled NGS-NEE with a mean SI of 0.98 (0.98-0.99, 95% confidence interval). Wind direction was the least sensitive variable, with a mean SI of 0.08, that is, only slightly larger than that for the dummy (control) variable, which had a mean SI of 0.03. Soil temperature exerted a pronounced influence on modelled NGS-NEE with a mean SI of 0.72, consistent with soil temperature as a major control on subsurface organic C mineralization and, hence, production of CO2. Wind speed and soil moisture content were of moderate to high importance (mean SIs = 0.62 and 0.61, respectively) and a lower influence of upwelling PPFD. Table 2 summarizes the mean values and ranges of all the SIs.

Climate Change Predictions
Under all three emission scenarios, NGS-NEE CO2 fluxes were predicted to increase over the remainder of the 21 st century ( Figure 6). That is, the Mer Bleue Bog will act as a stronger source of CO2 during the NGS, even under the lowest radiative forcing scenario (i.e., RCP2.6). As also shown in Figure 6, the predicted future NGS-NEE trends for the three climate scenarios diverged most significantly after 2050.
Under the high radiative forcing scenario considered (RCP8.5), the mean NGS-NEE CO2 emission rates at the end of the century would rise to 0.62 μmol m -2 s -1 (0.60-0.63; minimum and maximum values from n=1,000 runs of the model). Compared to the observed mean NGS-NEE of 0.31 μmol m -2 s -1 for the period 1998-2010, NGS-NEE under RCP8.5 therefore exhibited roughly a 2-fold increase by 2100 (mean increase of 103%). If the stringent emission reduction measures and policy shifts are implemented to limit radiative forcing to 2.6 Wm -2 by the year 2100 (RCP2.6), NGS-NEE values at the Mer Bleue Bog site would only increase by 17% (approximately 6.2 times less than under RCP8.5). Under the stabilization climate scenario of RCP4.5, NGS would increase by 48% to 0.45 μmol m -2 s -1 . Table 3 summarises both mid-century (2050) and end of century (2100) predicted NGS-NEE CO2 fluxes under the three RCPs.

Discussion
Our results for the Mer Bleue Bog site support the hypothesis that environmental changes accompanying climate warming have the potential to increase CO2 emissions from peatlands during the NGS, hence potentially strengthening a positive climate feedback loop. As expected, the predicted NGS-NEE increase is highest for the climate scenario with the largest radiative forcing, and vice versa. The predictor variables for NGS-NEE further imply that both subsurface and surface processes play important roles in modulating the net CO2 fluxes during the NGS. The subsurface controls are likely linked to the decomposition of soil organic matter, which is known to strongly depend on soil temperature and moisture. Aboveground turbulent transport of CO2 and energy balances, in turn, depend on meteorological conditions and snow cover dynamics 67,68 . In the following discussion special attention is given to the influence of snow on both surface and subsurface processes modulating NGS-NEE of CO2.

Subsurface processes
According to the global sensitivity analysis (GSA), at the Mer Bleue Bog site soil temperature is the second most sensitive predictor variable of NGS-NEE ( Figure 5), with a positive influence on the net emission of CO2 (Figure 4). This positive effect is consistent with the positive temperature dependence of soil microbial respiration, even when soil temperatures approach or drop below zero during the winter 18,69,70 . In the climate projections considered in this study, soil temperature changes were assumed to be controlled by the predicted changes in air temperature, with an imposed 1°C attenuation (section 2.6). The actual relationship between air and soil temperatures during the NGS may be far more complex, however, largely due to the influence of snow coverage on the belowground thermal regime 71,72 . Even at the shallow depth of 20 cm, Mer Bleue Bog peat rarely freezes despite mostly subzero winter temperatures (Supplementary Figure S2). This can be explained by the insulating effect of the snowpack, with peak annual snow depths ranging from 30 to 120 cm for the 1998-2010 period. With the projected decreases in snow depth and fractional snow-cover for the Ottawa region 9 , the more exposed soils could experience greater heat loss and, consequently, cooler peat temperatures. More detailed future projections of the changes in soil temperature will have to take into account the possible confounding effect of reduced snow cover. Snow depth and coverage also alter the mechanisms and rates of soil-atmosphere gas exchanges, with variable contributions of diffusive and non-diffusion transport 73 . A lower fraction of snowcovered soil diminishes the gas transport barrier and enhances the release of CO2 while, at the same time, facilitating the influx of molecular oxygen (O2) which fuels the production of soil CO2 by aerobic respiration. However, surface heterogeneities, including hummock and hollow peat microforms, may cause differential snow accumulation patterns that influence the overall effect on NGS-NEE of a reduction in snow cover. Such small-scale processes are a source of uncertainty not taken into account in our analysis.
Climate warming may also increase in the frequency of freeze-thaw events, especially at the start and near the end of the NGS. Many recent studies have linked freeze-thaw cycles to variations in CO2 emissions [74][75][76][77] . Recurring freezing and thawing alter the physical and biological processes that contribute to winter soil respiration [78][79][80] . In addition, freezing causes ice layers to form in the snowpack and in the near-surface peat resulting in ice encasement 81 .The formation of ice layers traps CO2 produced or stored in the peat and snowpack, while thawing of the ice results in the release of the trapped CO2 to the atmosphere.  25 and Schindlbacher et al. 83 .
In our GSA and ML modeling, we rely on the soil temperature and moisture content data recorded at 20 cm depth. This is justified because most decomposition of plant debris (and hence CO2 production) in peatlands occurs within the upper 10-20 cm 84 . In addition, the temporal records of both soil properties exhibited sufficient variability at 20 cm depth to train the ML model. At the Mer Bleue Bog site, temperature and moisture are higher deeper in the soil profile by the end of the NGS. However, the lack of fresh plant residues and O2 at these depth limits microbial mineralization 85,86 and, thus, variations in temperature or moisture below 20 cm are unlikely to have a strong impact on the NGS-NEE CO2 fluxes.
Because of its mid-latitude location (45°24′N), the Mer Bleue Bog site will continue to experience a high frequency of near freeze-thaw cycles over the remainder of the 21 st century 87 . The implications of changes in freeze-thaw frequency and timing for the instantaneous and cumulative CO2 emissions during the NGS, at this site and across peatlands in general, remain to be fully understood. Similarly, the length of time during winter when the soil is continuously frozen probably affects the NGS-NEE, in line with Humphreys et al. 38 who showed smaller winter CO2 emissions for peatlands further north in the Hudson Bay Lowlands compared to the Mer Bleue Bog. A fully predictive understanding of CO2 production and effluxes during the NGS will require additional characterization of the temporal variations in physical and biogeochemical properties and processes of the snowpack and underlying peat.

Surface processes
The most sensitive predictor variable for NGS-NEE is net radiation measured above the canopy ( Figure 5). With increasing net radiation, the modelled net CO2 emissions decrease and NEE eventually switches to net CO2 uptake. Net radiation in the NGS, particularly in early spring (February-March), strongly depends on the ground snow coverage. Upwelling PPFD also exerts a negative effect on NGS-NEE. With wavelengths between 400-700 nm, the PPFD encompasses part of the shortwave radiation spectrum and provides energy to support photosynthetic CO2 fixation. Similar to net radiation, upwelling PPFD at Mer Bleue Bog is greatest in early spring as days get longer and a large fraction of incoming radiation is reflected from the snow surface.
The strong negative correlation between NGS-NEE and net radiation and, to a lesser extent, upwelling PPFD (Figure 4) may be the result of patchy snow accumulation and melting, which exposes Sphagnum-covered hummocks to light thus enabling photosynthesis 88 . Sphagnum growth has been shown to depend on various environmental factors related to snow coverage, including the timing of snow onset and retreat, the amount of snow, mid-winter thawing and drifting snow conditions 89,90 . Non-negligible primary productivity by bryophytes may thus have important, but largely unexplored, consequences for estimating peatland NGS-NEE of CO2.
With projected decreases in snow-water equivalents and snow-cover fractions at Mer Bleue Bog over the 21 st century 9 , interesting implications arise for advective sensible heat transfer between bare soils and snow patches 91,92 . Rain-on-snow events may similarly have significant impacts on NGS-NEE. With a greater proportion of NGS precipitation falling as rain [93][94][95] , rain-on-snow events can result in wetter soils and reduced or removed snow coverage. Under these conditions, photosynthetic activity during the NGS may play an increasingly important role in the annual C budget of the Mer Bleue Bog and, by extension, other temperate and boreal peatlands 96,97 .
The final sensitive predictor variable is wind speed which correlates positively with NGS-NEE (Figures 4 and 5). A positive relationship is expected as turbulent exchanges of CO2, heat and moisture are driven by forced convection during the NGS when the surface is snow covered and the atmosphere remains neutral or stable. In addition, wind-induced ventilation of the snowpack can cause pulse emissions of CO2 stored in the snow and the porous and unsaturated near-surface peat 96,97 . Turbulent exchanges of heat and moisture play important roles in governing the snowpack energy balance 98,99 .Wind can therefore be indirectly linked to subsurface CO2 production processes through its effects on energy fluxes into the ground.

Some Cautionary Notes
As for ML in general, the development of the NGS-NEE model was only possible because the dataset for the Mer Bleue Bog site was large and diverse enough to enable the model learning process. Applying the same methodology to other locations is thus contingent on access to sufficient data records. In addition, a limitation of the ML approach is that only predictor variables included in the available dataset can be selected. Important variables not explicitly represented in the dataset may therefore remain hidden. Such variables may include, for example, the structure and pH of the peat, groundwater flow rates and pathways, and the timing and the frequency of rain-on-snow events.
Furthermore, the inferred sensitivity ranking and effects of the predictor variables, and the projected future changes in NGS-NEE, are site-specific and cannot be automatically extrapolated to other locations. Rather, the Mer Bleue Bog provides a reference site against which differences in the importance of surface and subsurface processes (sections 4.1 and 4.2) across peatlands can be hypothesized to lead to differences in NGS-NEE CO2 fluxes and their sensitivity to environmental change. For instance, compared to the Mer Bleue Bog, extensive permafrost at higher latitudes results in unique thermal and hydrologic regimes regulating CO2 exchanges of peatland ecosystems 38,100,101 . For these peatlands, the future trajectories of NGS-NEE will undoubtedly be impacted by permafrost thaw due to climate warming [102][103][104] .
The results of the sensitivity analysis and ML modeling are also affected by how the NGS is defined (section 2.2). Our results actually point to non-negligible photosynthetic CO2 fixation during the NGS, in order to explain the role of net radiation on the observed NGS-NEE (section 4.2). We speculate this may primarily reflect the onset of Sphagnum growth in early spring when snow cover is still present. To address the ambiguities related to the definition of the NGS, research should focus on a finer-resolution (say, monthly) analysis of the variations in NGS-NEE from late fall to early spring. The more detailed temporal trends of NGS-NEE should in turn yield more robust estimates of the contributions of the winter and shoulder seasons to the annual NEE of peatlands.
The projected future trajectories of NGS-NEE CO2 emissions are based on estimating how the 7 predictor variables will change under a set of radiative forcing scenarios. Future NGS-NEE trajectories will also be affected by other environmental drivers, including, for instance, future shifts in vegetation and fauna (including invasive species), and changes in land use and water management. Continued global warming may also increase the absorption of CO2 by peatlands during the growing season while decreasing the length of the NGS. Together, the changing NEE of growing and non-growing seasons will ultimately determine the evolving status of a given peatland as either a net CO2 sink or source.

Conclusions
The NGS-NEE CO2 fluxes at Mer Bleue Bog over the period 1998-2010 can be reproduced by taking into account 7 environmental variables: near-surface soil temperature and moisture, wind speed and direction, air temperature, net radiation above the canopy, and upwelling (i.e., reflected) photosynthetic photon flux density. Of these 7 predictor variables, net radiation and wind direction are the most and least influential, respectively. The significant effects of soil temperature and moisture are expected due to their roles in the subsurface production of CO2. The effects of net radiation and upwelling PPFD is attributed to photosynthetic CO2 uptake by Sphagnum mosses during the NGS, while wind speed controls the release and aboveground transport of soil CO2. In turn, most of the predictor variables are themselves influenced by variations in the spatial distribution and depth of the snow cover. The mean NGS-NEE CO2 fluxes at Mer Bleue Bog site are projected to increase during the 2021-2100 period, reaching values by the end of the 21 st century that are 17, 48 and 103% higher under the RCP2.6 (low), RCP4.5 (medium) and RCP8.5 (high) radiative forcing scenarios, respectively. Thus, in a warmer world, the Mer Bleue Bog site will act as a stronger source of CO2 during the NGS.

Author Contributions
A.R and F.R worked together in developing the variable selection methodology, creating the model, and led the preparation of the manuscript. E.R.H, P.V.C, W.L.Q, and K.W discussed the results and contributed to writing and reviewing the manuscript and have approved the final version.

Code Availability
Code used for the global sensitivity analysis, feature selection methodology, development of the model, and future predictions is available on request.    Table 3 52 Table 3. Predicted mean NGS-NEE CO2 fluxes in years 2050 and 2100. 53