Deciduous fractional layers: We developed deciduous fraction layers using random forest regression models in a supervised-learning approach (Supplementary Fig. 3). We used Landsat satellite data24 from Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI sensors to derive deciduous fractional cover and tree canopy cover layers at 30-meter spatial resolution. While the Landsat datasets are publicly available to download through the United States Geological Survey (USGS), we used Google Earth Engine39 (GEE) to pre-process these datasets via GEE python application programming interface (API). We used Landsat 5, 7, and 8 Collection 1 surface reflectance (SR) datasets for years 1987–1997, 1998–2002, 2003–2007, 2008–2012, and 2013–2018 to derive normalized difference vegetation index (NDVI)-based median-value image composites for nominal years 1992, 2000, 2005, 2010, and 2015, respectively. We used NDVI to derive median-value composites for spectral bands which were then used to calculate other indices. These image composites were prepared for early spring, mid-summer, and fall seasons to identify key differences in deciduous and evergreen green-up amplitudes (Supplementary Table 2; Supplementary Fig. 4). In addition to Landsat surface reflectance bands, we added several vegetation indices derived from Landsat bands as additional features for the random forest regression model: 1) normalized difference vegetation index (NDVI), 2) normalized difference wetness index (NDWI), 3) soil-adjusted vegetation index (SAVI), 4) variable atmospherically-resistant index (VARI), and 5) enhanced vegetation index (EVI). With the three seasonal composites and addition of slope, elevation, and aspect as topographical variables from GMTED2010 dataset40, the image composite layer-stack consisted of 36 bands or features (Supplementary Fig. 5). The GMTED dataset was used to exclude any discontinuities in the topographical data due to latitude, or other factors.
We obtained ground samples for modeling deciduous fraction from 1) the Cooperative Alaska Forest Inventory (CAFI)41, 2) the Bonanza Creek Long Term Ecological Research (BNZ LTER) samples in Alaska42, 3) repeatedly-measured permanent sample plots (PSPs)43 from the Canadian provinces of Yukon, Northwest Territories, British Columbia, Alberta, Saskatchewan, Manitoba, and Ontario, 4) ground sample plots from Northwest Territories44, and 5) Canadian National Forest Inventory45 plot data for sites across Canada (Fig. 1 and Supplementary Table 1). These ground samples include specie-specific basal areas at each site. We converted the specie-specific basal areas to deciduous fraction by taking ratio of deciduous tree basal area to total tree basal area to be used as the response variable in random forest26 regression models.
We used the RandomForestRegressor module in scikit-learn python package as our random forest regression model. We divided the sample locations geographically into zones: 1) east and 2) west (Supplementary Fig. 6) due to differences in spatial density of ground samples, significantly different model outputs, and large variations in importance of input features in the random forest regression models. In addition to geographically distinct random forest models, we trained two temporally distinct random forest models for each zone: 1) a 3-season model, using spring, summer, and fall seasonal composites and 2) a 1-season model, using summer seasonal image composite. The 1-season model used summer season image composite as input and was applied to pixels where all three seasons spring, summer, and fall were not available resulting in gaps in the 3-season model output. To reduce bias in the random forest models due to data imbalance because of the large number of evergreen sample plots, we resampled each zone’s sample set by binned under-sampling to create a near uniform distribution of the response variable – deciduous fraction (Supplementary Fig. 7). The random forest models were then parameterized using a multi-dimensional grid-search approach with number of trees (n_estimators), maximum number of features per tree (max_features), minimum number of samples per split node per tree (min_samp_split), and minimum number of samples at each leaf node in a tree (min_samp_leaf) as parameters (Supplementary Table 3). For each combination of the parameters, we parameterized 1,000 5-fold cross-validated model runs up to a total of 30 million models to perform a grid search for parameter sets resulting in the highest mean cross-validated r-squared value, low root mean-squared error (RMSE) and low standard deviation in RMSE (Supplementary Table 3). The random forest regression models were then applied to tiled Landsat image composites. Landsat derived seasonal composites bands that are highly correlated (absolute value of correlation coefficient > 0.95; Supplementary Fig. 8) were not used in random forest models as inputs. We used variable importance plots to assess and validate bands used in the random forest models (Supplementary Fig. 9). We also used one standard deviation of all samples in output leaves across all the decision trees in the random forest regression model as a measure of uncertainty for deciduous fractional cover layers. Additionally, the samples not used in training the model were used to provide an estimate of variance explained by the RF model using adjusted r2 value and model performance using root mean squared error (RMSE) value (Supplementary Table 3). Spatial autocorrelation effects in validation samples were reduced by resampling all the validation plot locations via gridding and using a minimum distance threshold of 1 km.
Tree canopy cover layers
The tree canopy cover layers were developed using a random forest modelling framework similar to deciduous fraction (Supplementary Fig. 3). The training sample were selected in a stratified random manner, with a stratum for each tree canopy cover level and were extracted from the Global tree cover data for 201025. We then used the tree canopy cover sample locations to extract band values from the 3-season 36 band image layer-stack. Similar to deciduous fraction modelling approach, we used the RandomForestRegressor module in scikit-learn python package as our random forest regression model to derive tree canopy cover layers. We also used a 1-season model to fill in the gaps in output of the 3-season model. However, we did not divide the study area into different zones for tree canopy cover. We used 30% of the ground samples that were selected to build the models exclusively for model validation for tree canopy cover layers. The parameterization of the model was performed similar to the deciduous fractional layer approach (Supplementary Table 3). To determine uncertainty in tree canopy cover value we again used one standard deviation of all the samples in output leaves across all the decision trees in the random forest regression model.
Assessing change and albedo-based radiative forcing: We used spring (March – May), summer (June – August), and fall (September – October) season image-composites of the blue-sky albedo product for 2000, 2005, 2010, and 2015 to assess changes in albedo along with deciduous fraction and tree canopy cover. In each seasonal mean-value composite, we used solar zenith angle > 70 deg in valid data values. Areas with low post-fire tree cover were included in our analyses because they had higher pre-fire tree cover values. To examine the strength of relationship between albedo, deciduous fractional cover, and tree canopy cover, we modelled albedo for spring (March – May), summer (June – August), and fall (September – October) using albedo data from 2000, 2005, and 2010 in a random forest supervised-learning approach (Supplementary Fig. 3). We extracted stratified random samples within each 0.1 albedo interval to train the random forest regression model from the blue-sky albedo product seasonal composites. We parameterized the random forest model by optimizing r-squared and RMSE values using a multi-dimensional grid-search approach to determine the best set of random forest regression parameters: n_estimators, max_features, min_samp_split, and min_samp_leaf (Supplementary Table 4). We multiplied the difference in our modeled albedo between the two epochs 2015 and 2000 by the “all-sky” CESM-CAM5 albedo kernel28 to compute radiative forcing. We extracted mean values of positive and negative change in deciduous fraction, tree canopy cover, blue-sky albedo composites, and radiative forcing to assess changes inside fire perimeters and across the entire domain. We then used fire boundaries obtained from Alaska Large Fire Database46 and Canadian National Fire Database47 for fires that occurred between 1950 and 2018 to assess change. We used > 500,000 stratified random samples to assess the validity of deciduous canopy layers by examining the relationships between deciduous fractional cover, tree canopy cover, and uncertainties (Supplementary Fig. 10).
Although our deciduous fractional cover and tree canopy layers capture the variation in tree-based forest composition across the boreal domain (Supplementary Fig. 11), they do not specifically consider the understory component of the boreal forest. In our analysis, we included trees that are above the height of 1.5 m. This excluded understory vegetation component may contribute to the unexplained variance in the deciduous fraction regression model. Another limitation of this analysis is the dependency on multi-spectral data to extrapolate deciduous fraction and tree canopy cover across the boreal domain. Since we trained the regression model on the spectral reflectance of tree-based deciduous fraction, there may be locations where the spectral reflectance of shrubs and other vegetation types resembles that of the treed components. While we used tree canopy cover as a masking layer to only include treed forest in our study, locations with high shrub density may be modelled as having higher tree canopy cover than the threshold, making them difficult to exclude. Further, per-pixel deciduous fraction does not categorically depict vegetation cover. While 30 m spatial resolution is fine enough to represent homogeneous vegetation, this pixel size may include mixed vegetation types such as deciduous shrubs and evergreen trees. Such mixing can occur for mid to lower tree canopy cover resulting in some pixels having a deciduous fraction value different from forest stands on the ground. Lastly, due to lack of remote sensing data for an entire fire cycle, which is typically of the order of 80-150 years19, we analyzed variations in older fire scars across a time-period of 15 years to assess changes in forest composition and forcing in early to mid-successional post-fire periods. We did not analyze fires in late-successional period in this study due to lack of reliable fire perimeter data from fires that occurred 80-150 years ago. However, our assessment of changes in forest composition and forcing for the entire North American boreal domain provides an indication of variations in such late-successional fire perimeters. We also were not able to integrate albedo and associated radiative forcing estimates for winter months. However, we note the direction of albedo changes and forcing in winter tends to follow that in spring and fall20, and, as such, we do not expect this to alter our conclusions. Finally, we were unable to account for potentially earlier snowmelt in spring due to warming during the analysis periods, which would otherwise lead to more biophysical warming21.
We used Python 3.7.5 programming language for all workflows in this analysis. Data extraction from the raster layers was performed using custom packages and codes written in python programming language. All statistical and machine learning analyses were performed using python packages numpy 1.15.0, scipy 1.2.0, and scikit-learn 0.23.2.