Southwestern ponderosa pine forest patterns following wildland fires managed for resource benefit differ from reference landscapes

Managers aiming to utilize wildland fire to restore southwestern ponderosa pine landscapes require better understanding of forest cover patterns produced at multiple scales. Restoration effectiveness of wildland fires managed for resource benefit can be evaluated against natural ranges of variation. We describe landscape patterns within reference landscapes, including restored and functioning ponderosa pine forests of northern Arizona, and compare them to wildland fires managed for resource benefit. We make comparisons along a gradient of extents and assess the effects of scale on landscape differences. Using Sentinel-2 imagery, we classified ponderosa pine forest cover and calculated landscape metrics across a gradient of landscape extent within reference and managed landscapes. We used non-parametric tests to assess differences. We used random forest models to assess and explore which landscape metrics were most importance in differentiating landscape patterns. Restored forests exhibited landscapes patterns consistent with those of ecologically intact forest landscapes. Managed wildfire landscape patterns differed significantly when compared to reference landscape patterns among nearly all landscape metrics considered and became increasingly different with increasing landscape extent (15–840 ha), tending towards both denser and larger patch areas. Landscape patterns from wildland fires managed for resource benefit we examined differ from those of reference landscapes. Differences become more pronounced with increasing landscape size. Landscape patterns among large managed forest landscapes suggest that the predominately single-entry, low-severity disturbance regime from these managed fires is failing to reduce tree densities and break up large contiguous areas of canopy cover.


Introduction
In northern Arizona, USA, an extensive history of fire exclusion and the subsequent densification of ponderosa pine (Pinus ponderosa var. scopulorum P. & C. Lawson) forest conditions have led to the need for landscape-scale restoration (Allen et al. 2002;Moore et al. 2004;Churchill et al. 2013). Climate change, uncharacteristic wildfire, and other disturbances operating over spatial extents of thousands of hectares challenge public land managers to develop landscapescale restoration plans. Restoration efforts typically utilize a combination of mechanical treatments, such as tree thinning to introduce forest structure compatible with natural fire disturbance processes, yet these treatments can be costly and limited in scale and are not suitable for all lands (Hjerpe et al. 2009;Reynolds et al. 2013;North et al. 2015). Silvicultural prescriptions for mechanized restoration approaches are often guided by reference information, or descriptions of intact forest structure and processes Reynolds et al. 2013). Such reference information is commonly used as a baseline against which effectiveness of various restoration approaches are evaluated (Morgan et al. 1994;White and Walker 1997;Landres et al. 1999). For southwestern ponderosa pine forests, natural ranges of variation have been well described for fine-scale (\ 10 acres) structural attributes, but there has been little work done to quantify landscape patterns among mid-(10-1000 acres) to landscape-([ 1000 acres) scales of ecologically functioning forest stands and large restored sites (see Reynolds et al. 2013). An approach moving beyond typical plotbased comparisons is needed to capture the variability of landscape patterns across such a gradient of landscape scales and examine the current state of forest structure across the region.
Using naturally occurring wildland (henceforth ''managed wildfire'') fire as an additional or complementary treatment alternative to mechanical treatment has seen increased interest (North et al. 2012;Huffman et al. 2017Huffman et al. , 2020. As frequent low-severity surface fire plays an integral role in regulation of structure and function in these ponderosa pine ecosystems, managed wildfires may provide a cost-effective approach for restoring contemporary forest conditions, including reduction in small-diameter tree densities, lower hazardous fuel loads and potential fire behavior, and increased ecological resilience (Hunter et al. 2011;Ager et al. 2017;Fitch et al. 2018;Barros et al. 2018;Huffman et al. 2020). While low-severity fire will provide some benefits, including maintenance of healthy forest structure, such as tree density, without mechanical intervention higher severity fire is needed to alter forest structure to reduce tree density and break up continuous canopies (Hunter et al. 2011;Hagmann et al. 2013;McCauley et al. 2019;Owen et al. 2020). Moderate burn severity among both single-and multiple-entry managed wildfires meet more restoration objectives related to ponderosa pine forest structure, fuels, and ecological function, than areas in other burn severity classes, in northern Arizona and across western North America (Collins et al. 2011;Lydersen et al. 2013;Huffman et al. 2017Huffman et al. , 2018Huffman et al. , 2020Stevens et al. 2017;Kane et al. 2019). However, the operational reality within the mosaic of multiuse lands is such that fire managers are typically averse to the risks associated with moderate-severity burning (Young et al. 2019(Young et al. , 2020, resulting in the majority of reported treated acres falling into unburned and low-severity fire, which generally lack the ability to alter forest structure (Stephens et al. 2016;Huffman et al. 2017Huffman et al. , 2018Huffman et al. , 2020Weber and Yadav 2020). Given the contemporary structure of southwestern ponderosa pine forests, large-scale low-intensity fire results in a homogenized disturbance landscape pattern, as opposed to the more varied patchwork of disturbance age and intensity that would have existed historically (Mast et al. 1999;Fulé et al. 2007;Reynolds et al. 2013). The relationship of fire severity and resulting forest structure has largely been studied at fine-scales using fixed area plots, typically less than one acre (Sánchez Meador et al. 2009Meador et al. , 2011Huffman et al. 2017Huffman et al. , 2018. To place managed wildfire landscapes in context, undisturbed, functioning, or restored landscapes (hereafter referred to as reference landscapes) can provide estimates for a natural range of variability of landscape structure (White and Walker 1997;Fulé et al. 2002a;Collins et al. 2007Collins et al. , 2017. Contemporary restored forest conditions are particularly helpful for deriving landscape-scale reference information, as landscape metrics derived from reconstructed, small-extents (0.04-1 ha) are notoriously difficult to extrapolate to larger extents (e.g. 800 ha) due to the highly dynamic disturbance regime of in frequent fire forests over larger scales and lack of large-scale empirical evidence(Sánchez Meador et al. 2011;Barth et al. 2015;Cannon et al. 2018). Today, precious few landscapes exist in the Southwest we would consider ecologically intact, but notable exceptions include remote areas in national parks and wilderness areas, as well as large landscape with ongoing monitoring efforts (Roccaforte et al. 2010;Strahan et al. 2015;Stoddard et al. 2018). For this study, we chose to compare both ecologically functioning and ecologically restored landscapes, which together provide more comprehensive estimates of the natural range of variability of landscape metrics examined here and baseline information for evaluating outcomes of managed wildfires in southwestern ponderosa pine forests.
Quantifying forest structure over large areas is only possible through remote sensing approaches, which have a well-established role within forest research. Specifically, medium resolution (10-100 m) multispectral satellite imagery such as the Landsat family of satellites (30 m) or the Sentinel-2 satellites (10-20 m), provide frequent image coverage that can be used to monitor forest cover and dynamics (Eidenshink et al. 2007;Wulder et al. 2012;Li and Roy 2017). While we cannot directly measure tree density or tree size or age distribution, those components of forest structure are directly correlated to forest cover (Donager et al. 2021), allowing us to gain insight in to the overall structure of the forest through a snapshot from above. The quantification of forest structure from inter-stand (* 15 ha) to landscape scales ([ 400 ha) is sensitive to the grain of the data used to quantify landscape metrics (Wasserman et al. 2019). Imagery from the Sentinel-2 satellites are medium-resolution (10-20 m), multispectral (12 spectral band) datasets and provide wall-to-wall coverage every 5 days on average (Li and Roy 2017). While sacrificing fine-scale (e.g., 1-3 m) detail compared to higher-resolution datasets, which are often also limited to four multispectral bands, Sentinel-2 imagery allowed for increased discrimination of land cover types through integrating spectral and temporal information into the classification scheme (Li and Roy 2017).
The intent of this study was to quantify forest structure through classification of forest cover from remotely-sensed imagery across a gradient of landscape scales to better understand spatial patterns within wildland fires managed for resource benefit and reference landscapes, consisting of restored and functioning ponderosa pine forests. From the forest cover classification, we calculated a suite of landscape metrics to quantify landscape patterns. We then used a series of random forest models to better understand which components of landscape structure are driving the differences between these two landscape types and the scales at which those differences manifest. To accomplish this, we focused on three specific objectives: (1) quantify and test differences in landscape metrics calculated across a gradient of landscape extents among managed and reference landscapes, including restored and functioning landscapes (2) determine which landscape metrics are most important for distinguishing landscapes and identify how this relationship changes with landscape extent, and (3) interpret the management implications of multi-scale landscape metric differences among landscape types.

Study area
Our study focused on two classes of landscapes within northern Arizona among federally controlled lands ( Fig. 1): reference and wildfire managed for resource benefit. Reference landscapes included both intact and experimental restoration landscapes. We included four large reference landscapes (\ 840 ha) and seven small (* 10-15 ha) experimental restoration areas in the study (Table 1). We compared these collectively described reference landscapes to 16 recent and sufficiently large managed fires which fit the study criteria ( Fig. 1, Table 1). The sites encompassed a range of elevation, soil type, and topography, which in turn influences local climate. Average yearly precipitation ranges from 385 to 825 mm (1st and 99th percentiles of precipitation) and average temperatures range from 6.1 to 12.0°C (1st and 99th percentiles of temperature) when considering values within the distribution of ponderosa pine across the study area (PRISM Climate Group, accessed May 2020). Elevation within the distribution of ponderosa pine range from 1346 to 3449 m across the study extent. International soil classifications across the study area are generally Luvisols and Kastanozems (Hengl et al. 2017). We limited our study to the distribution of the ''Rocky Mountain Ponderosa Pine Woodland'' forest type land cover according to Southwest ReGAP data (SWReGAP 2005), as this represents forest cover  Restoration sites were only used for comparisons to managed fire landscapes at the smallest extent. The percent area by fire severity (U: unburned, L: low, M: moderate, and H: high) was calculated from Monitoring Trends in Burn Severity (MTBS); dominant soil groups are an international soil classification for each area from SoilGrids data (Hengl et al. 2017); and mean climatic variables are 30-year averages from PRISM (PRISM Climate Group 2020). Reference landscape mean elevations were not significantly different from resource objective wildfire mean elevations according to Wilcoxian rank sum and Kolmogorov-Smirnov tests, while restoration site elevations were significantly higher in elevation dominated by ponderosa pine, but can include other components, including Gambel oak (Quercus gambelii). For a managed wildfire to be included in the study, the bounding polygon needed to be classified predominantly (2/3) as this forest type. The four large reference landscapes for this study included Mt. Trumbull, a restored experimental landscape initially treated with a thin/harvest treatment in 1996-1997 followed by initial prescribed fire and subsequent maintenance burning in 2000(Roccaforte et al. 2010; and Powell Plateau, Rainbow Plateau and Fire Point, three landscapes considered intact and ecologically functioning as logging and fire suppression have not occurred in large part due to their remoteness on the North Kaibab Plateau (Fulé et al. 2002a. Small experimental restoration sites (see Table 1) were treated using evidence-based ecological restoration guidelines (''ERG'' sensu Tuten et al. 2015) wherein the resulting forest structure was targeted to be consistent with site-specific historical conditions and ranged (see Fulé et al. 2001Fulé et al. , 2002b in size from 8 to 15 ha (Table 1). These experimental sites located within Centennial Forest, Fort Valley Experimental Forest on the Coconino National Forest, and the Tusayan Ranger District (Fig. 1) of the Kaibab National Forest. All reference landscapes and restoration sites are dominated by ponderosa pine (P. ponderosa), but also include varying amounts of Gambel oak (Q. gambelii).
We selected recent managed wildfires for analysis which met the following criteria: (1) size ([ 405 ha), (2) year of occurrence between 2015 and 2018, (3) identified as being managed for resource benefit from the Forest Service Activity Tracking System (FACTS) database and (4) had a vegetation cover type dominated ([ 66%) by ponderosa pine, as classified by the Southwest ReGAP land cover classification (Lowry et al. 2007). We cross-referenced selected fires with the Monitoring Trends in Burn Severity (MTBS) dataset to ensure the dates, locations, and perimeter of wildfires matched (MTBS 2017). A total of 16 wildfires matched all criteria ( Fig. 1, Table 1). Fire weather and fuels information for each fire on the date of ignition is included in the Online Appendix (A). We used FireFamily Plus (4.2) to summarize fuels and weather conditions for the first 30 days following ignition of the wildfires (Bradshaw and McCormick 2009). Historical weather data summarized was from Happy Jack (ID 020291), Flagstaff (ID 020209), and Tusayan, Arizona (ID 020207) Remote Automated Weather Stations. As our focus for this study was on ponderosa pine forest, we removed any fire meeting our criteria but not having a majority proportion of ponderosa pine forest cover (\ 66%). Among the total burned area covered by the 16 wildfires, approximately 90% was classified as unburned and low severity, while medium and high severity classifications were 8.3% and 1.7%, respectively (www.mtbs. gov).

Canopy cover classification
We used seasonal image composites of Sentinel-2 satellite imagery to classify the landscape into four categories: evergreen canopy cover, deciduous canopy cover, non-tree cover and water. As our areas of interest were limited in extent, this simple classification scheme proved to be adequate to isolate ponderosa pine cover from deciduous cover, particularly Gambel oak. We built the seasonal image composites in Google Earth Engine (Gorelick et al. 2017) using Sentinel-2 level 2A multispectral imagery calibrated to surface reflectance for the year 2019. These images are radiometrically calibrated and geographically referenced. For each seasonal composite, we calculated the median value for each pixel after masking out cloud and cloud shadow pixels from the image stack using all available imagery in a time period that spanned 1 month prior to and following solstice and equinox dates. We conducted all subsequent analysis, including random forest classification models of the image composites in R using the ranger package (ver 0.12.1; Wright and Ziegler 2017) for random forest classification. Rasters representing elevation (http:// ned.usgs.gov), derived slope and aspect as well as a soil parent material classification (Theobald et al. 2015) were included in the classification. We maintained the pixel resolution of the classification and resulting grain of landscape analyses at 10 m based on the Sentinel-2 imagery by down sampling coarser resolution image bands using a smoothing filter to match the resolution of the visible and near-infrared bands.
We built training and testing datasets of labeled point locations through manually examining NAIP imagery (2017) in combination with summarized Sentinel-2 image stacks to ensure selected points were not in an incorrect pixel due to geographic offsets of the two datasets. The non-tree category was an allinclusive ''other'' category which, for example, included training points representing bare ground, herbaceous cover, and roads. All training and testing points were located inside of a 1 km buffered area of reference landscape and wildfire perimeters. A total of 450 training points were chosen, with approximately equal quantities for each category (* 130 points) other than water (59 points).
The resulting canopy cover classification had an accuracy of 99.7% based on out-of-bag sampling, or compared tothe training labels, from the random forest model creation. A manual accuracy assessment was conducted using250 randomly sampled points generated within 1 km of reference landscapes and managed wildfire perimeters (Table 2). Overall accuracy was less than the random forest out-of-bag estimate, but still high at 93%. Mean Usersaccuracy was 91% while mean producer's accuracy was 92%.
Sampling along a gradient of landscape extents From our classified map, we calculated landscape metrics along a gradient of subsampled extents randomly generated within the reference and wildfire landscape perimeters. This approach allowed us to move beyond the traditional dependency on plotbased comparisons and undertake a more holistic assessment of each landscape. As ponderosa pine was the dominant component of the landscape, and the cover type forest managers are interested in manipulating, we eliminated all other cover classification types before calculating metrics (Fig. 2). While we calculated a total of 55 class-level landscape metrics using the R package landscapemetrics (ver 1.4; Hesselbarth et al. 2019), we chose to focus on twelve metrics which encompassing a range of relevance to land management. Metrics analyzed were patch area and edge, core area, and several shape and aggregation metrics (see Table 1). A complete list of class-level landscape metrics and corresponding summaries for al focal landscapes (55 metrics) are included in Online Appendix B. For each landscape extent from that of restoration sites (* 15 ha) to the largest managed wildfire extent (* 15,000 ha), we calculated all classlevel landscape metrics. For the purposes of this research, we specifically focus on a set of landscape metrics directly applicable to landscape-scale forest management among area and edge metrics (proportion of landscape, mean and standard deviation of patch area, and edge density), core area metrics (core area proportion of landscape, mean and standard deviation Table 2 Accuracy assessment of classification outputs using randomly sampled validation points (n = 250) located inside and within 1 km of the reference and managed wildfire landscapes included in the study. User's accuracies (right side column), producer's accuracies (bottom row) and overall accuracy (bottom-right) suggest high overall classification accuracies. Relatively lower accuracies associated with deciduous cover likely results from understory greenness obfuscating the signal. The grey color highlights counts of correctly classified validation points User's accuracies (right side column), producer's accuracies (bottom row) and overall accuracy (bottom-right) was high. Lower accuracies associated with deciduous cover likely results from understory greenness obfuscating the signal of the core area index), shape metrics (mean and standard deviation of the shape index), and aggregation metrics (patch density, normalized landscape shape index, and the division index). For further details, see McGarigal et al. (2012). We compared restoration sites to larger reference landscapes by randomly sampling 15 ha extents from within the reference landscape boundary to create a distribution of metrics at that same extent, allowing sampled extents to overlap at random (Fig. 2B). In the same way, we compared reference landscapes to managed wildfires among intervals of landscape extents by subsampling both within reference landscapes and managed wildfire boundaries and comparing the resulting distribution of landscape metrics. In this way, comparisons were made at each of those extents so as to avoid metric comparisons sensitive to extent (Wasserman et al. 2019). The maximum extent for comparison was approximately 800 ha, which was determined by the maximum size of available reference landscapes within the study area. We generated subsampled extents by randomly generating point locations within an inward-buffered landscape and buffering those points by the appropriate radius to achieve the desired extent which was then clipped by the landscape boundary (Fig. 2). The size of the inward buffer is a function of the radius of the desired subsample extent. The number of points generated is a function of the desired subsample extent area and the total area of the landscape, whereby for each individual landscape, we divided the total area by the target sampled extent, multiplied that resulting number by three, and chose that many random samples within that landscape with an interior buffer to reduce sampled areas falling too much outside the landscape extent Those sampled points were then buffered to the correct target extent. Resulting extent polygons were then filtered by removing buffered polygons whose centroid fell within a threshold number of resulting polygons (C 5% of the total subsampled extents within the landscape) to promote the independence of the metrics calculated within a given landscape. As a final check, subsampled extents were removed if they were less than 75% of the target landscape extent area.

Comparisons of landscape metrics
Our analysis resulted in three general strata of landscape metrics: (1) inclusion of restoration sites at approximately 15 ha extent, (2) comparisons to reference landscapes up to approximately 800 ha extent, and (3) values only resulting from large wildfires up to approximately 15,000 ha extent (Fig. 2). Managed wildfires varied in size, from 452 to 14,529 ha while reference landscapes were smaller overall, ranging from 8 to 800 ha (Table 1). We only compared metrics calculated at similar extents, for each extent considered. Comparisons of metrics drawn from different landscape types were made using the Kolmogorov-Smirnov test (Marsaglia et al. 2003) in addition to the Mann-Whitney test (Sijtsma and Emons 2010). For both tests, significance was determined by an alpha value smaller than that of the appropriate Bonferroni correction (a = 0.01). The two-sample Kolmogorov-Smirnov test is a Fig. 2 Example of A 100 ha sampled extents and B 15 ha sampled extents for the Mt. Trumbull reference landscape. The dark dashed line shows the inward buffering used in the process of creating the sampled extents, within which random points were generated (red dots). And then buffered and clipped to the landscape boundary. Sampled extents were removed if their centroid intersected many other polygons (determined relative to the total number of sampled extents). Extents were removed if their resulting area was less than 75% of the target extent area nonparametric test of how similar the shapes are for two continuous distributions are through a comparison of the empirical distribution function and the cumulative distribution function of the two samples, respectively (Marsaglia et al. 2003). The Mann-Whitney test is also a nonparametric rank sum test which examines the probability than any single sample is drawn from differing distributions (Sijtsma and Emons 2010).
Determining metric importance across scale using random forest classifiers Finally, to understand which landscape metrics were the most important in discriminating reference and managed wildfire landscapes and how those differences changed with scale, we implemented a series of random forest classification models and retrieved variable importance, model accuracy, and sample sizes for each model (n = 11). Random forest models were built using the 12 landscape metrics as predictors of landscape type (managed vs. reference). Models were built using 10,000 trees and assessing half of the landscape metrics at each tree node and individual landscape metric importance was assessed using a permutation approach within the random forest model. Variable importance values in combination with model accuracy were used to infer how individual landscape metrics are more or less differentiable across the scale gradient applicable to both reference and managed wildfire landscapes. Kendall rank correlation coefficients among the variables are included as an Online Appendix (C).

Comparison of small restoration sites to reference landscapes
We initially compared landscape metrics among restored (15 ha) and reference landscapes (\ 800 ha) as separate groups. However, testing showed that restored and reference landscapes were not statistically different from one another at our smallest landscape extent (Fig. 3). Because our analysis showed that these sites were consistent with functional landscapes when randomly subsampled at a similar extent (* 15 ha) from both Kolmogorov-Smirnov and Mann-Whitney test results we collapsed these categories into the singular 'reference' landscape category for further analysis. Of the metrics we examined, only edge density displayed significant differences (Fig. 3). Elevations of the smaller restored sites were greater than the managed wildfire landscapes on average, according to Wilcox and Kolmogorov-Smirnov tests. However, they were not significantly different from larger reference landscapes.

Comparisons of landscape metrics among extents
Similarities among restoration and reference landscape metrics justified combining them into a single ''reference'' class (see section 'Comparisons of landscape metrics'). Most landscape metrics were significantly different between reference and wildfire landscapes across the gradient of all extents tested (Fig. 4). The entirety of landscape metric values (55 metrics) are included as an Online Appendix (B). Among most landscape metrics we focused on, the distribution of metric values was significantly different according to both statistical tests and differences tended to increase as sampled extent increased. Of particular note were proportion of the landscape, (i.e., canopy cover), mean and standard deviation of canopy patch area, all three aggregation metrics, and the standard deviation of the shape index (Fig. 4). Median canopy cover in managed landscapes decreased with scale from 70.5% (15-ha extent) to 60.8% (840-ha extent) and continued to less than 50% at the largest wildland fire extent (15,000-ha). Similarly, canopy cover on reference landscapes decreased from 48.8% (15-ha) to 28.8% (840-ha). Differences in canopy cover between landscapes were statistically significant across all extents tested. Median patch area similarly decreased with scale, from 2.17 ± 3.32 ha (15-ha extent) to 1.5 ± 22.4 ha (840-ha extent) among managed landscapes and from 0.52 ± 1.45 ha (15ha extent) to 0.24 ± 1.78 ha (840-ha extent) among reference landscapes. The opposite relationship was true for patch density, which increased from 33.2 ha -1 (15-ha extent) to 40.9 ha -1 (840-ha extent) among managed landscapes and from 84.1 ha -1 (15-ha extent) to 122.2 ha -1 (840-ha extent) among reference landscapes.
Of the remaining landscape metrics, the distribution of landscape metric values among reference and managed wildfire sampled extents were significantly different among smaller landscape extent comparisons but became non-significantly different from one another as sampled landscape extent increased according to both comparison tests (Fig. 4). However, the majority of metric distributions were significantly different according to the Kolmogorov-Smirnov test. The notable exception was edge density, for which both tests were not significant at any sampled extent but differed significantly according to the Kolmogorov-Smirnov test for sampled extents of 240 ha and smaller (Fig. 4). While there was no Fig. 3 Boxplot comparisons of restoration treatment samples (lighter blue; n = 57) to reference landscapes (darker blue; n = 148) sampled at approximately 15-ha extents for A area and edge metrics, B core area metrics, C aggregation metrics and D shape metrics. Restoration boxplots are outlined in black if comparison tests were significant (p \ 0.01), and semi-transparent otherwise. The only significant difference was for edge density. Test results are displayed above the x-axis with asterisks for significance in each of the comparison tests (Kolmogorov-Smirnov, Mann-Whitney) and a point otherwise comparison for managed wildfire extents greater than 840 ha, among the majority of landscape metrics the trend established among smaller extents continued through the largest extent of 15,000 ha, with metrics tending smaller or larger in accordance with values extracted from smaller sampled extents (Fig. 4).

Landscape metric importance across scale
Overall, relative importance of the twelve landscape metrics for distinguishing between managed wildfires and reference sites decline with increasing landscape extent with greater importance among scales less than Fig. 4 Comparison of landscape metrics among a gradient of extents within reference landscapes (blue) and managed wildfires (red) for area and edge (Area), core area (Core), aggregation (Agg.), and shape (Shape) metrics. While we show metrics for the largest extents of wildfires, comparisons to reference were only possible up to 840 ha extents. Individual boxplots are outlined in black if both comparison tests were significant (p \ 0.01), and transparent otherwise. Test results displayed above the x-axis with asterisks for each of the comparison tests (Kolmogorov-Smirnov, Mann-Whitney). Outlier values are included but transparent to highlight the main distribution of metric values 240 ha (Fig. 5A). Despite this, model accuracy increased to nearly 100% with increasing sampled extent, increasingly rapidly up to 240 ha (Fig. 5B, C). In other words, as landscape extent increases, our resulting models were better able to distinguish landscape type (reference vs. managed) given less data and increasingly similar variable importance among predictors. An inflection in predictive accuracy appears to occur at the 240-ha scale (Fig. 5).
Area and aggregation metrics were consistently the most important landscape metrics for discriminating between landscape types across all scales examined. Specifically, proportion of the landscape, mean patch area and the normalized landscape shape index were the most important among landscape extents smaller than 240 ha (Fig. 5). Among these intermediary scales (up to 240 ha), we observed a shift in which variables were most important from proportion of landscape and the normalized landscape shape index to patch area and patch density. Patch area and division index are the most important landscape metrics at the largest extent (840 ha). Shape metrics were consistently less important among all scales while edge density was the least important metric at all extents greater than 240 ha (Fig. 5).

Discussion
Through a landscape-scale assessment of forest canopy cover and landscape patterns, we found that restored forests exhibited patterns consistent with intact landscapes and that managed wildfires were largely unsuccessful at altering forest structure. We also found that homogeneity of landscape pattern increased with landscape scale, but homogeneity trends differed significantly among reference and managed wildfire landscapes. While aggregate differences are relatively minor at small scales, stemming from a high degree of variability, differences tended to increase with scale and the metrics defining those differences change in their importance as we moved to landscape scales. Although we initially explored a large suite of landscape metrics, we ultimately concentrated on twelve common metrics which easily translate to forest management and silvicultural prescriptions (Reynolds et al. 2013;Churchill et al. 2013;Rodman et al. 2016), were parsimonious in their description of landscape pattern (Cushman et al. 2008), and have been found to be useful in evaluating restoration treatment success (Dickinson et al. 2016;Ziegler et al. 2017;Huffman et al. 2017). Our Heatmap showing variable importance among random forest models for each extent from 15 to 840 ha (A), sample sizes for each landscape extent (B) and overall model accuracy for each scale (C). Variable importance was normalized to a 0-1 scale for ease of interpretation. Overall, variables became less important with increasing landscape extent while overall model accuracy increases to nearly 100%, despite having fewer samples to draw from. Area and aggregation metrics remain the most important metric classes among all models. Importance values were re-scaled from 0 to 1 by dividing by the maximum importance value from the model series approach to assessing differences across scale allowed us to examine those outcomes and provided some insight into the scales at which management would be necessary to address particular ecological processes and functions (Kerr and Ostrovsky 2003;Uuemaa et al. 2013;Wan et al. 2020) and could serve to help outline future restoration planning efforts (Churchill et al. 2013;Dickinson et al. 2016;Cannon et al. 2020;Wan et al. 2020).

Reference landscape patterns
Landscape patterns at restored sites around the 15-ha extent were consistent with those of intact ponderosa pine forest landscapes that had not experienced notable impacts of late nineteenth century livestock grazing and fire exclusion. Based on this initial result, we were able to combine restored and functioning landscapes into a single reference class for subsequent analysis. This is unsurprising, as the removal of trees during the process of restoration mimics the discontinuous canopy patterns we would expect from lowseverity, frequent fires over sufficiently long time periods. Mount Trumbull stands out as a unique landscape in the Southwest in that it was the only large, restored landscape in the region we could include in the study. Ambitious efforts are presently underway to apply restoration treatments across millions of hectares throughout the Southwest (Schultz et al. 2012), but much more needs to be treated to assist with the restoration of large landscapes. Similarity between restoration sites and intact landscapes suggests that restoration treatments, as implemented using the ERG method (see Tuten et al. 2015), have measurably promoted forest structural conditions consistent with reference landscapes, complementing many previous studies relying on plotbased measurements of forest structure (Mast et al. 1999;Fulé et al. 2002b;Waltz et al. 2003;Roccaforte et al. 2010Roccaforte et al. , 2015Tuten et al. 2015). Canopy proportion of the landscape, mean patch size and patch density were very similar among restored and functioning landscapes, with restored landscapes were comprised of smaller and denser canopy patches. Edge density was the only metric that showed significant differences between restored and intact landscapes. This single difference could result from the implementation of mechanical treatments which have operation limitations and target the restoration of functioning conditions through tree spacing and patch sizes, but fail to promote spatial complexity that stems from low-severity, frequent fire (Hargis et al. 1998;McIver et al. 2013).
Elevations of small restoration sites did prove to be higher than those of managed wildfires but did not significantly differ from elevations of large reference landscapes. While these landscapes tended to be slightly higher in elevation, on average, they do not persist at the upper limit of ponderosa pine distribution, but reside well inside the elevation band of ponderosa pine forests (Moir et al. 1997). It should also be noted that these larger reference landscapes only occur in the northern latitudes of our study area. While we desired to incorporate a greater number of reference landscapes representing a greater diversity of environmental conditions, this was simply not a possibility within the forests of our study area as they did not exist. These larger landscapes represent what we expect to see given the range of environmental and disturbance factors that would have existed historically across the region (Moir et al. 1997;Reynolds et al. 2013;Ritter et al. 2020). The structure of these reference landscapes is dependent upon the removal of trees and limits on establishment, through either mechanical (e.g. restoration activities) or disturbance (e.g. low-severity fire) mechanisms, coupled with frequently recurring fire to maintain those structural patterns Fulé et al. 2002a;Stoddard et al. 2021).

Comparisons to managed wildland fires
Landscape metrics derived from wildland fires managed for resource benefit were significantly different from those of reference sites across the gradient of landscape extents we examined. Managed wildfires exhibited greater canopy cover, larger patch size and corresponding lower patch density, and divergent shape complexity as compared with reference sites. Taken together, these differences continue to reflect the denser forest conditions typical of fire-excluded ponderosa pine forests that dominate a region where surface fire has been excluded for decades and thinning activities have not yet occurred . Dense forest conditions and larger patches of contiguous canopy on the managed wildfire sites reflect the management tendency towards unburned or low burn severity and limited re-entry, conclusions that parallel those drawn in previous studies examining managed wildfires in the Southwest . While these studies have suggested that some mixed severities may be necessary to alter forest structure, only one fire examined had greater than 10% of its area classified as moderate burn severity, and only two fires had more than 1% of total area classified as high severity. Current research has shown that the presence of high-severity conditions within a managed wildfire is correlated with landscape patterns within southwestern forests (Singleton et al. 2021). Interestingly, edge density was similar among all landscape types in our study, perhaps resulting from larger patches having a similar amount of patch edge to that observed for smaller, yet more complex, canopy patches (see Cushman et al. 2008) and perhaps suggesting edge density is not a good metric for assessing active forest management.
Larger contiguous patches of forest canopy within managed wildfires suggest that numerous restoration objectives, such as the reduction of canopy fuels and crown fire hazard, promotion of understory plant production and diversity through decreased resource competition from overstory trees, and increases in habitat suitability for wildlife species associated with forest edges, have not been effectively met Laughlin et al. 2008;Fulé et al. 2012;Reynolds et al. 2013). Homogeneity of forest canopy and lack of structural complexity also increase vulnerability of ecosystems to landscape-scale disturbances such as stand-replacing wildfire, bark beetle attacks, and density-dependent or drought induced mortality (Covington et al. 1997, Agee andSkinner 2005;Kolb et al. 2007;McWethy et al. 2019). Although, interconnected tree crowns and greater canopy cover are important for some wildlife species, such as the tassel-eared squirrel (Sciurus aberti), Mexican spotted-owl (Strix occidentalis lucida), and northern goshawk (Accipiter gentilis) (Ganey et al. 1999;Prather et al. 2006;Loberger et al. 2011;Tuten et al. 2015;Bagdon et al. 2016), severe loss of habitat due to stand-replacing fire remains of high concern to forest managers. Over time, repeated fire occurrence, especially if allowed to burn at severity levels that reduce tree density, can increase heterogeneity by breaking up larger forest stands and creating discontinuous, smaller tree patches as well as moderating understory tree growth into the overstory (Hargis et al. 1998;Covington et al. 2001;Hunter et al. 2011;Reynolds et al. 2013;Huffman et al. 2018;Kane et al. 2019). Thus, to meet objectives associated with ecological restoration, repeated low-to moderateseverity wildfires are necessary to disrupt continuous canopy structure typical of fire-excluded forests and restore spatial patterns more consistent with the natural range of variation (Allen et al. 2002;Reynolds et al. 2013;Huffman et al. 2017). If fires are consistently managed for minimal effects and low burn severity, time required to incrementally produce fine-grain structural patterns may be beyond typical planning horizons for public land managers (Collins et al. 2009;van Wagtendonk et al. 2012;Huffman et al. 2018;Yocom et al. 2019). Generally, fragmentation, which in fire-excluded ponderosa pine forests is likely to increase heterogeneity, has been shown to be linearly related to the proportion of the landscape disturbed (Hargis et al. 1998;McGarigal et al. 2012). First-entry wildfires can reestablish canopy structure consistent with reference landscapes if allowed to burn at some moderate severities Kane et al. 2019), but this is often not preferable when natural resource and other values are perceived to be at high risk (North et al. 2015).
Landscape patterns in southwestern ponderosa pine forests are a product of the dominant disturbance process: low-severity, frequent fire (Allen et al. 2002;Veblen 2003;Reynolds et al. 2013). In a functioning forest, such as our reference landscapes, this disturbance is sufficient to maintain low tree density and discontinuous canopy, but fails to do so once the age of dominant trees develop fire resistance and high tree densities promote canopy transmission of fire (Reynolds et al. 2013;Tarancón et al. 2014;Bottero et al. 2017;Ritter et al. 2020). Repeated low-severity wildfire will promote heterogeneity at small scales, but lead to homogenous landscape patterns at increasingly larger scales (Wasserman et al. 2019). Our results showed that among all landscape metrics, greater variability transitions to increased homogeneity of metric values with increases in landscape extent. While the range of landscape metric values became increasingly constrained with increasing extent, reference landscape metrics were generally more constrained than their managed wildfire counterpart, suggesting that heterogeneity may not have been a key element at all scales for reference landscapes. It is possible that we were not able to examine landscapes large enough to see an eventual increase in heterogeneity at larger landscape extents, exemplifying a key constraint to our study, the lack of large reference landscapes. Variability in patch area was the only metric examined which exhibited the opposite pattern of increased heterogeneity with increasing scale, and this was only among managed wildfire landscapes. This was likely due to increasingly large patches being included in summaries as extents became larger; the same was not true for reference landscapes.
The effect of scale on metrics We were surprised by the random forest model results showing that decreasing sample sizes among larger landscapes extents were associated with increasing model accuracy, coupled with an overall decline in variable importance (Cutler et al. 2007). These results suggest that the landscape types predicted in the model became increasingly differentiable with smaller amounts of data. In other words, as landscape extent increased, managed and reference landscapes became increasingly distinct. Increased homogeneity of landscape metrics with increasing landscape extent suggest that these landscape types homogenize with scale in different ways, which is also evident in the distinct trajectories of each landscape metric across the gradient of sampled extents. An inflection in both model accuracy and variable importance was apparent at the 240-ha landscape extent, coincidentally corresponding to a suggested definition of mid-scale landscape structure in southwestern ponderosa pine forests (Reynolds et al. 2013). The inflection suggests that landscapes are more variable among scales up to 240 ha, likely the result of fine-scale processes and management (Kalies and Yocom Kent 2016;Ritter et al. 2020), but become increasingly homogeneous following scales greater than this threshold, exhibiting diverging structural patterns.
Landscape metrics scaled predictably across the gradient of extents we examined, consistent with a type I scaling relationship (Wu and Hobbs 2002;Wu 2004). We expect this behavior in ponderosa pine forests due to the dominant disturbance regime of lowintensity wildfire, leading to increased homogeneity with scale (Wasserman et al. 2019). We see evidence for this as the range of most metric values became increasingly constrained as the scale of landscape extents grew, while tending towards divergent metric values for each landscape type. Understanding the behavior of ponderosa pine forest landscape patterns is necessary for scale-dependent ecological function and management objectives (Turner and Gardner 2015), including wildland fire behavior (Povak et al. 2018), habitat suitability (Timm et al. 2016;McGarigal et al. 2016;Wan et al. 2020), forest health and ecohydrology (Stevens 2017), and climate mitigation and carbon sequestration (McCauley et al. 2019). Shape metrics were consistently unimportant in our models, suggesting area and aggregation metrics may be more important for assessing treatment effectiveness and guiding future restoration plans.

Methods, limitations and next steps
Our classification of ponderosa pine forest cover had high accuracy according to both out-of-bag model training and an independent comparison of manual classifications. This was in part due to the relatively simple classification scheme as well as the inclusion of a seasonal component, which effectively discriminated between evergreen and deciduous vegetation. While the Sentinel-2 imagery can fail to detect single trees and small tree groups due to the size of the object of interest relative to the 10-m resolution of the imagery, the resolution is sufficient to detect mature individual trees or tree groups of ponderosa pine due to their average size (e.g. we expect a mature ponderosa to have a radius around 6 m) and thus appropriate for examining forest structure over large landscapes (Wasserman et al. 2019;Grabska et al. 2020;Schumacher et al. 2020). See Wasserman et al. (2019) for a comparison of landscape metrics resulting from a gradient of data resolution. Higher resolution data, such as lidar or airborne imagery, may more accurately capture smaller patch sizes; however, expense and availability of these types of datasets for regional landscapes is limited (Buyantuyev and Wu 2007;Wasserman et al. 2019). We suggest that methods used in our study have broad applicability for assessment and post-treatment monitoring of ponderosa pine forest structure in the Southwest (see also Botequilha Leitão and Ahern 2002; de Almeida et al. 2020). However, further research is needed to understand why metrics vary not only with extent, but also grain as interpretation of landscape structure and pattern are known to vary with grain (Hargis et al. 1998;Buyantuyev and Wu 2007;Corry and Lafortezza 2007;Rutchey and Godin 2009;Š ímová and Gdulová 2012;Wasserman et al. 2019;Wickham and Riitters 2019).
We were limited by potential reference landscape quantity, size, and abiotic variability; stemming from the widespread and prolonged effects of historical livestock grazing and fire exclusion Moore et al. 1999;Reynolds et al. 2013) and the limited large-scale restoration activities that have occurred to date. With these limitations of reference landscapes, we were unable to control for the entirety of potential variation in abiotic factors, specifically soil parent material, elevation, topography, climate, and disturbance history (Puhlick et al. 2012;Larson and Churchill 2012;Reynolds et al. 2013;Rodman et al. 2017). Sampling extents from reference landscapes were limited to a maximum of approximately 800 ha, while managed wildfire landscapes included in our analysis were as large as 15,000 ha. This discrepancy presents an inherent challenge for understanding reference landscapes of the Southwest, where managers desire to better understand the importance scale on structural metrics and where comparisons between contemporary and reference landscapes are instrumental to assessing treatment effectiveness. We hope to increase the geographic footprint of the study area in future research, but with that increased geographic scope comes the need to account for increasingly diverse sources of variability in both biotic and abiotic factors contributing to forest structure (Johnston et al. 2016;Rodman et al. 2017). The scarcity of reference landscapes reiterates the urgent need for accelerated large-scale forest management (Stephens et al. 2016;Cannon et al. 2018), but also highlights continued difficulties in managing for ecologically functioning, resilient forest landscapes under threat of climate change and competing land uses (Bradford and D'Amato 2012;Stanturf et al. 2014;Jacobs et al. 2015;Gleason et al. 2017;McCauley et al. 2019;Cannon et al. 2020).
While our smaller (\ 15 ha) reference sites tended to be slightly higher (\ 100 m) in elevation as compared to our managed wildfire elevations, these site elevations did not significantly differ from larger restoration landscapes, nor did the elevations of larger reference landscapes significantly differ from managed wildfire landscapes. The smaller (\ 15 ha) reference sites were only included in the smallest scale comparison to managed wildfire landscapes (1 of 11 comparison extents) and make up only four percent of the sample size at that scale. The managed wildfire landscapes we examined encompass a greater range of elevations compared to reference landscapes, but mean differences in elevation are less than 100 m, on average. We also did not include wildfires at the upper or lower elevational limits of ponderosa pine forest, having filtered potential fires to only include predominately ponderosa pine forest types based on an auxiliary land cover classification. Research suggests that within the limited regional extent included in the study, the managed fire landscapes we examined would have exhibited a structure prior to disturbance largely in line with that of reference landscapes (Mast et al. 1999;Allen et al. 2002;Williams and Baker 2012;Rodman et al. 2017).

Conclusion
In general, southwestern ponderosa pine forest landscape patterns resulting from wildland fires managed for resource benefit differ from reference landscape pattern at analogous scales. Ecologically functioning landscapes, both relict and restored, facilitated comparisons up to approximately 800 ha, a scale at which we found managed wildfire landscapes significantly differed from reference. At this scale and larger, landscape patterns examined in our study were found to become increasingly homogenous, most likely due to a lack of disturbance events, which allowed large, contiguous forest canopies to persist. Our results point to predominately low burn severity and limited reentry of wildfire leading to in minimal changes in forest structure, such that large scale landscape patterns are largely homogenized through continuous canopy cover. At the current rate of multiple re-entries in this landscape, low-severity fire would take an exceedingly long time to sufficiently alter forest landscape pattern. Repeated, mixed-severity fire or mechanical treatments are needed to move forest structure towards that of ecologically functioning landscapes through the reduction of tree density, which in turn alters the canopy and overall forest cover patch patterns. However, these mechanisms largely do not occur at rates sufficient to alter landscape patterns among the larger scales we examined. Understanding the scale-dependent relationships of management activities and ecological function with landscape patterns is necessary to improve management outcomes and fire policy. Greater emphasis on the restoration of forest structure across scales is needed to provide for resilient, healthy ponderosa pine forests.